Commit ca69f50d authored by arnedewall's avatar arnedewall
Browse files

initial commit

parents
MIT License
Copyright (c) 2019 The BigEarthNet Authors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
# Deep Learning Models for BigEarthNet-MM with 19 Classes
This repository contains code to use the [BigEarthNet-MM](http://bigearth.net) archive with the nomenclature of 19 classes for deep learning applications. The nomenclature of 19 classes was defined by interpreting and arranging the CORINE Land Cover (CLC) Level-3 nomenclature based on the properties of Sentinel-2 images. This class nomenclature is the product of a collaboration between the [Direção-Geral do Território](http://www.dgterritorio.pt/) in Lisbon, Portugal and the [Remote Sensing Image Analysis (RSiM)](https://www.rsim.tu-berlin.de/) group at TU Berlin, Germany.
A paper describing the creation of the nomenclature of 19 classes is currently under review and will be referenced here in the future.
## Pre-trained Deep Learning Models
We provide code and model weights for the following deep learning models that have been pre-trained on BigEarthNet-MM with the nomenclature of 19 classes for scene classification:
| Model Names | Pre-Trained TensorFlow Models |
| ------------ | ------------------------------------------------------------ |
| K-Branch CNN | [K-BranchCNN.zip](http://bigearth.net/static/pretrained-models/BigEarthNet-MM_19-Classes/K-BranchCNN.zip) |
| VGG16 | [VGG16.zip](http://bigearth.net/static/pretrained-models/BigEarthNet-MM_19-Classes/VGG16.zip) |
| VGG19 | [VGG19.zip](http://bigearth.net/static/pretrained-models/BigEarthNet-MM_19-Classes/VGG19.zip) |
| ResNet50 | [ResNet50.zip](http://bigearth.net/static/pretrained-models/BigEarthNet-MM_19-Classes/ResNet50.zip) |
| ResNet101 | [ResNet101.zip](http://bigearth.net/static/pretrained-models/BigEarthNet-MM_19-Classes/ResNet101.zip) |
| ResNet152 | [ResNet152.zip](http://bigearth.net/static/pretrained-models/BigEarthNet-MM_19-Classes/ResNet152.zip) |
The TensorFlow code for these models can be found [here](https://gitlab.tu-berlin.de/rsim/bigearthnet-models-tf).
The pre-trained models associated to other deep learning libraries will be released soon.
## Generation of Training/Test/Validation Splits
After downloading the raw images from http://bigearth.net, they need to be prepared for your ML application. We provide the script `prep_splits_19_classes.py` for this purpose. It generates consumable data files (i.e., TFRecord) for training, validation and test splits which are suitable to use with TensorFlow. Suggested splits can be found with corresponding csv files under `splits` folder. The following command line arguments for `prep_splits_19_classes.py` can be specified:
* `-r1` or `--root_folder_s1`: The root folder containing the raw images of the downloaded BigEarthNet-S1 dataset.
* `-r2` or `--root_folder_s2`: The root folder containing the raw images of the downloaded BigEarthNet-S2 dataset.
* `-o` or `--out_folder`: The output folder where the resulting files will be created.
* `-n` or `--splits`: A list of CSV files each of which contains the patch names of corresponding split.
* `-l` or `--library`: A flag to indicate for which ML library data files will be prepared: TensorFlow.
* `--update_json`: A flag to indicate that this script will also change the original json files of the BigEarthNet-MM by updating labels
To run the script, either the GDAL or the rasterio package should be installed. The TensorFlow v1 package should also be installed. The script is tested with Python 3.6, TensorFlow 1.15, and Ubuntu 16.04.
**Note**: BigEarthNet-MM patches with high density snow, cloud and cloud shadow are not included in the training, test and validation sets constructed by the provided scripts (see the list of patches with seasonal snow [here](http://bigearth.net/static/documents/patches_with_seasonal_snow.csv) and that of cloud and cloud shadow [here](http://bigearth.net/static/documents/patches_with_cloud_and_shadow.csv)).
## License
The BigEarthNet Archive is licensed under the **Community Data License Agreement – Permissive, Version 1.0** ([Text](https://cdla.io/permissive-1-0/)).
The code in this repository to facilitate the use of the archive is licensed under the **MIT License**:
```
MIT License
Copyright (c) 2021 The BigEarthNet Authors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
```
{
"original_labels":{
"Continuous urban fabric": 0,
"Discontinuous urban fabric": 1,
"Industrial or commercial units": 2,
"Road and rail networks and associated land": 3,
"Port areas": 4,
"Airports": 5,
"Mineral extraction sites": 6,
"Dump sites": 7,
"Construction sites": 8,
"Green urban areas": 9,
"Sport and leisure facilities": 10,
"Non-irrigated arable land": 11,
"Permanently irrigated land": 12,
"Rice fields": 13,
"Vineyards": 14,
"Fruit trees and berry plantations": 15,
"Olive groves": 16,
"Pastures": 17,
"Annual crops associated with permanent crops": 18,
"Complex cultivation patterns": 19,
"Land principally occupied by agriculture, with significant areas of natural vegetation": 20,
"Agro-forestry areas": 21,
"Broad-leaved forest": 22,
"Coniferous forest": 23,
"Mixed forest": 24,
"Natural grassland": 25,
"Moors and heathland": 26,
"Sclerophyllous vegetation": 27,
"Transitional woodland/shrub": 28,
"Beaches, dunes, sands": 29,
"Bare rock": 30,
"Sparsely vegetated areas": 31,
"Burnt areas": 32,
"Inland marshes": 33,
"Peatbogs": 34,
"Salt marshes": 35,
"Salines": 36,
"Intertidal flats": 37,
"Water courses": 38,
"Water bodies": 39,
"Coastal lagoons": 40,
"Estuaries": 41,
"Sea and ocean": 42
},
"label_conversion": [
[0, 1],
[2],
[11, 12, 13],
[14, 15, 16, 18],
[17],
[19],
[20],
[21],
[22],
[23],
[24],
[25, 31],
[26, 27],
[28],
[29],
[33, 34],
[35, 36],
[38, 39],
[40, 41, 42]
],
"BigEarthNet-19_labels":{
"Urban fabric": 0,
"Industrial or commercial units": 1,
"Arable land": 2,
"Permanent crops": 3,
"Pastures": 4,
"Complex cultivation patterns": 5,
"Land principally occupied by agriculture, with significant areas of natural vegetation": 6,
"Agro-forestry areas": 7,
"Broad-leaved forest": 8,
"Coniferous forest": 9,
"Mixed forest": 10,
"Natural grassland and sparsely vegetated areas": 11,
"Moors, heathland and sclerophyllous vegetation": 12,
"Transitional woodland, shrub": 13,
"Beaches, dunes, sands": 14,
"Inland wetlands": 15,
"Coastal wetlands": 16,
"Inland waters": 17,
"Marine waters": 18
}
}
\ No newline at end of file
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# This script creates splits with TFRecord files from 1) BigEarthNet
# image patches based on csv files that contain patch names and
# 2) the new class nomenclature (BigEarthNet-19)
#
# prep_splits_19_classes.py --help can be used to learn how to use this script.
#
# Date: 16 Jan 2020
# Version: 1.0.1
# Usage: prep_splits_19_classes.py [-h] [-r ROOT_FOLDER] [-o OUT_FOLDER] [--update_json]
# [-n PATCH_NAMES [PATCH_NAMES ...]]
from __future__ import print_function
import argparse
import os
import csv
import json
from tensorflow_utils import prep_tf_record_files
GDAL_EXISTED = False
RASTERIO_EXISTED = False
UPDATE_JSON = False
with open('label_indices.json', 'rb') as f:
label_indices = json.load(f)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description=
'This script creates TFRecord files for the BigEarthNet train, validation and test splits')
parser.add_argument('-r1', '--root_folder_s1', dest = 'root_folder_s1',
help = 'root folder path contains multiple patch folders')
parser.add_argument('-r2', '--root_folder_s2', dest = "root_folder_s2")
parser.add_argument('-o', '--out_folder', dest = 'out_folder',
help = 'folder path containing resulting TFRecord or LMDB files')
parser.add_argument('--update_json', default = False, action = "store_true", help =
'flag for adding BigEarthNet-19 labels to the json file of each patch')
parser.add_argument('-n', '--splits', dest = 'splits', help =
'csv files each of which contain list of patch names, patches with snow, clouds, and shadows already excluded', nargs = '+')
parser.add_argument('-l', '--library', type=str, dest = 'library', help="Limit search to Sentinel mission", choices=['tensorflow'])
args = parser.parse_args()
# Checks the existence of patch folders and populate the list of patch folder paths
folder_path_list = []
if args.root_folder_s1 and args.root_folder_s2:
if not os.path.exists(args.root_folder_s1):
print('ERROR: folder', args.root_folder_s1, 'does not exist')
exit()
if not os.path.exists(args.root_folder_s2):
print('ERROR: folder', args.root_folder_s2, 'does not exist')
exit()
else:
print('ERROR: folder', args.patch_folder, 'does not exist')
exit()
# Checks the existence of required python packages
try:
import gdal
GDAL_EXISTED = True
print('INFO: GDAL package will be used to read GeoTIFF files')
except ImportError:
try:
import rasterio
RASTERIO_EXISTED = True
print('INFO: rasterio package will be used to read GeoTIFF files')
except ImportError:
print('ERROR: please install either GDAL or rasterio package to read GeoTIFF files')
exit()
try:
import numpy as np
except ImportError:
print('ERROR: please install numpy package')
exit()
if args.splits:
try:
patch_names_list = []
split_names = []
for csv_file in args.splits:
patch_names_list.append([])
split_names.append(os.path.basename(csv_file).split('.')[0])
with open(csv_file, 'r') as fp:
csv_reader = csv.reader(fp, delimiter=',')
for row in csv_reader:
patch_names_list[-1].append(row)
except:
print('ERROR: some csv files either do not exist or have been corrupted')
exit()
if args.update_json:
UPDATE_JSON = True
if args.library == 'tensorflow':
try:
import tensorflow as tf
except ImportError:
print('ERROR: please install tensorflow package to create TFRecord files')
exit()
prep_tf_record_files(
args.root_folder_s1,
args.root_folder_s2,
args.out_folder,
split_names,
patch_names_list,
label_indices,
GDAL_EXISTED,
RASTERIO_EXISTED,
UPDATE_JSON
)
tensorflow==1.15
\ No newline at end of file
This folder contains the suggested training, validation, and test splits. Each csv file includes patch names of the corresponding split. They must be passed to the `prep_splits.py` script.
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
import tensorflow as tf
import numpy as np
import os
import json
# SAR band names to read related GeoTIFF files
band_names_s1 = ["VV", "VH"]
# Spectral band names to read related GeoTIFF files
band_names_s2 = ['B01', 'B02', 'B03', 'B04', 'B05',
'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12']
def prep_example(bands, BigEarthNet_19_labels, BigEarthNet_19_labels_multi_hot, patch_name_s1, patch_name_s2):
return tf.train.Example(
features=tf.train.Features(
feature={
'B01': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B01']))),
'B02': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B02']))),
'B03': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B03']))),
'B04': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B04']))),
'B05': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B05']))),
'B06': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B06']))),
'B07': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B07']))),
'B08': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B08']))),
'B8A': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B8A']))),
'B09': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B09']))),
'B11': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B11']))),
'B12': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B12']))),
"VV": tf.train.Feature(
float_list=tf.train.FloatList(value=np.ravel(bands['VV']))),
"VH": tf.train.Feature(
float_list=tf.train.FloatList(value=np.ravel(bands['VH']))),
'BigEarthNet-19_labels': tf.train.Feature(
bytes_list=tf.train.BytesList(
value=[i.encode('utf-8') for i in BigEarthNet_19_labels])),
'BigEarthNet-19_labels_multi_hot': tf.train.Feature(
int64_list=tf.train.Int64List(value=BigEarthNet_19_labels_multi_hot)),
'patch_name_s1': tf.train.Feature(
bytes_list=tf.train.BytesList(value=[patch_name_s1.encode('utf-8')])),
'patch_name_s2': tf.train.Feature(
bytes_list=tf.train.BytesList(value=[patch_name_s2.encode('utf-8')]))
}))
def create_split(root_folder_s1, root_folder_s2, patch_names, TFRecord_writer, label_indices, GDAL_EXISTED, RASTERIO_EXISTED, UPDATE_JSON):
label_conversion = label_indices['label_conversion']
BigEarthNet_19_label_idx = {v: k for k, v in label_indices['BigEarthNet-19_labels'].items()}
if GDAL_EXISTED:
import gdal
elif RASTERIO_EXISTED:
import rasterio
progress_bar = tf.contrib.keras.utils.Progbar(target = len(patch_names))
for patch_idx, patch_name in enumerate(patch_names):
patch_name_s1, patch_name_s2 = patch_name[1], patch_name[0]
patch_folder_path_s1 = os.path.join(root_folder_s1, patch_name_s1)
patch_folder_path_s2 = os.path.join(root_folder_s2, patch_name_s2)
bands = {}
for band_name in band_names_s1:
band_path = os.path.join(
patch_folder_path_s1, patch_name_s1 + '_' + band_name + '.tif')
if GDAL_EXISTED:
band_ds = gdal.Open(band_path, gdal.GA_ReadOnly)
raster_band = band_ds.GetRasterBand(1)
band_data = raster_band.ReadAsArray()
bands[band_name] = np.array(band_data)
elif RASTERIO_EXISTED:
band_ds = rasterio.open(band_path)
band_data = np.array(band_ds.read(1))
bands[band_name] = np.array(band_data)
for band_name in band_names_s2:
# First finds related GeoTIFF path and reads values as an array
band_path = os.path.join(
patch_folder_path_s2, patch_name_s2 + '_' + band_name + '.tif')
if GDAL_EXISTED:
band_ds = gdal.Open(band_path, gdal.GA_ReadOnly)
raster_band = band_ds.GetRasterBand(1)
band_data = raster_band.ReadAsArray()
bands[band_name] = np.array(band_data)
elif RASTERIO_EXISTED:
band_ds = rasterio.open(band_path)
band_data = np.array(band_ds.read(1))
bands[band_name] = np.array(band_data)
original_labels_multi_hot = np.zeros(
len(label_indices['original_labels'].keys()), dtype=int)
BigEarthNet_19_labels_multi_hot = np.zeros(len(label_conversion),dtype=int)
patch_json_path = os.path.join(
patch_folder_path_s1, patch_name + '_labels_metadata.json')
with open(patch_json_path, 'rb') as f:
patch_json = json.load(f)
original_labels = patch_json['labels']
for label in original_labels:
original_labels_multi_hot[label_indices['original_labels'][label]] = 1
for i in range(len(label_conversion)):
BigEarthNet_19_labels_multi_hot[i] = (
np.sum(original_labels_multi_hot[label_conversion[i]]) > 0
).astype(int)
BigEarthNet_19_labels = []
for i in np.where(BigEarthNet_19_labels_multi_hot == 1)[0]:
BigEarthNet_19_labels.append(BigEarthNet_19_label_idx[i])
if UPDATE_JSON:
patch_json['BigEarthNet_19_labels'] = BigEarthNet_19_labels
with open(patch_json_path, 'wb') as f:
json.dump(patch_json, f)
example = prep_example(
bands,
BigEarthNet_19_labels,
BigEarthNet_19_labels_multi_hot,
patch_name_s1,
patch_name_s2
)
TFRecord_writer.write(example.SerializeToString())
progress_bar.update(patch_idx)
def prep_tf_record_files(root_folder_s1, root_folder_s2, out_folder, split_names, patch_names_list, label_indices, GDAL_EXISTED, RASTERIO_EXISTED, UPDATE_JSON):
try:
writer_list = []
for split_name in split_names:
writer_list.append(
tf.python_io.TFRecordWriter(os.path.join(
out_folder, split_name + '.tfrecord'))
)
except:
print('ERROR: TFRecord writer is not able to write files')
exit()
for split_idx in range(len(patch_names_list)):
print('INFO: creating the split of', split_names[split_idx], 'is started')
create_split(
root_folder_s1,
root_folder_s2,
patch_names_list[split_idx],
writer_list[split_idx],
label_indices,
GDAL_EXISTED,
RASTERIO_EXISTED,
UPDATE_JSON
)
writer_list[split_idx].close()
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment