Triplet Sampling Method for Multi-Label Remote Sensing Image Search and Retrieval
This repository contains the code for the following paper:
G. Sumbul, M. Ravanbakhsh, B. Demir, "Informative and Representative Triplet Selection for Multi-Label Remote Sensing Image Retrieval", arXiv preprint arXiv:2105.03647, 2021.
If you use the code in this repository in your research, please cite the following paper:
@article{Sumbul:2021,
author = {Gencer Sumbul and Mahdyar Ravanbakhsh and Begüm Demir},
title = {Informative and Representative Triplet Selection for Multi-Label Remote Sensing Image Retrieval},
booktitle={arXiv preprint arXiv:2105.03647},
year = {2021},
month={September}
}
Introduction
Content based image retrieval (CBIR) systems are trained to learn semantic similarities from images to find related images in an archive, when provided with a query image. One popular way of training such systems is py feeding them triplets of images as input. The first image is called the anchor, the second image, which is similar to the anchor, is called the positive image, and the third is dissimilar to the anchor and is called the negative image. The model is forced to understand the relationship between these images by learning an embedding space, in which the anchor and the positive image are closer to each other than the anchor and the negative image.
Selecting Triplets
It is not clear, how triplets should be selected from an archive of images. The cited paper above explores different baselines and proposes a novel triplet selection method.
Selection of Anchors
Given a mini-batch of images, the following options are available in the code:
- Select all images as anchors.
- Select a random number of images as anchors.
- Select diverse anchors (see the paper for more details).
Selection of Positive/Negative Images
Similarly to the anchor selection, there are different strategies available to select the positive and negative images for given anchors:
- Select all possible combinations.
- Select a set of random images as positive and negative images.
- Select a set of deliberately chosen images based on certain criteria (see the paper for more details).
Running the Code
The code can be run either manually or within a prepared Docker container.
Prerequisites
The code in this repository requires Python 3.6 and TensorFlow 2.2, along with a number of packages, which are given in requirements.txt
. The used datasets need to be prepared beforehand as described in the next section.
Datasets
Currently, support for BigEarthNet and the UC Merced Land Use Dataset is implemented. Take a look at data.py
to see, how the datasets are handled and how to add new datasets in the same manner.
The data is loaded from TFRecord
files for fast processing. These have to be prepared beforehand. For BigEarthNet, follow the instructions here to create the necessary files or run the prepare_ben.py
script:
python source/prepare_ben.py <IMAGE_ROOT_DIR> <OUTPUT_DIR>
This will use the official splits into training, test, and validation.
For UC Merced, the original dataset is annotated with single labels. A multi-label version is available here. Run the script prepare_ucm.py
to create the needed TFRecord
files:
python source/prepare_ucm.py <IMAGE_ROOT_DIR> <MUTLI_LABELS_CSV> <OUTPUT_DIR>
where IMAGE_ROOT_DIR
ist the root directory of the extracted UC Merced images, MUTLI_LABELS_CSV
is the CSV
file containing the multi-labels (note: the file extension is .txt
, but its content is CSV
), and OUTPUT_DIR
is where the final files will be placed.
For visualization, an HDF5
version of the datasets is used, as it allows fast sequential read access.
Model Architectures
Currently, ResNet-50
, DenseNet-121
, and SCNN
are implemented. Other architecture can easily be added, check model_ben.py
and model_ucm.py
.
Configuration
The experiments are configured via a file. This may either be a JSON
or YAML
file. The only things that cannot be configured in the config file are the used GPUs and the amount of logging TensorFlow produces, those need to be set via environment variables.
An experiment consists of five possible phases: train
, extract
, retrieve
, and evaluate
. Each phase has their own set of parameters and may be enabled or disabled for a run. There is an experimental implementation for visualization, but it is not working for all datasets right now. It expects the data provided as an HDF5
file for fast linear read access.
General Options
Some general options are set at the top-level of the configuration:
-
output_path
: The root path, where all output data from the experiments will be stored. Each phase creates it own subfolder. -
version
: An arbitrary value that may be used to keep track of different experiments. -
clear_old_output
: A boolean value to indicate, whether the output directory should be cleaned at the very start of the experiment. -
num_classes
: The number of classes present in the dataset that is used in these experiments. -
query_data_size
: The number of individual images in the query dataset. -
archive_data_size
: The number of individual images in the archive dataset. -
model_args
: A custom set of arguments that is passed to the constructor of the model that is created internally. This can be extended, if custom model architectures are used, but these options are currently available:-
model_arch
: The model architecture, currently either"resnet"
,"densenet"
, or"scnn"
. -
feature_size
: The embedding size of the images.
-
train
Phase
The In this phase, the embedding model is trained with triplets. The following options are available:
-
enabled
: A boolean value to indicate, if this phase is enabled in the experiment. -
epochs
: The number of training epochs that are performed. -
continue_training
: The filename of the training weights that will be restored for continued training. If empty, the training will start from scratch. -
margin
: The margin of the triplet loss. -
visualize_triplets
: A boolean value indicating, whether or not triplets that are selected during the training should be visualized in TensorBoard. -
evaluation_period
: The frequency, with which checkpoints for intermediate evaluation are written. Set to0
to disable intermediate evaluation. -
optimizer
: Configuration for the used optimizer:-
name
: The name of the used optimizer, e.g.adam
. This has to match with the names recognized by TensorFlow: https://www.tensorflow.org/versions/r2.1/api_docs/python/tf/keras/optimizers/get. -
args
: A set of additional arguments that are passed to the optimizer, when it is created, e.g.learning_rate
. -
learning_rate_decay
: Optional configuration for a decaying learning rate:-
enabled
: A boolean value indicating, whether or not the decay is enabled. -
initial_learning_rate
: The initial learning rate that will be decayed over time. -
decay_steps
: The number of steps, after which the the learning rate is dacayed. -
decay_rate
: The rate at which the learning rate is decayed.
-
-
-
data
: Configuration for the used training data:-
filename
: The filename of the used TFRecord files, can be a single file or a list of files. -
size
: The number of unique images in the training dataset. -
batch_size
: The size of each batch that will be processed. -
shuffle_size
: The size of the shuffle buffer, i.e., how many images should be randomly shuffled, before a new batch is selected. -
shuffle_seed
: The random seed for the shuffeling. Only set this, if you want to control the way the data is shuffled, if omitted a random seed is used. -
num_parallel_calls
: The number of parallel calls to load the dataset from file into the memory. -
prefetch_size
: The number of batches that are prefetched from the file system.
-
-
checkpoints
: Configuration for the saving of checkpoints during training:-
enabled
: A boolean value indicating, whether or not checkpoints are written during training. -
start
: The epoch, in which checkpoints are started to be written, i.e.epoch: 10
means that checkpoints will be written starting in the 11th epoch. -
frequency
: The frequency, with which checkpoints are written.
-
-
anchors
: Configuration for the selection of anchors:-
selection
: The selection strategy for the anchors, either"exhaustive"
,"random"
, or"diverse"
. -
number
: The number of anchors that are selected, relevant for the selection strategies"random"
and"diverse"
.
-
-
triplets
: Configuration of the selection of positive and negative images:-
selection
: The selection strategy for the positive and negative images, either"exhaustive"
,"random"
, or"smart"
. -
num_elements
: The number of positive and negative images to select, relevant for the selection strategies"random"
and"smart"
. - Additional configuration options for selection strategy
"smart"
:-
initial_selection_strategy
: Initially a different strategy can be used for a few epochs to ensure stability in the algorithm. This can be either"exhaustive"
or"random"
. -
initial_epochs
: The number of epochs, that the initial selection strategy should be applied. Set to0
to disable any initial strategy. -
iteration_frequency
: The frequency, with which the model for the semantic similarity is updated. -
beta
: The weight parameter to determine the influence of the hardness of the images. -
gamma
: The weight parameter to determine the influence of the diversity of the images.
-
-
extract
Phase
The In this phase, image embeddings are extracted from the trained model for later retrieval. The following options are available:
-
enabled
: A boolean value to indicate, if this phase is enabled in the experiment. -
minor_results
: A boolean value to indicate, if image embeddings of intermediate checkpoints are extracted. -
batch_size
: The size of a batch for the extraction. -
query_filename
: The TFRecord file for the query dataset. -
archive_filename
: The TFRecord file for the archive dataset.
retrieve
Phase
The In this phase, similar images from the archive are retrieved for all query images. The following options are available:
-
enabled
: A boolean value to indicate, if this phase is enabled in the experiment. -
minor_results
: A boolean value to indicate, if retrievel is applied for intermediate checkpoints. -
query_batch_size
: The size of the batches from the query dataset. -
archive_batch_size
: The size of the batches from the archive dataset. -
num_retrieved_images
: The number of images that are retrieved per query image from the archive. -
metric
: The distance metric that is used for retrieving images. The only possible values are currently"sqrt_abs_error_sum"
and"hamming_distance"
, but more can easily be added inretrieve.py
. -
num_cpus
: The number of CPU threads that are used for the image retrieval step. -
used_memory
: The amount of memory in GB that is used during the retrieval.
evaluate
Phase
The In this phase, the retrieved images are evaluated. The following options are available:
-
enabled
: A boolean value to indicate, if this phase is enabled in the experiment. -
minor_results
: A boolean value to indicate, if evaluation is applied for intermediate checkpoints. -
data_size
: The size of the archive. -
batch_size
: The size of the batches that are evaluated. -
num_retrieved_images
: Configuration of the retrieved images:-
start
: The number, with how many images to start. -
end
: The number, at how many images to stop. -
frequency
: The frequency of the evaluation.
-
-
intermediate_evaluation_images
: A list of numbers of images that are used for the intermediate evaluation; e.g.[10, 20, 30]
means performance for the intermediate checkpoints are evaluated for10
,20
, and30
retrieved images.
visualize
Phase
The In this phase, the retrieved images are visualized. The following options are available:
-
enabled
: A boolean value to indicate, if this phase is enabled in the experiment. -
hdf_file
: For BigEarthNet, the data must be given as an HDF5 file. -
image_root_path
: For UCMerced, the root directory of the raw images must be given. -
image_root_path
: A boolean value to indicate, if the images retrieved by the intermediate steps are visualized.
Manually
Make sure the requirements in requirements.txt
are met, like this for example:
pip install --upgrade pip && pip install -r requirements.txt
TensorFlow (2.2+) has to be installed, make sure you install it according to your environment with up-to-date drivers and libraries (note: if you have no special requirements, pip install tensorflow==2.2.3
will do the job). Then start your experiment like this:
python source/experiment.py <CONFIG_FILE> <LOG_FILE>
For configuring which GPU to use, set CUDA_VISIBLE_DEVICES
accordingly beforehand.
With Docker
Prepare a configuration file as explained above and the datasets. Then build the Docker image once (or again after changes to the code):
docker build . -t triplet_image
Then start the experiment:
docker run --gpus all -p <PORT>:<PORT> \
-v <YOUR_CONFIG_FOLDER_TO_MOUNT>:/config \
-v <YOUR_DATA_FOLDER_TO_MOUNT>:/data \
-v <YOUR_OUTPUT_FOLDER_TO_MOUNT>:/output \
triplet_image:latest <CONFIG_FILE> <USED_GPUS> <TF_LOGGING_LEVEL>
Replace PORT
and the mounted volume paths with your own values. The CONFIG_FILE
paramater simply expects the name of the configuration file in the mounted configuration folder. In your configuration file, use the paths /data
and /output
in the respective places. Check config/docker.yaml
for an example Docker configuration. The parameter USED_GPUS
is given to CUDA_VISIBLE_DEVICES
and TF_LOGGING_LEVEL
controls the output of TensorFlow (0 = ALL, 1 = no INFO, 2 = no WARNING, 3 = no ERROR
).
Author
Acknowledgement
The Authors would like to thank Tristan Kreuziger for the initial version of the code.
License
The code in this repository is licensed under the MIT License:
MIT License
Copyright (c) 2021 The Authors of the paper "Informative and Representative Triplet
Selection for Multi-Label Remote Sensing Image Retrieval"
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.