Skip to content
Snippets Groups Projects

Triplet Sampling Method for Multi-Label Remote Sensing Image Search and Retrieval

This repository contains the code for the following paper:

G. Sumbul, M. Ravanbakhsh, B. Demir, "Informative and Representative Triplet Selection for Multi-Label Remote Sensing Image Retrieval", arXiv preprint arXiv:2105.03647, 2021.

If you use the code in this repository in your research, please cite the following paper:

@article{Sumbul:2021,
    author = {Gencer Sumbul and Mahdyar Ravanbakhsh and Begüm Demir},
    title = {Informative and Representative Triplet Selection for Multi-Label Remote Sensing Image Retrieval},
    booktitle={arXiv preprint arXiv:2105.03647}, 
    year = {2021},
    month={September}
}

Introduction

Content based image retrieval (CBIR) systems are trained to learn semantic similarities from images to find related images in an archive, when provided with a query image. One popular way of training such systems is py feeding them triplets of images as input. The first image is called the anchor, the second image, which is similar to the anchor, is called the positive image, and the third is dissimilar to the anchor and is called the negative image. The model is forced to understand the relationship between these images by learning an embedding space, in which the anchor and the positive image are closer to each other than the anchor and the negative image.

Selecting Triplets

It is not clear, how triplets should be selected from an archive of images. The cited paper above explores different baselines and proposes a novel triplet selection method.

Selection of Anchors

Given a mini-batch of images, the following options are available in the code:

  • Select all images as anchors.
  • Select a random number of images as anchors.
  • Select diverse anchors (see the paper for more details).

Selection of Positive/Negative Images

Similarly to the anchor selection, there are different strategies available to select the positive and negative images for given anchors:

  • Select all possible combinations.
  • Select a set of random images as positive and negative images.
  • Select a set of deliberately chosen images based on certain criteria (see the paper for more details).

Running the Code

The code can be run either manually or within a prepared Docker container.

Prerequisites

The code in this repository requires Python 3.6 and TensorFlow 2.2, along with a number of packages, which are given in requirements.txt. The used datasets need to be prepared beforehand as described in the next section.

Datasets

Currently, support for BigEarthNet and the UC Merced Land Use Dataset is implemented. Take a look at data.py to see, how the datasets are handled and how to add new datasets in the same manner.

The data is loaded from TFRecord files for fast processing. These have to be prepared beforehand. For BigEarthNet, follow the instructions here to create the necessary files or run the prepare_ben.py script:

python source/prepare_ben.py <IMAGE_ROOT_DIR> <OUTPUT_DIR>

This will use the official splits into training, test, and validation.

For UC Merced, the original dataset is annotated with single labels. A multi-label version is available here. Run the script prepare_ucm.py to create the needed TFRecord files:

python source/prepare_ucm.py <IMAGE_ROOT_DIR> <MUTLI_LABELS_CSV> <OUTPUT_DIR>

where IMAGE_ROOT_DIR ist the root directory of the extracted UC Merced images, MUTLI_LABELS_CSV is the CSV file containing the multi-labels (note: the file extension is .txt, but its content is CSV), and OUTPUT_DIR is where the final files will be placed.

For visualization, an HDF5 version of the datasets is used, as it allows fast sequential read access.

Model Architectures

Currently, ResNet-50, DenseNet-121, and SCNN are implemented. Other architecture can easily be added, check model_ben.py and model_ucm.py.

Configuration

The experiments are configured via a file. This may either be a JSON or YAML file. The only things that cannot be configured in the config file are the used GPUs and the amount of logging TensorFlow produces, those need to be set via environment variables.

An experiment consists of five possible phases: train, extract, retrieve, and evaluate. Each phase has their own set of parameters and may be enabled or disabled for a run. There is an experimental implementation for visualization, but it is not working for all datasets right now. It expects the data provided as an HDF5 file for fast linear read access.

General Options

Some general options are set at the top-level of the configuration:

  • output_path: The root path, where all output data from the experiments will be stored. Each phase creates it own subfolder.
  • version: An arbitrary value that may be used to keep track of different experiments.
  • clear_old_output: A boolean value to indicate, whether the output directory should be cleaned at the very start of the experiment.
  • num_classes: The number of classes present in the dataset that is used in these experiments.
  • query_data_size: The number of individual images in the query dataset.
  • archive_data_size: The number of individual images in the archive dataset.
  • model_args: A custom set of arguments that is passed to the constructor of the model that is created internally. This can be extended, if custom model architectures are used, but these options are currently available:
    • model_arch: The model architecture, currently either "resnet", "densenet", or "scnn".
    • feature_size: The embedding size of the images.

The train Phase

In this phase, the embedding model is trained with triplets. The following options are available:

  • enabled: A boolean value to indicate, if this phase is enabled in the experiment.
  • epochs: The number of training epochs that are performed.
  • continue_training: The filename of the training weights that will be restored for continued training. If empty, the training will start from scratch.
  • margin: The margin of the triplet loss.
  • visualize_triplets: A boolean value indicating, whether or not triplets that are selected during the training should be visualized in TensorBoard.
  • evaluation_period: The frequency, with which checkpoints for intermediate evaluation are written. Set to 0 to disable intermediate evaluation.
  • optimizer: Configuration for the used optimizer:
    • name: The name of the used optimizer, e.g. adam. This has to match with the names recognized by TensorFlow: https://www.tensorflow.org/versions/r2.1/api_docs/python/tf/keras/optimizers/get.
    • args: A set of additional arguments that are passed to the optimizer, when it is created, e.g. learning_rate.
    • learning_rate_decay: Optional configuration for a decaying learning rate:
      • enabled: A boolean value indicating, whether or not the decay is enabled.
      • initial_learning_rate: The initial learning rate that will be decayed over time.
      • decay_steps: The number of steps, after which the the learning rate is dacayed.
      • decay_rate: The rate at which the learning rate is decayed.
  • data: Configuration for the used training data:
    • filename: The filename of the used TFRecord files, can be a single file or a list of files.
    • size: The number of unique images in the training dataset.
    • batch_size: The size of each batch that will be processed.
    • shuffle_size: The size of the shuffle buffer, i.e., how many images should be randomly shuffled, before a new batch is selected.
    • shuffle_seed: The random seed for the shuffeling. Only set this, if you want to control the way the data is shuffled, if omitted a random seed is used.
    • num_parallel_calls: The number of parallel calls to load the dataset from file into the memory.
    • prefetch_size: The number of batches that are prefetched from the file system.
  • checkpoints: Configuration for the saving of checkpoints during training:
    • enabled: A boolean value indicating, whether or not checkpoints are written during training.
    • start: The epoch, in which checkpoints are started to be written, i.e. epoch: 10 means that checkpoints will be written starting in the 11th epoch.
    • frequency: The frequency, with which checkpoints are written.
  • anchors: Configuration for the selection of anchors:
    • selection: The selection strategy for the anchors, either "exhaustive", "random", or "diverse".
    • number: The number of anchors that are selected, relevant for the selection strategies "random" and "diverse".
  • triplets: Configuration of the selection of positive and negative images:
    • selection: The selection strategy for the positive and negative images, either "exhaustive", "random", or "smart".
    • num_elements: The number of positive and negative images to select, relevant for the selection strategies "random" and "smart".
    • Additional configuration options for selection strategy "smart":
      • initial_selection_strategy: Initially a different strategy can be used for a few epochs to ensure stability in the algorithm. This can be either "exhaustive" or "random".
      • initial_epochs: The number of epochs, that the initial selection strategy should be applied. Set to 0 to disable any initial strategy.
      • iteration_frequency: The frequency, with which the model for the semantic similarity is updated.
      • beta: The weight parameter to determine the influence of the hardness of the images.
      • gamma: The weight parameter to determine the influence of the diversity of the images.

The extract Phase

In this phase, image embeddings are extracted from the trained model for later retrieval. The following options are available:

  • enabled: A boolean value to indicate, if this phase is enabled in the experiment.
  • minor_results: A boolean value to indicate, if image embeddings of intermediate checkpoints are extracted.
  • batch_size: The size of a batch for the extraction.
  • query_filename: The TFRecord file for the query dataset.
  • archive_filename: The TFRecord file for the archive dataset.

The retrieve Phase

In this phase, similar images from the archive are retrieved for all query images. The following options are available:

  • enabled: A boolean value to indicate, if this phase is enabled in the experiment.
  • minor_results: A boolean value to indicate, if retrievel is applied for intermediate checkpoints.
  • query_batch_size: The size of the batches from the query dataset.
  • archive_batch_size: The size of the batches from the archive dataset.
  • num_retrieved_images: The number of images that are retrieved per query image from the archive.
  • metric: The distance metric that is used for retrieving images. The only possible values are currently "sqrt_abs_error_sum" and "hamming_distance", but more can easily be added in retrieve.py.
  • num_cpus: The number of CPU threads that are used for the image retrieval step.
  • used_memory: The amount of memory in GB that is used during the retrieval.

The evaluate Phase

In this phase, the retrieved images are evaluated. The following options are available:

  • enabled: A boolean value to indicate, if this phase is enabled in the experiment.
  • minor_results: A boolean value to indicate, if evaluation is applied for intermediate checkpoints.
  • data_size: The size of the archive.
  • batch_size: The size of the batches that are evaluated.
  • num_retrieved_images: Configuration of the retrieved images:
    • start: The number, with how many images to start.
    • end: The number, at how many images to stop.
    • frequency: The frequency of the evaluation.
  • intermediate_evaluation_images: A list of numbers of images that are used for the intermediate evaluation; e.g. [10, 20, 30] means performance for the intermediate checkpoints are evaluated for 10, 20, and 30 retrieved images.

The visualize Phase

In this phase, the retrieved images are visualized. The following options are available:

  • enabled: A boolean value to indicate, if this phase is enabled in the experiment.
  • hdf_file: For BigEarthNet, the data must be given as an HDF5 file.
  • image_root_path: For UCMerced, the root directory of the raw images must be given.
  • image_root_path: A boolean value to indicate, if the images retrieved by the intermediate steps are visualized.

Manually

Make sure the requirements in requirements.txt are met, like this for example:

pip install --upgrade pip && pip install -r requirements.txt

TensorFlow (2.2+) has to be installed, make sure you install it according to your environment with up-to-date drivers and libraries (note: if you have no special requirements, pip install tensorflow==2.2.3 will do the job). Then start your experiment like this:

python source/experiment.py <CONFIG_FILE> <LOG_FILE>

For configuring which GPU to use, set CUDA_VISIBLE_DEVICES accordingly beforehand.

With Docker

Prepare a configuration file as explained above and the datasets. Then build the Docker image once (or again after changes to the code):

docker build . -t triplet_image

Then start the experiment:

docker run --gpus all -p <PORT>:<PORT> \
  -v <YOUR_CONFIG_FOLDER_TO_MOUNT>:/config \
  -v <YOUR_DATA_FOLDER_TO_MOUNT>:/data \
  -v <YOUR_OUTPUT_FOLDER_TO_MOUNT>:/output \
  triplet_image:latest <CONFIG_FILE> <USED_GPUS> <TF_LOGGING_LEVEL>

Replace PORT and the mounted volume paths with your own values. The CONFIG_FILE paramater simply expects the name of the configuration file in the mounted configuration folder. In your configuration file, use the paths /data and /output in the respective places. Check config/docker.yaml for an example Docker configuration. The parameter USED_GPUS is given to CUDA_VISIBLE_DEVICES and TF_LOGGING_LEVEL controls the output of TensorFlow (0 = ALL, 1 = no INFO, 2 = no WARNING, 3 = no ERROR).

Author

Gencer Sumbul

Tristan Kreuziger

Mahdyar Ravanbakhsh

Acknowledgement

The Authors would like to thank Tristan Kreuziger for the initial version of the code.

License

The code in this repository is licensed under the MIT License:

MIT License

Copyright (c) 2021 The Authors of the paper "Informative and Representative Triplet 
Selection for Multi-Label Remote Sensing Image Retrieval"

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.