`mksquashfs` process terminates unexpectedly when running `squash-dataset`
I tried squashing a larger dataset with squash-dataset
.
The operation terminated unexpectedly, with the mksquashfs
process being killed.
No alarm - this is just a squashed crash-test dummy. I need only a small subset of the data, and that subset can be squashed without problems.
The issue is only raised to document unexpected behavior.
Command run:
srun --partition=cpu-2d --pty bash
squash-dataset /home/space/datasets/google-landmark-v2/train/ /home/space/datasets-sqfs/google-landmark-v2-train.sqfs
Output:
Parallel mksquashfs: Using 64 processors
Creating 4.0 filesystem on /home/space/datasets-sqfs/google-landmark-v2-train.sqfs, block size 131072.
[===- ] 401976/5965198 6%/usr/local/bin/squash-dataset: line 9: 2777831 Killed mksquashfs $SOURCE_DIR $TARGET_FILE -all-root -action 'chmod(o+rX)@!perm(o+rX)'
Expected behavior: The dataset should be successfully squashed without crashing.
Actual Behavior:
The process is unexpectedly "Killed". A .sqfs
file is created, but it is incomplete.
Additional information:
- The dataset has over 5M images and is over 500GB large.
-
squash-dataset
works perfectly fine for other datasets (biggest difference being the size). - The command runs >3 hours until it crashes (on a
cpu-2d
partition). - The incomplete
.sqfs
file is 34GB large. - There are two other datasets that are >100GB in
/home/space/datasets-sqfs/
, but they have been created before Hydra.
Questions:
- Is
squashfs
, copying to/tmp/
, and binding viaapptainer -B
the right approach for such a datasets, or are there better approaches? - Is this known or expected behavior for
mksquashfs
orsquash-dataset
? - Are there any workarounds or fixes available? (e.g. breaking the dataset into smaller chunks, or squashing it on a holiday)