Final preparation
The last steps are to expand the dataset with augmentations, optionally upscale images, rename files with clean sequential names, normalize image formats and channels, produce resized versions at the dimensions you need for training, and validate the result.
Augment
The augment command increases dataset size by applying image transformations. It reads from one or more source folders and writes the transformed images (plus the originals by default) to a destination folder.
To create the final 1024px dataset with horizontal flips (2x the images):
You can combine multiple transforms in a single run. This produces the original plus three variants of each image (4x the dataset):
If you only want the transformed images without copying the originals:
Upscale
The upscale command increases image resolution using AI super-resolution models. This is useful when source images are too small for your target training resolution — for example, face crops that came out at 256px but you need 1024px.
To upscale images 4x (the default):
For 2x upscaling:
The default 4x model (Real-ESRGAN) tends to smooth out textures, especially on noisy source images. Use --denoise to control how much denoising is applied. Lower values preserve more natural texture, which is particularly important for face datasets:
The --denoise option accepts a value between 0.0 and 1.0:
0.0— maximum texture preservation (recommended for faces)0.5— balanced1.0— full denoising (smoothest result)
Note that --denoise is only available with 4x upscaling and activates a different, lighter model (realesr-general-x4v3). It cannot be combined with --model or --scale 2.
Large images are processed in tiles to avoid GPU memory issues. If you run into out-of-memory errors, reduce the tile size:
Rename
The rename command gives images clean, sequential filenames with a consistent prefix. Renaming early — before format and resize — means every downstream folder inherits the clean names automatically. It operates in-place — there is no --to option.
To rename all images in final/1024 with a "crowd_" prefix:
This produces crowd_1.jpg, crowd_2.jpg, etc. The number of zero-padded digits is computed automatically from the total count — 5 images get single digits, 100 images get 3 digits. To set it explicitly:
This produces crowd_00001.jpg, crowd_00002.jpg, etc. Sidecar JSON files are renamed along with their images.
Preview what would happen before committing:
Format
The format command normalizes image formats, channels, and metadata before the final resize. This is useful when your sources contain a mix of PNG and JPEG files, images with alpha channels, or embedded EXIF data you want to strip before training.
To convert everything to JPEG and enforce RGB channels:
To strip all EXIF metadata and ICC profiles while preserving the source format:
You can combine multiple normalizations in a single pass — convert to WebP, enforce RGB, and strip metadata:
dtst format -d scratch/crowd --from final/1024 --to final/formatted -f webp --channels rgb --strip-metadata
When converting images with transparency to a format that requires a background (like JPEG), or when using --channels rgb, alpha channels are composited onto white by default. Use --background to change this:
dtst format -d scratch/crowd --from final/1024 --to final/formatted -f jpg --channels rgb --background black
For grayscale datasets:
Resize
The frame command resizes images to a target width and/or height using Lanczos resampling. Use it to produce sized versions of the formatted dataset:
dtst frame -d scratch/crowd --from final/formatted --to final/512 --width 512 --height 512
dtst frame -d scratch/crowd --from final/formatted --to final/256 --width 256 --height 256
When only one dimension is given, the other is computed proportionally to preserve the aspect ratio:
Validate
The validate command checks that every image in a folder shares the same dimensions and channel mode. Run it against your final outputs to catch inconsistencies before training:
If your training target requires square images (e.g. StyleGAN), add --square:
The command also warns if any PNG files use compression above level 0, which slows down data loading during training.
If everything passes you will see output like:
Validated 1,204 images (0m 3s)
Dimensions: PASS (all 512x512)
Channels: PASS (all RGB)
Square: PASS
PNG comp: OK (all 1,204 PNGs at compression level 0)
If any check fails, validate exits with code 1 so you can use it in scripts or workflows.