Skip to content

dtst

dtst - dataset toolkit for datasets creation and curation.

Usage:

dtst [OPTIONS] COMMAND [ARGS]...

Options:

Name Type Description Default
--verbose, -v boolean Enable debug logging False
--help boolean Show this message and exit. False

dtst analyze

Compute image metadata and write JSON sidecars.

Analyzes images in the source folders and writes per-image sidecar JSON files containing the requested metadata (perceptual hash, blur score, or both). Sidecars are merged incrementally — running with --phash then --blur accumulates both.

At least one analyzer flag (--phash, --blur) is required unless using --clear.

Examples:

dtst analyze --from raw --phash --blur -d ./my-dataset dtst analyze config.yaml --phash dtst analyze --from raw,extra --blur --force dtst analyze --from raw --phash --dry-run -d ./my-dataset dtst analyze --from raw --clear -d ./my-dataset

Usage:

dtst analyze [OPTIONS] [CONFIG]

Options:

Name Type Description Default
--from text Comma-separated source folders (supports globs, e.g. 'images/*'). None
--phash boolean Compute perceptual hash for each image. False
--blur boolean Compute blur score (Laplacian variance) for each image. False
--force boolean Recompute all analyzers even if sidecar data already exists. False
--working-dir, -d path Working directory (default: .). None
--workers, -w integer Number of parallel workers (default: CPU count). None
--clear boolean Remove all sidecar files from source folders. False
--dry-run boolean Preview what would be computed without writing sidecars. False
--help boolean Show this message and exit. False

dtst annotate

Write source and license metadata into image sidecars.

Annotates all images in the given folders with provenance metadata (source, license, origin). Useful for manually imported images that were not fetched through the pipeline. Sidecars are merged incrementally — existing fields are preserved unless --overwrite is used.

At least one of --source, --license, or --origin is required.

Examples: dtst annotate --from extra --source "unsplash" --license "cc0" -d ./my-dataset dtst annotate config.yaml dtst annotate --from raw,extra --source "personal" --license "all-rights-reserved" dtst annotate --from extra --source "flickr" --overwrite -d ./my-dataset dtst annotate --from extra --source "personal" --dry-run

Usage:

dtst annotate [OPTIONS] [CONFIG]

Options:

Name Type Description Default
--from text Comma-separated source folders (supports globs, e.g. 'images/*'). None
--source, -s text Source name to write (e.g. 'unsplash', 'personal'). None
--license, -l text License string to write (e.g. 'cc-by', 'cc0', 'all-rights-reserved'). None
--origin, -o text Origin URL to write (applied to all images). None
--overwrite boolean Overwrite existing source/license/origin values in sidecars. False
--working-dir, -d path Working directory (default: .). None
--dry-run boolean Preview what would be annotated without writing sidecars. False
--help boolean Show this message and exit. False

dtst augment

Augment a dataset by applying image transformations.

Reads images from one or more source folders and writes transformed copies to a destination folder. By default the original images are also copied to the output; use --no-copy to write only the transformed versions.

At least one transform flag (--flipX, --flipY, --flipXY) is required. Multiple flags can be combined in a single run to produce several variants of each image.

Transformed files are named with a suffix indicating the transform: photo.jpg becomes photo_flipX.jpg, photo_flipY.jpg, photo_flipXY.jpg.

Can be invoked with just a config file, just CLI options, or both. When both are provided, CLI options override config file values.

Examples:

dtst augment -d ./project --from faces --to augmented --flipX
dtst augment -d ./project --from faces --to augmented --flipX --flipY --flipXY
dtst augment -d ./project --from faces --to augmented --flipX --no-copy
dtst augment config.yaml --dry-run

Usage:

dtst augment [OPTIONS] [CONFIG]

Options:

Name Type Description Default
--working-dir, -d path Working directory containing source folders and where output is written (default: .). None
--from text Comma-separated source folders within the working directory (supports globs, e.g. 'images/*'). None
--to text Destination folder name within the working directory. None
--flipX boolean Apply horizontal flip. False
--flipY boolean Apply vertical flip. False
--flipXY boolean Apply both horizontal and vertical flip (180-degree rotation). False
--no-copy boolean Do not copy original images to the output folder. False
--workers, -w integer Number of parallel workers (default: CPU count). None
--dry-run boolean Preview what would be written without creating files. False
--help boolean Show this message and exit. False

dtst cluster

Cluster images by visual similarity.

Groups images into clusters based on embedding similarity using HDBSCAN. Each cluster is written to a numbered subdirectory (000 = largest, 001 = second largest, etc.) within the output folder. Images that do not belong to any cluster are placed in a noise/ subdirectory.

Supports two embedding models: arcface for face identity clustering (requires face images) and clip for general visual similarity clustering (works with any images).

--min-cluster-size sets the smallest group HDBSCAN will consider a real cluster (default: 5). Raise it to suppress small or spurious clusters; lower it to capture smaller groups.

--min-samples controls how conservative the density estimate is (default: 2). It decides how many close neighbors a point needs before it can join a cluster. Lower values (1-2) let borderline images in; higher values push more images into the noise folder. Keeping this low while adjusting --min-cluster-size is usually the best starting point.

Can be invoked with just a config file, just CLI options, or both. When both are provided, CLI options override config file values.

Examples:

dtst cluster config.yaml
dtst cluster -d ./project --from faces --to clusters
dtst cluster -d ./project --model clip --from raw --to clusters
dtst cluster -d ./project --top 3 --min-cluster-size 10
dtst cluster -d ./project --min-samples 1 --min-cluster-size 8
dtst cluster config.yaml --model arcface --dry-run

Usage:

dtst cluster [OPTIONS] [CONFIG]

Options:

Name Type Description Default
--working-dir, -d path Working directory containing source folders and where output is written (default: .). None
--from text Comma-separated source folders within the working directory (supports globs, e.g. 'images/*'). None
--to, -t text Destination folder name within the working directory. None
--model, -m choice (arcface | clip) Embedding model for similarity (default: arcface). None
--top, -n integer Maximum number of clusters to output; omit for all clusters. None
--min-cluster-size integer Minimum images to form a cluster (default: 5). None
--min-samples integer How many close neighbors a point needs to join a cluster; lower values include more borderline images (default: 2). None
--batch-size, -b integer Images per inference batch (default: 32). None
--workers, -w integer Number of workers for image preloading (default: CPU count). None
--no-cache boolean Skip the embedding cache and recompute from scratch. False
--clean boolean Remove the output directory before writing new clusters. False
--dry-run boolean Show image count and configuration without clustering. False
--help boolean Show this message and exit. False

dtst dedup

Deduplicate images by perceptual hash similarity.

Groups images by phash hamming distance and keeps the best image from each duplicate group. The winner is chosen by resolution (width x height), then file size, then blur sharpness. Losers are moved to a duplicated/ subdirectory within the source folder (configurable with --to).

Requires phash sidecar data from dtst analyze --phash. Blur scores (from dtst analyze --blur) are used as a tiebreaker when available.

Examples: dtst dedup -d ./project --from faces dtst dedup -d ./project --from faces --threshold 4 dtst dedup -d ./project --from faces --to my-dupes dtst dedup config.yaml --dry-run dtst dedup -d ./project --from faces --clear

Usage:

dtst dedup [OPTIONS] [CONFIG]

Options:

Name Type Description Default
--working-dir, -d path Working directory (default: .). None
--from text Folder name to deduplicate within the working directory. None
--to text Subfolder name for duplicate images. None
--threshold, -t integer Phash hamming distance threshold for near-duplicate detection. None
--workers, -w integer Number of parallel workers (default: CPU count). None
--clear boolean Restore all deduplicated images back to the source folder. False
--dry-run boolean Show what would be deduplicated without moving anything. False
--help boolean Show this message and exit. False

dtst detect

Detect objects in images using OWL-ViT 2.

Uses open-vocabulary object detection to find specific objects in images and writes the results into per-image sidecar JSON files under a "classes" key. Each class gets all detections (score + bounding box) sorted by confidence, or null if not found.

Each run replaces the entire "classes" key in the sidecar.

Examples: dtst detect -d ./project --from raw --classes "microphone,chair,table" dtst detect config.yaml dtst detect -d ./project --from raw --classes "microphone" --threshold 0.4 dtst detect -d ./project --from raw --classes "microphone" --dry-run dtst detect -d ./project --from raw --clear

Usage:

dtst detect [OPTIONS] [CONFIG]

Options:

Name Type Description Default
--from text Comma-separated source folders (supports globs, e.g. 'images/*'). None
--classes, -c text Comma-separated object classes to detect (e.g. 'microphone,chair'). None
--threshold float Minimum detection confidence. None
--working-dir, -d path Working directory (default: .). None
--workers, -w integer Number of threads for image preloading (default: 4). None
--max-instances integer Maximum detections per class per image. None
--clear boolean Remove all detection data from sidecar files. False
--dry-run boolean Preview what would be detected without writing sidecars. False
--help boolean Show this message and exit. False

dtst extract-faces

Extract aligned face crops from images.

Detects faces in each image using MediaPipe (default) or dlib, then produces an aligned and cropped face image for each detection. The alignment normalises eye and mouth positions for consistent face crops.

Reads images from one or more source folders within the working directory and writes face crops to a destination folder. Multiple source folders can be specified as a comma-separated list with --from.

Can be invoked with just a config file, just CLI options, or both. When both are provided, CLI options override config file values.

Examples:

dtst extract-faces config.yaml
dtst extract-faces config.yaml --engine dlib --max-size 512
dtst extract-faces -d ./crowd --from raw --to faces
dtst extract-faces -d ./crowd --from raw,extra --to faces
dtst extract-faces config.yaml --max-faces 3 --no-padding

Usage:

dtst extract-faces [OPTIONS] [CONFIG]

Options:

Name Type Description Default
--working-dir, -d path Working directory containing source folders and where output is written (default: .). None
--from text Comma-separated source folders within the working directory (supports globs, e.g. 'images/*'). None
--to text Destination folder name within the working directory. None
--max-size, -M integer Maximum side length in pixels; faces smaller than this are kept at natural size (default: no limit). None
--engine, -e choice (mediapipe | dlib) Face detection engine (default: mediapipe). None
--max-faces, -m integer Max faces to extract per image (default: 1). None
--workers, -w integer Number of parallel workers (default: CPU count). None
--padding / --no-padding boolean Enable/disable reflective padding on crops (default: enabled). None
--skip-partial boolean Skip faces whose crop extends beyond the image boundary instead of padding them. False
--refine-landmarks boolean Enable MediaPipe refined landmarks (478 vs 468). False
--debug boolean Overlay landmark points on output images. False
--help boolean Show this message and exit. False

dtst extract-frames

Extract keyframes from video files using ffmpeg.

Reads video files from one or more source folders and extracts keyframes (I-frames) to a destination folder. Each video produces a set of numbered images named as {video_stem}_{frame_number}.{format}.

Only I-frames are decoded, which avoids interpolated or blurry frames and produces the sharpest possible output. The --keyframes option sets the minimum interval between extracted frames: with the default of 10, at most one keyframe every 10 seconds is kept. Lower values produce more frames, higher values produce fewer.

Supported video formats: .mp4, .mkv, .avi, .mov, .webm, .flv, .wmv, .m4v.

Can be invoked with just a config file, just CLI options, or both. When both are provided, CLI options override config file values.

Examples:

dtst extract-frames -d ./project --from videos --to frames
dtst extract-frames -d ./project --from videos --to frames --keyframes 5
dtst extract-frames -d ./project --from videos --to frames --keyframes 30 --format png
dtst extract-frames config.yaml
dtst extract-frames config.yaml --keyframes 20 --dry-run

Usage:

dtst extract-frames [OPTIONS] [CONFIG]

Options:

Name Type Description Default
--working-dir, -d path Working directory containing source folders and where output is written (default: .). None
--from text Comma-separated source folders within the working directory (supports globs, e.g. 'images/*'). None
--to text Destination folder name within the working directory. None
--keyframes, -k float Minimum interval in seconds between extracted keyframes. Only I-frames are considered; frames closer together than this value are skipped (default: 10). None
--format, -F choice (jpg | png) Output image format (default: jpg). None
--workers, -w integer Number of parallel workers (default: CPU count). None
--dry-run boolean Preview what would be done without extracting frames. False
--help boolean Show this message and exit. False

dtst fetch

Download images and videos from a URL list.

Reads a URL list from the working directory specified by --input. Two formats are supported:

.jsonl JSON Lines with a "url" field per line (search output). Supports --min-size and --license filtering. .txt Plain text with one URL per line. Lines starting with # are treated as comments.

URLs are routed automatically: known video hosting domains (YouTube, Vimeo, etc.) are downloaded with yt-dlp, all other URLs are downloaded directly with HTTP requests.

Image files are named by the MD5 hash of the URL. Video files are named by yt-dlp using the video ID and original extension. Existing files are skipped unless --force is set.

Can be invoked with just a config file, just CLI options, or both. When both are provided, CLI options override config file values.

Examples:

dtst fetch config.yaml
dtst fetch -d ./chanterelle --to raw --input results.jsonl
dtst fetch -d ./project --to videos --input urls.txt
dtst fetch config.yaml --workers 16 --timeout 60
dtst fetch config.yaml --force
dtst fetch -d ./chanterelle --to raw --input results.jsonl --no-wait --license cc

Usage:

dtst fetch [OPTIONS] [CONFIG]

Options:

Name Type Description Default
--working-dir, -d path Working directory where input is read from and media is written to (default: .). None
--to text Destination folder name within the working directory. None
--input, -i text Input file name relative to the working directory. Supports .jsonl and .txt formats. None
--min-size, -s integer Minimum image dimension in pixels; only applies to .jsonl input (default: 512). None
--workers, -w integer Number of parallel download threads (default: CPU count for images, 2 for video). None
--timeout, -t integer Per-request timeout in seconds. 30
--force, -f boolean Re-download files even if they already exist. False
--max-wait, -W integer Max seconds to honor a Retry-After header (default: unlimited). None
--no-wait boolean Never wait for Retry-After headers; use fast exponential backoff instead. False
--license, -l text Only download images whose license starts with this prefix (e.g. 'cc'); only applies to .jsonl input. None
--help boolean Show this message and exit. False

dtst frame

Resize images to a target width and/or height.

Reads images from one or more source folders and writes resized copies to a destination folder. Uses Lanczos resampling for high-quality downscaling.

When both --width and --height are given, the --mode option controls how aspect ratio differences are handled:

stretch Resize to exact dimensions, distorting if needed. crop Scale to cover the target area, then trim excess (default). pad Scale to fit within the target area, then fill the rest.

When only one dimension is given, the other is computed proportionally and --mode is ignored.

Examples:

dtst frame -d ./project --from faces --to resized -W 512 -H 512
dtst frame -d ./project --from faces --to resized -W 512 -H 512 --mode pad --fill blur
dtst frame -d ./project --from faces --to resized -W 512 -H 512 --mode crop --gravity top
dtst frame -d ./project --from faces --to resized --width 512
dtst frame config.yaml --dry-run

Usage:

dtst frame [OPTIONS] [CONFIG]

Options:

Name Type Description Default
--working-dir, -d path Working directory containing source folders and where output is written (default: .). None
--from text Comma-separated source folders within the working directory (supports globs, e.g. 'images/*'). None
--to text Destination folder name within the working directory. None
--width, -W integer Target width in pixels. If --height is omitted, aspect ratio is preserved. None
--height, -H integer Target height in pixels. If --width is omitted, aspect ratio is preserved. None
--mode, -m choice (stretch | crop | pad) Resize mode when both width and height are given (default: crop). None
--gravity, -g choice (center | top | bottom | left | right) Anchor position for crop (part to keep) or pad (where to place image). Default: center. None
--fill, -f choice (color | edge | reflect | blur) Fill strategy for pad mode: color, edge, reflect, or blur (default: color). None
--fill-color text Hex color for pad fill when --fill=color (default: #000000). None
--workers, -w integer Number of parallel workers (default: CPU count). None
--dry-run boolean Preview what would be written without creating files. False
--help boolean Show this message and exit. False

dtst review

Launch a web UI for manual image review.

Opens a local web server with an image grid. Click images to select or deselect them, then apply to move filtered images into a subfolder. Use the view toggle to switch between source and filtered images to restore previously filtered images.

Press Ctrl+C to stop the server.

Examples: dtst review config.yaml dtst review -d ./project --from faces dtst review -d ./project --from faces --to rejected --port 9000 dtst review config.yaml --no-open

Usage:

dtst review [OPTIONS] [CONFIG]

Options:

Name Type Description Default
--from text Source folder name within working directory. None
--to text Subfolder name for filtered images. None
--port, -p integer Port for the web server. None
--no-open boolean Do not open the browser automatically. False
--working-dir, -d path Working directory (default: .). None
--help boolean Show this message and exit. False

dtst run

Run a named workflow defined in a config file.

Executes a sequence of dtst commands and shell commands as defined in the workflows section of the config file. Each command step inherits its defaults from the corresponding config section unless inherit: false is set.

Examples: dtst run pipeline config.yaml dtst run pipeline config.yaml --dry-run dtst run pipeline config.yaml -d ./my_dataset

Usage:

dtst run [OPTIONS] WORKFLOW CONFIG

Options:

Name Type Description Default
--working-dir, -d path Override working directory. None
--dry-run boolean Print steps without executing. False
--help boolean Show this message and exit. False

Search for images across multiple engines.

Reads an optional YAML config file and generates image URLs from Flickr, Serper (Google Images), Brave and Wikimedia Commons using an expanded query matrix of search terms and suffixes. Results are deduplicated and written to a JSONL file in the working directory (default: results.jsonl) so multiple runs accumulate new results.

Can be invoked with just a config file, just CLI options, or both. When both are provided, CLI options override config file values.

Query matrix: By default, the command runs two kinds of queries for each term: (1) the term alone, e.g. "chanterelle"; (2) the term with each suffix, e.g. "chanterelle mushroom", "chanterelle forest". Use --suffix-only to run only the second kind.

Examples:

dtst search config.yaml
dtst search config.yaml --dry-run
dtst search config.yaml --max-pages 3 --engines flickr,wikimedia
dtst search --terms "chanterelle" --suffixes "mushroom,forest" --engines brave -d ./chanterelle

Usage:

dtst search [OPTIONS] [CONFIG]

Options:

Name Type Description Default
--terms text Comma-separated search terms (override config). None
--suffixes text Comma-separated query suffixes (override config). None
--working-dir, -d path Working directory where results are written (default: .). None
--output, -o text Output filename within the working directory (default: results.jsonl). None
--max-pages, -m integer Limit pages per engine per query. None
--engines, -e text Comma-separated engine list (override config). None
--dry-run, -n boolean Print query matrix and exit without searching. False
--workers, -w integer Parallel workers (default: CPU count). None
--min-size, -s integer Minimum image dimension in pixels (default: 512). None
--retries, -r integer Number of retries per request (with exponential backoff). 3
--timeout, -t float Request timeout in seconds. 30
--suffix-only boolean Run only queries that include a suffix (e.g. 'term suffix'). Skip bare term queries. False
--help boolean Show this message and exit. False

dtst select

Select images from source folders into a destination folder.

Copies (or moves with --move) images from one or more source folders into a destination folder. When filter criteria are provided, only images that pass all criteria are selected. Without criteria, all images are selected.

Files that already exist in the destination (by name) are skipped.

Can be invoked with just a config file, just CLI options, or both. When both are provided, CLI options override config file values.

Examples: dtst select -d ./project --from raw --to backup dtst select -d ./project --from raw,extra --to combined dtst select -d ./project --from faces --to curated --min-side 256 dtst select -d ./project --from faces --to curated --max-side 2048 dtst select -d ./project --from faces --to curated --min-width 512 --max-height 1024 dtst select -d ./project --from faces --to curated --move --min-blur 50 dtst select -d ./project --from raw --to clean --max-detect microphone 0.5 dtst select config.yaml --dry-run

Usage:

dtst select [OPTIONS] [CONFIG]

Options:

Name Type Description Default
--working-dir, -d path Working directory containing source folders and where output is written (default: .). None
--from text Comma-separated source folders within the working directory (supports globs, e.g. 'images/*'). None
--to text Destination folder name within the working directory. None
--move boolean Move images instead of copying (removes originals). False
--min-side, -s integer Minimum largest side in pixels; images with max(w,h) below this are excluded. None
--max-side integer Maximum largest side in pixels; images with max(w,h) above this are excluded. None
--min-width integer Minimum width in pixels; narrower images are excluded. None
--max-width integer Maximum width in pixels; wider images are excluded. None
--min-height integer Minimum height in pixels; shorter images are excluded. None
--max-height integer Maximum height in pixels; taller images are excluded. None
--min-blur float Minimum blur score (Laplacian variance); lower-scoring images are excluded as too blurry. None
--max-blur float Maximum blur score (Laplacian variance); higher-scoring images are excluded. None
--max-detect Exclude images where detection score >= THRESHOLD (e.g. --max-detect microphone 0.5). ()
--min-detect Exclude images where detection score < THRESHOLD (e.g. --min-detect chair 0.3). ()
--workers, -w integer Number of parallel workers (default: CPU count). None
--dry-run boolean Preview what would be selected without creating files. False
--help boolean Show this message and exit. False

dtst upscale

Upscale images using AI super-resolution models.

Reads images from one or more source folders and writes upscaled copies to a destination folder. Uses spandrel to load PyTorch super-resolution models (Real-ESRGAN, SwinIR, HAT, etc.).

By default uses a 4x Real-ESRGAN model. Use --scale to choose between 2x and 4x upscaling, or --model to provide a custom .pth weights file (scale is auto-detected from the model).

Use --denoise to control how much denoising is applied (4x only). 0.0 preserves the most texture, 1.0 applies full denoising. This activates a lighter general-purpose model with controllable denoise strength via weight interpolation.

Large images are processed in tiles to avoid GPU memory issues. Adjust --tile-size to control memory usage (smaller = less VRAM).

Examples: dtst upscale -d ./project --from faces --to upscaled dtst upscale -d ./project --from faces --to upscaled --scale 2 dtst upscale -d ./project --from faces --to upscaled --denoise 0.5 dtst upscale -d ./project --from faces --to upscaled --model ./custom.pth dtst upscale config.yaml --dry-run

Usage:

dtst upscale [OPTIONS] [CONFIG]

Options:

Name Type Description Default
--working-dir, -d path Working directory containing source folders and where output is written (default: .). None
--from text Comma-separated source folders within the working directory (supports globs, e.g. 'images/*'). None
--to text Destination folder name within the working directory. None
--scale, -s choice (2 | 4) Upscale factor. Ignored when --model is provided (default: 4). None
--model, -m text Model preset name or path to a .pth file. Overrides --scale. None
--tile-size, -t integer Tile size in pixels for processing; 0 disables tiling (default: 512). None
--tile-pad integer Overlap padding between tiles in pixels (default: 32). None
--format, -f choice (jpg | png | webp) Output image format. Default preserves the source format. None
--quality, -q integer JPEG/WebP output quality, 1-100 (default: 95). None
--denoise, -n float Denoise strength 0.0-1.0. Lower preserves more texture. Only available at 4x. None
--workers, -w integer Number of threads for image preloading (default: 4). None
--dry-run boolean Preview what would be written without processing. False
--help boolean Show this message and exit. False