Python API reference
All command functions are re-exported at the top level as from dtst import ... and as async wrappers in dtst.aio. See Python API for the usage guide.
Commands
analyze
analyze(
*,
from_dirs: str,
metrics: str | None = None,
force: bool = False,
workers: int | None = None,
clear: bool = False,
dry_run: bool = False,
progress: bool = True
) -> AnalyzeResult
Compute image metrics and write per-image sidecar JSON files.
annotate
annotate(
*,
from_dirs: str,
source: str | None = None,
license: str | None = None,
origin: str | None = None,
overwrite: bool = False,
dry_run: bool = False
) -> AnnotateResult
Write source/license/origin metadata into per-image sidecars.
augment
augment(
*,
from_dirs: str,
to: str,
flip_x: bool = False,
flip_y: bool = False,
flip_xy: bool = False,
no_copy: bool = False,
workers: int | None = None,
dry_run: bool = False,
progress: bool = True
) -> AugmentResult
Augment a dataset with image flips.
cluster
cluster(
*,
from_dirs: str,
to: str,
model: str = "arcface",
top: int | None = None,
min_cluster_size: int = 5,
min_samples: int = 2,
batch_size: int = 32,
workers: int | None = None,
no_cache: bool = False,
clean: bool = False,
dry_run: bool = False,
progress: bool = True
) -> ClusterResult
Cluster images by visual similarity using HDBSCAN.
Returns a :class:ClusterResult. Raises :class:InputError for
missing inputs or :class:PipelineError if no clusters emerge.
Set progress=False to silence tqdm bars.
dedup
dedup(
*,
from_dir: str,
to: str = "duplicated",
threshold: int = 8,
workers: int | None = None,
clear: bool = False,
dry_run: bool = False,
prefer_upscaled: bool = False,
progress: bool = True
) -> DedupResult
Deduplicate images by perceptual hash similarity.
to is a subfolder inside from_dir where duplicates are
moved.
detect
detect(
*,
from_dirs: str,
classes: str | None = None,
threshold: float = 0.2,
workers: int | None = None,
max_instances: int = 1,
clear: bool = False,
dry_run: bool = False,
progress: bool = True
) -> DetectResult
Detect objects in images using OWL-ViT 2.
extract_classes
extract_classes(
*,
from_dirs: str,
to: str,
classes: str,
margin: float = 0.0,
square: bool = False,
min_score: float = 0.0,
skip_partial: bool = False,
workers: int | None = None,
dry_run: bool = False,
progress: bool = True
) -> ExtractClassesResult
Extract image crops from class-detection bounding boxes.
extract_faces
extract_faces(
*,
from_dirs: str,
to: str,
max_size: int | None = None,
engine: str = "mediapipe",
max_faces: int = 1,
workers: int | None = None,
padding: bool = True,
skip_partial: bool = False,
refine_landmarks: bool = False,
debug: bool = False,
progress: bool = True
) -> ExtractFacesResult
Detect and align face crops from images.
extract_frames
extract_frames(
*,
from_dirs: str,
to: str,
keyframes: float = 10.0,
fmt: str = "jpg",
workers: int | None = None,
dry_run: bool = False,
progress: bool = True
) -> ExtractFramesResult
Extract keyframes from video files using ffmpeg.
fetch
fetch(
*,
to: str,
input_file: str,
min_size: int = 512,
workers: int | None = None,
timeout: int = 30,
force: bool = False,
max_wait: int | None = None,
no_wait: bool = False,
license_filter: str | None = None,
progress: bool = True
) -> FetchResult
Download images and videos from a URL list.
format
format(
*,
from_dirs: str,
to: str,
fmt: str | None = None,
quality: int = 95,
compress_level: int = 0,
strip_metadata: bool = False,
channels: str | None = None,
background: str = "white",
workers: int | None = None,
dry_run: bool = False,
progress: bool = True
) -> FormatResult
Convert and normalize image formats, channels, and metadata.
frame
frame(
*,
from_dirs: str,
to: str,
width: int | None = None,
height: int | None = None,
mode: str = "crop",
gravity: str = "center",
fill: str = "color",
fill_color: str = "#000000",
quality: int = 95,
compress_level: int = 0,
workers: int | None = None,
dry_run: bool = False,
progress: bool = True
) -> FrameResult
Resize images to a target width and/or height.
rename
rename(
*,
from_dirs: str,
prefix: str = "",
digits: int | None = None,
dry_run: bool = False
) -> RenameResult
Sequentially rename images in-place with a prefix + zero-padded number.
from_dirs is a comma-separated list of folders and may contain
globs. Sidecar JSON files travel with their images. Raises
:class:InputError if from_dirs is missing or no images are
found.
search
search(
*,
terms: list[str] | None = None,
suffixes: list[str] | None = None,
output: str = "results.jsonl",
max_pages: int | None = None,
engines: list[str] | None = None,
dry_run: bool = False,
workers: int | None = None,
min_size: int = 512,
retries: int = 3,
timeout: float = 30,
suffix_only: bool = False,
taxon_ids: list[int] | None = None,
progress: bool = True
) -> SearchResult
Search for images across multiple engines and append to a JSONL file.
select
select(
*,
from_dirs: str,
to: str,
move: bool = False,
min_side: int | None = None,
max_side: int | None = None,
min_width: int | None = None,
max_width: int | None = None,
min_height: int | None = None,
max_height: int | None = None,
min_metric: list[tuple[str, float]] | None = None,
max_metric: list[tuple[str, float]] | None = None,
max_detect: list[tuple[str, float]] | None = None,
min_detect: list[tuple[str, float]] | None = None,
source: list[str] | None = None,
license_filter: list[str] | None = None,
workers: int | None = None,
dry_run: bool = False,
progress: bool = True
) -> SelectResult
Select images from source folders into a destination folder.
upscale
upscale(
*,
from_dirs: str,
to: str,
scale: int = 4,
model: str | None = None,
tile_size: int = 512,
tile_pad: int = 32,
fmt: str | None = None,
quality: int = 95,
denoise: float | None = None,
workers: int = 4,
dry_run: bool = False,
progress: bool = True
) -> UpscaleResult
Upscale images using AI super-resolution models.
validate
validate(
*,
from_dirs: str,
square: bool = False,
workers: int | None = None,
progress: bool = True
) -> ValidateResult
Check that images in a folder share dimensions, mode, and optionally squareness.
Returns a :class:ValidateResult — inspect .passed for overall
pass/fail or the individual counters for detail. Does not raise on
failed checks; only raises :class:InputError when inputs are
missing or unreadable.
Results
results
Result dataclasses returned by :mod:dtst.core functions.
Each command's core function returns one of these so library callers get structured data instead of parsing CLI output. The CLI layer formats them for human display.
Errors
errors
Library-layer exceptions.
These are raised by :mod:dtst.core functions so library callers never
have to depend on Click. The CLI layer in :mod:dtst.cli catches them
and re-raises as click.ClickException for user-friendly output.
ConfigError
Bases: DtstError
Invalid configuration file or values.
DtstError
Bases: Exception
Base class for all library-layer errors.
InputError
Bases: DtstError
Invalid or missing user input (missing --from, bad directory, etc.).
PipelineError
Bases: DtstError
Failure during pipeline execution (e.g. no clusters found).