Concepts
Buckets
Every dtst command reads from and writes to buckets -- named folders within the working directory. A bucket is just a plain directory on disk. There is nothing to register or configure: drop files into a folder and it becomes a bucket. You can browse buckets in the file explorer, manually add or remove files, and the tools will pick up whatever is there.
Commands reference buckets through --from and --to flags:
- Sourcing commands like
fetchonly have--to-- they bring data into a bucket from the outside world. - Augmenting commands like
extract-facesorframehave both--fromand--to-- they read from one or more buckets and write results to another. - Filtering commands like
deduporreviewoperate on a bucket in place.
The --from flag accepts comma-separated names and supports globs. For example --from images/* matches all subdirectories of images/.
Because buckets are plain directories, you can always supplement pipeline output by manually copying files into a bucket. No import step is needed.
Sidecars
A sidecar is a small JSON file that lives next to an image or video and carries metadata about it. The naming convention appends .json to the full filename:
What goes in a sidecar
Different commands write different fields:
| Command | Fields | Description |
|---|---|---|
fetch |
source, origin, license |
Where the file came from, its original URL, and license info |
analyze --phash |
metrics.phash |
Perceptual hash for duplicate detection |
analyze --blur |
metrics.blur |
Laplacian variance measuring image sharpness |
detect |
classes |
Object detection results (score + bounding box per class) |
A sidecar after fetching, detecting and analyzing might look like this:
{
"source": "flickr",
"origin": "https://live.staticflickr.com/example/photo.jpg",
"license": "cc-by",
"metrics": {
"phash": "d4c6b8b0b4e1e3c7",
"blur": 1842.57
},
"classes": {
"microphone": [],
"chair": [
{ "score": 0.87, "box": [10, 200, 250, 480] }
]
}
}
How sidecars flow through the pipeline
Sidecars are automatically carried along when images move between buckets. The exact behavior depends on whether the command transforms the image pixels:
Commands that copy images unchanged (cluster, select, dedup, review) preserve the full sidecar as-is. Every field carries over because nothing about the image has changed.
Commands that transform images (augment, extract-faces, extract-frames, frame, upscale) carry over provenance and semantic fields (source, origin, license, tags) but drop computed fields (metrics, classes). These pixel-dependent values would be invalid after the transformation and need to be recomputed with analyze or detect if needed.