Extracting features
With your images collected, the next step is to extract meaningful structure from them. There are three paths depending on your use case:
- Face datasets — extract face crops, then cluster by identity using ArcFace
- General image datasets — cluster images directly by visual similarity using CLIP
- Object/class datasets — detect objects with
detect, then extract class crops withextract-classes
Paths A and B produce numbered cluster folders sorted by size, ready for selection. Path C produces cropped images of specific object classes, which can then feed into clustering or selection.
Path A: Faces
Extract faces
The extract-faces command detects faces in each image and produces aligned and cropped face images. Use the --from glob pattern images/* to capture all image subfolders in a single pass:
After this runs, scratch/crowd/faces/ contains one cropped face image per detection. Images with no detected faces are skipped.
A few options that are useful in practice:
# Limit to one face per image
dtst extract-faces -d scratch/crowd --from images/* --to faces --max-faces 1
# Use dlib instead of MediaPipe for detection
dtst extract-faces -d scratch/crowd --from images/* --to faces --engine dlib
# Skip faces whose crop extends beyond the image boundary
dtst extract-faces -d scratch/crowd --from images/* --to faces --skip-partial
The default engine is MediaPipe, which is faster. Dlib can be more accurate on challenging angles. You can try both by writing to different output folders:
Cluster by identity
Use the default arcface model to group faces by identity — photos of the same person end up in the same cluster:
Path B: General images
Cluster by visual similarity
For objects, scenes, styles, or any non-face dataset, use the clip model to cluster images directly:
No face extraction step is needed — CLIP works on any image content.
Path C: Object classes
When your dataset focuses on a specific type of object rather than faces or general scenes, you can use object detection to locate instances and crop them out.
Detect objects
First, run detect to find objects in your images. This writes bounding boxes and confidence scores into each image's sidecar file:
After this, each sidecar JSON contains a classes key with detection results:
You can detect multiple classes at once:
Extract class crops
The extract-classes command reads those detections and crops the corresponding regions from each image:
Each detected object becomes its own image in the output folder. The sidecar data is preserved with bounding box coordinates adjusted to match the cropped region.
Square crops with margin
Two options help produce cleaner crops. Use --square to extend the shorter side of the bounding box to match the longer side, centering the object. Use --margin to add breathing room around the box as a ratio of the larger side:
# Square crops with 10% margin on each side
dtst extract-classes -d scratch/dahlias --from images --to flowers \
--classes flower --square --margin 0.1
The margin is computed on the larger side of the box (after squaring if enabled) and applied equally on all four sides. For example, a 400×400 square box with --margin 0.1 adds 40 pixels on each side, producing a 480×480 crop.
Handling edge cases
When --square or --margin push the crop beyond the image boundary, the crop is clamped to the image edges by default. Use --skip-partial to discard those detections entirely:
dtst extract-classes -d scratch/dahlias --from images --to flowers \
--classes flower --square --margin 0.1 --skip-partial
Filter out low-confidence detections with --min-score:
dtst extract-classes -d scratch/dahlias --from images --to flowers \
--classes flower --min-score 0.5
Preview what would be extracted without writing any files:
Cluster the crops
Once you have your class crops, you can cluster them by visual similarity using CLIP, the same way as Path B:
Cluster output
Whichever path you chose, scratch/crowd/cluster/ now contains numbered folders sorted by size (000 is the largest). Images that don't fit any cluster are placed in noise/.
Tuning the clusters
# Only keep the top 5 largest clusters
dtst cluster -d scratch/crowd --from faces --to cluster --top 5
# Require larger clusters (minimum 20 images to form a cluster)
dtst cluster -d scratch/crowd --from faces --to cluster --min-cluster-size 20
Use --clean to remove the output directory before writing, which is useful when re-running with different parameters: