Skip to content

Add embed-openclip-rn101-yfcc15m model (FP16, 86-tag default vocab)#38

Open
andriiryzhkov wants to merge 1 commit into
darktable-org:masterfrom
andriiryzhkov:open_clip_yfcc
Open

Add embed-openclip-rn101-yfcc15m model (FP16, 86-tag default vocab)#38
andriiryzhkov wants to merge 1 commit into
darktable-org:masterfrom
andriiryzhkov:open_clip_yfcc

Conversation

@andriiryzhkov

Copy link
Copy Markdown
Collaborator

Adds embed-openclip-rn101-yfcc15m – a ResNet-101 image embedder for tag suggestion and image-similarity search.

Why this one and not a stronger CLIP variant: every other CLIP training corpus (LAION, WIT-400M, DataComp, MetaCLIP, WebLI) is a web scrape with no per-image consent. YFCC15M is 15M Flickr photos uploaded under Creative Commons – the one option that meets the project's consent-based training-data criterion. The cost is a lower benchmark score (~31% ImageNet zero-shot vs ~67% for LAION ViT-B-32), but in actual photo-library use the gap is much smaller than that number suggests.

Ships model.onnx (60 MB FP16, mean/std + L2 norm baked in) plus tags.json – 86 precomputed centroids for cold-start tag suggestions before users have enough data of their own. Text encoder runs at convert time only, not shipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant