⚠️ This post links to an external website. ⚠️
I struggled with organizing 14,000 photos and wanted an AI to help identify the ones matching my aesthetic. Initially, I relied on a vision model to classify them based on a prose description, but the results were inconsistent and resource-intensive, spiking RAM usage to 15GB. The new architecture focuses on learning from my feedback instead of static prompts. Using CLIP embeddings and a preference model, it tracks my ratings to improve selection. The ingestion process involves measuring technical image qualities and extracting metadata before sending photos to the AI worker. This staged pipeline lets each component fail and retry independently, providing a smooth curation experience. After running just a few sessions, I managed to surface 214 meaningful images out of 3,600 processed. The entire backlog could be managed within a week at this rate!
continue reading onqwelian.com
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.