For the course project for MIE1517: Introduction to Deep Learning, the project team decided to tackle the problem of sorting trash according to the City of Toronto guidelines based on an input image.
Given an input image, the AI model should output the appropriate classification of the object in the image based on City of Toronto guidelines.
To properly train an AI model, a large, high quality data set is needed. There is additional difficulty involved with the specific guidelines of the City of Toronto. No suitable dataset was found which would allow the team to train a CNN model on and have the model output the correct City of Toronto classes.
A workaround was found for the lack of training data. The team would use BLIP2 to create a caption for the input image, and use LLMs to predict the class of object in the image based on the caption it was given. To obtain classes specific to the City of Toronto, a BERT model was trained on City of Toronto data. The final output of the model would tell the user the type of trash and which bin it belonged to.
The resulting BERT model was able to achieve 84% accuracy with 80% precision and 90% recall. While the model could see further improvement, the creative approach of the team was recognized by the instructor and the class.
Copyright © 2024 Paul Zhou - All Rights Reserved.
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.