The database was removed this week. MIT also urged researchers and developers to stop using the training library, and to delete any copies.
The training set, built by the university, has been used to teach machine-learning models to automatically identify and list the people and objects depicted in still images. For example, if you show one of these systems a photo of a park, it might tell you about the children, adults, pets, picnic spreads, grass, and trees present in the snap.
However MIT's approach when assembling its training set labelled women as whores or bitches, and Black and Asian people with derogatory language. The database also contained close-up pictures of female genitalia labelled with the word C*nt. Applications, websites, and other products relying on neural networks trained using MIT's dataset may end up using these terms when analysing photographs and camera footage.
The problematic training library in question is 80 Million Tiny Images, which was created in 2008 to help produce advanced object-detection techniques.
It is, essentially, a huge collection of photos with labels describing what's in the pics, all of which can be fed into neural networks to teach them to associate patterns in photos with the descriptive labels.
So when a trained neural network is shown a bike, it can accurately predict a bike is present in the snap. It's called Tiny Images because the pictures in library are small enough for computer-vision algorithms in the late-2000s and early-2010s to digest.
The Tiny Images dataset is used to benchmark computer-vision algorithms along with the better-known ImageNet training collection. Unlike ImageNet, though, no one thought to scrutinised Tiny Images for problematic content. This effectively turned the AI system into a brain dead, misogynist racist moron.