Researchers developa a new cost-effective training technique for AI systems.
Eric Slyman of the OSU College of Engineering and the Adobe researchers call the novel method FairDeDup, an abbreviation for fair deduplication.
Deduplication means removing redundant information from the data used to train AI systems, which lowers the high computing costs of the training.
Datasets gleaned from the internet often contain biases present in society, the researchers said.
When those biases are codified in trained AI models, they can serve to perpetuate unfair ideas and behavior.
By understanding how deduplication affects bias prevalence, it's possible to mitigate negative effects -- such as an AI system automatically serving up only photos of white men if asked to show a picture of a CEO, doctor, etc.
when the intended use case is to show diverse representations of people.
"We named it FairDeDup as a play on words for an earlier cost-effective method, SemDeDup, which we improved upon by incorporating fairness considerations," Slyman said.
"While prior work has shown that removing this redundant data can enable accurate AI training with fewer resources, we find that this process can also exacerbate the harmful social biases AI often learns."
Slyman presented the FairDeDup algorithm last week in Seattle at the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
FairDeDup works by thinning the datasets of image captions collected from the web through a process known as pruning.
Pruning refers to choosing a subset of the data that's representative of the whole dataset, and if done in a content-aware manner, pruning allows for informed decisions about which parts of the data stay and which go.
"FairDeDup removes redundant data while incorporating controllable, human-defined dimensions of diversity to mitigate biases," Slyman said.
"Our approach enables AI training that is not only cost-effective and accurate but also more fair."
In addition to occupation, race and gender, other biases perpetuated during training can include those related to age, geography and culture.
Please click HERE to view the original article.