Microsoft has recently removed a database of more than 10 million faces, intended as a test and training dataset for facial recognition algorithms, according to a report by the Financial Times.
The database known as MS Celeb contained more than 10 million images of roughly 100,000 people, largely scraped from publicly available online sources. While no individual photo in the dataset was difficult to find, the volume of images and the structured data accompanying them made the dataset extremely useful in training programs to recognize a person’s face across different photos.
The takedown came after an earlier Financial Times investigation found that many of the people represented in the dataset were not aware of it and did not consent to having their pictures used. A number of experts speculated that the dataset might encounter legal issues under the General Data Protection Regulation, which imposes significant requirements for the storage and transfer of a subject’s personal data.
Notably, Microsoft did not announce the removal of the dataset. A spoke person said “The site was intended for academic purposes. It was run by an employee that is no longer with Microsoft and has since been removed.”