Tech4Good #2 — What normative patterns of life are assumed?

“The automated interpretation of images is an inherently social and political project, rather than a purely technical one” by Kate Crawford and Trevor Paglen.

Delal Tomruk
5 min readDec 27, 2022
“Training Humans,” a photography exhibition unveiled this week at the Fondazione Prada museum in Milan, shows how artificial-intelligence systems have been trained to “see” and categorize the world. Image courtesy of Fondazione Prada; Marco Cappelletti © Trevor Paglen
“Training Humans,” a photography exhibition unveiled this week at the Fondazione Prada museum in Milan, shows how artificial intelligence systems have been trained to “see” and categorize the world. Image courtesy of Fondazione Prada; Marco Cappelletti © Trevor Paglen

This piece is a summary of my notes while reading “The Politics of Images in Machine Learning Training Sets” by Kate Crawford and Trevor Paglen. The article is linked at the bottom of the page.

In machine learning (ML), there is a process called labeling. Based on what raw data represents, human labels the data with appropriate labels that provide informative context, so that the ML algorithm can be trained based on these labels and label further raw data.

In “The Politics of Images in Machine Learning Training Sets”, Crawford and Paglen question how the decision made during the labeling process affects the reliability of ML algorithms and causes the production of biased, unethical results. ML algorithms get the labeled data and interpret further data that are missing the labels, and label them based on the data that have been fed to the algorithm. This results in an automated interpretation of images. Crawford and Paglen state that “the automated interpretation of images is an inherently social and political project, rather than a purely technical one”.

Understanding the fallacies in the labeling process is important because, with the development of AI, the results of AI interpretation are more included in our social institutions. Crawford and Paglen explain that labeling results in the categorization of things, and categorization is always political.

Anatomy of A Training Set — Labeling

Crawford and Paglen explain a training set based on three different taxonomies.

When we have a dataset of faces of Japanese women and we try to categorize their facial expressions based on emotion, we are faced with three different levels:

  1. the overall taxonomy: “facial expressions depicting the emotions of Japanese women.”
  2. the individual classes: happiness, sadness, surprise, disgust, fear, anger, and neutral
  3. individually labeled image: a face labeled as indicating an emotional state

When trying to label the emotions in Japanese women’s faces, we make several assumptions, such as:

  • The concepts within “emotions” can be applied to photographs of people’s faces.
  • There are six emotions that can be covered as well as a neutral face.
  • There is a fixed relationship between a women’s facial expression and her true emotional state and this relationship between the face and the emotion is consistent.

Categorization As Political

TDLR;

All classificatory systems are political because:

Images are open to interpretation

Words lie in an axis from abstract (e.g. health) to concrete (e.g. apple), and abstract words are hard to categorize.

The subclasses are often based on beliefs from centuries ago and reflect problematic thoughts.

When it comes to labeling orange or apple images, which are introductory examples of ML projects, labeling doesn’t look so problematic. However, Crawford and Paglen state that images are flexible, and the meaning behind them can change based on the context, environment, location, and time frame. Thus, images are open to interpretation, which ML algorithms do not take into account.

The Art of Living, 1967 by Rene Magritte
Magritte’s paintings will have different meanings for different individuals. (The Art of Living, 1967 by Rene Magritte)

Another problem is that words lie in an axis from abstract (e.g. apple) to concrete (e.g. health). The challenge becomes more apparent as we use words that are more abstract rather than concrete, such as health. Health is not an object and can be interpreted differently based on a specific image.

Additionally, the subclasses are often based on beliefs from centuries ago and reflect problematic thoughts such as racism and sexism. For example, the adult body has two subclasses, the female body and the male body, meaning the categorization implies that there are only two types of bodies.

Thus, Crawford and Paglen state that “to create a category or to name things is to divide an almost infinitely complex universe into separate phenomena” and all classificatory systems are political.

ImageNet

ImageNet is one of the most significant datasets in AI history which is on the Internet for more than 10 years now. The goal of ImageNet is to map out the entire set of objects in the world. Historically, there had been competitions where technologists around the world would use the ImageNet dataset to compete on achieving the highest accuracy in automating the interpretation of images and labeling them accurately.

As such, these competitions rely on the initial labeling of the images that are present in the ImageNet library and categorize the raw data based on those labels.

The “Person” Category on ImageNet

Interestingly, the competitions on ImageNet did not include the category “Person”, even though the dataset does include images in this category and is open to the public. The reason why images of people are not included in competitions is that ImageNet classifies people based on “race, nationality, profession, economic status, behavior, character, and even morality”, including many racist slurs and misogynistic terms. Additionally, the images are often classified without consent or participation.

However, as mentioned, the images of people are open to the public, and there is a project called ImageNet Roulette, which had been developed to show the extent of misclassification by ML algorithms and the potential dangers of it. The algorithm regularly returns racist, misogynistic and cruel results.

Removing People Images: A Solution (?)

Over the years, millions of people's images from ImageNet have been removed due to different reasons. Even though it seems like a solution since consent issues are addressed, it doesn’t change the fact that these images were once available to everyone and had been used in the development of different ML systems. “By erasing them completely, not only is a significant part of the history of AI lost, but researchers are unable to see how the assumptions, labels, and classificatory approaches have been replicated in new systems, or trace the provenance of skews and biases exhibited in working systems”, as Crawford and Paglen state.

Conclusion

The more I read about AI ethics, the more I question what is the purpose of AI and who it serves. The problematic results of AI cannot be solved by only social sciences or technology, instead, experts from both fields should work closely to understand what each step in the ML pipeline results in regarding inclusivity.

Reflection:

What normative patterns of life are assumed while creating datasets?

What work do images do in AI systems? What are computers meant to recognize in an image and what is misrecognized or even completely invisible?

The questions of labeling: how do people tell a computer which words will relate to an image?

What purposes are AI meant to serve in our society?

--

--