Project information

  • Category: University Project
  • Course: Applied Data Analysis
  • University: EPFL
  • Date: November 2019 - December 2019
  • Project URL:
  • Paper: link

Wikipedia is visited by more than 9 billion people per month and its articles are edited by volunteers around the world, which means that sometimes, subjectivity and bias are introduced. We, as many people during this conference, were interested in bias. In our case, the aim of this project is to find out if there is a difference in the way people describe men and women on Wikipedia and identify which words are creating that difference. The bias is explored in two ways: First we analyzed which topics are more likely to appear depending on the gender. Then, we also studied the bias in terms of the subjectivity introduced - through the usage of adjectives. The dataset we used is the overviews of the Wikipedia biography articles. There’s almost 1.5 million biographies but almost (seventeen) 17% of them are about women. Using the subjectivity lexicon, we discovered that women tend to be described with more positive and strongly subjective adjectives, while men are described with more negative and weekly subjective adjectives.

  • Top 2 project out of 88 student project
  • Paper accepted at the WikiWorkshop 2020 held at the WebConference 2020