Damegender: Towards an International and Free Dataset about Name, Gender and Frequency

EasyChair Preprint no. 5763, version 7

9 pagesDate: May 5, 2022


Equality of gender is the 5th objective of sustainable development in United Nations.
This equality can be reached working on to measure and to analyze data and to apply politics from the results. On many gender studies, we need to count males and females deciding gender from names, for instance, research papers, job positions, streets, ... The traditional way is to use commercial APIs with proprietary data without idea about how the data has been built. Another way, is taking data from Wikipedia, linguistic studies, scientific sites, ... 

Many statistics institutions are providing Open Datasets about name, gender and frequency. So, we need a scientific discussion about unifying formats, making easy ways to process these data and ways towards make standards.

Meanwhile, has been developed Damegender (Free and Open Source Software) to retrieve and make calculus with these data.

The dataset is covering more than 20 countries in the occidental world reaching a big number of names and accuracies around (87.56%) with it. Allowing to measure gender gap to students and academics interested on the phenomenon without costs and on a reproducible way, more people will be contributing to fix the gender gap.

There are a warranty of quality on reproducible research, that's the Free Software and the citation about official sources about names, gender and frequency provided by statistics institutions making easy the peer review and opening doors to the semantic web and the attention to diversity.

Keyphrases: Gender Detection Tool from the Name, gender gap, open datasets

BibTeX entry
  author = {David Arroyo Menéndez},
  title = {Damegender: Towards an International and Free Dataset about Name, Gender and Frequency},
  howpublished = {EasyChair Preprint no. 5763},

  year = {EasyChair, 2022}}
