Download PDFOpen PDF in browser

Damegender: Writing and Comparing Gender Detection Tools

EasyChair Preprint no. 3288, version 2

Versions: 12history
9 pagesDate: May 7, 2020


The variable sex (male or female) is one of most used variables for
any study in sociology, but this variable can be hidden in Internet
communities. The gender detection from a name is an important problem
in Natural Language Processing to decide if a string labeled as name
is classified as male or female. An engineer will find useful
make gender detection from a name retrieving information from social
networks, mailing lists, instant messaging, software repositories,
papers, etc. To achieve gender equality and empower all women and
girls is a goal in sustanaible development in United Nations, so to
measure the gender gap is a previous step to find solutions to reduce

Nowadays, there are several Application Programming Interfaces to
guess gender from a name. This kind of software has the database
based on propietary databases and the software is not free, so some
scientific works are difficult to reproduce.

In this paper, we are envisioning how to solve these problems,
offering a solution with a free license and open data names from
official census useful to replace, use and/or compare these apis with
very good results. This tool provides Machine Learning to predict
strings, that's useful to guess diminutives or nicknames.

Keyphrases: gender detection tool, gender gap, software repositories

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
  author = {David Arroyo Menéndez},
  title = {Damegender: Writing and Comparing Gender Detection Tools},
  howpublished = {EasyChair Preprint no. 3288},

  year = {EasyChair, 2020}}
Download PDFOpen PDF in browser