A French Weblog Corpus for New Insights on Blog Post Tagging

15 pagesPublished: November 28, 2016


The rapid evolution and informational growth of blogs requires enhanced functionality for searching, navigating and linking content. This paper presents the French Blog Annotation Corpus \textsc{FLOG}, intended to provide a research testbed for the study of annotation practices, and specifically tagging and categorizing blog posts. The corpus covers a ten year time span of blog posts on cooking, law, video games and technology. Statistical analysis of the corpus suggests that tag annotation of posts is more frequent than category attribution, but on the other hand categories provide a richer semantic structure for post classification and search. The review of the state of the art on automatic tag suggestion shows that tag suggestion tools are not of widespread use yet between bloggers, which might be a consequence of methods that do not take into account the past tagging history of the blog, the context of the post within the blog and the tagging pattern of each blog author.

