View: session overviewtalk overview
Frédéric Kaplan
09:00 | The Venice Time Machine SPEAKER: Frédéric Kaplan ABSTRACT. The Venice Time Machine is an international scientific programme launched by the EPFL and the University Ca’Foscari of Venice with the generous support of the Fondation Lombard Odier. It aims at building a multidimensional model of Venice and its evolution covering a period of more than 1000 years. The project ambitions to reconstruct a large open access database that could be used for research and education. Thanks to a parternship with the Archivio di Stato in Venice, kilometers of archives are currently digitized, transcribed and indexed setting the base of the largest database ever created on Venetian documents. The State Archives of Venice contain a massive amount of hand-written documentation in languages evolving from medieval times to the 20th century. An estimated 80 km of shelves are filled with over a thousand years of administrative documents, from birth registrations, death certificates and tax statements, all the way to maps and urban planning designs. These documents are often very delicate and are occasionally in a fragile state of conservation. In complementary to these primary sources, the content of thousands of monographies have been indexed and made searchable. The documents digitised in the Venice Time Machine programme are intricately interweaved, telling a much richer story when they are cross-referenced. By combining this mass of information, it is possible to reconstruct large segments of the city’s past: complete biographies, political dynamics, or even the appearance of buildings and entire neighborhoods. The information extracted from the primary and secondary sources are organized in a semantic graph of linked data and unfolded in space and time in an historical geographical information system. The resulting platform can serve for both research and education. About a hundred researchers and students collaborate already on this programme. A doctoral school is organised every year in Venice and several bachelor and master courses currently use the data produced in the context of the Venice Time Machine. Through all these initiatives, the Venice Time Machine explores how “big data of the past” can change research and education in historical sciences, hopefully paving the way towards a general methodology that could be applied to many other cities and archives. |
14:00 | Similarity-Based Support for Text Reuse in Technical Writing SPEAKER: unknown ABSTRACT. Technical writing in professional environments, such as user manual authoring for new products, is a task that relies heavily on reuse of content. Therefore, technical content is typically created following a strategy where modular units of text have references to each other. One of the main challenges faced by technical authors is to avoid duplicating existing content, as this adds unnecessary effort, generates undesirable inconsistencies, and dramatically increases maintenance and translation costs. However, there are few computational tools available to support this activity. This paper presents an exploratory study on the use of different similarity methods for the task of identification of reuse opportunities in technical writing. We evaluated our results using existing ground truth as well as feedback from technical authors. Finally, we also propose a tool that combines text similarity algorithms with interactive visualizations to aid authors in understanding differences in a collection of topics and identifying reuse opportunities. |
14:30 | Exploring scholarly papers through citations SPEAKER: unknown ABSTRACT. Bibliographies are fundamental components of academic papers and both the scientific research and its evaluation are fundamentally organized around the correct examination and classification of scientific bibliographies. Currently, most digital libraries publish bibliographic information about their content for free, and many include the bibliographies (outgoing and in some cases even incoming) to the papers they manage. Unfortunately no sophistication is spent for these lists: monolithic pieces of text where it is even difficult to tell automatically the authors apart from the title or publication details, and where users are provided with no mechanisms to filter and access full context of each citation. For instance, there is no way to know in which sentence a work was cited (the citation context) and why (the citation function). In this paper we introduce a novel environment for navigating, filtering and making sense of citations. The interface, called BEX, exploits data freely available in a Link Open Dataset about scholarly papers; end-user testing proved its efficacy and usability. |
15:00 | Filling the gaps: Improving Wikipedia stubs SPEAKER: unknown ABSTRACT. The availability of only a limited number of contributors on Wikipedia cannot ensure consistent growth and improvement of the online encyclopedia. With information being scattered on the web, our goal is to automate the process of generation of content for Wikipedia. In this work, we propose a technique of improving stubs on Wikipedia that do not contain comprehensive information. A classifier learns features from the existing comprehensive articles on Wikipedia and recommends content that can be added to the stubs to improve the completeness of such stubs. We conduct experiments using several classifiers - Latent Dirichlet Allocation (LDA) based model, a deep learning based architecture (Deep belief network) and TFIDF based classifier. Our experiments reveal that the LDA based model outperforms the other models (~6% F-score). Our generation approach shows that this technique is capable of generating comprehensive articles. ROUGE-2 scores of the articles generated by our system outperform the articles generated using the baseline. Content generated by our system has been appended to several stubs and successfully retained in Wikipedia. |
15:15 | BBookX: An Automatic Book Creation Framework SPEAKER: unknown ABSTRACT. As more educational resources become available online, it is possible to acquire more up-to-date knowledge and informa- tion. However, there has not been a tool which can auto- matically retrieve and organize these open resources for ed- ucational purposes. This paper introduces BBookX, a novel computer facilitated system that automatically builds free open online books using publicly available educational re- sources such as Wikipedia. BBookX has two separate com- ponents: one that creates an open version of existing books by linking different book chapters to Wikipedia articles and another that with an interactive user interface supports in- teractive real-time book creation, during which users are allowed to modify a generated book with explicit feedback that is used to improve the ranking of the returned educa- tional resources. |
15:30 | VEDD: A Visual Editor for Creation and Semi-Automatic Update of Derived Documents SPEAKER: unknown ABSTRACT. Document content is increasingly customised to a particular audience. Such customised documents are typically built by combining content from selected logical content modules and then editing this to create the custom document. A major difficulty is how to efficiently update these derived documents when the source documents are changed. Here we describe a web-based visual editing tool for both creating and semi-automatically updating derived documents from modules in a source library. |
15:45 | Madoko: Scholarly Documents for the Web SPEAKER: Daan Leijen ABSTRACT. Madoko is a novel authoring system for writing complex documents. The main design goal of Madoko is to enable light-weight creation of high-quality scholarly and industrial documents for the web and print, while maintaining John Gruber's Markdown philosophy of simplicity and focus on plain text readability. In particular, it overcomes limitations of LaTeX and uses standard CSS to create both paginated PDF and also rescaleable and reflowable HTML. |
Short Papers Presented as Posters:
- Fine Grained Access Interactive Personal Health Records. Helen Balinsky (HP Laboratories), Nassir Mohammad (HP)
- Does a Split-View Aid Navigation Within Academic Documents? Juliane Franze (Monash University and Fraunhofer), Kim Marriott (Monash University), Michael Wybrow (Monash University)
- An Approach for Designing Proofreading Views in Publishing Chains. Léonard Dumas (Université de Technologie de Compiègne), Stéphane Crozat (Université de Technologie de Compiègne), Bruno Bachimont (Université de Technologie de Compiègne), Sylvain Spinelli (Kelis)
- High-Quality Capture of Documents on a Cluttered Tabletop with a 4K Video Camera. Chelhwon Kim (University of California, Santa Cruz), Patrick Chiu (FXPAL), Henry Tang (FXPAL)
- Segmentation of overlapping digits through the emulation of a hypothetical ball and physical forces. Alberto Nicodemus Lopes Filho (CIn - UFPE), Carlos Mello (Universidade Federal de Pernambuco)
- AERO: An extensible framework for adaptive web layout synthesis. Rares Vernica (HP Labs), Niranjan Damera Venkata (HP Labs)
- Automatic Text Document Summarization Based on Machine Learning. Gabriel Silva (Federal University of Pernambuco), Rafael Lins (Federal University of Pernambuco), Luciano Cabral (CIn-UFPE), Rafael Ferreira (Federal University of Pernambuco), Hilário Tomaz (Federal University of Pernambuco), Steven Simske (Hewlett-Packard Labs), Marcelo Riss (Hewlett-Packard)
- Searching Live Meeting Documents "Show me the Action". Laurent Denoue (FXPAL), Scott Carter (FXPAL), Matthew Cooper (FXPAL)
- Multimedia Document Structure for Distributed Theatre. Jack Jansen (CWI: Centrum Wiskunde & Informatica), Michael Frantzis (Goldsmiths), Pablo Cesar (CWI: Centrum Wiskunde & Informatica)
- Change Classification in Graphics-Intensive Digital Documents. Jeremy Svendsen (University of Victoria), Alexandra Branzan Albu (University of Victoria)
ProDoc:
- Automatic Content Generation for Wikipedia. Siddhartha Banerjee (Pennsylvania State University)
- Sentiment Analysis for Web Documents. Fathima Sharmila Satthar (University of Brighton)
Banquet at Chillon Castle, Awards