DOCENG 2015: 15TH ACM SIGWEB INTERNATIONAL SYMPOSIUM ON DOCUMENT ENGINEERING
PROGRAM FOR FRIDAY, SEPTEMBER 11TH
Days:
previous day
all days

View: session overviewtalk overview

09:00-10:15 Session 13: Logical Structures
09:00
Spatio-temporal validation of multimedia documents
SPEAKER: unknown

ABSTRACT. A multimedia document authoring system should provide analysis and validation tools that help authors find and correct mistakes before document deployment. Although very useful, multimedia validation tools are not often provided. Spatial validation of multimedia documents may be performed over the initial position of media items before presentation starts. However, such an approach does not lead to ideal results when media item placement changes over time. Some document authoring languages allow the definition of spatio-temporal relationships among media items and they can be moved or resized during runtime. Current validation approaches do not verify dynamic spatio-temporal relationships. This paper presents a novel approach for spatio-temporal validation of multimedia documents. We model the document state, extending the Simple Hypermedia Model (SHM), comprising media item positioning during the whole document presentation. Mapping between document states represent time lapse or user interaction. We also define a set of atomic formulas upon which the author's expectations related to the spatio-temporal layout can be described and analyzed.

09:30
Detecting XSLT Rules Affected by Schema Evolution
SPEAKER: unknown

ABSTRACT. In general, schemas of XML documents are continuously up-dated according to changes in the real world. If a schema is updated, then XSLT stylesheets are also affected by the schema update. To maintain the consistencies of XSLT stylesheets with updated schemas, we have to detect the XSLT rules affected by schema updates. However, detecting such XSLT rules manually is a difficult and time-consuming task, since recent DTDs and XSLT stylesheets are becoming more complex and users do not always fully understand the dependencies between XSLT stylesheets and DTDs. In this paper, we consider three subclasses based on unranked tree transducer, and consider an algorithm for detecting XSLT rules affected by a DTD update for the classes.

09:45
Concept Hierarchy Extraction from Textbooks
SPEAKER: unknown

ABSTRACT. A concept hierarchy is a tool for presenting and organizing knowledge and has proven to be powerful in science eduction. With the rapid growth in the number of available online knowledge bases, automatic concept hierarchy extraction is becoming increasingly attractive. In this work, we focus on concept extraction from textbooks with the help of Wikipedia. Given a book, our goal is to extract important concepts in each book chapter using Wikipedia and to construct a concept hierarchy for the book. We investigate local and global features to capture both the local relatedness and global coherence embedded in the textbook. In order to evaluate the proposed features and extracted concept hierarchies, we manually construct concept hierarchies for three high quality textbooks by labeling important concepts for each book chapter and conduct experiments on this data set. Our results show that our proposed local and global features achieve better performance than only using keyphrases to construct concept hierarchies. Moreover, we observe that incorporating global features can improve the concept ranking precision and confirm the global coherence in the book.

10:15-10:45Coffee Break
10:45-12:15 Session 14: Document Understanding
Chair:
10:45
Combining Advanced Information Retrieval and Text-Mining for Digital Humanities
SPEAKER: unknown

ABSTRACT. Digital Humanities make more and more structured and richly annotated corpora available. Most of this data rely on well known and established standards, such as TEI, which especially enable scientists to edit and publish their work. However, one of the remaining problems is to give adequate access to this rich data, in order to produce higher-order knowledge. In this paper, we present an integrated environment combining an advanced search engine and text-mining techniques for hermeneutics in Digital Humanities. Relying on semantic web technologies, the search engine uses full text as well as complex embedding structures and offers a single interface to access rich and heterogeneous data and meta-data. Text-mining possibilities enable scholars to exhibit regularities in corpora. Results obtained on the cartesian corpus illustrate these principles and tools.

11:15
The Delaunay document layout descriptor
SPEAKER: unknown

ABSTRACT. Security applications related to document authentication require an exact match between an authentic copy and the original of a document. This implies that the documents analysis algorithms that are used to compare two documents (original and copy) should provide the same output. This kind of algorithm includes the computation of layout descriptors, as the layout of a document is a part of its semantic content. To this end, this paper presents a new layout descriptor that significantly improves the state of the art. The basic of this descriptor is the use of a Delaunay triangulation of the centroids of the document regions. This triangulation is seen as a graph and the adjacency matrix of the graph forms the descriptor. While most layout descriptors have a stability of 0% with regard to an exact match, our descriptor has a stability of 74% which can be brought up to 100% with the use of an appropriate matching algorithm. It also achieves 100% accuracy and retrieval in a document retrieval scheme on a database of 960 document images. Furthermore, this descriptor is extremely efficient as it performs a search in constant time with respect to the size of the document database and it reduces the size of the index of the database by a factor 400.

11:45
An Approach to Convert NCL Applications into Stereoscopic 3D
SPEAKER: unknown

ABSTRACT. This paper presents and discusses the internal operation of NCLSC (NCL Stereo Converter): a tool to convert a 2D interactive multimedia application annotated with depth information to a stereoscopic-multimedia application. Stereoscopic-multimedia applications are those that codify both the left-eye and right-eye views, as required by stereoscopic 3D displays. NCLSC takes as input an NCL (Nested Context Language) document and outputs an NCL stereoscopic application codified in side-by-side or top-bottom format (both common input formats for 3DTV sets). NCL is the declarative language adopted in most Latin America countries for terrestrial digital TV middleware systems and the ITU-T H.761 Recommendation for IPTV services. However, the proposed approach is not restricted to NCL and may be used by other languages. The depth annotation allows for positioning each 2D graphical component in a layered (2.5D or 2D+depth) user interface, and is used by NCLSC to compute the disparity (offset) between the graphical elements in the left and right views of the resulting stereoscopic application. When the resulting application is presented on stereoscopic 3D displays, such disparity creates the illusion of floating flat-2D graphical elements. NCLSC does not require any additional native middleware support to run in currently available 3D-enabled TV sets. Moreover, NCLSC can adapt, at run-time, the output application to different display sizes, viewer distances, and viewer preferences, which are usually required for a good balance between artistic effects and user experience.

12:30-14:00Lunch