SDMQ2015: INTERNATIONAL WORKSHOP ON SPATIAL DATA AND MAP QUALITY
PROGRAM FOR WEDNESDAY, JANUARY 21ST
Days:
previous day
all days

View: session overviewtalk overview

09:00-10:30 Session 6: Political and Socioeconomic Views on Quality
Location: Main conference room
09:00
Quality and public data – why does it matter?
SPEAKER: Laila Aslesen

ABSTRACT. There is a growing amount of public information that is available under an open license and free of charge. One of the things that have puzzled the advocates of open data is how often the public sector is citing worries over liability issues as a reason for not opening up their data. While this can appear as a mere excuse to cover a lack of political will to open data, it is actually a very real issue for the public servants in question. In many cases the public data in question is a result of priorities that the public body do not have control over. They may suggest, but in the end politicians and budgets decide what they actually produce. Since the PSI directive and national law requires them to make available what they have, they cannot really choose what they think is good for public consumption and what is not good enough. Many of them feel that the direct connection with users through an individual license gives them a better opportunity to manage the user’s expectation and avoid liability issues. When faced with a decision to open that data, they are concerned that they should have an open data license which also adequately handles these issues, which is one of the reasons that we have so many different open licenses for public data in Europe. So while one would think that it should be a fair game just to open the data on an as is basis and let people take responsibility for what they do, the liability law of various countries is a very real worry for many public bodies. The growing tendency to litigate over issues that used to be settled out of court means not only costs but bad publicity as well. This becomes multiplied when you are not just offering the data, but also builds up the system that will deliver them, like the ELF platform. This is where the quality system comes in. It is an accepted defense against liability claims that you had a professional quality system in place. You are not liable for the fact that things may go wrong, but you can be made liable if you cannot prove that you did what was humanly possible to ensure that that it should not go wrong. This is according to what we in the legal term calls torts which is the most likely legal framework for a liability claim on public data. Not only do you need to make certain the data you provide are quality assured, you also need to make certain that the system delivering them are the quality assured as well. The key here is the consistency between what the system claims to do and how it performs, and what the person at the other end reasonably can expect from it. There is an additional concern when you are licensing and charging for data, because this takes you further into the product liability area, where peoples rightly expects that when they pay, things should be even better and more reliable. This will have elements of what we call strict liability, i.e. you can not necessarily defend yourself with having done your best. So it is not enough to have a system for data production and process towards publication, you also have to make certain the actual output meets the standards you set. In order to make use of the quality systems put in place by the data providers themselves, there is a need to secure the integrity of the data from provider to end-user. This presentation will delve further into what type of liability issues that may be in question, and how quality systems can help with this. It will use some existing court decisions as examples.

09:25
On the Volunteered Geographic Information Quality

ABSTRACT. Spatial data quality is a subject that has drawn the interest of researchers for quite a long time now. Extensive lit-erature reviews can be found in the writings of Shi et al. (2002) and Devillers and Jeansoulin (2006) among others. The progress in this field has lead international organizations to provide detailed frameworks on spatial data quality evaluation processes and methods (e.g. ISO). The evaluation methods can be indirect (e.g. taking into account line-age or known uses of the data) or through direct comparison with internal or external data (reference data). The methods can be continuous or on benchmarks and can be applied either to the entire data population or on selected samples. Official producers of spatial data such as NMAs, governmental agencies and enterprises are moving along these guidelines to determine the quality of their data and use metadata specifications to document and communicate their findings.

However, the advent of Web 2.0 with its bi-directional flow of data and the consequent transformation of the lay person from data user to data producer has totally changed the landscape of data availability. Affected by this evolution, Geographic Information (GI) that originates from crowdsourced projects (i.e. Volunteered Geographic Information as it is commonly referred to, Goodchild, 2007) is now in abundance on the Web. VGI is expanding rapidly and GI stakeholders such as governments, NGOs and enterprises have started to grope this new domain (see for example the Haklay et al., 2014 report for the World Bank on the use of crowdsourced spatial data in government).

One of the most challenging issues with VGI is its quality assessment. The usually loose, non-hierarchical and uncontrolled production process of VGI is based on contributions from non GI-professionals (or neo-geographers, Turner, 2006) and thus concerns are raised on the overall fitness for purpose of such spatial data as there is no use of the evaluation methods mentioned earlier. However, academic research that focus on the assessment of specific quality elements has shown that quality is not a prohibitive factor for several spatial and map products. At the same time, new sources of uncertainty that have been surfaced, mainly related to the underlying socio-economic factors of the areas mapped, make the quality assessment of VGI seem as an intractable riddle. While a potential VGI user has no clear view of VGI quality, the administrators of VGI projects are in position to recognise and cure erroneous processes and take the necessary steps to formalise and standardise GI production so to gain a better insight into their spatial data quality. Otherwise, when it comes to a transaction between VGI producers and VGI users none of the two parties actually knows if the spatial data at hand is of any real value and therefore both parties are in mutual ignorance. Thus, there is either poor or imbalanced information available; information needed to make the final decision on the realisation of the transaction. The absences of an environment of symmetric information, where the interested parties will be mutually clearly informed of the spatial data quality is the fundamental problem when considering VGI data quality.

Interestingly enough, the 2001 Nobel Prize in Economics was awarded to Akerlof, Spence and Stiglitz for their analyses of markets with asymmetric information (Akerlof, 1970; Spence, 1973; Stiglitz and Rothschild, 1976). In the course of the paper, the concepts and the positions of the three Nobel Prize laureates are presented. More specifically, the paper will explore Akerlof’s contribution regarding the realisation and modeling of situations where a certain product in the market exists in both high and low quality but this information is asymmetrically distributed between the seller and the buyer. Akerlof proved that in these cases the consequences are damaging both parties as either low quality products will dominate the market or transaction will stop altogether, a process known as “adverse selection”. Thus, a theoretic model is provided that accurately describes the environment of asymmetric information that VGI data providers and VGI potential users are in. On the other hand, Spence and Stiglitz offered two different solutions focusing on the possible actions of sellers and buyers respectively. The former explains that one way to solve the problem of the asymmetric information environments is the party that has better information should act upon it and signal that information to any other interested party. The latter transposes the ability of improving an asymmetric information environment to the uninformed party through a method known as “self-screening”.

Then, returning the discussion to the Geomatics domain, these economic apparatuses are transferred and adopted to the current situation in VGI where the product is the crowdsourced data, the sellers are the VGI producers and the buyers are the VGI potential users. Common elements and factors are recognized, parallels are highlighted and opportunities for implementation are described. By doing so, these prominent economic theories are used as the basic components of a new framework to remedy the asymmetric availability of information on the quality of VGI.

09:50
Reshaping socio-spatial representativeness from probabilistic survey data: a case study from Marseille

ABSTRACT. See attached file (SDMQ_abstract_Audard_Carpentier_vfinal.docx)

10:30-11:15Coffee Break
10:30-11:15 Session 7: Poster Session
Location: Poster room
10:30
ISO 19157: A way for further improvement
SPEAKER: Jordi Escriu

ABSTRACT. International Standards ISO 19113, ISO 19114 and ISO 19138, which are nowadays withdrawn, had been widely used around the world to identify quality elements and subelements, to identify quality measures, as well as to evaluate and report data quality, respectively. The new ISO 19157, currently in force, integrated them in a new standard with the aim of increasing legibility, usability and simplicity. Additionally, it introduces other improvements such as a common structure, formal UML models, the new metaquality element, and the possibility of providing spatial and descriptive results, between others. This paper summarizes the results from the analysis of the standard performed in our organization, with the view to implement it and adapt our data quality and metadata records accordingly.

The analysis identified two main categories of issues:

1. Changes introduced in the standard that affect existing implementations (i.e. backwards compatibility): they may lead to potential changes in existing data quality and metadata records when implementing the new set of ISO standards (e.g. ISO 19157, ISO 19115-1). At the moment, not all the implications for these records are fully determined since the corresponding implementation standards are not available yet (ISO 19115-3, ISO 19157-2, ISO 19139).

  • The distinction between data quality elements and subelements is not explicit in the standard.
  • The Lineage is no more linked to the data quality section. Data quality overview elements (‘Purpose’, ‘Usage’ and ‘Lineage’) are considered as metadata elements in the scope of ISO 19115-1.

  • The way to create new data quality elements by data providers is supposed to be replaced by the Usability. This may affect existing quality elements defined by data providers in existing records.

2. Aspects that need further refinement, definition, harmonization or concretion in the standard:

  • The definition of some data quality elements may be ambiguous, which may lead to different interpretations. Particularly, the definition of the new ‘Usability’ data quality element is quite abstract and general. This quality element may be used to report almost any information the data provider is interested to share. However, no useful guidance in the use of this element is included in the standard.

  • The set of data quality elements foreseen in the standard is not enough to describe the quality of certain types of geographic data sets in a more specific way (e.g. for imagery data: histogram, sharpness, saturation, noise).

  • The definition of some data quality measures appears to be incorrect (e.g. Root mean square error of planimetry 2D - RMSEP - Measure 47; Root mean square error 1D or vertical - RMSE  - Measure 39).

  • There is no guidance (or link to any guidelines) on how to apply complex data quality measures (e.g. sample distribution and representativeness).

The mentioned issues, especially those identified in the second category, should be considered and tackled when implementing the ISO 19157. Ideally, these implementation experiences should be considered in future revisions of the standard to achieve its further improvement and a better usability. The analysis performed includes some proposals for resolving the issues while the implementation standards are approved.

10:30
On the quality assessment of mapping parties

ABSTRACT. Most studies on the quality of OpenStreetMap (OSM) do not perform a ground-truth control. As an example, in this paper we focus on assessing the quality of a data set from a mapping party (MP) celebrated in Baeza (Spain) (MPB) (Figure 1). This work is novel because: a) it is the first quality evaluation of a MP, b) the reference is the real world and c) we use the concepts of metaquality proposed in ISO 19157 for reporting the quality of results.

Figure 1: The town of Baeza in OSM before (left) and after (right) MPB. The objective of this work is to analyze the results of two sampling strategies for the quality assessment of the MPB (see Torres-Manjón el al. 2011 for details); and our analysis is centered on the completeness and thematic accuracy. In particular, the data quality subelements are: commission, omission and classification correctness. From ISO/TS 19138, the rate of missing items is selected for measuring omission (P1), the rate of excess items for commission (P2) and the misclassification rate for classification correctness (P3). Although the errors are counted, the results are presented as quality Q=1-P, that is to say as features without defects. This allows a more natural understanding of the final results. Anyway, a global measure of quality is also given. Because a full inspection is not viable, the estimations of corresponding quality measures Q1, Q2, Q3 are obtained by sampling. The choice of the sampling design is a crucial decision in order to achieve an appropriate level of accu-racy in the estimations (EPA, 2002). Here, two different strategies are considered: a cluster sampling (CS) over the framework defined by the city blocks and open spaces from the city of Baeza, and a stratified sampling (SS) defined to take into account the three types of geometry: point, line and area, respectively. It is important to claim that the SS is based on considering the type of symmetry as homogeneous strata (point, line, area), which supposes the design of three different simple random sampling on each stratum, whereas the CS simplifies the fieldwork and it considers that each cluster has the same heterogeneity as the population in relation to the presence of geometrical features and only a simple random sample of clusters is extracted (Alba and Ruiz, 2006). In both sampling design, we have estimated the standard error associated to each quality measure (ε), which is defined as the square root of the estimated variance, and the confidence intervals (CI) of level 100(1-α)%, 0<α<1, are also obtained. Stratified sampling. As said before, to design a stratified sample, three independent simple random sampling are performed, for point, line and area features. From the population of 6529 features to be inspected, 411 features was selected from the data set with size 6529. The division attending to the type of geometry was: 174 point, 149 lines, and 88 areas, respectively. This stratified sample supposes around 6.3% of data. Cluster sampling. A simple random sample of 27 clusters was selected, 20 of them corresponding to blocks and 7 corresponding to open spaces. All the features belonging to theses clusters form the final sample, 995 features in total (5% of data), which has to be inspected in the fieldwork. This sample size of clusters was initially decided as a pilot study, but from the analysis of the results, the pilot scheme was finally considered of enough statistical reliabil-ity, so no more clusters are considered for this study. Given the different sample sizes of the clusters in the popula-tion, a “type-ratio” estimator is considered for each quality measure. Table 1 shows the global estimation of quality and its confidence interval al level 95%, and Table 2 displays the estimations of Q1, Q2 and Q3 and their CI at the same level of confidence.

Table 1. Global estimation of quality and its confidence interval in the SS and CS. Sampling Sample size Estimated global quality (%)  CI 95% (%) SS 411 86.40 3.91 (82.49, 90.31) CS 995 45.63 11.69 (33.93, 57.32)

Table 2. Estimation of quality and the confidence interval for each quality measure CS SS Quality element Quality measure Estimation (%) ε (%) Estimation ε (%) Omision Q1 47.84 11.59 94.88 2.47 Commision Q2 98.79 0.97 98.85 1.41 Classification correctness Q3 98.99 1.72 98.2 1.72

Looking at these tables, we can conclude both sampling strategies provide similar results with respect to the commission element and the classification correctness element, with high commission correctness and high classifi-cation correctness. However, the SS is a more appropriate design to estimate the omision correctness than the CS design with relative standard error given by 4.5% instead of 25.6%.

References Alba Fernández, V., Ruiz Fuentes, N. (2006), Muestreo en poblaciones finitas, Septem ediciones. ISO (2006). ISO/TS 19138:2006. Geographic information -- Data quality measures. International Organization for Standardizati-on. Torres-Manjón, J., Tamayo de la Torre, J.R., Sevillano Sepúlveda, S. (2011). Una experiencia participativa (Mapping Party Baeza). www.idejaen.es/documentos/FinMapping_Baeza.pdf U.S. Environmental Protection Agency. (2002) Guidance on choosing a sample design for environmental data collection, EPA QA/G-5S, Office of Environmental Information. Washington DC

10:30
Proposal of a web service for positional quality control of spatial data sets

ABSTRACT. The positional accuracy of cartographic products has always been of great importance. It is, together with logical consistency, the quality element of geographic information most extensively used by the National Mapping Agen-cies (NMAs), being also the more commonly evaluated quality element option (Jakobsson and Vauglin, 2002). Positional accuracy is a matter of renewed interest because of the capabilities offered by the Global Navigation Satellite System (GNSS) and the need of a greater spatial interoperability for supporting the Spatial Data Infrastructures (e.g. Inspire). Different positional behaviors of two spatial data sets mean the existence of an inter-product positional distortion and a barrier to interoperation (Church et al., 1998). This barrier is not only for the positional and geometric aspects, but also for thematic ones which are greatly affected by position (Carmel et al., 2006). Since positional accuracy is essential in geospatial production, all NMAs have established statistical methods for its control (standards). Many of these standards propose the use of control samples of, at least, 20 control elements. Some researchers (Li 1991, Ariza-López and Atkinson-Gordo 2008, Ruiz-Lendínez 2012) indicate that a sample of 20 control elements is not large enough in order to obtain good estimations. Also, the preferred source for the control sample is a field survey (e.g. using GNSS), which is a very expensive operation. In European countries there are now spatial data sets of high quality, but also daily, new spatial data sets are generated and these should be evaluated. We believe that for these cases would be interesting to have web services with the capability of assessing the positional accuracy of new spatial data sets in relation to reference data sets. In this work we propose the design of a web service for the automatization of positional quality controls of spatial data sets. This proposal has to face different scopes: • Use Case. Establish the use cases for this idea. • Control elements. How we can automatically obtain valid and representative samples of control and controlled elements (feature matching). • Time degradation. How we can deal with the quality degradation of the reference data sets. • Standardization. The alignment with ISO (2011) and OGC WPS (Schut and Whiteside, 2005). • System architecture. Definition of the basic elements of the system architecture. • Outputs. Type and content of the quality evaluation reports and generated metadata and metaquality data. This work finishes with the identification of alternative uses of the proposed service. We are in the early stages of the research, but we believe that the framework necessary to handle positional accuracy may also be used to asses other quality elements, like completeness and thematic accuracy.

References Ariza-López, F.J. and A. D. Atkinson-Gordo (2008). “Variability of NSSDA Estimations”, Journal of Surveying Engineering, 134(2), 39-44. Carmel, Y., C. Flather, and D. Dean (2006). “A methodology for translating positional error into measures of attribute error, and combining the two error sources”, Proceedings of Accuracy 2006. 7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences. Caetano, M. and M. Painho (eds.), 5-7 July, Lisbon, Portugal, pp. 3-17. Church, R., K. Curtin, P. Fohl, C. Funk, M. Goodchild, P. Kyriakidis, and V. Noronha (1998). “Positional Distortion in Geo-graphic Data Sets as a Barrier to Interoperation”. Technical Papers ACSM. American Congress on Surveying and Mapping. Bethesda, Maryland. ISO (2011). “ISO/DIS 19157:2011. Geographic information - Data quality”, International Organization for Standardization (ISO). Jakobsson, A., and F. Vauglin (2002). “Report of a questionnaire on data quality in National Mapping Agencies”. Report prepared for CERCO Working Group on Quality. Comite Europeen de Responsibles de la Cartographie Officielle, Marne-la-Vallée, France. Li, Z. (1991). “Effects of check points on the reliability of DTM accuracy estimates obtained from experimental test.” PE&RS, 57(10), 1333-1340. Ruiz-Lendínez J.J. (2012). “Automatización del control de la calidad posicional”, Tesis Doctoral, Universidad de Jaén. Schut, P. and Whiteside, A. (2007). “OpenGIS® Web Processing Service”. Open Geospatial Consortium (OGC).

11:15-12:30 Session 8: Map and Cartographic Quality
Location: Main conference room
11:15
Quality Assessment in the framework of Model Generalization
SPEAKER: Natalia Blana

ABSTRACT. In recent years, along with the augmented production and dissemination of spatial data, numerous data quality issues arise that are being addressed by the pertinent authorities. A number of National Cartographic Organizations handle the assessment of data quality through the development and implementation of Quality Management Systems based on international standards. Nowadays, considerable parts of the disseminated spatial data bear an indication of quality, referring at least to their positional accuracy, lineage etc. In contrast to the above, maps and charts resulting from compilation of spatial data, are missing information concerning their quality characteristics making questionable their fitness for use. This paper elaborates on the issue of quality assessment in model generalization considering map as the result of composition of spatial data, which undergo a number of transformations in a series of processes. It constitutes a component of the ongoing research on the development of a map/chart quality assessment methodology aiming at the typification of the necessary quality control and the development of a quality model that will be applied on the map/chart production process (Tsoulos and Blana, 2013). The proposed methodology considers map/chart composition as a series of processes (phases) executed in a production line where a Quality Management System is implemented, according to the ISO 9001/2008 standard. According to the quality model under development, quality control is carried out in every phase of the map/chart composition process. The proposed quality model utilizes the ISO 19113, 19138 standards as tools for the assessment and quantification of quality and the ISO 19114 standard for reporting the results. Four basic structural elements comprise the proposed quality model: • Map specifications • Map quality requirements (derived from map specifications) • Map composition • Quality control tools that assess and measure quality Model generalization is the initial phase of map/chart composition, which in general is carried out in three main and distinct processes: model generalization, cartographic generalization and symbolization. Certain issues like format compatibility between the source database schema and the new one of the map to be produced, information modification or object and attribute aggregation are very likely to occur during model generalization (Nyerges, 1991; Yaolin et al., 2001; Van Smaalen J. W. N. 2003; Regnauld and McMaster, 2007). These issues should be resolved before data transfer takes place from the source to the cartographic database, otherwise information loss, object misclassification and missing attributes values will occur. The proposed methodological approach for the implementation of model generalization considers the above issues and adopts the quality elements of the ISO 19113 standard and the corresponding measures of the ISO 19138 standard for the identification of inconsistencies between the source and the cartographic database schema before data migration. It also utilizes a quality control process, which is applied on model generalization that includes the necessary quality checks for the assessment of the results of the generalization operations applied on classes, objects and attributes. Model generalization is executed in three phases: • In the first phase transformation of source data (classes, objects, attributes) according to the cartographic database schema specifications is carried out. Quality elements and their corresponding measures are utilized for the assessment of the integrity and quality assessment of the results of this phase. • In the second phase, the actual process of model generalization is implemented. Data from the source database are migrated in a temporary database and the transformations defined in the previous phase are executed. • In third phase, quality control is performed on the data of the temporary database.

Quality results are mainly of Boolean type (acceptable or not acceptable). If the results are not acceptable, data migration from the temporary database to the cartographic one is not carried out and new transformations are executed until results comply with the acceptable conformance levels. The resultant cartographic database is considered as “error free” and its content is readily available for input to the subsequent phases of map composition (cartographic generalization, symbolization). The methodology described, has been implemented for the model generalization of the EuroRegional Map data-base at scale 1:250. 000 to derive the EuroGlobal Map database at scale 1:000.000.

References Nyerges T. (1991), Representing Geographical Meaning. Map Generalization: Making Decisions for Knowledge Representation, Longman Scientific Publications, London, UK, 59 p. Regnauld N., McMaster R. B. (2007), Generalization of geographic information: Cartographic modeling and applications, Elsevier, Oxford,, UK, 37 p. Tsoulos L. and Blana N., (2013), “Map Quality Assessment-Groundwork and Implementation Approach”. In: Proceedings of 26th International Cartographic Conference, Dresden, Germany. Van Smaalen J. W. N. (2003), Automated aggregation of geographic objects: A new approach to the conceptual generalization of geographic databases. PhD thesis, University of Something, Somecountry, Wageningen University, Netherlands. Yaolin L., Molenaar M., Tinghua A., Yanfang L. (2001), “Frameworks for generalization constraints and operations based on object-oriented data structure in database generalization”. Geo-Spatial Information Science, 4(3):42-49.

11:40
Estimation of quality of cartographical products

ABSTRACT. The article presents a vector-hierarchical approach to estimation of map quality. It discusses the components determining map quality such as the mathematical basis of a maps, completeness of the map content, reliability of maps, geometrical accuracy of maps, the modernity of maps, quality of registration, the scientific and social value as well as the information content. The graph approach allows visualizing the components of map quality and thus estimating the map quality considering the full complexity of its characteristics. Besides such approach allows to estimate the information content of maps which directly is connected to the usability of a map.

12:05
Application of the International Standards ISO 2859-1 and 2859-2 in positional quality control of spatial data

ABSTRACT. International standards ISO 2859-1 (ISO, 1985) and 2859-2 (ISO, 1999) are widely used to determine a sampling acceptation procedure in inspection using attributes. Both of them are mentioned and recommended in ISO 19157 (ISO, 2013) for dealing with counting variables such are completeness errors, consistence errors, and so on. In this work we propose to apply these standards for positional quality control of spatial data. For this, we propose to consider positional errors as "positional defectives" when the measured positional error is greater than a tolerance , and in consequence the quality control process is based on counting the number of defectives, according to the user’s criteria. This idea is not new, the National Map Accuracy Standard (USBB, 1947) was based on it, but recent research by Ariza-Lopez and Rodríguez-Avi (2014a) reopen this possibility. One great advantage of this method is that it is not limited by any underlying hypothesis, such as error’s distribu-tions or error’s statistical models as occurs with the majority of quality procedures; for instance, the application of ISO 3951 is not recommend if data does not follow the Normal distribution (Montgomery, 2001). Our proposal can be used for free-distributions models. This means that ISO 2859 can be applied to the positional quality control of any kind of spatial data (e.g. points, lines strings, Lidar data, etc.) and dimension (1D, 2D, 3D…), using any kind of measure (e.g. Euclidean or Hausdorff distances, etc.) in a very simple fashion (Ariza-Lopez and Rodríguez-Avi, 2014b). In order to apply these standards we have established the relations between the indexing parameters (acceptance quality limit and limiting quality) of ISO 2859-1 and ISO 2859-2 and the positional errors of a given spatial data set by means of the observed error distribution function. This procedure starts establishing some criteria that allow us to obtain a numerical error measure in spatial data (e.g. establishing an error measure and a method of measurement). In this numerical expression we fix a numerical tolerance (maximum allowed error) and the percentage of elements that may have errors greater than the fixed tolerance. Using these elements we can apply this ISO 2859-1 and ISO 2859-2 which are adequate for lot-by-lot sequences and for isolated lots supplies. In order to illustrate the proposal we include examples using different kinds of positional errors (points and line strings), and for both cases, lot-by-lot and isolated lots. The conclusions of the work summarize the main advantages of the proposal.

12:30-14:00Lunch Break
14:00-15:00 Session 9: Processing Quality Results
Location: Main conference room
14:00
Metadata for (topographic) data on-demand
SPEAKER: Dolors Barrot

ABSTRACT. In the last decades, the hardware and software potential has increased exponentially so that, it is possible to manage and deal large amounts of digital data, especially spatial data; at the same time, the number and diversity of users as well of channels to access geographic information have also increased significantly. This has led the mapping agencies to modify the organization of the data products, moving from sheets to spatial databases, and to define polices of updating spatial data to adapt their services and products to the needs of customers, offering to the user the chance of defining an area of interest, incremental updates or thematic subsets to visualize, access, download or acquire the dataset. The most efficient way to manage the change of updating policies and the information organization in a database goes through the extension of the data model, adding metadata at object level: unique and persistent ID, life cycle and lineage. Considering that one of the aims of metadata is to help users to find the dataset more suitable to their needs, it is important to be able to adjust the metadata of the spatial database to the subset of interest. Under these conditions locate the dataset that fits with the needs of users and customers requires a new approach in the treatment and generation of metadata, because there is only one dataset, the whole database. Taking into account the deployment of the INSPIRE directive the analysis of searching parameters required in the location services shows that to provide a metadata for a spatial database it’s not enough; it is necessary to refine the data and introduce the appropriate mechanisms to generate specific metadata for the dataset of interest, which will be downloaded with the dataset if it is the case. Finally, it will be evaluated the metadata entity, conformant to the INSPIRE Metadata Regulation and to ISO 19115 Geographic information - Metadata and ISO 19115-1 Geographic information – Metadata – Part 1: Funda-mentals and ISO (2009), ISO 19115-2 Geographic information – Metadata – Part 2: Extensions for imagery and gridded data to determine the relevant metadata items at dataset level and how this information can be derived automatically from metadata at database and object level.

14:25
Quality control of Large-scale Reference Data at the Flemish Geographical Information Agency

ABSTRACT. The Flemish Geographical Information Agency (FGIA) is a public organization founded in 1995. The core busi-ness of FGIA is “enabling an optimal application of geographical information in Flanders”. In this point of view it is the leading provider of geodata and IT services to the Flemish market. In its role as data provider FGIA has created LRD (Large-scale Reference Database). This database is an object oriented reference database of Flanders with precise and current information on buildings, administrative parcels, roads and their lay-out, watercourses and railroads. Terrestrial measurements are complemented with photogram-metrical data to cover the whole area. The data is integrated in detail so that it is usable in a large-scale presentation with scale range between 1/250 and 1/5000. Started in 2001 LRD was finally completed at the end of 2013. Since 2006 FGIA has started updating LRD. Each town and village in the Flanders region is now updated every 6-9 months using different processes. All these processes have detailed data and quality specifications. FGIA has no own production unit, so all LRD data is outsourced to private data providers and their data is subject to quality control by FGIA. In the whole making and updating procedure of LRD, quality control is one of the key issues. The quality con-trols have strict deadlines so a standardized control process and easy to use quality system is necessary. Within 24 hours each delivery has to be controlled on errors in their file structure. Afterwards the advanced quality control starts. Depending on the updating process, the data will be controlled in a GIS environment, in the field by topogra-phers and/or with photo restitution.

In order to optimize the quality control, FGIA has designed and developed its own control system “MIRO”. This system supports the controllers during the complete quality process. Centrally there is the MIRO database where all the specifications and tests are defined. Besides the database there are four applications: MiroSuite, ReceiptCon-trol, GisControl and TopoControl. •MiroSuite is the management tool that enables the control process. It contains different generic tests which are adaptable to different processes. These tests include changeable parameters such as feature types, attribute and topological restrictions and a DE-9IM matrix to define the allowed interactions of the features. •ReceiptControl checks the incoming data on the required file structure. •GisControl is a GIS environment that offers an interface to run automatic topological tests (cf. ISO 19157) defined for the different processes and to evaluate their output. It also supports manual tests by facilitating the sampling of the data according to ISO 2859-1 and enables the registration of the errors. •TopoControl offers an interface for the preparation and processing of the field controls. Topographers control the positional accuracy and the completeness of the objects on site. There measurements will be uploaded in the TopoControl tool. All quality control results are uploaded to the MIRO database. MiroSuite enables to evaluate the overall quality and generates quality reports which are sent to the data providers. FGIA uses MIRO since 2009 and currently looks to expand the system to other vector products. Nowadays each product has its own control system. Although they have a lot in common, they all use different technologies, strate-gies, tests, … This is very cost inefficient and no longer manageable. Therefore FGIA started in 2014 a project to develop a generic vector quality control system. The aim is to build a modular system which is adaptable to all different vector products.

15:00-15:30Coffee Break
15:30-17:00 Session 10: Methods in Quality Evaluation
Location: Main conference room
15:30
Methodology for the similarity analysis of spatial patterns of samples

ABSTRACT. Sampling is usual in quality control of spatial data sets, for instance, for controlling the positional component or the quality of thematic classifications of images, among many others. In this context, spatial distribution of samples greatly affects the representativeness of the sample. However, effective and accepted statistical methods are not available to assess whether or not spatial distributions that follow those samples are consistent with a particular spatial pattern, especially in complex situations. In addition, it is not easy to evaluate the spatial similarity of two samples taken independently under a same spatial layout. This is a huge limitation that affects quality of results and therefore the metaquality. Statistical study of spatial patterns has come a long way from the initial work of Skellam (1952) and Clark and Evans (1954). The works of Dale et al. (2002) presented the main conceptual and mathematical relationships man-aged in the methods of analysis. In geomatics research, Ripley K function-based methods have spread in recent years (Ripley, 1997, Wong and Lee, 2005, RDCT, 2005), and the use of statistics based on the closest neighbors. Anyway, some of the main problems of these methods are: there is no spatial order, the effects of edges, extending beyond the 2D case, problems with the level of detail and when there is no homogeneity of the pattern (Dixon, 2012, Bolibok, 2008; Dale el al., 2002). Some of the problems mentioned above could be solved through the application of space filling (Sagan, 1994). These curves can be applied to the case 2D, 3D and n-D, and, in consequence, they are quite adequate to the space case solution. Other properties (see Bader, 2013) are also of great interest: linearization of space, standardization in the spatial scanning pattern, different level of lattice and auto-similarity. Some of these curves are applied in spatial data structures (Laurini and Thompson, 1992), and recently, they are applied in management, consultation and big-data visualization. In addition, there are several patterns (Peano, Morton, Hilbert, etc.) that allow rehearse different sweeps of space with different degrees of neighborhood. This work proposes a methodology, based on simulation, to test the behavior of the space filling curves as a pro-cedure to linearize spatial distributions and to compare them. More specifically, we simulated a huge number of samples from a particular spatial pattern, we applied several space filling curves (Peano, Morton, Hilbert, etc) and we tested if such techniques applied to spatial sample provide a consistent with the spatial pattern. The effect of the level of sweep and the sample size are also studied. One the space filling curve more appropriate to a spatial pattern is investigated. We perturbed it in several ways in order to test if the selected space filling curve is able to distinguish between two sampled spatial distributions. To achieve this task, some homogeneity tests for two samples based on the empirical characteristic function are used (Alba et al, 2009), for instance, the Kolmogorov-Smirnov test. The process proposed is presented in Figure 1.

Figure 1. Schema of the proposed methodology. Acknowledgements This work is supported by the research project UJA2013/08/01 (University of Jaén and Caja Provincial of Jaén). References Alba Fernández, V., Jiménez Gamero, M.D., Muñoz García, J. (2008). A test for the two-sample problem based on empirical characteristic functions. Computational Statistics and Data Analysis, 52: 3730-3748. Bader, M. (2013), Space-Filling Curves: An Introduction with Applications in Scientific Computing, Springer. Bolibok, L. (2008), Limitations of Ripley’s K(t) function use in the analysis of spatial patterns of tree stands with heterogeneous structure, Acta Sci. Pol. Silv. Colendar. Rat. Ind. Lignar. 7(1):5-18. Clark, P.J., Evans, F.C. (1954), Distance to nearest neighbor as a measure of spatial relationships in populations. Ecology, 35:445-453. Dale, M. R. T., Dixon, P., Fortin, M.J., Legendre, P., Myers, D. E. and Rosenberg, M. S. (2002), Conceptual and mathematical relationships among methods for spatial analysis. Ecography, 25:558–577. Dixon, P.M. (2012), Ripley's K function. For the Encyclopedia of Environmetrics, 2nd ed. Laurini, R. Thompson, D. (1992), Fundamentals of Spatial Information Systems, Academic Press. RDCT. (2005), R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org. Ripley B.D., (1977), Modeling spatial patterns (with discussion). J. Roy. Stat. Soc. B 39:172-212. Sagan, H. (1994), Space-Filling Curves, Springer. Skellam, J.G. (1952), Studies in statistical ecology: I. Spatial Pattern. Biometrika, 39:346-362. Wong, D., Lee, J. (2005), Statistical Analysis of Geographic Information with ArcView GIS And ArcGIS, Wiley.

15:55
A methodology to model the number of completeness errors using count data regression models

ABSTRACT. When we analyze the quality of a spatial data set, we can take into account several aspects, and one of these as-pects is related to the completeness. In this sense, and if it were possible, we can be interested in counting omissions, or number of elements (such as bridges, buildings, cross, power plants, …) which are in the real world surface but do not appears in spatial data set, and the number of commissions, or elements that appear in spatial data set but they do not actually exist, and, even, the sum of these errors. The greater is the number of these errors, the worse is the quality of the product. In addition to this we can also obtain additional information about the product, related to structural aspects, and may be interesting to investigate the existence of relationships between errors and structural aspects. In respect to these errors, and in a statistical sense, we can see them as discrete variables. More specifically, as count data variables whose values are non-negative integer numbers starting from 0. In consequence we can model them through count data model (Poisson, Negative Binomial, UGWD, and so on). Additionally, structural aspects of the spatial data set may be seen as exogenous co-variables. For this reason we propose the use of count data regression models in order to measure and explain the relation between covariates and errors. Specifically, and due to the nature of error data (overdispersed and without may zeroes) it is possible the use of the Poisson Regression Model (Cameron- Trivedi, 2013), the Negative Binomial Regression Model (Hilbe, 2011) or the Generalized Waring Regression Model (GWRM) (Rodríguez-Avi et al, 2009). This methods allows us to explain which covariates are relevantly related whit the number of errors, and for each specific data set in respect to the covariates, we can propose the error’s distribution in probabilistic terms. In this form, starting from sampling data, we can fit all mod-els and we can select the best fit according to any goodness of fit criteria, such as the Akaike Information Criteria.

In this work we propose a method and we prove it modeling the number or commissions and omissions in cells of 11 km2 of the Topographic Map of Andalusia (Spain). We adjust GWRM model for a data set coming from a ground survey sample of 192 cells. In this example, covariates considered are, for instance, count of elements, rural or urban typology, province, and so on. We obtain two different model for omissions and commissions, and we determine relations in both cases between the error variable considered and the exogenous covariates.

The results of the modeling are of practical interest for National Mapping agencies in order to manage the updat-ing processes of the spatial data bases. References

Cameron AC, Trivedi PK. (2013). Regression analysis of count data. Cambridge University Press. Hilbe, J. M. 2011 Negative binomial regression. Cambridge University Press, 2011 Rodríguez-Avi, J., Conde-Sánchez, A., Sáez-Castillo, A.J., Olmo-Jiménez, M.J., Martínez-Rodríguez, A. M. 2009. “A generalized Waring regression model for count data”. Computational Statistics and Data Analysis (53) pp. 3717-3725

16:20
Closing remarks