SDMQ2015: INTERNATIONAL WORKSHOP ON SPATIAL DATA AND MAP QUALITY
PROGRAM FOR TUESDAY, JANUARY 20TH
Days:
next day
all days

View: session overviewtalk overview

09:00-10:30 Session 1: Quality in Spatial Data; Changing Expectations
Location: Main conference room
09:00
Opening of SDMQ 2015
09:15
Quality Management Practices in European National Mapping and Cadastral Authorities
SPEAKER: Carol Agius

ABSTRACT. National Mapping and Cadastral Authorities (NMCAs) are the traditional producers of authoritative reference data in Europe. The calling card of NMCAs is that the data they produce are of high quality. They are also often public sector organisations reliant, to varying degrees, on public funding. They therefore have to do more with less and be cost efficient. Quality does not just happen, organisations that strive to provide their customers with quality services and prod-ucts have to implement a strategic, organisational structure in order to ensure quality. In this respect NMCAs are no different to other organisations. ISO 9000 and Total Quality are two of the main quality initiatives currently implemented across the globe (Goetsch & Davis, 2013, p.236). ISO 9000 is mostly focused with a quality management system put into place for the production and delivery of an organisation’s products and services (ISO, 2008) while Total Quality Manage-ment (TQM) is a management approach focusing on the idea that quality is customer-driven and involves all people and functions of the organisation. TQM applies the concept of quality to all functions and actions within an organi-satio be they processes, people, or values (Goetsch & Davis 2013, p. 3 & p. 237; Chartered Quality Institute, 2012). Various quality ‘gurus’ of quality, such as Deming, Juran, Crosby, Ishikawa and Feigenbaum, highlight various factors that are required for an organisation to implement quality. Talib and Rahman (2010, p. 367) specify these as critical dimensions for the successful implementation of total quality management in a service organisation. These include factors such as top-management commitment, customer focus, training and education, continuous improve-ment and innovation, supplier management, employee involvement, information and analysis, and process manage-ment. Various authors divide these factors into the “soft” people side and the “hard” technical side of total quality. Both aspects have to be included when implementing TQM, however many organisations tend to focus more on the technical side (for example such as striving for ISO 9000 certification), since this is comparatively easier to imple-ment than the soft components (Tari, 2005, pg 187). EuroGeographics is the association of European NMCAs, and has carried out various surveys to measure the level of adoption and implementation of quality management systems amongst member organisations; different survey questionnaires were sent to the association’s members in 1998, 2005 and 2012. Over the period surveyed there was an increase in the number of NMCAs which had either already implemented or were in the process of developing a quality management system: from half the respondents in 1999 to two thirds by 2012. Questions asking whether the organisations adopted ISO 9000 standards, Lean or Six Sigma tools or TQM showed that while NMCAs have implemented a QMS based on ISO 9000 standards (80% in 1999 and 67% in 2012), only 20% responded that they had adopted Lean or Six Sigma tools and none adopted TQM principles in 2012 (CERCO, 2004, p. 6; EuroGeographics, 2013, p. 18). The results from these surveys give the impression that European NMCAs are more focused on the implementa-tion of the technical aspects of quality management than the soft people side of quality management. However a quality management system is only successful if all the factors are implemented together, the different components cannot be viewed independently of each other (EFQM, 2003 cited by Calvo-Mora, Picón, Ruiz & Cauzo, 2014, p. 117). A research study to investigate to what extent European NMCAs actually implement the various factors of quality management systems was carried out in September 2014 through a survey of the member organisations of Euro-Geographics. The survey asked 32 questions about eight factors of quality management: top-management commit-ment, customer focus, training and education, continuous improvement, supplier management, employee involve-ment, information and analysis, and process management. The responses from the questionnaire will be used to identify current quality management practice amongst European NMCAs; to highlight any gaps between current practice within NMCAs and quality management theory; and as a basis for recommending actions that NMCAs can implement to improve their current quality management systems and practices. The presentation will focus on the findings of the survey.

09:40
Volunteered Geographic Information (VGI) and other cartographic (r)evolutions: Will there be a “raison d’être” for national mapping agencies in a few decades from now?

ABSTRACT. National mapping agencies (NMAs) are in many countries the official organizations in charge of producing topographic maps. NMAs employ professional cartographers, trained through specialized schools, that follow rigorous procedures to create accurate maps of entire countries’ territories. Cartography became through the ages a profession requiring complex scientific and technical training mixed with some artistic talent for map design.

The rise of geographic crowdsourcing a decade ago (Goodchild, 2007), called volunteered geographic information (VGI), allowed anyone with access to a computer connected to the Internet to produce and share geographic information through projects like OpenStreetMap (OSM). If such movement was at first received with a significant level of skepticism by NMAs and professional cartographers, projects like OSM turned to be far more popular than initially expected and the power of the crowd proved itself in cartography like in other fields. In the last decade, OSM involved more than 1.7 millions users, generating the largest world geographic information database ever done. Multiple studies on the quality of the data produced by VGI contributors surprised by VGI data quality (Haklay, 2010), showing a superior accuracy for populated places compared to data produced by many NMAs. From a project that started with the mapping of streets in the UK to react against the pricing policy of the UK NMA, OSM moved to a complex self-organized system with a dynamic data model (i.e., folksonomy), a hierarchical structure of contributors, regional boards, etc. At a time where NMAs face difficult economic situations and in the context of an industry increasingly interested in mapping (e.g. Google Maps, Microsoft Bing), VGI projects became attractive to many NMAs for their ability to capitalize on contributions from citizens (Wolf et al., 2011) at a low cost.

Giving the tools to citizen to collect geographic data accentuated a phenomenon that started years ago in NMAs from different countries, which is a shift away from data collection activities to become more of a data integrator and a provider of products and services for the population. In such context where VGI projects are also increasingly used as basemaps and to support decisions, some could wonder about the long-term “raison d’être” of NMAs and dream of a not-so-distant future where geographic data will be collected by citizens or automatically captured by sensors (e.g. on phones, cars) or high-resolution imagery, aggregated through various automated processes, and delivered in real-time as products customized to individual users.

This paper will discuss the current status of VGI initiatives and other distributed data collection mechanisms that could collect large amount of data, the quality of the data that are produced, the characteristics of VGI datasets and possible future developments. We will discuss how mapping organizations like NMAs and how individuals like professional cartographers may have to rapidly adapt to those changes in the next decades in order to ensure that the professional skills developed over centuries can contribute and foster these changes to ensure that data best fit the use (i.e. data of quality) of end users.

10:05
The Dutch Key register Topography as open data

ABSTRACT. In the Netherlands national mapping is regulated by law. It is Kadaster’s statutory task to maintain a number of registrations. One of these registrations, the Basisregistratie Topografie (Key register Topography, BRT) consists of digital topographic data sets at different map scales (1:10k, base data set and derived / generalized data sets at scale 1:50k, 1:100k, 1:250k, 1:500k and 1:1.000k). The law requires an actuality of less than two years for the whole range of the BRT product family. Governmental organizations are obliged to use the available BRT data sets for the exchange of geographical information. Since January 1, 2012 the data sets of the BRT are available as open data. This means that all the products of the BRT are freely available to everyone to use and republish, without restrictions from copyright and patents.

The introduction of the BRT as open data has lead to a growth in the use of the BRT, especially by commercial companies (Bregt et al., 2013). The wider use of the data has lead to a demand for increased actuality which exceeds the demand for data quality. In 2011 the actuality of the base data set (1:10k) met the requirement of an actuality less than two years. For the small scale datasets the actuality was somewhere between two and ten years. To increase the actuality of the BRT, two important changes were applied. First the LEAN methodology was introduced in the production of the base data set of the BRT (Bruns, 2013). In LEAN the focus is on the customer and on the direct creation of value for the customer (Figure 1). All steps in the process that do not add customer value are a target for elimination. This methodology changed the production process and the mindset of the employees. For example the old process contained multiple quality checks. In the new process, several of them were eliminated and replaced by a quality metric at the end of the process. This step measures the quality of the product in the perspective of the customer. As a side effect, the elimination of the quality checks has led to an increased quality of the product.

The second change was the introduction of automatic generalization of the 1:50k scale data set in September 2013. The new process is developed between 2010 and 2013 in several iterations (Altena et al., 2013). During the development Kadaster used the feedback from the customers for the next development iteration. The result is a fully automated generalization production workflow which generalizes the 1:50k map series from the 1:10k base data set in less than three weeks. Although the automatically produced 1:50k map is not equal to the manual generalized map, the customers are satisfied. In the near future also the 1:100k and smaller data sets of the BRT will use this new procedure. Because of the vast reduction in processing time, it is now possible to generalize the small scale data sets directly in flow after the finishing of the base data set. Therefore the actuality of the derived data set is the same as the actuality of the base data set and meets the requirement of an actuality less than two years. Nowadays the base data set and derived map series are released simultaneously with the same actuality five times a year. For the customer this means more actual maps more frequent.

10:30-11:15Coffee Break
10:30-11:15 Session 2: Poster Session
Location: Poster room
10:30
Analysis of existing training programs on spatial data quality and curriculum diagram proposal

ABSTRACT. Nowadays an unprecedented volume of spatial data is being generated. Nevertheless, still it is usual for producers and users to be more worried about availability of data while and specific quality indicators remain in the back-ground. Even more, in many cases data have no quality indicators in their metadata, which not helps in the decision processes. It can be said that this is related with a cultural aspect related with best practice, in which the implementation of the existing standards should not be done only by necessity (i.e. requirements in contracting) but with the conviction in the obtained benefits, which encourage the efficient and effective use of the geographical information (GI). In this context, training has much to say. The most of the training programs in Geomatics at the University do not go in depth with spatial data quality, so specific training about this issue is required. These programs are more oriented to production, and aspects about quality assessment, assurance, improvement and management also remain in the background. This entails the need to develop training activities with contents centred on quality of GI with high technical specialization. Understanding the basic rules, tools, and fundamentals of quality are key issues. Focus on specifications and in-ternational standards about IG (OGC specifications, ISO/TS 211 standards as 19157, 1958, 19122, 19131, etc.) is required, but also the knowledge about many standards from the industrial field (i.e. ISO 8000, 9001, 2859, 3951, etc.) related with quality management, assurance, statistical aspects, etc. There are ongoing training activities on spatial data and map quality issues in various national mapping organisations, universities, and the private sector. Some examples are presented in Figure 1. Many of them are shorth-length courses (a few days or weeks), with a partial view about a topic related with spatial data quality, and/or with a research approach instead of a practical one.

  Figure 1. Different training activities related with spatial data quality From the analysis of the worldwide existing training activities, we want to propose a new master’s degree special-ized in spatial data quality and with a global view. Strengths and weaknesses of the existing activities should be highlighted and taken into account for the new proposal. The curriculum diagram of the master should be focused to the national mapping agencies (and other state institutions) needs and centered not only on quality assessment, but also on quality management. This is important given the limited experience in the quality business sector in relation to spatial data. This course should be aligned with international standards (ISO 19157, 19115, 2859, 3951, 9001, 9004, etc.) and the approach should be both theoretical and practical.

10:30
Quality information representation in ELF Geo Product Finder
SPEAKER: Kai Koistinen

ABSTRACT. Applications of spatial data quality, Communication/visualization of spatial data and map quality, Use of ISO metadata and quality standards ELF (European Location Framework) data quality evaluation and reporting process includes multiple phases and many different parties. Data producers, ELF validation tools, expert reviewers and ELF platform users each can evaluate and report issues regarding data quality. Reporting is supposed to be mainly done via two different reporting channels: INSPIRE/ISO19115 compliant metadata documents and ELF feedback service.

Data quality evaluation and reporting process results as well documented data quality information in standardized form. To benefit from the quality reports, the quality information should somehow be represented to the ELF platform users that consider using or are already using the data. In ELF, this data quality representation task is handled by a tool called the Geo Product Finder.

10:30
INFORMATION SYSTEM OF PRODUCTION IN THE NATIONAL TOPOGRAPHIC MAP OF NATIONAL GEOGRAPHIC INSTITUTE OF SPAIN

ABSTRACT. The National Geographic Institute of Spain, IGN-S, has an extensive experience as National Mapping Agency. Specifically, the National Topographic Map, MTN, has been publishing map sheets from 1875 to present.

Having a total quality control of processes is critical for any production system that wants to improve the product and meet the specifications. Nowadays, it is not enough to have a management system that allows the manager to carry out the production process verification with the aim of planning and optimizing the production. Current needs require answers and improvements to become more efficient, for that reason the control must be done in real time, as automatically as possible and with greater involvement of the entire organization.

In this sense, the MTN has created an Information System of Production which is an important shift in the tradi-tional quality control processes philosophy. The system provides a list of indicators and alarm times, automatically and continuously, which make possible for all the team members to enhance the production and improve the error control.

It is based mainly on three aspects: Firstly the automatic exploitation of the control DB and access to the results in real time, that is, the system provides the needed information directly and during the process execution. Secondly, greater process consistency, with just two types of time stages; "lead time" and "waiting time", disap-pearing completely the stage "lost time". Thirdly and finally, open access to this information for all production members, allowing them to take an active part in the control and the resolution of deviations (self-monitoring and self-correction tasks), as well as a more global view of the project and the role they play in it (transparency and involvement).

11:15-12:30 Session 3: Working with multiple data providers: The ELF approach
Location: Main conference room
11:15
ELF Data quality rules and validation results
SPEAKER: Sonja Werhahn

ABSTRACT. The European Location Framework (ELF) project will deliver the first implementation of the European Location Framework (Jakobsson, 2012) - a technical infrastructure which harmonises national reference data to deliver au-thoritative, up-to-date, interoperable, cross-border geospatial reference data for use by the European public and private sectors in a way that is easy to use by application developers and even end users. (Jakobsson et al., 2013) The ELF data specifications aim at facilitating the interoperability of topographic, administrative and cadastral reference data according to the requirements set in the INSPIRE directive and to other user requirements at regional, European and global levels. The ELF data specifications describe the conceptual data model for creating harmonised cross-border, cross-theme and cross-resolution pan-European reference data from national contributions. Thus, the ELF data specifications are important to ensure a coordinated approach regarding harmonisation of the NMCA’s data and services in matters of content, geometric resolution and quality (Jakobsson et al., 2013). In order to achieve harmonisation not only by using common ELF data specifications but also by harmonising the geometrical and semantic content of the data and the production and maintenance process several additional specifications for products and services and for the maintenance and processing workflow will be developed in ELF. Based on these maintenance and processing specifications geoprocessing tools will be set up. As one of these tools a data quality validation tool will be established including cloud based commercial services. These tools will validate the input data against data quality rules developed for ELF. The ISO 19157 Geographic Information- Data Quality standard, the findings from the ESDIN project, and the experience from ERM were the starting point for developing the ELF data quality rules. The ELF data quality rules concentrate on the requirements needed to achieve a harmonised pan-European data set and identify errors in the data’s topological structure as well as feature/attribute compliance with the ELF data specification. The ELF data quality rules are mainly based on the ERM Validation Specifications and the experience gained in the ERM validation process as described in Hopfstock et al. (2012). For ELF only automatically testable quality criteria were considered for the data quality rules. Quality rules that can only be check against reality or large scale reference data are thought to be in the responsibility of the NMCAs and out of scope for ELF. Where possible the rules were assigned to the appropriate data quality element and sub-element of the ISO 19157 Geographic Information- Data Quality standard. The ELF data quality rules were written in RuleSpeak, a set of practical guidelines and sentence forms for ex-pressing business rules in clear, unambiguous and well structured English. This approach was chosen to have an implementation independent description of the data quality rules that everybody (users, producers and tool pro-grammers) can understand and can come to a common understanding. The rules are implemented by ESRI and 1Spatial and can be used by the data producers to validate their ELF data. The tools also generate tables stating statistics on errors and the presence or absence of features and attributes in the data. These results are reported back to the data producer, used in the data quality reporting process and made available to users. The aim was to have clear and transparent data quality rules and to provide users with detailed results of the validation of the ELF data. References ELF Project: http://www.elfproject.eu/ ESDIN - European Spatial Data Infrastructure with a Best Practice Network: http://www.esdin.eu/ Hopfstock et.al. (2012), “EuroRegionalMap – A joint production effort in creating a European Topographic Reference Dataset”. In: Jobst M. (ed.), Service Oriented Mapping 2012, JOBST Media Präsentation Management Verlag. Wien 2012, pp. 151-162 Jakobsson, A. (2012). “Introducing a New Paradigm for Provision of European Reference Geo-Information – Case Study of the European Location Framework Concept”, In Jobst M. (ed.), Service Oriented Mapping 2012, JOBST Media Präsentation Management Verlag. Wien 2012, pp. 51-62 Jakobsson et.al. (2013), “European Location Framework – One Reference Geo-Information Service for Europe”. In: Buchroithner et.al. (eds.), Proceedings of the 26th International Cartographic Conference (ICC 2013), Dresden 2013 http://icaci.org/files/documents/ICC_proceedings/ICC2013/_extendedAbstract/377_proceeding.pdf RuleSpeak: http://www.rulespeak.com/en/

11:40
ELF Data Quality Reporting Process

ABSTRACT. Governance aspects of spatial data management, Quality issues in spatial data infrastructures (e.g. national, INSPIRE), Spatial data quality assessment Abstract: The European Location Framework (ELF) is a technical infrastructure, which delivers authoritative, interoperable geospatial reference data from all over Europe for analysing and understanding information connected to places and features. This infrastructure is going to change how national geospatial reference data is sup-plied by National Mapping and Cadastral Authorities and consumed by the European public and private sectors.

Making reference data available from national sources across Europe through a technical infrastructure is challenging. This also applies to the metadata and quality information where national information needs to be combined with the results from the interoperability processes.

The submission discusses how data quality reporting can organised in a multi-organisational environment to meet the expectations of the different stakeholders.

12:05
Quality Management of the European Location Framework

ABSTRACT. This paper will introduce principles for the quality management of the European Location Framework. The Euro-pean Location Framework (ELF) project will by 2016 deliver the first implementation of the European Location Framework (Jakobsson, 2012) - a technical infrastructure which harmonises national reference data to deliver au-thoritative, up-to-date, interoperable, cross-border geospatial reference data for use by the European public and private sectors in a way that is easy to use by application developers and even end users. The project will provide a critical mass of content and coverage as at least 14 Member States’ national ELF/INSPIRE data will be made available from a single source (ELF platform) connecting to a number of applications and ArcGIS Online, a commercial Cloud GIS platform. The ELF platform will be implemented using an Open Source development made originally for the Finnish Spatial Data Infrastructure (SDI), Oskari. Covering the full range of INSPIRE Annex I, II and III themes, these datasets will provide full national coverage of the rich content available from national and regional SDIs. We will introduce organization and process viewpoints to the management of quality for ELF. Quality Manage-ment of ELF will be based on ISO 9000 and ISO/TS 19158 standards. ISO 9000 (ISO, 2005) is a widely used qual-ity management standard series, which concentrates on the process and organization centric part of the quality management. ISO 19158 Geographic Information - Quality Assurance of Data Supply (ISO, 2012) introduces quality assurance levels that can be applied to data production processes and organizations. In this approach data custodian can get assurance that the entire supply chain is capable of producing the quality required. Ordnance Survey has successfully implemented this in their data supply processes and first experiences in the international context have been implemented within EuroGeographics ERM production. In the ELF project, quality evaluation based on ESDIN (2011) results will put in practice by using commercial tools. The goal is to introduce a standard way in which quality models can be expressed as rules enabling utilization in multiple software environments. Project has decided to use RuleSpeak (Ross, 2009) for expressing the rules in a standardised, technology agnostic way. Figure 1 represents the ELF data quality approach. As indicated in the Figure 1, national data processes and quality testing is not considered in the ELF. In this paper the focus is on quality assurance at organizational and process levels not in actual data quality validation process implementation. National quality validation processes and organization for ELF as well as European quality valida-tion is considered. Final organizational model for ELF has not yet been agreed but in this paper following approach is assumed: • NMCAs are responsible for validating their ELF data and setting the ELF services. • Regional co-ordinators will be managing the expert reviews for their regional data provider’s data us-ing the ELF data quality validation tool(s). • ELF Quality Manager will be responsible for overall quality management within ELF. • ELF Quality Manager will be assuring the ELF regional co-ordinators using the proposed approach below. • ELF Quality Manager will together with ELF Regional co-ordinators manage the quality assurance of NMCAs based on Data Producer Agreement. As the European quality validation will be done when national data is already ready, it is too late to correct errors at this stage. Therefore it is utmost essential that national data providers’s production processes for ELF are assured by utilization of ISO 19158. Assurance is needed at the national level but also at the European level. This assurance at the national level will consist of three steps: 1) Basic Level Assurance; Assuring that a process appears to be capable of creating or maintaining ELF prod-ucts at the required quality. This includes identification of production processes and utilization of ELF vali-dation or some other tools. 2) Operational Level Assurance; This will be achieved working together with ELF regional co-ordinators, checking that validation results are what is excepted, training programmes have been implemented. 3) The Final Level Assurance is achieved when the data provider for ELF is capable of maintaining the quality achieved at the operational level over period of time.

Utilization of ISO TS 19158 principles in the international context has not been widely tested. The approach proposed is based on experiences within EuroGeographics and Ordnance Survey. The methodology is tested and shown to be working in a commercial supplier/ customer relationships. As the provision of data for current Euro-Geographics products has been more based on NMCAs membership to the association, the challenge is to change this to more business like relationship for ELF. It is critical for the success of ELF that quality management is im-plemented. References ESDIN (2011), ESDIN D1.10.3 Public Final Project Report. http://www.esdin.eu (accessed 30th September 2014). Jakobsson, A. (2012), Introducing a New Paradigm for Provision of European Reference Geo-Information – Case Study of the European Location Framework Concept, In: Jobst M. (ed.), Service Oriented Mapping 2012, JOBST Media Präsentation Management Verlag. Wien 2012, pp. 51-62 International Organisation for Standardization, ISO (2005), ISO 9000: 2005 Quality Management Systems - Fundamentals and vocabulary. International Organisation for Standardization, ISO (2012), ISO/TS 19158 Geographic Information- Quality Assurance of Data Supply. Ross, R.G (2009), Basic Rulespeak Guidelines, Do’s and Don’ts in Expressing Natural-Language Business Rules in English, Version 2.2, Business Rule Solutions, LLC.

12:30-14:00Lunch Break
14:00-15:00 Session 4: Tools for Automation of Quality Evaluation
Location: Main conference room
14:00
Good Quality Data, Maximising Business Value
SPEAKER: Jo Shannon

ABSTRACT. Data quality is an integral and vital part of any data driven organisation, and yet is often considered more of a ‘problem creation’ exercise than a business and operational enabler.

It is well understood that the maintenance of large datasets is time consuming and requires significant investment. Therefore, ensuring that data quality validation is an inherent (and ideally automated) part of any system or set of processes, gives you the confidence that your investment in maintaining your data can deliver business value, over and above your base dataset.

The concept of ‘capture once, use many’ is a common goal and ambition in the National Mapping and Charting Agency community – in fact any data producing organisation. Automated data integration and generalisation are key capabilities in the implementation of any ‘capture once, use many’ strategy. However these processes are completely dependent on a known level of data quality to ensure that automated processes can make the correct decisions when performing data integration, inference and generalisation.

1Spatial’s innovative data management environment 1Spatial Management Suite contains three data processing products that build on this concept; • 1Validate checks your data, against your own rules base and provides a report to show the con-formance level of your data against your rules. • 1Integrate enables automated correction of any data quality issues, data conflation to exploit 3rd party datasets and edge matching to enable cross border integration • 1Generalise produces generalised data based on a configurable set of parameters. All of these products can be used as stand-alone products or as part of an orchestrated, automated system.

To unlock the potential of your data to be used over and over to deliver value, we need to erase the stigma attached to having less than perfect data quality. Defining a specification for your data and assessing how well it measures against it can uncover challenges for the organisation and require investment in resolving those challenges. This can sometimes be unwanted information, or be viewed as finding problems, not delivering solutions. This is an understandable, if undesirable, reaction but can lead to not tackling quality issues, or worse, not looking for them.

This approach may be viewed as adequate for standard data maintenance, but are you getting the most out of the data you maintain? Is your business efficiency affected because your ability to exploit automation is limited? Could investing in your spatial data quality unlock more value from your data?

The question is, if it were easy and delivered business value and opportunity to grow and expand hori-zons, would data quality be higher up the investment priority? If good data quality enabled you to maximise the value you get from your data, would it be seen as an enabler, rather than a problem?

By changing this perception, good quality data would no longer be a set of problems, but a pre-requisite to business development and growth, the foundations on which new products are innovated. So by crediting the ability to do more with the same data to measuring and improving data quality; data quality management becomes a key business enabler and viewed as a solution, not a problem.

Let’s make data quality validation and management easy, quick, part of everyday and the corner stone of your organisation’s ability to do more with it’s asset.

14:25
Consistency and Quality of INSPIRE & ELF Data, using GIS Tools
SPEAKER: Paul Hardy

ABSTRACT. The INSPIRE directive and its associated ‘Implementing Rules’ [European Commission 2014] are encouraging the publishing and sharing of a wealth of geospatial data from many countries and many source agencies. The ELF project [ELF 2013] builds on INSPIRE and has a goal “to deliver the European Location Framework required to provide up-to-date, authoritative, interoperable, cross-border, reference geo-information for use by the European public and private sectors”, by harmonising the authoritative data from National Mapping Organisations (NMOs), and making it readily available in scalable online services. Esri is a major supplier of GIS software and one of the partners in the ELF project. This paper reviews our experience in customising and deploying Esri commercial off-the-shelf (COTS) tools to improve the consistency and quality of such INSPIRE & ELF data.

The Esri ArcGIS platform across desktop, server and online applies rigorous geodatabase structures, and provides a consistent approach to data quality and integrity checking and enforcement. The schema supports attribute domains to ensure that only valid values are stored. Historic NMO data may contain geometry overshoots, gaps, slivers, and overlaps, arising from survey, CAD, or legacy mapping. ArcGIS has a rule-based topology engine that can take this data, clean it, and build it into structured networks and continuous polygon tessellations - important to NMOs in eliminating errors resulting from overlapping boundaries and incomplete polygon descriptions and for building clean road networks and land-use coverages.

We highlight the specific tools particularly relevant to INSPIRE and quality. These are ArcGIS for INSPIRE [Esri 2014a] and ArcGIS Data Reviewer [Esri 2014b]. We discuss the geodatabase data models for the INSPIRE Annex I themes that are provided by ArcGIS for INSPIRE, and the way that the normalised constructs of the conceptual INSPIRE models are ‘flattened’ for efficiency in a GIS relational database. We also discuss the challenges in implementing the often more complex models of INSPIRE Annex II and III.

We discuss the capabilities of ArcGIS Data Reviewer (ADR) to implement and manage data quality control workflows that include both automated and manual QC tasks. These capabilities include the ability to automate validations using standard out-of-the-box checks configured to implement specific business rules. We discuss options for deploying automated validation across the ArcGIS platform, which includes both the desktop and server environments, to support scalability, data quality reporting and stakeholder engagement via web services.

The NMO members of the ELF project have generated a specification spreadsheet document containing a set of high-level quality rules in ‘RuleSpeak’, together with appropriate parameters at the various ‘levels of detail’ of typical NMO data. These include Master LoD0 at around 1:1K, LoD1 at around 1:10K, LoD2 around 1:50K, Regional LoD3 at 1:250K, and Global LoD4 at 1:1M. RuleSpeak is “a set of guidelines for expressing business rules in concise, business-friendly fashion. It is not a language or syntax per se, but rather a set of best practices for speakers of English to capture, express and retain decision criteria and business know-how effectively” [RuleSpeak 2014]. Its use is explained further in the Business Rules Manifesto [BRG2003].

We discuss the mapping between the specified RuleSpeak forms and the ArcGIS Data Reviewer (ADR) standard checks, together with the types of configuration required to instantiate the necessary quality rules using ADR. We discuss omissions, confusions and problems in ELF rules and/or in the matching ADR checks.

We briefly discuss the wider scope and meaning of spatial data quality, and its history [Hardy1990]. To conclude we summarize the overall pros and cons and benefits of using configurable COTS GIS software for consistency and quality checking and enforcement of INSPIRE and ELF data.

15:00-15:30Coffee Break
15:30-17:00 Session 5: User communities
Location: Main conference room
15:30
Making Open Spatial Data Operationally Useful for the British Red Cross – Assessing Metadata Quality
SPEAKER: Claire Ellul

ABSTRACT. The British Red Cross (BRC) carry out tasks including preparing for disasters, providing first aid, finding missing families, supporting disaster recovery, fund raising, refugee support, health and social care, as well as running a chain of charity shops where the public can make donations which are then sold to raise funds for the organisation. They are increasingly making use of spatial data to support these tasks, and are in the process of growing their GIS team, which is seen as fundamental to enabling the organisation to meet its targets.

Given their charitable status, they rely predominantly on free spatial data for any analysis, and one of the issues they face is identifying spatial data of appropriate quality for their work, from the increasingly vast range of open data sources in the UK, including the government open data portal data.gov.uk and the emerging outputs form the INSPIRE (Infrastructure for Spatial Information in Europe) project. This data is used in conjunction with their Open Source software stack, which provides spatial data services to the organisation via a web mapping tool, as well as allow the GIS team to conduct bespoke analysis.

This paper will describe an evaluation of the suitability of this open data for the BRC, consisting of three stages. Firstly, interviews with BRC staff were conducted to identify operational requirements for spatial data. Secondly, a manual review of available data (taking a sample of the 19205 datasets from data.gov. uk and of the 5289 INSPIRE datasets) was conducted and a number of Case Studies developed to demonstrate potential datasets in the contexts of air pollution and chronic obstructive pulmonary disease, and to identify where ‘home help’ is required to support the elderly in their daily activities.

The review itself highlighted the widely varying quality of metadata available for different datasets, and the BRC team also expressed contrasting views about metadata, with one person expressing the view that there is too much metadata, and others expressing a wish to know both file size of any dataset as well as ‘what the shapefile contains’. No clear criteria for selecting appropriate datasets were evident in the organisation.

Given the metadata quality issues and the debate within BRC as to the suitability and relevance of metadata, the third and final stage of the project involved developing a flexible metadata quality assessment tool which allows BRC to define an XML template for the required metadata, including details of which information is required (e.g. abstract, keywords, spatial reference, bounding coordinates and so forth), to determine for each field whether it is mandatory or optional, and to set a minimum number of words required in each field. A flow diagram, setting criteria for the high quality metadata that should be present before a dataset is considered for adoption by the BRC, was developed. The tool was then developed in Python, making use of XML both for the template metadata and the metadata to be used for evaluation. The flexibility of the tool means that any number of metadata templates can be created and evaluated, recognising the need for project-specific flexibility in metadata content. It could also be said that high quality metadata is an indicator of high quality data, and such a tool would facilitate BRC’s review and suitability assessment of the 20,000 available datasets.

Overall, it was concluded that, to date, the available INSPIRE data is not necessarily suitable for BRC operations although this may change in future as other datasets (e.g. demographics, addressing) come on stream. However, a number of other datasets from data.gov.uk do show potential for specific analyses within BRC. Metadata quality and variability was, however, highlighted as a key issue, impeding both data discovery and suitability assessment.

15:55
Predictive Analysis and Probability of Error Scoring, Study and Practise - Abstract
SPEAKER: Matt Tobin

ABSTRACT. Predictive Analysis and Probability of Error Scoring, Study and Practise - Abstract Stephen Tope & Matt Tobin Ordnance Survey Operations Quality Management, Southampton, SO16 0AS, England Stephen.tope@ordnancesurvey.co.uk, matt.tobin@ordnancesurvey.co.uk

Recently we have experienced an exponential increase in the demand for geospatial data. This data demand is vast in scope but also has depth in terms of detailed attribution. Through data analytics we believe we can provide affordable and detailed quality management information based on probability of errors. Through our analyses we will consider amongst other factors relative importance of feature types to any given requirement, the complexity of those features as well as any existing knowledge of those features. This will enable us to identify the quality of a new dataset quicker and with more precision to specific features types and objects within datasets. One such product provides detailed extents and access information for 42,000 educational, medical, utility and transport facilities in the UK. The dataset is refreshed on a six-monthly cycle, during which time about 12% of sites (5,000) are expected to have changed. Assuring the accuracy of the changed sites has been found to be a labour-intensive process, yet it is expected that the number and variety of sites held in the data will continue to increase. This project investigated the potential for using automated analysis of the characteristics of changed sites to predict the likelihood that they had been changed incorrectly, in order to direct manual checking to those sites most likely to require amendment, so making the quality assurance process more efficient. The methods employed evolved over three main studies:

The first looked at a random sample of sites changed during April 2014. Scores were assigned for five characteristics: which work area had made the changes to the site; if the site was composed of a single topographic polygon; whether the site was named; whether the site was on a list of high-profile sites; and a visual scan of site extent overlaid on a previous version. These scores were combined to give an overall score for the site, and the sites were then checked manually to determine the accuracy of the prediction.

The second investigated potential means of replacing the labour-intensive element of visually scanning site extents. 258 sites were chosen randomly from those changed during June 2014. In order to permit analysis of changes to site extents, the dataset was joined to a previous version of the same sites. Site extents were checked manually: these were categorised as degraded or not degraded, or for completely new sites as correct or incorrect. The results were correlated with four geometric measures: site compactness; change in site compactness; absolute change in site area; and percentage change in site area.

The third applied the results of the second study to predict results for 697 changed Education sites. Again, the amended sites were joined to a previous version of the data. Sites were flagged for checking: if new; if unnamed; if the number of separate areas composing the site had increased; if site’s compactness had been reduced by more than 0.1; if the site’s extent had been reduced by more than 25%; or if the site’s extent had doubled. The results of the studies showed an improvement in the effectiveness of the automated tests. The tests used in the first study were effective, identifying 72% of the sites requiring amendment; however, this still incorporated a visual scan of sites, without which the tests would have identified only 48% of target sites. In the second study, joining the data to a previous version allowed for analysis of change in the site characteristics. Change in site geometry was found to be more useful as an indicator of error than the current values. In the third study, the automated testing identified 74% of sites requiring amendment. The success rate was comparable to the first study, but with the visual scanning used in the first study replaced by automated measures of geometric change. The overall results are summarised in the table below.

1. Demonstrating improvements in the method.

Study Sites in study Sites requiring amendment All sites identified by method Sites for amendment identified by method 1 148 66 (45%) 62 (42%) 48 (72%) 2 258 43 (17%) 77 (30%) 32 (74%) 3 697 106 (15%) 157 (23%) 78 (74%)

For automated testing, the next steps will be to combine the tests applied, adjusting the scores based on the evidence gained from previous tests. The tests need to be diversified to take account of variations in the types of sites. It will also be useful to look at the significance of the errors that are found (or missed) to provide a better analysis of the impact of relying more heavily on automated testing. For this product, automated testing is not a replacement for comprehensive manual checking. However, trends affecting the dataset – increasing numbers, increasing frequency of checking, and increasing quality – are likely to make comprehensive checking less practicable. The results of these studies indicate that for selective checking, automated tests can be a useful technique for identifying where to direct manual checking to greatest effect.

16:20
Usability of building data for computation of population and for assessment of vulnerability to earthquakes

ABSTRACT. The INSPIRE Thematic Working Group Buildings has proposed usability metadata element regarding two of the use cases (computation of population and assessment of vulnerability to earthquakes)that were taken into account when elaborating the data spécifications for theme Buildings.