Download PDFOpen PDF in browser

Crawling Based Data Collection and Data Pre-processing

10 pagesPublished: March 22, 2023


This paper focuses on the design and architecture of an application that will programmatically pre-process data. This application aims to extract and provide clean airport terminal passenger throughput data within the United States. A different application will then use this data to forecast passenger throughput models via a friendly user experience webpage. The purpose of forecasting passenger throughput throughout the U.S. airport terminals is to improve the Transportation Security Administration (TSA) checkpoint operations. Such as increasing TSA personnel in security checkpoints when the forecast expects a high volume of passengers during the holiday season. On the other hand, decreasing the personnel workforce in other airport terminal checkpoints where they do not forecast a high passenger throughput. TSA seeks to improve its personnel scheduling using this forecasting model. In addition, the forecasting model will improve passenger satisfaction with non-excessive wait times at security checkpoints and does not jeopardize the safety of passengers with adequate security protocols.

Keyphrases: Crawling Based Data Collection, data pre-processing, PDF Extractor

In: Ajay Bandi, Mohammad Hossain and Ying Jin (editors). Proceedings of 38th International Conference on Computers and Their Applications, vol 91, pages 76--85

BibTeX entry
  author    = {Alexander Morales and Jiang Guo},
  title     = {Crawling Based Data Collection and Data Pre-processing},
  booktitle = {Proceedings of 38th International Conference on Computers and Their Applications},
  editor    = {Ajay Bandi and Mohammad Hossain and Ying Jin},
  series    = {EPiC Series in Computing},
  volume    = {91},
  pages     = {76--85},
  year      = {2023},
  publisher = {EasyChair},
  bibsource = {EasyChair,},
  issn      = {2398-7340},
  url       = {},
  doi       = {10.29007/cqkw}}
Download PDFOpen PDF in browser