Reading Out Information From Unstructured Documents

PDF documents contain valuable information, although this information is difficult to use for automatic processing. Our Extractor application allows a machine-based read-out of information from PDFs, such as, for example, maintenance planning data (MPD) from maintenance specification documents in the aviation industry.

With the help of an intuitive user interface, sections in PDF files are defined which are then automatically read out. A text pattern recognition allows the definition of criteria for the grouping of the read out data. As these documents are normally periodically published by the OEM, once established definitions can be saved as configurations and can be used again.

The extracted data are output as tables in Excel. Furthermore, for regularly recurring extractions the system offers a revision comparison which highlights the changed values in the Excel tables.

The Extractor is designed as a modular system and allows us to realize customer-specific adaptations in a quick and flexible manner. Today the implemented documents contain parts numbers from parts catalogs (IPD) and maintenance planning data (MPD) from various manufacturers and for various customers.

Apart from the product Extractor we offer together with our service partner eDOC Aviation Service nonrecurring as well as recurring extractions including quality assurance as a service.

