Extraction of data from documents with automatic controls

Using the Station platform, you can automatically extract data from any document, automating checks and comparisons with data present in other applications.

Scroll down to find out more about data extraction using Station – Document Exchanger
PDF documents
EDI files
XML files
Electronic invoices
CSV files
Any other type of document

Extract data from any type of document using Station

The RPA Station platform can automatically execute repetitive back office operations and processes, interacting with documents and company applications just as a user would:

  • Extract information from any type of document, leveraging various tools (mapping engine, OCR, preloaded dictionaries, scripts…)
  • Conduct checks and controls on the data extracted, also comparing them with the data in other company applications
  • You can transform the output into any format necessary, be it XML, EDI, txt or a spreadsheet.
  • It interfaces with any IT system to automate the data entry of the information extracted

Using Station, you can automatically extract data from PDFs.

The most simple type of PDF to manage is a vectoral PDF, which contains text that you can select and search: Station directly and easily obtains the text from these digital native documents.

Instead, for hard copy documents, you can scan them in PDF format and, using the OCR (Optical Character Recognition) feature, Station will transform the image of the document into a format where the text can be recognised and used to extract data, just like native PDFs.

To extract information from PDFs, the platform leverages a powerful “mapping engine” that defines the (static or dynamic) rules for searching for and extracting data from documents.

After extracting the data, Station can conduct additional checks and comparisons between the data extracted and those in other applications (for example, automatic reconciliation of the data in the purchase invoice and orders to transport documents), to then automatically upload them into the company software or process the information to create a document in a new format (e.g. creating an XML file for electronic invoicing, starting with the invoice in PDF)

The RPA Station platform can work with EDI files, encoding or decoding them:

  • it can extract data from various types of EDI standards to automatically enter information into the IT system or to create a new type of document using the information from the EDI.
  • it can transform the information in the company applications into EDI format (EURITMO, EDIFACT, ODETTE, etc.) required by the company’s business partners.
  • it converts from one standard to another (for ex. from ODETTE to EURITMO), a feature that is often necessary for companies that have adopted a different internal standard than the one used by several of its business partners.

To extract data from EDI messages, the Station platform uses preloaded dictionaries containing the definitions of the main messages in the various EDI standards.

XML (eXtensible Markup Language) is a commonly used language to exchange structured data between various applications. The Station platform can read any type of XML file, extract the data and conduct the subsequent processing, for example:

To extract data from XML files, the Station platform uses scripts and provides you with specific classes that simplify the reading and extraction of data from XML files.

Using Station, you can extract data from electronic purchase invoices to automatically record them in your administrative software/ERP:

  • Recognise and extract data from the XML file of the electronic invoice.
  • Automatically reconcile the invoice, conducting the necessary comparisons and checks on quantities and prices against the warehouse bookings, supplier orders and delivery notes/transport documents.
  • Notify the right user of any discrepancies and simplify the management of the problem identified.
  • Upload the data to the IT system, integrating without the need for the user to intervene.

Using the Station platform, you can process documents in any format to extract the information and enter them into the IT system without manual interventions, or to convert the document to a new format (for ex. from PDF to XML or from PDF to spreadsheet).

In addition to traditional PDF files, Station can also process non human-readable files such as XML (including the specific format for electronic invoicing) and EDI messages.

You can also process any type of text file: TXT, XLS (spreadsheets), CSV or RTF.

How does Station extract data from documents and files?

Station can extract information from any type of document, leveraging various tools:

To extract data from any type of document, such as PDF, TXT or CSV, Station provides you with a powerful mapping engine, which, when configured, sets up both static and dynamic rules for data extraction, starting from a file selected as the template.

If, for example, you want to extract data from an order or an invoice in PDF, you will use a visual mapping tool specific for PDFs, downstream.

Using this tool, for each document template of a supplier/client, you can map the points where the various information is located, define what type of data is present (for ex. whole number or figure) and how the data is to be processed to import it into company applications.

If the company receives hard copy and/or scanned documentation, Station’s OCR feature transforms the image of the document into a format where its text can be recognised and used by the platform to extract the data and enter it into the IT system or carry out other processing.

For optimum results, a high-definition scan will obtain the best possible recognition. Station can assess the reliability of the recognition carried out, character by character.

Once the character recognition and extraction phases are completed, you can set up a phase of data validation: for example, you can check whether a specific item of data extracted is present in your administrative software (for ex. if an item code in the document actually exists in your IT system) or ask the user to validate the data extracted.

Once the document has been processed and the data extracted, the document is reduced to insert it in the company’s document management system, which can be easily integrated with Station.

Due to the smart recognition feature, Station can carry out an automatic mapping of the documents, identifying, in full autonomy, the information contained in the header and footer.

This powerful tool can be used both on vectorial documents and hard copy documents that are scanned in and transformed into document that can be processed by the OCR feature.

To automatically map the document, Station searches for specific “labels” (defined in a dictionary that the user can populate, for example, “date”, “invoice date” or “invoice dated…”) linked to a specific field (for example, the date of the document), in order to locate the point where a specific item of data is located and autonomously understand what it is in order to extract it.

The system requires an initial learning phase, where Station analyses a number of documents defined by the user and proposes a validation form for the information extracted.

In the event of dubious results, you can specify what the item is or add the missing data, allowing the software to refine its ability to correctly identify the fields.

You can also define the rules in the dictionary, to make it easier for the software to locate the field (an example rule: in 70% of cases the field is located to the right of the label).

Once the self-learning phase is completed, the system can operate in full autonomy.

To extract data from some types of files, such as those in EDI format, Station uses preloaded dictionaries containing the definitions of the main messages in the various standards.

On configuration, just define which standards to apply in reading the file and the visual mapping tool will show you which type of information was automatically identified in the message.

This considerably simplifies the initial configuration phase for the extraction of information.

For some file types, such as XML and Excel, the Station platform uses scripts for extracting data: it provides you with specific classes that simplify the reading and extraction of data from files.

Discover some of the Solutions created using the Station platform

Discover some of the Solutions created using the Station platform

Success stories: some projects implemented hand-in-hand with our clients using Station

Success stories: some projects implemented hand-in-hand with our clients using Station

Want to know more about the features of Station, the platform that automatically extracts data from documents?
Fill out the form to ask for more information.
Type the words you want to search...