Call: Provision of Optical Character Recognition processing and conversion of scanned images and PDF documents to coded data and data extraction
Buyer: European Patent Office, Munich Germany
Deadline Jan 1, 2025
Duration in months: 36
Description: EPO receives daily a large amount of confidential new patent applications, forms, free text letters as well as patent and non-patent literature containing plain text, tables, chemical and mathematical formulae as well as drawings relating to the patent grant process in various formats. These documents need to be OCR-converted into XML-structured data where formulae, drawings and many tables can be captured as images. The requested Services cover XML-conversion, extraction of citations and XML mark-up in line with the EPO’s capturing guidelines. On average the Services relate to 2 million pages per month spread over 3 Lots. Service descriptions and volumes are subject to change.
All data shall be kept and services shall be performed within the territory of one of the EPC Member States. The Services shall be performed at Contractor’s premises.
Lot 1 – Early OCR – Optical Character Recognition concerning patent documents requiring digitisation of text
Lot 2 – Citation extraction covering digitised patent documents requiring citation extraction
Lot 3 – Optical Character Recognition concerning images and/or PDFs of forms related to the patent grant procedure requiring the localisation, extraction and mark-up of information in such forms
Leave a Reply