Go to website
Product-related questions answered
Which information can be extracted from the different documents ?
Out-of-the-box, Klippa extracts information from ID-documents (passports, drivers licenses and ids) and financial documents (receipts and invoices). For passports we extract: full name, nationality, date of birth, place of birth, gender, date of issue, location of issue, valid through date, document number, social security number, photo, signature, MRZ. For drivers licenses we extract: full name, nationality, date of birth, place of birth, gender, date of issue, locations, valid through date,
How does the OCR handle unknown documents, like an invoice from an unknown merchant?
Our model for invoices and receipts is based on rules and machine learning, so that we make sure it's scalable and works on any kind of invoices/receipts even if we parse it for the first time.
Can I upload more documents at once to the API and queue them?
You can upload multiple documents at once to the API. Up to a specific document volume, you won’t be queued and can be processed simultaneously. We work with a Kubernetes cluster, which orchestrates the amount of workers and scales down or up automatically. The amount of workers is dynamic throughout the day. By default, Klippa runs 40 workers at the same time. Once we notices that more than 40 workers are necessary, we can scale up the workers.
The documentation states that no data is stored, how does rejection of duplicates work if no data is stored?
Duplicates don't get rejected necessarily, rather just detected. From every document we process, we generate a hash. That's a random string of letters and numbers, which on itself has no reference to the original document. This hashes will help us detect on duplicates, but we can not retrieve any information.
What kind of documents does Klippa support for OCR processing?
We support a wide range of documents for OCR processing. Out of the box, we support invoices, receipts, and identity documents, such as passports, ID cards and drivers licenses.
How does the SDK differ from the rest API?
The API is really the bare service of sending in a document and receiving back the structured information within seconds. The SDK includes a camera module, that helps our clients to help their clients to enter good quality data. The SDK can be connected to another OCR provider, it can be delivered without OCR connection (just storing the photo locally on the phone) or with the Klippa OCR API connection built-in.
How can I give feedback to the Klippa system and can Klippa automate the feedback to our system?
There are two options: Manual verification from our side: you send us the documents and we annotate the correct values, and feed this to our system. Manual verification from your side: You do the annotation of each document and sends us the set of documents. For the annotation, you can use Klippa's manual verification platform. We feed your input to our system.
Is the OCR capable of processing other docs than invoices, receipts and ID-docs, like logistic-docs f.e. and how would it work?
We are always happy to dive into any use case and take a look at the documents our clients want to parse. Today we are fully operational for invoices/receipts and also identity documents. For other type of documents, when they are always structured the same way, we can create a template-based model quite easily. When they are not structured, we just need a couple of examples to determine what it would take us to build a model for these documents.
Can I link the OCR-API to external databases?
Yes. You can send additional information along with documents under the user data field to enhance the recognition, for example on merchant names. You can also send us databases. You can create external databases by using external user data sets. We have to enable the link to the external user data. Instructions on how to do that is documented in the API documentation. We can also connect to other external databases through API.
Is the Klippa OCR-API White-Label?
By default, the Klippa OCR-API is a white-label solution because it can be implemented seamlessly into your software and you or your user's won’t notice that they are using an external service.
What is the default output of the OCR-API?
The default output of the OCR is structured JSON. This allows for fast processing of the extracted data. On request, we can send the structured data in other formats as well, such as XLSL, PDF, CSV, UBL.