Let’s Talk OCR

We are introducing this week version 2.0 of our Prizmo Go app, for instantly capturing text from the physical world. Prizmo Go got a great welcome last year when we released it, and was seen as a great and fresh execution of a number of technologies to simply achieve the goal of capturing text instantly. This year, we wanted to revisit its core feature, namely, text recognition.

As an anecdote, when we were working on version 1.0, in late 2016, we had only planned to have optical character recognition (OCR) performed on the device itself. We hadn’t thought a second about the cloud processing. At that moment, we heard of Azure’s offering and thought we’d give it a shot anyways, and we were pleasantly surprised by the accuracy of it. It was so good actually that we changed our plans to integrate it into 1.0, despite being late already. That’s the fun part in small companies, like, let’s build that major new feature 3 months prior to release ?, right?

Modern neural network techniques and deep learning had found their way into the text recognition world, giving it cognition capabilities that come closer and closer to humans. With previous generation OCR engines, we could achieve extremely good results when the image was of high quality and well framed. But this would degrade rapidly when the conditions were either non optimal or unexpected. Sometimes in odd ways, as we humans could essentially still read the text clearly. Well trained deep neural networks (DNN) come with that human-level capability of generalising to non-optimal conditions.

Prizmo Go 2.0 features a new neural network-based OCR that runs on the device, a new handwriting-capable OCR in the cloud, automatic cloud-based text translation, as well as the addition of a new subscription model. You can have a look at this video trailer to quickly review the new features:

Let’s walk through these new things in a bit more detail.

On-device Neural Network-based OCR

With the advent of cloud services that perform so well, we could have chosen to go only with those. Even though we are embracing them fully, we decided to double down on the integrated OCR offering as well. Why? Because it has complementary advantages, such as constant availability and better privacy.

Prizmo Go 2.0 comes with our CeedOCR text recognition library that is built on three things:

smart image preprocessing & layout analysis
a custom build of Tesseract (open source OCR library by Google)
text results & layout post-processing

First, image preprocessing is important because shooting text image with a smartphone is so variable, going from a perfectly lit and framed image to one that even humans cannot read. Building over our 10-year experience in this field, we further improved it to generate such images that would specifically please neural network-based OCR, in turn improving the overall accuracy.

Second, as of last year, we noticed the Tesseract project was experimenting with LSTM (this is a form of recurrent neural network). We found that pretty exciting after witnessing similar cloud-based technologies and we gave it a test run. Despite some quirks, the gain in accuracy was immediately obvious. That got us thinking how this could fit in our apps, even though there were still important challenges. DNN-based OCR typically is more resource hungry than other OCRs, but specific optimizations and the power of recent devices are making it a reality.

Degraded image recognition results. **Left**: traditional OCR, **right**: DNN-based OCR.

Last, post-processing OCR results is typically needed, to filter OCR errors, or correct layout errors. We started this with Prizmo Go 1.0, where we built some mathematical foundations in Swift, and moving to the new OCR engine was the occasion to further build on that. We continued with Swift because it matches geometrical and math computations very well (thanks to generics, operators, and protocols), and we thought that would bring some more future-proofing to our libraries.

Photo-based Handwriting Recognition

Selecting through handwritten text. Ain’t that cool?

Let’s start by a disclaimer: 1. it only works in English, 2. quality of results are highly dependent on writing style.

For the last 20–30 years, we thought that handwriting recognition were, in the best scenario, something that would be solved “later”. In fact, there are two cases: when writing directly on the device, or the photo-based one. The first case is somewhat easier, because not only the final image is available, but also the temporal sequence of inputs of “how it is made”. Some apps have been handling this very well is the last few years, just like the Nebo app, great stuff. In the latter case though, where just the final image is available, it’s been mostly impractical. Except for deep neural networks, of course.

Microsoft announced that they would be tackling this problem in the last few months, and we decided to have a test run in Prizmo Go. Why? Many people still need real paper to express their creativity. This is fine. And having tools to bridge the physical world with the digital one sure makes sense. It is very early and sure it is not perfect, but when it works, it is so impressive. This was not possible just 6 months ago, and now, look at this! We expect it to improve over time both in terms of accuracy and language support, so stay tuned!

Automatic Text Translation

In our reflection of what users would like to do with text, translation was high on the list. Typically useful when travelling, but also on other occasions.

Prizmo Go 2.0 supports this cloud-based automatic text translation feature which works in 59 languages. Note that text-to-speech and accessibility are still very well supported in this use case, bringing new opportunities in the app.

As this too is a cloud operated service, we can expect it to evolve over time. In our testing with French/English, we could witness that the modern neural-network translation variant is available, and it’s doing impressively well (available in 21 languages, as of writing). Just have a look at the English translation of this business magazine article written in French. Well written, same meaning, few mistakes. Added the German translation as well, please let us know if it’s good too.

French text from magazine is OCR’ed, then translated into English and German using DNN.

Again, we expect the translation to gradually improve over time. We’ll also propagate the news when Microsoft updates the service with latest generation artificial intelligence on additional languages.

New Business Model

Some closing notes about Prizmo Go. The new neural-network based, on-device OCR replaces the old one, and is provided free of charge to existing users. It can be tried out for free, but text sharing requires the Export Pack one-time in-app purchase. The cloud-based handwriting recognition is available as part of the Cloud OCR option of Prizmo Go, under the same terms, that is, each invocation requires a unit (these can be purchased in 100 or 1000 packs).

In addition to existing options, we are introducing a new business model in Prizmo Go, the Premium Plan subscription. With Premium Plan, all features of the Export Pack (text sharing, copy/paste, smart interactions), as well as unlimited Cloud OCR operations are provided. This makes it easier to manage as there is no need to purchase Cloud OCR units and it makes it available at a fixed annual or monthly price, which is great for enterprise users. The cloud-based translation service is available to Premium Plan subscribers only. Furthermore, users who had already purchased the Export Pack are offered a special introductory pricing on Premium Plan (60% off first year, yearly plan only).