The Making of Prizmo 5

Ten years after the initial introduction of Prizmo on the Mac, that came with the innovative idea, at the time, of using a digital camera as a document scanner, we are updating Prizmo on iOS. We’ve been busy with Prizmo 5 for over two years and we’re proud to finally release it.

Prizmo (App Store link, intro video, website) lets you scan documents on iPhone or iPad using the built-in camera or by loading input files, and then create PDFs with embedded text, read the recognized text aloud, or process business cards. Documents with text are OCRed (text is recognized) and the generated PDFs are searchable by contents.

Our hope along the way has been to offer the best document scanning experience on iPhone and iPad. Prizmo 5 for iOS is an entirely new app with a new user interface, improved foundations, and with many features that we’ve brought back from Prizmo Go after some successful experiments. We are so excited to finally be able to share this with our users, and we hope you’ll like it as well.

Version Highlights

Here’s a quick recap of the most important new features:

Focus on speed: in just 3 taps, have your document scanned, cleaned up, cropped, and text-recognized to a multi-page PDF right into the cloud
New best-in-class machine learning OCR options: reliable & accurate on-device OCR, and high-performance cloud-based OCR (same as Prizmo Go), including handwriting
New shooting experience: detection & tracking, CoreML analysis, result preview, VoiceOver feedback
New & much improved document editing workflow
New background processing (OCR + PDF generation) and Auto Upload for PDFs
Smart actions based on document contents (phone numbers, locations, etc.)
Much improved Text Reader with text-to-speech
New Messages extensions to scan & send a document without leaving the conversation
And so much more

Simple or Capable App?

A design philosophy that we’ve followed all along has never been to avoid complexity and complex problems because simple solutions for sure can’t handle all cases. Nor was it to expose complexity up front. Instead, what we’ve tried hard to achieve is progressive complexity, that is, the most common cases should be handled simply, intuitively, with no compromise, and the less common cases, often the more complex ones, should get specific attention and options for when it’s needed. It is also good news business-wise as very often only the most common cases are well handled by the big names, this gives us room to explore.

Prizmo is a great example of this philosophy: internally it is the most complex app we’ve been working on, capable of so many kinds of manipulations, yet it proposes a fast path like we state on its main description: just 3 taps and you get your multi-page, cleaned up searchable PDF available in the cloud on all your devices. That is, one tap to initiate camera & autoshoot documents, one tap to terminate shooting, and one tap to close the document and go back to the app’s main screen.

What happens behind the scenes is this: on the first tap, camera restarts with its last cleanup mode, page size, and OCR language pre-configured. As soon as documents are presented in front of the camera, they will be shot after the “don’t move” progress bar completes. The user can dismiss an ongoing capture just by moving the device away (motion = cancel), or he can pause capture altogether by maintaining the device in oblique fashion (obliqueness = pause). When the user hits “Done” (second tap), the document is opened, with all pre-configured settings applied. When the document is closed (third tap), background OCR and PDF generation takes over, and after a few seconds, the cleaned and searchable PDF is sent to the cloud that has been pre-configured as well.

What is interesting though is that in any of these steps, you can dig in to override some settings like OCR language, cleanup mode, or export to a specific format like Microsoft Word DOCX instead of PDF. Or even automate the whole app from the outside.

New Capture Workflow

3 taps to capture, process, and upload PDF.

The overall workflow has been streamlined and simplified through a completely new user interface. Most options are chosen (and preserved for next use) at the time of capture, to avoid unnecessary downstream manipulations. Shots can be validated as they are being made, and you can immediately retake the picture if needed.

If activated, Prizmo can now autoshoot documents without user intervention when a document is presented in front of the camera, perform page cleanup (texture, lighting compensation) automatically from the user’s preferences, quickly perform OCR as it automatically runs in the background. Thanks to the new Auto Upload feature, the auto-generated PDF is uploaded to a user-defined location in the cloud.

Editing multi-page documents is easier than ever: no more back and forth navigation to the page list, swipe left for previous page, right for next one from anywhere. We tried to remove all previously required (and cumbersome) steps to offer a more natural experience. We did not drop the modal editing state from image processing tools, but instead augmented it.

Non-destructive Editing

Unlike most scanning apps, non-destructive editing is at the heart of Prizmo. After you’ve shot a document, you can apply a number of image enhancements like cropping, or image cleanup to remove uneven lighting, and of course, OCR to recognize text. Throughout this editing session, you can always go back and redo everything in case of error, as Prizmo preserves the source photo that was shot in its native document format.

Most editing in Prizmo is proposed in an orthogonal fashion, that is, most settings are independent of each other or exposed as if they were independent of each other. If the user wants to flatten a curved book page, he just enables that setting. It doesn’t matter whether this has to be done before or after cleanup or crop. Prizmo figures out the best path to get the best results in the shortest time, and sets the user’s mind free to achieve exactly what he really needs.

What is a Document?

Prizmo, as we see it, is a processing app. It lets you create PDFs, vCard, JPEG, TXT, or DOCX files, and that’s it. Once processed, outputs are yours. Like a camera app that shoots JPEG. Prizmo does use a native format to provide non-destructive editing, but we see these as temporary projects while working on the scan, and not as long term storage. We advise users to export files as PDF (or use auto upload), a widely available format, and then organize them the way they want, in a filesystem they can backup, and that they own.

On Prizmo’s start screen, under the document creation options & toolbar, a list of recent documents are shown for quick access. That’s it, recent documents for further editing or export. That recent document list can be expanded and it offers basic file management options, like searching, renaming, exporting. Native Prizmo documents can be stored on the device itself, or in iCloud, where they will be immediately available to the Mac version of Prizmo for further editing while staying in sync with Prizmo on iOS.

We did not want to expose native documents too much, that’s why you get a PDF when dragging from that list on iPad, as we think this is what users expect in the most common case (native documents are still available from Files app, or within the open document itself, in Prizmo). Additionally, Prizmo documents are now “actionable”, which means that based on their content, Prizmo will propose “smart actions” such as calling a phone number, navigating to an address, etc.

Deep-rooted Technologies

Technological advances have been specifically made for Prizmo 5. Our new automatic page detection technique in the camera viewfinder is more robust than before. We developed it with the specific (hard) case of a white document on white background in mind and with document borders that are not perfectly straight. Although this remains a challenging task, we’ve achieved unprecedented progress on that front. See by yourself how Prizmo compares to Apple’s Notes, the white-on-white top performer in our findings (most apps fail at this), in the video below.

Page detection of white page against white background (hard case). Prizmo vs. Apple’s Notes.

Moreover, Prizmo now makes use of scene tracking (like augmented reality) to provide a more fluid experience & feedback. Again, we used our own scene tracking here (same as in our other apps Emulsio, Hydra, and Prizmo Go) as we found out it outperformed Apple’s Vision framework both in precision and energy consumption. As for capture, we’re using a custom-trained machine learning model (CoreML) for structural analysis to offer complementary capabilities such as automatic orientation determination (tap the compass icon to enable it).

Energy consumption of page tracking/detection: homegrown CeedVision as used in Prizmo (left), Apple’s Vision Framework (right).

Document cropping now features an edge repair tool to remove irregular borders or texture that typically occur around the cropped edges. Prizmo’s photo capture has been enhanced with an innovative stabilization technique that increases OCR accuracy. We found it to be more effective than hardware optical image stabilization (OIS) when shooting with Apple’s Camera app or other apps, that is, the image will be better suited to OCR if shot directly with Prizmo. Text polarity handling now lets Prizmo determine automatically lighter or darker text against the background, and a manual override is provided when needed, which makes it straightforward to OCR documents which have all kinds of contrasts and appearances. Document flattening has been specifically added to handle everyday documents like magazines or books, which are curved or not perfectly flat. Prizmo will just handle that and the generated digital copies will be made flat.

Document flattening: content-based analysis & processing. Source image (left), model estimation (center), and flattened result (right).

We hand-optimized neural network computations of the on-device OCR to get the most out of the hardware, and to have fast and accurate output at up to 6x the speed of the baseline. Advanced PDF caching is introduced together with background processing, and that means you can edit just a few pages in a 100-page document and re-generate the whole PDF in just a couple of seconds. About PDF generation, Prizmo now comes with efficient compression techniques (CCITT G4, JBIG2) in case you need to generate very small files of, say 50KB per page (usually 2MB, a 40x reduction), with high-resolution, 600DPI black & white scans. Generation of password-protected PDF documents is also available. New export formats are also proposed, like multi-image and DOCX with preserved document layout, and enable further editing in Microsoft Word, Apple’s Pages, or Google Docs.

Internally, we’ve now moved to a new mathematical foundation for OCR pre- and post-processing that is entirely made in Swift. We first initiated it for Prizmo Go in 2017 and further improved it to handle the more general cases of Prizmo. Swift is very well suited to handle geometrical structures and algorithms both in terms of speed and expressiveness, and this opens up new horizons for future developments.

Optical Character Recognition

Prizmo 5 now offers two distinct OCR options, just like Prizmo Go: a robust, always available, machine learning-based on-device OCR option that runs locally, as well as a high-performance cloud-based OCR option. We’ve been through our OCR modules previously in this blog post about Prizmo Go. Many Prizmo users wanted to have the latest advances that were introduced in Prizmo Go, and this is it.

Unlike Prizmo 4, both new OCRs are based on machine learning. They are quick, accurate, and robust. Among the very best in the industry. The on-device OCR can be finely tuned (binarisation) while the cloud one will typically offer enhanced accuracy, more language choices, and even experimental handwriting support (English only for now). Both OCR get our in-house pre- and post-processing techniques to enhance the readability of the text prior to the OCR and to perform document layout corrections after OCR has been executed. Also note that when the Cloud OCR feature is used, data is immediately removed after processing, as per Microsoft’s Privacy statement. Prizmo is “Privacy first”, and here’s our privacy policy.

Oh, and by the way, we’ll very soon update Prizmo to include a third (on-device) OCR: Apple’s new OCR available in iOS 13. It’s English only for now, and in our testing, it’s pretty good.

Accessibility

Accessibility has always been a high priority in Prizmo. Prizmo is not an app specifically made for blind or low vision users, nor for users coping with dyslexia, but its core capabilities of OCR and text reading make it useful in these situations. Not only does Prizmo offer full VoiceOver support, but it also comes with updated voice guidance during shooting as well as a new description feature (first introduced in Prizmo Go) that lets you know the quantity and position of text in front of the camera, as well as device horizontality.

Moreover, Prizmo’s Text Reader is customizable (text size & appearance) and it will highlight the words while they are being spoken by the device. You can read more about this in the next section.

Text Reader

Prizmo also comes with a redesigned voice-capable text reader. This is almost an app by itself actually. The new text reader uses built-in voices from iOS (a number of voices can be installed from iOS settings), and will read any text you send it, at your chosen rate. Recognized textual documents can be sent to the reader, but other apps can send text to it as well.

The reader has the qualities of an audio player on iOS, and text can be scrolled through with audio navigation buttons (e.g. go to next sentence). With special markers (##), you can define multiple pages that will help with navigation.

The reading experience itself can be fully customized with font, size, margins, etc. We also included the OpenDyslexic font as we are aware that Prizmo is also used by people with specific needs. That particular font combined with word highlighting as text is being read can really help.

Automation & Batch Editing

Previous versions of Prizmo were already pretty good at automation, as they could be invoked from other apps with x-callback-url scheme. It was a great feature first introduced in Prizmo 2 (yeah, a long time ago). Prizmo 5 further improves these automation capabilities: images can be imported into Prizmo with immediate text, image, PDF or DOCX conversions, or as native document (see documentation on GitHub). Cleanup options are exposed, as well as page format and most other options. You can also send texts to Prizmo for reading it aloud. All new OCR options are available as well. Prizmo is thus the perfect companion to the Shortcuts app for automated document processing. Note that it will also work from any other app with URL-based invocations, and the plan is to go beyond x-callback-url in future versions of the app.

Prizmo now also features batch editing for pages. The batch editor lets you propagate settings (like cleanup mode, OCR language, etc.) of one page to some or all the other pages of that document. This is a desktop-class feature and it is a snap for power users that need to process many documents as it avoids repetitive tasks.

Good iOS Citizen

As a good citizen on your iPhone or iPad, Prizmo brings many modern iOS features along the way. A new Messages extension is provided, in addition to Open In and Photos extensions, that enables quick scanning of a document or text without leaving the conversation. Full support for iPad multitasking and drag & drop is also offered.

Continuity and iCloud were previously introduced, and they are still there in this new release. In particular, the Mac version of Prizmo is updated to be able to handle the document format changes of Prizmo 5 for iOS. Even though the Mac does not yet support all the new features of Prizmo 5, the updated Mac version will still be able to interoperate with iOS on shared documents and preserve the new processing options.

Prizmo also provides new Siri Shortcuts to initiate documents or business card scanning through Siri. Handy. And, as mentioned previously, the callback scheme also makes it fully compatible with Apple’s Shortcuts app to automate document processing.

Finally, Prizmo also is a background audio player that will read scanned text on your stereo, smart speaker, AirPods, or in the car.

We’re constantly monitoring Apple’s new developments, and if there are OS features you’d like to have that we currently don’t offer, feel free to reach out (here).

Business Model(s)

Prizmo’s business model comes as 2 separate apps, the regular one, which is free with in-app purchases options, and the Volume Edition as a pre-paid app, targeted toward volume purchasers (enterprise and education).

The Volume Edition (link) is made specifically to address the issue that IAPs can’t be sold currently on the VPP store, and we’d advise against getting that app if you’re a standard user, as this version possibly won’t get all future upgrade offers due to limitations of paid apps. Also, at this time, we don’t propose the Cloud OCR feature in the Volume Edition.

About the regular version of Prizmo (link), as you’ve probably noticed by now, it comes as a free app. That is, Prizmo is free to download. We made that change for 3 reasons: it makes it possible to try the app before buying, it enables upgrade pricing for existing customers, and it is more flexible to deploy future features.

We believe it is important to be able to try out and compare apps before committing to one. This is especially true for deeper and possibly more expensive apps where you want to avoid bad surprises. The free download lets you do just that, try the various features of the app and its user experience, and determine if it suits your workflow. It comes with limitations (text access, watermark) that are removed when you get the in-app purchase.

Finally, Prizmo also offers a subscription option, specifically for the Cloud OCR feature. Some of our users do need an OCR that offers better accuracy, in particular with specific languages. We went with a subscription because this feature has ongoing costs on our side (server / pay-per-use), and this was the most fair and sustainable approach we could find, at the same time offering uncompromised results & accuracy. Icing on the cake: Cloud OCR is able to handle handwritten documents in English, which in turn makes it possible to make searchable PDF from manuscripts or annotated texts. Cool, isn’t it?

Thank you for reading!