It’s been a while since you have heard from us, but rest assured we have not been twiddling our thumbs! Indeed, two weeks ago on 27th April, we released our brand new app, Prizmo Go.
If you haven’t seen Federico Viticci’s review of Prizmo Go on MacStories, please have a look, it really goes in depth about the app. We are all very proud here to “have gained a permanent spot” on Federico’s iPhone, such an achievement ?.
Also, we’ve already discovered a number of podcasts from the accessibility communities that we’d like to share here:
- AppleVis: https://www.applevis.com/podcast/episodes/get-ocr-results-go-prizmo-go
- MacAccessibility (M12y): http://maccessibility.net/tagged-podcast.html
- Blind Abilities: https://blindabilities.com/bateam/jeff-thompson/prizmo-go-ocr-app-with-voiceover-accessibility/
We’ll go through the main features of the app, as well as give a glimpse at some related technical bits.
Instant Text Capture
Prizmo Go allows you to take a picture of printed text, with real-time text highlighting in camera preview, which is then recognised in a split of a second. You can then interact with the text, copy it to other apps or to your Mac. Say goodbye to that tedious retyping. Feels like magic!
The real-time highlighting combines both detection of text lines and motion tracking that is pretty common in augmented reality. It makes those lines stick to text contents in the live video feed. Running this on the device consumes a fair amount of power because CPU, GPU, and camera all run at the same time. We noticed this during development, and decided to have a “Low Power” feature in case battery is too low or when you want to scan lots of text. This mode removes the visual feedback, but does not change OCR output.
Stabilized shooting mode also brings its share of innovation. Sharp image is really what brings the best text recognition results, and we really wanted to improve that part. What we did previously in our other apps was that we tracked accelerometer to determine when the device stopped moving. Not bad, but this was a very rough prediction of image quality because of delays, unpredictable and instantaneous shaking gesture, etc. In Prizmo Go, we came up with the idea of continuously tracking image sharpness of the video feed and pick the very best image. That came with a number of implementation complications, but in the end, it turned out to be effective.
Rich Interactions with the Captured Text
You can swipe your finger to select text parts directly from the picture, and that text can then be copied or read aloud. Other interactions include tapping to browse any printed website address, calling phone numbers, triggering Mail app from an email address and many more.
As you would expect, this is implemented using iOS’s 3D Touch Peek and Pop and Data Detectors. This is a great illustration of building upon OS features to provide complementary features that make a lot sense in this particular context.
Copy/paste to other devices is a feature that was introduced with iOS 10 and macOS Sierra. It’s a great feature, and, with Handoff and iCloud, it’s a feature that seamlessly bridges separate devices together. I remember when Universal Clipboard was announced at WWDC’16, I was there with my colleague @BunuelCuboSoto inside the Bill Graham Civic Auditorium, and this single feature is actually the one at the origin of Prizmo Go’s concept.
Accessibility & VoiceOver
Prizmo Go comes with enhancements specially built for VoiceOver users. We wanted to create an app that could become a trusted companion in case you need help reading printed documents. We hope it will improve the lives of visually impaired and blind users on a daily basis.
We had prior experience in developing a VoiceOver app with the standard version of Prizmo. However, this new concept of quickly and accurately capturing text had even more potential with vision accessibility, thanks to its focused use case. We took this opportunity to further improve the techniques.
For instance, Prizmo Go has some built-in capabilities of determining text configuration. But when used with VoiceOver, we go beyond the standard mechanism. Prizmo Go will test more text pose hypotheses which might be unconventional when used by sighted users, but that make sense otherwise.
Prizmo Go will also trigger automatic audio playback of the text when VoiceOver is active, and the entire text is automatically selected. The text panel is also maximized to give more space in that use case to better interact with the text. Finally, we also enabled the same energy optimization that we use in Low Power mode, because we expect VoiceOver users to use it more frequently in scanning sessions, and the experience is better if battery lasts longer.
The app features an advanced image processing pipeline (cleanup, perspective correction, dynamic rescale). A new feature of Prizmo Go is that there is now a new Cloud OCR option, in addition to the built-in one.
This Cloud OCR option works from remote servers, as you would expect. It is implemented over Microsoft Cognition Services. The good news is that it’s very accurate and it offers 22 languages at this time including Chinese, Japanese, Korean and Arabic! This is the first time a ‘Prizmo’ app can handle non-Latin languages, and we took this opportunity to make sure the whole app UI would behave correctly in a right-to-left language. Another difficulty was to handle Japanese input, because it can be horizontal (left to right) or vertical (right to left), and we had to implement specific parsing here. Note that the OCR engine still cannot handle both directions on the same scan, but we sure hope a service update here that would allow that.
The reliable built-in OCR allows the use of 10 languages and does not require an internet connection. It’s the same OCR as Prizmo actually, and we tuned it some more to automatically handle unaligned text. Another improvement that we have worked on is the automatic contrast type selection: bright text on dark background or dark on bright is automatically detected throughout the document where it was previously necessary to manually pick the appropriate mode. Document language can also be automatically determined from text contents in this update.
Most of our apps so far have been paid apps, except Emulsio which is a free app that offers a one-time in-app purchase to unlock the main feature. But Emulsio is more of a niche app, for specific use cases, where Prizmo Go addresses more situations. This is the first time we try free + IAP for this kind of app. The main motivation has been that users want to try before they buy, as it should be. Because App Store does not provide that feature, we implemented it through in-app purchase.
Users download the app for free, and try out capture and OCR on some text. Cloud OCR can also be tested after getting the free trial units. Text can be seen in the application interface, but to actually use the text, users need to purchase the text Export Pack.
After using the free Cloud OCR units, users will need to purchase additional ones by getting either the small pack (100 units) or the large one (1000 units), or they can stay with the built-in OCR if that’s enough. Each user is responsible for his Cloud OCR balance, and that way we can cover the service costs. We also discussed a subscription option for Cloud OCR, but did not retain it at this time.
We have been working on Prizmo Go for many months, as you would expect. It’s one of our first projects where we have committed to using Swiftinstead of Objective-C. It’s not 100% Swift because we have many reusable bits of code that do some real work and that we don’t want to rewrite now, but most of the new code is Swift. We are now using Swift both at the application architecture level (Model / View / Controller) and low-level algorithms. We also factor out some utility functions and classes that can be used in other apps.
As @benoitsan first noticed, a clear advantage of Swift was fewer crashes. Almost no crashes at all. It was so obvious that we clearly saw it throughout the first betas (we use HockeyApp to collect our crash reports). We have about 100 crashes for a 6-figure user count (in 2 weeks!), we’ve never achieved that before. Crash proofing is a remarkable feature of Swift, who played an important role here.
The code is more expressive, there are less files (no headers), but navigation inside classes can sometimes be difficult. The other difficulties we had were mostly due to the slow compiler, slow IDE completion, variable lookup in the debugger, and lack of static libs to organize code. Crossed fingers for Swift improvements at WWDC’17!