Hydra 2: AI-powered camera app that reinvents photography.

Major update of our flagship camera app.

After multiple years in the making, we are releasing today Hydra 2, our next generation camera app for iPhone & iPad. You'll find here some additional background information about the making of this project but also about the wider perspective we're going after. Have a seat, a cup of coffee, and here we go.

Photography, reinvented.

Let's go to the point. Today we are announcing 3 products:

1. a unique camera experience as a fully-featured photo app with both automatic and manual control including normal & AI-based shooting modes.

2. a high-fidelity photo editor spanning from precise colorimetric rendering to boosted high-contrast & personalized photos to suit your liking or particular intent of the moment.

3. an evolutive AI framework for photographic processing encompassing platforms & brands, device aging and latest camera hardware innovations.

A camera, an editor, an AI photo framework. A camera, an editor, an AI photo framework. Are you getting it? 😇

Hydra brings a distinct and exciting new route for photographic innovations on iPhone & iPad (and more), with continuous and inexpensive (or even free) upgrades, with unique new features years after years, and best of all, with new software capabilities that add up with the latest camera hardware enhancements.

About portability, we think it would be awesome to have a reliable photographic software stack that is available throughout the diversity of devices you use, like DSLRs and other phone or tablet brands. Oftentimes, iPhone are said to make better photos than DSLRs and that's true when lighting conditions are good, and the reason is that the post-processing software (denoising, tone mapping…) is much more advanced than with regular cameras. Wouldn't it be great to edit ProRAW-style shooting out of your Canon or Nikon camera? We'd sure like over time to bring this type of advanced processing to all kinds of cameras, and take simultaneous advantage of larger sensors *and* photo enhancement.

That's the path we are proposing you to follow with us, and this new version of Hydra is the first step towards that broader goal of an independent software stack that goes from the sensor to the final picture.

Origins of the name "Hydra"

At the start, "Hydra" was just a pun about the HDR letters (HDR stands for high dynamic range, a photographic technique to capture both dark & bright areas of a scene), which Hydra was all about back in the 2000's on Mac: fusing multiple exposures shot from a DSLR camera. Then, with its arrival on iOS in January 2015, the name made even more sense with the photo fusion (hydra = many heads) technique required to grab more photos from small sensors that it pioneered back in the days and now the multi-faceted techniques it uses to push quality forward. Today, Hydra is taking a new turn as its mythological fighting character is here more than ever, although we hope it will remain gentle & useful to users.

Refined camera experience

When going through the design process of Hydra 2, we knew from the start we would expose more camera control than with the previous version. Indeed, manual control was a highly requested feature. Yet, we still wanted the app to be easy to use, approachable, that anyone could use without asking "what this button does", as it is often the case when you go with fancy ideas: they might look very good & cool and even be sound after you learn them, but we really wanted to remain true to our intuition of both keeping it simple and not augmenting the cognitive load for the sake of design ego.

We kept Hydra's signature: the big orange button that makes it obvious that this is the main action & is easy to hit. We created a dual interface (auto and manual shooting modes), just like a regular camera device. The automatic mode takes care of most things for you, but you still get some useful control like exposure control, while the manual mode will let you interact with all camera settings like ISO, Focus, white balance, etc. This subdivision made a lot of sense, as refined control can be exceptional depending on conditions, or in terms of user preference (pro might spend more time in there). With this dual mode, we felt very confident that everything was in its logical place even for first time users. We rationalized camera options by cleaning up the top bar, removing most of the visual burden to only keep A/M (auto/manual), exposure bias, and possibly flash. We moved other options in a specific panel, which also gives more room for description when needed.

Another goal for this design process was to make it not just look but also feel like a real camera. We had to build something more tactile than what we did previously, we had to feel light and be at home when shooting, turning an app into a trusty shooting hardware. The first step to achieve that was using the whole viewfinder surface as a trackpad for camera control. We call it the camera pad internally. Swiping up or down from anywhere in the preview will adjust exposure. In manual more, swiping horizontally changes the current setting (ISO, Focus…) and the vertical axis is used for secondary settings. We added vibration feedback for every stop (doubling/halving of gathered input light) just like real DSLR cameras have clicking crowns for those, and made sure that real-life tactile distance would match for all settings: ISO, Speed, EV Bias all have the same physical swipe distance & vibration for stops. 12 (visual) subdivisions are used to enable both third and half stops. Coherence of the various behaviors brings confidence when using the app.

1-stop (light doubling) physical sliding distance is the same horizontally & vertically for all settings (exposure compensation, ISO, Speed) and provides the same haptic feedback.

The built-in camera shooting mode was also added to Hydra. At stake, and we discussed that a lot internally at the start, was whether it would make sense to try to bring the standard iPhone shooting capabilities into the app or just keep Hydra for niche situations. Because Hydra 2 would have manual camera control (that the default camera does not expose), and also because we wanted to somehow remove this modal thinking from the user (which app should I use), we went with integration of the built-in camera as the "normal" mode. It later made even more sense when ProRAW was introduced, as we could integrate that into our built-in editor combining the various imaging operators. It is also a path for future evolution, as camera hardware has so many intricate options that we could reveal for that particular mode.

iPad was also in our viewfinder (or is it the other way around?). Camera hardware is also evolving on iPad although possibly a bit behind iPhone, and we see people use it to shoot photos and videos. You shoot with what you currently have with you, that's how photography works, isn't it? Moreover, iPad is the most powerful device of the mobile line and certainly is a workhorse in terms of machine learning. We focused mainly on having a great shooting experience, moving buttons according to how it is held while shooting, and better using the large display. We know there is still some work for iPad (improved support for the left-handed, better photo editor layout), but we are nonetheless very happy with the result for the initial release of the new design and to offer a full blown version just for iPad users.

Technically, we moved to SwiftUI for many parts of the app, with some legacy parts remaining in UIKit. The viewfinder is mostly SwiftUI, as are the top bar, the camera settings, manual controls, and most panels displayed in the app. We can't say the transition was easy, especially with our constraint of supporting both iOS 14 and iOS 15, as lots of things break or require specific workarounds. We also had unexpected CPU consumption linked to the way information flows into the code, and we needed to experiment and learn how to handle these situations using Combine (it was interesting in the end). SwiftUI also brought a number of advantages like easy interruptible animations, absence of all "unsynced" UI bugs, and faster iteration when things are in place. We see this as an investment for the future to learn it and use it now, but it can be challenging at times when you want to design specific appearances on the edge of its current capabilities.

Hydra 2 shooting modes

From its initial release years ago, Hydra's goal was to enable shooting in unusual or difficult conditions. We've pushed that concept further with the help of AI in this new release, but we also added the "Normal" mode for shooting with integrated photo capabilities of the iPhone. This makes it convenient for both use cases (standard and unusual conditions) within a single app.

About the AI-powered modes, they currently cover 3 main categories: HDR (for High dynamic range, scenes that consist of simultaneous very bright and dark areas), denoising for poorly lit scenes (Lo-light), and resolution augmentation (Zoom, Macro, Hi-res). All modes perform integrated fusion, AI-enhancement, demosaicing of multiple RAW input photos.

Hydra shooting modes (right side) compared to input (left side). From left to right: HDR, Lo-light, Zoom.

HDR mode together with its tone mapper that compresses the photo back to display dynamics is best used for scenes with typically bright light and important contrast, like architecture, sceneries, indoor-outdoor combos. The new tone mapper of Hydra 2 improves or even suppresses halo banding versus previous versions, and it results from subtle contrast enhancement to a pretty extreme look. Other improvements versus the previous version are the much shorter shooting times (0.5s vs 10s) leading to fewer artifacts. HDR is a very creative mode, especially with the exposed tone mapper settings of the built-in editor.

Lo-light is as its name suggests a mode to enhance photo quality under dim lighting. This is clearly the Achilles heel of small sensors / mobile phones. Denoising while preserving details in photos is a complex thing, and AI brings some additional capabilities to the rescue. Quality has improved a lot when compared to previous version of Hydra, and also motion-related artifacts are better handled, especially when you consider that a single shot can span over 1 or 2 seconds.

Resolution augmentation modes like Hi-res, Zoom, and Macro share some common foundation and try to recreate higher resolutions both from motion & multiple samples and from prior knowledge. It works best for well lit scenes (as otherwise, noise gets in the way), and will typically enable substantial improvements when compared to traditional digital zooming. It won't replace a real sensor, but it's still very valuable. Contrasted edges and textures typically are well handled in these modes.

All these modes are driven with neural networks, and their results can be tuned with the "Neural style" and "Neural computation" options. The former lets you set the desired appearance and will tune network parameters accordingly including training style (adversarial or not), while the latter lets you choose the architecture that is used, simple or complex, to respectively favor speed or quality.

Final note on modes: this is the current proposal for specialty shooting in Hydra. We are experiencing other shooting modes and have a number of ideas for widening applications of the app, so let's discuss that again later.

A note on photo quality & custom look

Often on social networks, we see people discussing photo rendering comparing the latest iPhone 13 Pro vs the Samsung 13 Pro (or whatever). Let's be honest for a second here, camera are complex hardware & now software with so many parameters that can be tuned.

"Contrast is better on the iPhone"

"Colors are brighter on the Samsung"

"Sharpness is better on the Google"

Showing 2 photos and asking users can go like this: you see the labels under the photos (Apple, Samsung) and some would say I like Apple better. Swapping the labels, the same people would still say Apple (which tells more how much he likes the brand independently of photo quality). Or the other way around. Or, with no labels, you could prefer the first one, but when showing the actual scene conditions, you switch your judgment to the other, finding the former unnatural. It's all very subjective. Yet at the same time this does not mean there are no facts in there.

There is the inherent quality of the device / software stack, for sure. But there are also the default settings chosen by the manufacturer. If one photo is faithful to reality, and the other one is contrast and color boosted, many (as in most) will declare the latter much better than the former. This makes it that the industry is attracted towards boosted photos.

We have done testing about the defaults applied in iPhone 13 Pro, and we have noticed a pattern where both the tone mapper and contrast (boost) have been increased over the years when compared to older iPhone, comparing processed photos to RAW ones which are not that different tone-wise. This produces a nice image, but it is sometimes beyond reality depending on lighting conditions. This is certainly a trend followed by the entire industry: if one maker increases contrast, the other ones have to do it too to some extent otherwise their photos will look blemish. Same thing for TVs. Hydra's defaults try to be balanced, and they can be tested (editor) and changed if needed.

It really is a matter of taste, intent, and purpose among other things. This is why customizable defaults and editing capabilities are important, as taste and intent are not absolute. They vary from person to person, from moment to moment, for scene to scene, etc. You get it.

An evolutive AI framework for photography

One of our goals was to explore photography outside of the predefined photo sandbox offered by the Apple APIs. Apple's stack is remarkable in many ways, offering great photo quality and camera control, although all photo apps using them will end up with similar photos & the same pros and cons as Apple's built-in camera app.

Instead, what we were after here was going directly from raw sensor data and offering the whole processing chain. Think of an operating system for photography that brings it all: demosaicing, denoising, fusion, enhancement, post-processing, encoding. Take control of every step needed to create the final image, in order to give back user customization, algorithm experimentation, or in short, a diversity of processing capabilities.

Along the way of implementing this framework, we noticed that AI could serve to remove the usual artifacts of photos like quantization and limited dynamic range, which in turn, could be used to offer a more accurate 32-bit editing. This allows to re-expose photos, adjusting brightness and contrast, etc. to unusually broad ranges of values that would have destroyed images under standard editing conditions. Expressivity & creativity are increased with 32-bit instant editing that works directly on the hifi image before storage.

Note that it currently means editing must be done immediately. With HDR, we already enabled 32-bit lightmap exports to allow further editing after shooting, or even applying future tone mapper with better capabilities than the current ones. We'll also be exploring ways of delaying the processing for non-HDR photos at some point, as we understand immediate editing is not always practical.

About file formats, we went with the usual ones of course (JPEG, HEIC, DNG) and extended it with Apple ProRAW on compatible devices. As just said, HDR now allows 32-bit lightmap exports too in specific formats (OpenEXR, Radiance) for later/alternate tone mapping. We also added the experimental JPEG XL format, which exposes very interesting capabilities like 32-bit floats, masks, and a great compression, possibly beating HEIC and ProRAW. We wanted to onboard this early to support it because it's great, and also in case it emerges as a valuable tool for photographers.

By cross-platform, do you mean Android?

Yes we do. But not only. Think of all digital cameras. From DSLR to Raspberry-PI, through Android phones. Haven't you thought sometimes that the iPhone takes better photos than your DSLR? With good lighting, this is true indeed because of HDR and other post-processing. We'd like to bring that tooling more broadly. Bring back some new options like you have for music or series for instance.

Android is very popular, in particular here in Europe but also in many other places. And it's motivating to reach a great audience when making products. Android phones have a great diversity of camera capabilities which makes Hydra even more interesting there, and on high-end devices, the camera API is very interesting and enables rich use cases as it give low-level on the camera itself.

If you are interested in a version of Hydra for a particular hardware (camera, phone, other), feel free to reach out, as this helps us sense the need for it.

Artificial Intelligence: the why's.

The question that first comes to mind when talking about artificial intelligence is what is made up versus what is real. Examples of fake imagery are everywhere. Where previously, with good old JPEGs shot from digital camera, "everything was real".

As usual, this is a little more complicated. Traditionally, when shooting with a digital camera, sensors get a partial view of the scene. That is, after light goes through the optics machinery, it is sampled at regular spatial intervals named photo sites. Each photo site location corresponds to 1 output pixel, yet they each receive only one of the required colors (reg, green, or blue). In a typical bucket of 4, each photo site samples a single color (say R, G, G, B) through the color filter array (CFA), while the output pixels need to have all colors at each location (RGB, RGB, RGB, RGB). This means that 2/3 of the information is missing from sampled data, as a 12MP output photo has 12M photo sites but 36M color samples. The photo site samples are also corrupted by noise (photon/electron quantization, thermal sensor noise, optical point-spread function according to color wavelength, etc.), which makes the process of filling missing values even more challenging.

Color Filter Array (CFA) / Bayer pattern. Image by Colin M.L. Burnett (CC BY-SA 3.0)

The missing signal is typically addressed by a process called demosaicing (or debayering) that fills the holes using either prior knowledge and/or correlation between color channels. Note that it can fail (differ from reality) if the missing values are unexpected from the used hypotheses. A number of demosaicing techniques exist, and this is often combined with denoising as you don't want to interpolate (too much) noise for missing values.

Related to this problem is super resolution (pixel dimension increase), where we want to estimate even more missing values between the (low-pass) sampled ones, which is even harder as we need to derive even more data (16 RGB pixels from a 4-photo-site RGGB quadruplet at 2x upscaling).

Hydra 2 neural results @ 2x upscaling (adversarial): Reference input (1st row), ground truth (2nd row), neural output (3rd row).

Traditional demosaicing and super resolution have gone at length over the years to provide good solutions to these problems, but not without flaws. Traditional image processing techniques need to make hypotheses that when met provide excellent results and when not met can generate artifacts and ruin images. The modeling of these hypotheses can be complex (yet incomplete), is hard to translate to memory/computationally efficient algorithms written by hand, hard to fine tune or improve without redesigning from scratch. Yet, these provide control, as we can precisely hand tune behaviors through thresholds for instance.

This is where artificial intelligence comes in. Filling up missing information from surrounding one and prior knowledge is a pretty natural task here. Assuming you can teach it through enough representative examples, AI techniques can be thought of as picking the best estimate from what it sees and what it knows. It is a bit like determining a tons of hypotheses (without naming them) and finding the optimum.

Artificial Intelligence in Hydra 2

2015's Hydra 1.0 had its challenges. It did not make use of AI, and its main purposes were image-fusion-based denoising, HDR tone mapping, and (classical) super resolution of 32MP from 8MP sensors. All this on 32-bit devices, using limited-accuracy OpenGL for GPU computing, and limited computing power and memory. The techniques were as described above, and I often discussed the super resolution one as thinking about the phone as a brush with straight hair that would sample adjacent color (1cm) let's say 10 meters away on photo 1, then again with some offset (think 0.32cm) on photo 2, etc. then combine all that info through subpixel motion estimation into a high-res photo.

We could have ported this technique to 64-bit and Metal to modernize it, but the room for improvement in traditional image processing was at best just marginal from existing implementation, yet highly demanding for development effort. This is why we started to explore AI possibilities about 4 years ago. The goal was mainly two-fold: 1. Increase the quality & capabilities when compared to traditional techniques, 2. Create an AI framework (training + runtime) that would allow future exploration of newer techniques.

So we started from scratch: on the training side, we had to learn Tensorflow 2, to define a physical model for photography, to figure out inputs / outputs for AI-based techniques, determine possible neural architectures and training data & schemes. See by ourselves what works and what doesn't, both through quantitative measures and visual quality assessment after training. It was not unlike terraforming a new planet for us (well, I guess). Many tools exist with their pro's and con's, but we as Apple developer maniacs had to come up with the tooling that would allow sufficient productivity to achieve our goals. Takes important research time & experimentation, scripting, remote commanding Linux computing boxes from a Mac, etc., but ends up being a valuable investment afterwards when things get rolling & quick iterations are needed. Isn't machine learning the future of all things, after all?

Pretty intense Hydra 2 training script running on multiple GPUs.

On the runtime side (iPhone), we had to carefully implement neural network execution on top of Core ML & Metal. We wanted to support all iPhones from 6s to the latest (13 Pro at the time of writing) which can be challenging (think of it, a 110MP floating-point image takes up 1.8GB of memory that most iPhone just don't have). We split the neural architectures into 2 versions, as older phones don't have the Apple Neural Engine (ANE) in their silicon (just standard GPU) and can't execute the more complex ones in a reasonable time. Also, because we are using specific tools in our network architectures, and because it's a fairly new domain as well, we had to write our custom Tensorflow > Core ML converter.

To be honest, with our limited resources, through the years, we just skimmed through the surface to realize a reasonable outcome, all the time focusing on shipping. There's so much yet to be done. It is far from perfect and we know very well where we should inject more work into it. Yet we're very proud of this first step as it opens so many doors. Feels very much like we can go anywhere from here. Quoting Robert A. Heinlein, “Once you get to Earth orbit, you’re halfway to anywhere in the solar system”.

What's Next?

It's been a long project for us during which we had a number of team achievements. We've been transitioning many things over the last few years: Obj-C to Swift, UIKit to SwiftUI, OpenGL to Metal, classical image processing to machine learning, office to home working, Apple-only mind to cross-platform plans. The SwiftUI work in particular was not easy because it is new technology, iOS 14/15 workarounds were needed, as well as bridges for existing Obj-C foundations. Tensorflow 2 and Core ML also needed a lot of investment to master.

We sure need to stabilize the app & fine-tune the photo rendering techniques & image quality to get robust foundations. There are a number of things we'd like to explore like a number of additional new modes, also, the capability of unifying high dynamic range & standard photo fusion together as a single scheme where they currently operate separately. As mentioned previously, an Android version too will be developed. Not forgetting DSLR & other cameras, just need to evaluate the best approach for them. Of course feedback is important, for instance, we discovered some kinds of scene lighting could cause some sort of artifact pretty late in the development stage, and that needed to retrain the whole thing with improved loss functions to address those. We need to iron out these cases through specific feedback to hopefully make the app better & more robust over time.

All in all, Hydra may sound like an ambitious project, and we sure have big hopes for it. At the same time, we're really exploring the new ways of computational photography & learning as we go. All we propose here is for you to join us in this new adventure & discover together where that brings us, capturing beautiful pictures & memories along the way.

Thank you for reading.