Recording ARKit Sessions
In this post we’ll discuss an experimental approach that can be used to record and replay AR sessions powered by ARKit on iOS.
The story so far
Unveiled at WWDC 2018 and officially released alongside iOS 12, ARKit 2.0 includes plenty of new features: world map saving, object detection and environment lighting probes to name a few. It’s clear that Apple is serious about augmented reality, and we can expect even more from them in this area in the coming years, especially if rumours about planned hardware are to be trusted. Of course Apple would also like third-party developers to be serious about AR, and it’s great to see many of them experimenting with ways to bring value to their customers using the new technology.
Unfortunately, one important aspect of developing for AR is still missing from Apple-provided tools: being able to capture the data collected during a live AR session, and feed it back into ARKit later to recreate the same environment, usage scenario and scene understanding as it builds up. Recording AR sessions can be useful during both development and testing: to save time while iterating on user experience, and to automate testing once a feature is complete. Of course, an experienced developer might be able to add such capabilities themselves by building an abstraction on top of ARKit and replicating its behaviour with mock data. However, such a mechanism would be quite challenging to implement and maintain, especially when taking into account integration with SceneKit and SpriteKit that’s available in ARKit out of the box.
Thankfully, that may not be necessary, as it appears that the ARKit team have already built such a tool for us to support their own automated quality assurance needs.
Disclaimer: Functionality discussed below is accessed through SPI (System Programming Interface, also known as private API). It’s not guaranteed to be reliable, or stay available in any form in future versions of ARKit. It definitely cannot be used in production versions of apps distributed on the App Store.
Ever since its first public release, ARKit includes an internal feature that allows recording sensor data captured in the session to a file (referred to as “sensor replay”), and using this file later to simulate a live session. This functionality and its SPI was enhanced in both ARKit 1.5 and ARKit 2.0, and at this point it’s conveniently controlled via specially constructed
ARConfiguration objects that the
ARSession is run with:
@interface ARConfiguration () + (ARConfiguration *)recordingConfigurationWithConfiguration:(ARConfiguration *)templateConfiguration recordingTechnique:(ARRecordingTechnique **)recordingTechnique fileURL:(NSURL *)fileURL; + (ARConfiguration *)replayConfigurationWithConfiguration:(ARConfiguration *)templateConfiguration replaySensor:(ARReplaySensor *)replaySensor replayingResultDataClasses:(NSSet<Class> *)resultClasses; @end
- A recording configuration instructs the session to record a replay. The
ARRecordingTechniqueobject returned as part of creating such configuration allows controlling the recording process.
- A replay configuration instructs the session to use the data from a replay file instead of interacting with the hardware. The
ARReplaySensorobject is created from a file URL and encapsulates the replay, allowing the consumer to control reading of its data.
A replay includes all the information necessary to run a session: camera feed, accelerometer and gyroscope readings, and more. During recording, this information is encoded as a QuickTime video with synchronised metadata, so it can even be previewed in a video player. Most importantly, both recording and playback of a replay are transparent to the
ARSession consumer: application can interact with the session by examining frames, using hit testing, adding custom anchors, etc., as it would normally do. As a result, adding support for consuming replay data in debug configurations requires minimal extra code by the app developer.
To demonstrate how the replay SPI can be used, we’ve built a simple example app available on GitHub. You can run it on supported iOS devices to record an AR session into the app’s Documents directory, and then replay a previously recorded session, or even extract the replay using the Files app.
The following video shows the replay in action: a session is recorded, then played back. In both cases, data produced by the session (camera transform, feature points, detected planes, etc.) is streamed live to a Mac for visualisation.
Even though at the moment this functionality is not available as a public API, the sheer simplicity of using it could be enough of a reason to try it while developing or testing your AR app or feature if it radically simplifies the process. One notable limitation of this approach, however, is that replay sessions don’t seem to work in the Simulator, just like normal sessions. Quite possibly, they require similar hardware features, such as Metal support.
How does it work?
To understand how sensor replays fit into ARKit’s internal architecture, let’s look at its simplified overview, as can be observed externally.
ARKit abstracts hardware and lower-level frameworks like AVFoundation and Core Motion in a form of “sensors” and data that they produce. At the moment, there are 6 types of sensor data:
- Colour image (from back or front facing camera)
- Accelerometer reading
- Gyroscope reading
- Device orientation (from magnetometer)
- Face metadata (from TrueDepth camera)
- Depth information (from TrueDepth camera)
Each sensor can provide one or more data types: for example, a TrueDepth camera provides both colour image, face metadata and depth information.
Sensor data is required to run different “techniques”: encapsulated units of asynchronous processing performed in the session (e.g. world alignment, or exposure light estimation). The session configuration determines which techniques to execute, and thus which sensors are required for it.
While the session is running, data received from the sensors is fed into the active techniques, which process it (with some degree of parallelism) and produce intermediate results. Those are then collected and combined into what we see as session’s output:
ARCamera objects, feature points, detected planes, and so on. This process repeats as the new sensor data arrives from the hardware.
Replays work pretty well in this model:
- During recording, the recording technique “eavesdrops” on the session’s sensors and encodes their readings to a replay file.
- During playback, the replay sensor is added to the session instead of hardware sensors, thus acting as a stand-in that provides all input data. At the same time, essentially the same set of techniques is used to process it.
It’s important to pay attention to the “template” configuration specified when creating both the recording and replay configurations. While it’s used to derive techniques that run in each session, it also informs the set of required sensors. Thus it must be essentially the same for recording and playback – otherwise the replay file may not contain the expected sensor data.
Internally, sensor replays are used by what seems like a regression testing tool:
ARQATracer. This allows running a replay session and encoding its consumer-facing output (frames, camera orientation, detected planes, etc.) to another file. It wouldn’t be surprising if the ARKit team have prepared a set of replays and run them in an automated way to compare the results against a set of reference data, thus continuously ensuring the quality of ARKit’s algorithms.
I hope that one day ARKit will provide a public API for the session recording and replay functionality, as it gives us, third-party developers, the power and convenience to build and test AR experiences without having to resort to manual repetition of usage scenarios (rdar://4474874). This can be especially important for AR apps intended to be used outdoors, though to be fair, at least for now we have one extra reason to spend more time outside 😉