Immersive audio extends conventional surround sound technology into the third dimension. It adds height and more to the audio experience.
This FAQ presents some of the basic concepts of immersive audio, looks at how ambisonics relates to immersive audio, and closes by looking at a new recommended immersive audio system design practice.
Immersive sound transforms multi-channel surround sound using height or presence channels to create a dome of sound around the listener. It also uses a new type of encoding called object-based audio to direct specific sounds, like a person’s voice or an automobile passing by, to a precise location within a three-dimensional (3D) space. It considers the 3D position of the listener. Newer implementations also consider the six mechanical degrees of freedom (6DoF) for the listener’s head in 3D space, such as being tilted up or down or to the left or right in real time (Figure 1).
3D spatial audio and ambisonics
Ambisonics is the most common spatial audio format, but it’s not the only one. Other options include Vector-Based Amplitude Panning (VBAP), Multiple-Direction Amplitude Panning (MDAP), and Distance-Based Amplitude Panning (DBAP). Higher-order ambisonics (HOA) have been developed to support virtual and augmented reality environments.
Ambisonics is a full-sphere sound format and includes sound sources above and below the listener as well as around the horizontal plane. There’s a range of implementations for ambisonics, all of which require multiple microphones to capture sound from different directions. First-order ambisonics (FOA) is the simplest form and uses four microphones but does not provide good vertical spatial resolution for the listener.
The use of HOA moves to the second, third, fourth, and higher orders of ambisonics. Third-order ambisonics require at least 16 microphones and provide a significantly improved vertical spatial resolution. An ambisonics recording microphone consists of four microphones encased closely together in a ‘capsule.’ These capsules capture cardioid polar patterns, and the signals they record are called the ambisonics A-format (Figure 2).
After recording, the A-format is encoded to B-format by a simple matrix to the WXYZ channels. The B-format enables multichannel audio to be produced and moved from location to location without considering the specific speaker arrangement that will be used for playback. The B-format audio channels include the components of the sound field that are combined during a subsequent decoding step that supports the specific playback environment and arrangement of speakers.
The B-format supports immersive 3D audio. It captures information from all directions and can be rotated as needed. For example, in a VR application, B-format data can be rotated into position before decoding based on where the listener’s head is positioned and where it’s pointing.
Immersive audio design recommended practice
The Custom Electronic Design & Installation Association (CEDIA) and the Consumer Technology Association (CTA) have jointly developed. CEDIA/CTA – RP22 Immersive Audio Design Recommended Practice. RP22 is the first recommended practice to include an objective set of performance criteria for home audio systems. It identifies 21 design parameters that affect audio system performance. It sets criteria for four levels of performance:
Level 1 – conveys basic artistic intent.
Level 2 – a higher level of performance that more accurately conveys artistic intent.
Level 3 – meets or exceeds reference commercial cinema standards.
Level 4 – achieves the maximum level of performance across every parameter.
Summary
Immersive audio is based on concepts of spatial audio that can place specific sounds at precise locations in the 3D space around the listener. It can also consider the 6DoF of the user’s head for added realism. Ambisonics is an important tool for the development of immersive audio, especially HOA. CEDIA/CTA – RP22 was recently released to enable developers and users of immersive audio to compare different implementations objectively.
References
Higher-order ambisonics in Dolby Atmos productions, Zylia
Immersive Audio Design Recommended Practice (CEDIA/CTA-RP22), Consumer Technology Association
Open-Source Spatial Audio Compression for VR Content, Society of Motion Picture and Television Engineers
What is immersive sound?, Storm Audio