Immersive audio uses various channel- and object-based techniques to deliver a high-quality listening experience. There are three general categories of spatial audio: channel-based audio (CBA), object-based audio (OBA), and scene-based audio (SBA) — that’s the next generation of OBA.
CBA is the simplest form of spatial audio, but it’s not immersive. The basic form of CBA is a two-speaker stereo. Adding a third speaker in the center can improve the audio experience by acting like an anchor that helps to improve the stereo experience for listeners not located in the optimal location. The most common CBA technology used today is surround sound. It adds more speakers in a two-dimensional (2D) horizontal layout around the listener so the sound can come from the front, both sides, and back. A common surround system format is 5.1, which consists of 5 primary speakers that deliver the left, right, and center front channels together with right and left rear channels, plus a single subwoofer, usually in the front. Surround sound can be “virtualized” in a sound bar speaker configuration in front of the listener.
Object-based audio
OBA is a system where audio ‘objects’ like voices, individual instruments, sound effects, and so on are stored in their audio files, usually with related metadata that defines their levels, locations, panning, and other characteristics. In addition, OBA includes the concept of scenes to represent the metadata of the objects, which is neutral relative to the output audio format and system. The renderer puts everything together. It uses the metadata to mix the audio objects based on the devices, like speakers versus headphones, and the layout, stereo, 5.1, 7.1, etc.
Because the audio objects and the related metadata are individually passed through to the renderer, OBA gives listeners more opportunities to control the listening experience. When the object-based audio scene is created, the listener can turn up the volume on certain objects, turn off objects (like turning off the commercials), and even choose a different language for dialog or subtitles (Figure 2).
Scene based audio
SBA uses higher-order ambisonics (HOA) to represent sound in a fully spherical surround format. Using HOA, sound can be reproduced at a specific point in 3D space. HOA requires more microphones, and the microphones are more complex. An ambisonics microphone capsule comprises four independent microphones that can capture sound as cardioid polar patterns. SBA is more complicated than OBA. It can employ vector-based panning, intensity panning, delay panning, the doppler effect, and so on to precisely place the sound and move it around in 3D space to create a fully immersive experience.
SBA is well suited for use in virtual reality (VR) and augmented reality (AR) environments. It supports positional audio that dynamically adjusts the sound field based on the listener’s head position relative to the 3D virtual world. SBA can be delivered using conventional speakers, VR/AR goggles, etc.
Not mutually exclusive
Commercial audio systems like Dolby Atmos and DTS:X can handle CBA and OBA formats. For example, Atmos supports up to 128 independent channels. That translates into 118 audio objects plus ten channels to deliver a 7.1.2 sound bed (7 primary speakers in the 2D space around the listener, a subwoofer, and two height channels).
Dolby Atmos requires separate ‘height’ speakers that are unnecessary (but can be used) with DTS:X. DTS:X will adapt to most surround sound speaker setups. The Pro version can support up to 30.2 channels but works with fewer speakers. Also, while Atmos supports a maximum of 128 channels and 118 audio objects, DTS:X can be configured to support an unlimited number of audio objects.
Summary
Of the three categories of spatial audio, CBA, OBA, and SBA, only the last two deliver an immersive 3D experience. CBA is strictly a 2D delivery system. SBA can support more sophisticated content delivery using HOA, giving the listener the most control over the sound bed. It’s particularly suited for AR/VR environments.
References
Immersive audio, capture, transport, and rendering, APSIPA Transactions on Signal and Information Processing
Object-based audio, Center for vision, speech, and signal processing,
What is channel-based audio? Sound Particles
What is object-based audio? Sound Particles