3D Audio Mixing.
This is a new audio technology that is trending online. It is specifically designed to be listened to with headphones. Aiming to give the impression of listening with your brain instead of your ears. In a nutshell, 8D audio does not really exist as we perceive audio in 3 dimensions but it emphasises the possibilities our brains are capable of producing new sensorial content. The term 8D is hyped in a series of videos on various digital video and audio platforms and the correct term should be 3D or Spatial Audio.
*Please use Headphones for Full experience.
At Spatial Mastering, we have listened to hundreds of so-called “8D” audio tracks online and found that fundamentally they are the result of equalisation techniques, effects and panning combined together. The majority of the tracks we heard apply the afore-mentioned techniques to the master/stereo file.
At Spatial Mastering, we generate 3D audio by treating every channel individually instead of working on the master/stereo file only. As a result, you have a fully immersive 360° experience. There are no shortcuts to obtain the best possible results in the art of spatializing your work and we achieve maximum potential in 3D audio. We have been involved in 3D audio projects for the past 10 years. Below we explain the knowledge that we have acquired throughout the years and how we apply them to your work.
The Historic Evolution.
Throughout the development of the gramophone and phonograph, audio systems were mono audio. Subsequently, these developed from mono to stereo and then binaural stereo, cinema stereo, ambiphony, quadraphonic sound, ambisonics, ITU standard surround and 3D audio. 3D audio is not a new concept but was popularised in recent years primarily due to virtual reality, thereby merging the two worlds to create a fully immersive experience.
Head-Related Transfer Functions (HRTF).
This captures the alterations of sound waves generated from the source to our ears. Some of the alterations include diffraction and reflections on the parts of our bodies; head, pinnae, shoulders and torso. As a result, this produces the illusion of spatially located sound. HRTF is a Fourier transform of a head-related impulse response (HRIR). It is a complex function defined for each ear, having both information about the magnitude and the phase shift.
Binaural recording and reproduction are intended to mirror the human two-ear auditory system and reproduce sounds specifically for a listener’s two ears. It is intended to match the human two-ear auditory system and usually works with headphones or with loudspeakers. Binaural is an upgrade to stereo; the sound is captured in a simulated way as we listen. This can be captured with a headset with two microphones located in each earpiece, dummy head or with a head and torso simulator (HATS). The sounds received in two ears are distributed and moulded by the human head, torso and ear geometry ending in spatial cues for binaural hearing being made available.
This is not a new format and was developed in the 1970s. As a result of the development in AR and VR technology, in recent years there has been a growing interest in ambisonics in the spatial audio area. Ambisonic microphones capture audio in four channels including elevation, declination and give as a full spherical directionality. The four signals captured by ambisonics are; W, X, Y and Z. The W, X, Y and Z channels are named B-Format – First Order (4 channels). B Format has two standard protocols AmbiX and Fuma – they vary by the order in which the four channels are arranged.
Psychoacoustics and Playback Issues.
The complex interaction between human and acoustic waves starting with the peripheral auditory system and terminating with the data handling characteristics of cognition. The more knowledge we have within the field will lead us to do a better job in recreating virtual simulations of spatial audio. Perhaps, therefore, our job is not to simulate reality for the sake of accuracy but often, simulating reality is the first step and from there we alter the content to make it more fun or entertaining.
Multichannel Sound Field.
Multichannel sound backdates to the 1930s. Today it is a standard format in the movie industry. A stereo mix presents a spectrum of positional information between two speakers providing the impression of the sound being positioned at any point between the two speakers. In reality, the sound is tied to a straight line within the speakers. This principle works well with headphones and when the listener is located in a sweet spot between the speakers. The further away the listener is from the speakers, the less effective the stereo spatialization becomes.
Spatialization in Surround.
Sound works in an identical principle although the spatial panning moves along multiple axes and gives the listener the sensation of sound coming from multiple sources. 5.1 or 7.1 give a similar feeling, but the sound remains locked to the line between the speakers. As a channel-based system, the listener can experience the sound moving backwards and forwards but the sound never leaves the line between the speakers and gets closer to the listener.
Dolby Atmos, Auro 3-D and DTX:X are more recent surround sound technologies, object-based where the listener has an immersive 3D spatial audio experience.
Decorrelation of an audio signal is a process that generates two or more incoherent signals from a single input signal, which has many applications in artificial auditory effects, such as broadening the apparent source width (ASW), enhancing the subjective envelopment, producing subjective diffusion in multichannel reproduction, etc. Something to take into consideration when we work with Z-axis speakers is the audio content that is delivered to them. It should be decorrelated from the sound coming from the speakers on ear-level.
To avoid the unnecessary increase of higher and higher channel counts, Dolby introduced the object. In theoretical terms, an object represents an audio channel, although instead of going into a specific channel the objective is free to be defined as being anywhere within 3D space at a precise X, Y and Z coordinate. The channel and its positional data are rendered as metadata and transmitted from the point of authoring to the Dolby Atmos decoder in the consumer playback environment. The decoder handles the metadata to know where the audio should be positioned in space, using an algorithm to decipher how to optimally route the audio to the available output channels that feed the appropriate speakers in an environment. Different objects can be applied to represent separated sounds in the mix; this is commonly referred to as object-based audio. One of the advantages of Dolby object-based is the fact that a mix created in a 5.1.2 environment can be played in 7.1.4 perhaps in an inverse order a mix created in a 25.1.8 can automatically be played back in a 5.1.2 setup.
Data from several studies suggest that on different occasions the use of a channel based approach may be preferred, even for Atmos delivery. When applicable each speaker position is configured as an object. For instance, in a 5.1 set up the centre channel is configured as an object and any sound intended for that position is routed to the channel feeding that object and will play in the front centre of the sphere.