Home » The Sound of VR Virtual Reality – Talking about Spatial audio, Ambisonic and other sound processing technologies – Earworm

The Sound of VR Virtual Reality – Talking about Spatial audio, Ambisonic and other sound processing technologies – Earworm

by admin
The Sound of VR Virtual Reality – Talking about Spatial audio, Ambisonic and other sound processing technologies – Earworm
The original text was published on DigiLog | https://digilog.tw/posts/1188 Published with permission

For friends who like virtual reality, 360-degree videos or enjoy watching movies, even if they are not creators, they should have been curious about how those 3D sound effects are achieved. This article will use some relatively new radios A brief introduction to technology, equipment, post-mixing, and listening methods.

But before that, let’s watch a video first to experience it directly!

(It is recommended to wear headphones to watch Youtube on your mobile phone. If you use a computer, please use Google Chrome, Firefox, Internet Explore, and Opera. Safari does not currently support)

There are a few things worth mentioning in the 360 ​​video:

Sound moves with what we see
No need for expensive machines and can be taken easily from the device at hand
Low hardware limitations, but many software/platform limitations
Difficult to share with others

This technology of processing 3D sound is not entirely a new field. What he uses is the so-called ambisonic format (there is no official Chinese translation yet), which is different from the mono, stereo, 5.1, and 7.1 we often hear. ambisonic is a sound processing format that records a complete 360-degree surround view, and is widely used in solutions for radio, post-processing, and listening situations. It has been introduced in the 1970s, but in recent years, due to the rise of related fields such as VR and AR, there has been an active discussion application.

But before explaining ambisonic in detail, we need to establish a simple understanding of the common classification of sound processing technology.

Type of sound processing technology

The emergence of technology is to solve problems and achieve things that everyone wants to do but can’t do. For the demand for sound technology, from the phonograph in the early 20th century, only the monophonic format of recording and playback, the development of the entertainment field began to pursue a more natural and richer sound, and with the stereo, don’t forget that there is also 5.1 , 7.1 or even more channel formats, and this is the Channel Based that we are most familiar with and understand most intuitively.

Channel Based

Using the “channel” as the benchmark to assign the sound level of each object and the audio track between the channels is also the most traditional and the most complete positioning method. We often hear mono, stereo, 5.1, 7.1 are all of this type.


The above figures are all around channel based evolution on common planes

Later, the channel format of the upper and lower surround channels was added. Currently, the most channel format is NHK’s 22.2.

The figure shows the configuration of three common channel formats
AURO 10.1 configuration, in addition to the upper channel, there is an overhead channel
11.1 Configuration of DTS Neo:X

It can be found that in fact, in the format of higher and higher channel count, the recommended configuration of each brand is not the same.


Common manufacturers of movies and theater devices include Dolby Digital, DTS, SDDS, etc.; for home theaters, there is no SDDS.
Different manufacturers have their own channel versions, audio compression, etc. The most common format at present is 5.1, which is widely used as standard or optional for various playback equipment and film and television works.


Copy negatives printed with soundtracks in four formats – (left to right) SDDS (edge ​​blue bar), Dolby Digital (grey grid between perforations), photoacoustic soundtrack (two white lines) and DTS timecode ( dash)

Advantages and disadvantages of application


  • Listening: As the most well-developed type of orientation, there is extensive support for traditional, digital TV, DVD, and movies.
  • Later stage: The production is made in the context of how to deal with what the audience wants to hear, and the sound sounds the most natural.
  • Radio: The radio method and equipment are fully developed, such as the decca tree in the figure below or a tool that can be used as a 5.1-channel recording of the overall sound.


  • Listening: The equipment is usually not compatible with different channel formats. It is necessary to use up / down mixing to allow different channel formats to simulate other formats. Even the new play space can be overwhelming for most people.
  • Late stage: The difference in the number of channel formats affects the number of monitor speakers required for monitoring, and it is also difficult to locate the sound direction.
  • Radio: The threshold is high, and the radio method is complicated and expensive.
See also  Covid: 60 deaths, 22,527 positive. Rate at 13.6% - Medicine

After all, the human ear is very accurate for sound localization (otherwise we will probably never find the phone that fell somewhere in the room…), and the channel guide is constantly introducing more channels in order to satisfy the fidelity of the sound. format, and there is no way to completely solve the difficulties that come with it, so around 2012, a new type of orientation, Object Based, came out.

Object Based

How much volume each “sound object” has in which direction, and record the complete object information.

maker, kind
Dolby Atmos


VBAP (Vector Base Amplitude Panning)

For example, there is a theater that supports Dolby Atoms in the State Guest Theater in Taipei!

Advantages and disadvantages of application


  • Listening: Since the thinking on the channel is skipped, only the sound information of each object is simply recorded, and the positioning effect is very good. Different playback devices can also generally correspond to his location, and even support the traditional Channel Based software and hardware. Only a small update is required on the body and it can be used directly.
  • Late stage, radio: radio does not need to consider the direction of his real existence, which is very suitable for the special effects of the movie that were originally created.

As you can see from the plugins for dolby atoms, the complex panning options are gone, replaced by intuitive directional control in three-dimensional space.

As you can see from the plugins for dolby atoms, the complex panning options are gone, replaced by intuitive directional control in three-dimensional space.


The sound source is concentrated, it is difficult to maintain authenticity, and the format file is large

Although the Object Based format has a good positioning ability, it is difficult to solve the problem that the format file is too large, and then comes the problem of high-cost production. At this time, we have the focus of our discussion this time, Scene Based guide)

Scene Based

Scene-oriented, how much sound information each “scene” has, the complete scene information is recorded from the center of the scene, and individual object information is not recorded.


Currently the most common is ambisonic B-format, ambisonic is one of Scene Based.

RODE NT-SF1 1st order ambisonic microphone

Ambisonic’s microphone is an extension of the recording method M/S prosessing. It looks very different from other microphones. It has at least 4 capsules, which are 1st Order Ambisonic (FOA), but these four capsules do not refer to the playback channel. The direction, but an ambisonic A-format that records the entire 360-degree scene, but let’s take a look at what M/S prosessing is before the word bombing!

Simply put, through two mono microphones, M stands for Middle, which is facing the front of the heart (all/bi-directional) microphone, s stands for Side, which is perpendicular to the bi-directional microphone in front. After processing, to create a listening It appears to have a wider sound than the actual speaker placement.

After being applied to FOA’s wheat, it is not only the breadth of left and right, but also the two dimensions of up and down, front and rear.

4 channel = surround view 360 sound information
Usually we will represent it as WXYZ

W: omnidirectional
X: Bidirectional front and rear
Y: Bidirectional left and right
Z: Double pointing up and down

AmbiX or FuMa
If you look at the introduction of some ambisonic microphones, there will be features that emphasize that you can flip the direction with the device.

First, the ambisonic radio is often used with the camera equipment of the surround view. Since it is placed in the center of the scene, the microphone will receive the sound of the camera operation, and the camera will also capture the microphone; second, different decoding formats will have different directions, the most commonly used. The format is AmbiX and FuMa. The main difference between them is the order of channels. AmbiX is WYZX; FuMa is WXYZ. Fortunately, the conversion between these formats also has plugins that can be directly converted.

encoding and decoding
From the original knowledge of M/S prosessing, the format of the microphone recording is not that after we throw it into the daw, we will hear the entire surround sound. The format of the microphone and the format of Scene based that we really want to hear must be decoded before it can be used, but These things sound complicated, but usually you don’t have to do this step yourself. In the state of FOA, you know that A-format is the format of the microphone recording, and B-format is the format after decoding. Through the software attached to the microphone or plugin can do it.

See also  Tumor of 70 kilos removed from a patient, record surgery at the Le Molinette hospital in Turin

HOA (Higher Order Ambisonic)
The influence of the positioning accuracy of the Ambisonic microphone comes from the number of capsules. As mentioned above, it is the 1st Order. According to the analogy of the spherical harmonic function, the 2nd Order needs 9 pieces, and the 3rd Order needs 16 pieces.

Layer 1: W omnidirectional
Second layer: 1st order
The third layer: 2nd order
Fourth floor: 3rd order

In recent years, many manufacturers have launched ambisonic microphones, such as Sennheiser’s AMBEO VR microphone, ZOOM’s H3-VR recording device, etc. High Order also has many microphones for different purposes, such as 3rd Order ambisonic’s ZYLIA.

Sennheiser AMBEO VR MIC
The ZYLIA of 3rd order ambisonic can divide different instruments through software and mix them later.
The handheld recording device equipped with FOA does not require an external recorder and is suitable for entry.

Mixed radio

The ZYLIA of 3rd order ambisonic can divide different instruments through software and mix them later.


The handheld recording device equipped with FOA does not require an external recorder and is suitable for entry.

Mixed radio

However, the ambisonic microphone has its limitations after all, being placed in the center of the scene makes it difficult for him to record all the sounds in detail, and the three-oriented relationship is not trying to replace each other, but can adopt a mixed sound collection method, except for the ambisonic microphone. I will set up a separate microphone and other microphones as a post-adjustment, and you can listen to the difference between each other.

The power of the Ambisonic format lies in its high compatibility with other formats during post-production. Because it is very tolerant of other oriented formats, no matter mono, surround, object based, it can be adjusted and edited together by transcoding. It can also be easily exported to a non-ambisonic format for a variety of listening situations. On the other hand, it is important to understand the complex formats and software functions, platforms and limitations.

It is very important to understand what are the most commonly used formats and what kind of equipment you will need. Although Ambisonic has higher-order microphones that can improve the accuracy of sound positioning, with the current mainstream platform that supports 360 video, the For example, youtube and facebook are different

If ambisonic is edited in DAW in FOA format, it needs to support 4 channel format, but at this time, the usage of channel is not the channel of several channels played by channel based, but the four channels recorded by the FOA microphone, which supports 4 channels. The most famous DAW of the channel should be Pro tools, but what is more worth mentioning should be REAPER, which is easier to get started. It can be said to be a good news for friends who are just getting started!

In addition to the decoding software for the microphone itself, waves has also launched plugins for editing, transcoding, and monitoring ambisonic, and google has also launched Resonance Audio’s cross-platform development tools, which can be used in web pages, programs, DAWs and other environments.

Any headset that supports stereo, hardware, platform, device that supports ambisonic audio format (eg: youtube, facebook, etc.)

Speakers and Headphones


The biggest difference between listening to 3D sound from speakers and headphones is whether there is positional identification through the ears. The speakers play the sound into the entire space, and the headphones directly send it to the ear canal, just like if the ambisonic format is transcribed into 5.1 , we can freely move and rotate in the middle of the speaker to listen to the direction of the sound, but this is obviously not the best choice for listening to ambisonic. On the sphere, but this is obviously difficult for ordinary people to achieve, and the most common method is through headphones – Binaural format.


Binaural is also a recording method. It records the sound directly through the microphone that simulates the head to faithfully record the sound that people hear. There will be a problem when transcribing ambisonic into binaural format. After wearing headphones, how to turn the headphones will be stuck there. If the way to make the sound move with the action, the headphones will sound just ordinary stereo, (correction: the format of Binaural audio is simply to use stereo (eg: headphone listening) to reproduce the sound in 3D. Ambisonic transcription When it is in Binaural format, the head-tracking method can achieve the effect of moving the sound relative to the head, but such processing will still appear unnatural, which is related to how the human ear receives and recognizes the direction of the sound.) The movement must bring in the correction of the sound of the simulated ear when it moves in space, that is, and this correction can be thought of as a filter that simulates the ear, the name of this filter is called – HRTF (Head Related Transfer) function).

See also  free tours and concerts, that's where

Our ears are actually very sensitive and complex. Just like we use a pair of eyes to distinguish the distance and size of objects in front of us, we can easily locate the source of sound in three-dimensional space with a pair of ears. There are several filters in this filter. The more important parameters are as follows.

HRTF (Head-Related Transfer Functions)

Inter Aural Time Delay (ITD)
The time difference between the sound source reaching the two ears. For example, if the sound comes from the front, the sound will reach both ears at the same time. If it comes from the right, the distance to the right ear will be one more head distance than the left ear. .

The volume difference between the two ears / IAD (Inter Aural Amplitude Difference)
In addition to the difference in arrival time, the volume will also be absorbed by the skull, so the volume heard by the left and right ears will also be different.

Cone of confusion


However, these two values ​​will still be misjudged. For example, the ITD and IAD are the same for the ear in front and rear, and the ear must rely on other data to make judgments.

auricle diffraction effect
The importance of the auricle can be relied on to cover the ear gently, ask someone to move the key ring up and down in a fixed position and swing it, you will find that it is actually a little difficult to tell where he is, and the sound is between the complex auricles. The reflection is also one of the important factors that affect our position identification.

But from these parameters, it can be seen that the data it refers to is actually from the human body itself, which means that there will be a slight difference between these data for everyone, the size of the head, the structure of the ear, etc., to really use headphones to faithfully present the complete appearance of the entire ambisonic , unless everyone can easily measure their HRTF, it is still difficult to be very accurate.

Possibility of listening to Binarual Audio with stereo speakers
Just like the video mentioned at the beginning, the immersive experience has a characteristic that it is difficult to share with other people. You can only experience it by wearing a device, but it is impractical to build a complete ambisonic monitoring system. Stereo speakers directly play the binary audio, and the left ear can hear the sound of the right ear (this phenomenon is called crosstalk). There will be some distortion in technology.


This technology seems to be still at a stage with many limitations and has not been really widely used. At the same time, the requirement for sound is less urgent than the degree of visual perception. The pursuit of the ultimate sound seems to be less urgent.

If you had such a tool, what would you want to use it for?

  • In-phase Audio
  • An Introduction to Ambisonics with John Escobar | 360° | VR | Spatial Audio Recording | Berklee Online
  • CDM
  • Resonance Audio
  • A Survey of HRTF Audio 3D Localization Technology
  • Interaural Level Differences
  • SonicScoop
  • WIKI: Surround Sound

(Visited 28 times, 28 visits today)

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More

Privacy & Cookies Policy