EXTENDING THE HUMAN ACOUSTIC MODEL
We have seen that we can create microphones
and loudspeakers that capture and reproduce sound sources placed in space.
These reproduced sources could then be treated as real – measured and recorded,
or listened to. We have avoided the human perception aspect. We now look at
this.
Lord Rayleigh published his paper on the
directional nature of sound perception in 1907 [[1]].This work explained binaural hearing behaviour using a model
considering the ears as simple pressure sensing microphone elements. Two
variables were used – Inter-aural Time Delay (ITD) for the additional distance
the sound wave had to travel around the head and Inter-aural Intensity (level)
Difference (IID or ILD) for the sound source proximity to the nearest ear.
This model essentially recognised that two
pressure sensing elements separated by a small distance along a line can
selectively determine the direction of a sound source.
A primitive model of hearing
We will use an electro-acoustic analogy of
human acoustic perception to help determine exactly what the essential “system”
requirements are. This will provide further insights into how acoustic
perception could work.
We start with the basic configuration
described above. Two microphone elements are spaced 170 mm apart in free space
to model a primitive human head. The electrical outputs from the microphone
elements are summed.
Figure 1 shows the directional behaviour of
such a model. The graphs represent the measured output from a point sound source
moved around the “head”, measured at various frequencies. To clarify the nature
of the graphs with respect to path difference the sound level received at each
element has been adjusted in the model to be the same. In practice the levels
are different because of proximity and shadowing. This means that the nulls are
not as deep or as regular in appearance. The full three-dimensional graphs
extend the lobes to toroidal or doughnut shaped responses.
The maximum path difference between the
elements occurs when the sound source is to the extreme left or the right. At
approximately 1 kHz, the path difference from the source to each element has
become half the wavelength and so the summed signal has a cancellation null or
dip appear that occurs symmetrically to the left and the right of the elements
(the head).
As the frequency is further increased the
half wave cancellation path difference is decreases and so the null or dip
location “moves around the head” with front-back symmetry. At the same time the
next highest frequency of cancellation corresponding to three half wavelengths
occurs to the left and the right and so multiple nulls now “appear”, and move
to the front and back. As the frequency is increased, more and more nulls or
dips appear and move to the front and the back. At any nulls, the particular
frequency is theoretically missing completely in that direction – there is an
acoustic blind spot. No amount of electrical equalisation of the overall or
summed output signals will recover signals from such nulls.
Human hearing does not exhibit these
periodic complete nulls. The primitive model will need augmenting by some means
to more accurately represent human hearing, but the result of path differences
can be clearly seen.
Improving the situation with electrical filtering
We now consider whether electrically
treating the signals from each element separately before summing would affect
the directionality. The answer is that it can.
The simplest example to show how this could
work only changes the response for sounds in one particular direction.
The signal from the nearest element is
delayed by the propagation time to the furthermost element. The time delay
would effectively mean that there was no path difference between the elements
and so the output from the combined elements was independent of frequency. This
would only apply for a point sound source in one direction. As the source
rotated away from this direction, nulls would reappear. This is not a very
useful apparatus unless you can continually more the array to align with the
source (turn the head), have some way of deriving delayed signals for every
direction and then determining the relevant delay to use for each source
direction or have the luxury of signals from many “ears” around the head to
treat so that any nulls are outside the frequency range of consideration, but
it does demonstrate three important aspects:
1.
Microphone design strategies can either minimise
or compensate for design dimensions.
2.
The directional response can be altered by
electrical signal processing of the individual element signals in isolation.
Nulls and dips can be shifted but not necessarily eliminated because, for this
geometry of elements the effect of the path difference cannot be removed in all
directions simultaneously by this form of signal processing alone.
3.
The electrical outputs of the microphone
elements can be simultaneously electrically
processed to obtain more than one output, each with a different directional
behaviour. In the above case both the untreated response with nulls and the
delayed response with no nulls to the left (or the right) would be available.
A different approach will be required if
acoustic blind spots (nulls) and dips are to be eliminated, but electrical
processing is always available to assist with processing should the need arise.
When this model is translated to the human
hearing case the necessary electrical signal processing becomes the
responsibility of the brain. In particular, where frequency dependent
processing is required, such as in the case of time delay and phase filtering
of multiple frequencies, the brain will need to receive signals analysed into
frequency bands by some means (cochlea). Significant signal processing
“brainpower” would be required to undertake the processing, and some forms of
both long term and dynamic temporal storage of the processing artifacts would
need to included in the brain along the way.
Improving the situation with directional acoustic filters
It is also possible to treat acoustic waves
arriving at each microphone element with external, purely acoustic filters.
External acoustic filters have a
significant advantage over pure electrical processing in that they can be
physically constructed to vary phase and amplitude with direction. Electrical
filter processing is limited to working with the signals from the microphones
and therefore has limited processing capability for direction. Arrays of
elements can be used to synthesise directional response by electrical means
alone, but the highest frequency of operation is limited by the element spacing
and so multiple elements are required, particularly if either high directivity,
full coverage over all space or a combination of both is required. External
acoustic filters will of necessity have to take into account the acoustic
behaviour of the overall microphone structure. This is also the case with
multi-element arrays using solely electrical processing.
Many
external acoustic filter treatments are possible. For example, with a spherical
geometry head having just two elements, it is possible to remove periodic
frequency dependent nulls and dips completely using an acoustic filter that
exactly compensates for the path length difference between the elements around
the sphere. A suitable filter for each element is based on a quarter wave
transmission tube. Whilst this filter structure is not in itself directional,
it does create a directional response on the sphere. On the median plane there
is no path difference, and on the axial line the path length difference is
exactly compensated. There will still be amplitude and phase errors with
direction (and distance) between these locations, but these are now less
severe. Figure 2 shows a graph of the summed element output response for a two
element spherical microphone showing the response anomalies related to path
length differences on-axis, an ideal ¼ wave compensation filter response and
the resultant overall response (red).
The remaining frequency response errors
that occur with direction can now be treated by using additional acoustic
filters that provide phase shift with frequency that varies with direction. A
separate filter will be required for each element. These filters will be
symmetrical both with regard to each other and independently as they are each
correcting for a symmetrical geometry spherical head.
When a restricted number of sensing
elements (two) are used, passive acoustic structures are necessary as these are
the only method of providing the necessary direction dependent phase shifts.
Electrical processing of the outputs from the two elements cannot achieve this.
Once the response correction has been
achieved in all directions, no null or dip producing phase shifts will occur in
the summed output of the two elements as sound sources move around the
microphone and as the sources move closer and further away. It is still
possible to process the difference signal and obtain divergence (sound source
distance) information at the same time, though the use of two elements placed
on a diameter would limit the usefulness of the approach for capturing source
distance information as no divergence information would be able to be captured
anywhere on the median plane.
To overcome this limitation, the microphone
elements can be moved away from their diametrically opposed locations. This
relocation will require adjustment of the acoustic equalisation filters as now
the path length differences have changed to become dependent on direction. Many
combinations of ¼ wave equalisers and directional phase filters are possible.
The ¼ wave stub equalisers could be adjusted to compensate the dominant path
difference that is now the shortest distance on the sphere surface. The
directional phase filters on each element can then equalise the remaining
directionally dependent phase shifts, noting that now the filters will be
mirror images of each other but independently asymmetric.
By this means it is possible to approach
the uniform response capability of the symmetric case, but it may not be as
effective in removing all peaks and dips with direction because of the variable
shadowing now present. The advantage, however, is the now the difference
signals will provide divergence (distance related) output in all directions.
The acoustic filters correct the
amplitude/phase differences between the two
microphone elements and so would need to consider both the basic physical
configuration including sphere diameter and microphone offset from the
diameter. An iterative design strategy is thus most likely required. Where the
basic shape varied from a pure sphere, the principles outlined above could be
applied to develop the appropriate equalisation that preserves both distance
and direction information for sound sources placed anywhere.
Sets of filters could be designed such that
there was no direction over 360 degrees in all planes for either sensor where
unacceptable loss of frequency band amplitude and phase information occurred,
and that allowed determination of directional and distance related information.
These filters would exhibit physical mirror symmetry when used to compensate
spherical shapes.
This is fully analogous to the nature of
human hearing, the ear canals and the ears on the human head. In profile it can
also be seen that the human head indeed does have offset ear locations.
The electronic head
For humans, certain directions may need
preferable coverage at the expense of worse performance in other directions,
and certain frequency bands may be of greater interest than others. This would
enable simplification and customisation of the structures and the subsequent
signal filters.
Electrical signal processing of the outputs
from the individual elements by the digital signal processing “brain” would
also be available for assistance in the design process [[2]].
One approach would be to design acoustic
filters that minimise the amount of subsequent processing power required to
extract the required source location information. Acoustic filters are just
topological structures and can be replicated, whereas “brainpower” is usually
in demand for other tasks.
Each directional filter microphone (ear)
should now be able to discern direction information on its own, taking into
account the acoustic influence of the head and torso shape. The mechanism would
use the processing analysed into frequency bands and memory requirements
identified previously. This is the first significant extension to the Rayleigh
model.
The originally simple two-microphone
element model can now be split into the arriving acoustic wavefront part (that
is already known to carry information about both the distance and the direction
of all sources), directional acoustic filters, the microphone elements with
spectral analysis capability and the processing required by the central
processor “brain”.
This logically leads to considering what
information can be extracted and what subsequent ‘brainpower” processing is
needed.
We have previously seen that a divergent
acoustic wavefront contains both distance and direction information, so we now
consider a perception model based on these parameters - a distance and
direction model. We already know that both distance and direction will need to
be considered together in order to fully locate sound sources. Their separate
consideration will now lead to a significant potential extension of the present
hearing models.

Direction sensing issues
If a point source with sufficiently broad
spectral content is measured by a single microphone element on a sphere with
external directional acoustic filters as described previously, sufficient
information is available from the wavefront to determine the source direction.
This requires the filter/element combination to have been previously calibrated
for directionality at all frequencies, and these calibrations stored in a
suitable form to be available at all times for comparison. In humans this is
described as a learned experience
The frequency band outputs will then only
be consistent for one physical direction, and so the source direction can be
uniquely determined.
The source has been assumed to be well
behaved and at a point. The nature of real sources can disrupt the direction
determination under certain conditions:
· The source does not contain sufficient spectral information
· The source is not a point
· The source is corrupted by specular reflections
· The source varies its phase response with frequency
Reflections or coherent phase anomalies in
the propagation path can also disrupt the determination.
Direction determination will require
significant processing in the “brain” and reference to a learned memory. This
processing will also need to split the incoming signals into frequency bands
during processing in order to determine the direction.
Thus there are two significant and distinct
processing activities required, and each activity will need access to stored
information – the compensation for the acoustic “ear” filter response, and the
split frequency band processing needed to locate each broadband sound source
direction. Each processing step will require some reference to learned or
stored information.
The significance of this will be seen in
the next section.
Distance sensing by series processing
We have seen that each of the filter/element combinations can
independently determine direction. With two “ears”, two directional determinations
are thus available. It should be possible to determine source distance from
these two determinations essentially by trigonometry with suitable additional
“brainpower” processing and with reference to a learned distances/directions
memory table.
This would be a significant third
processing activity and memory requirement over the previous filter calibration
and direction determination requirements.
There is another way.
Distance sensing by simplified parallel processing
We know that the divergence or swelling of
the acoustic wavefront contains all the necessary information to determine
source distance.
The previous, simple acoustic filtered
model could use another processing method to more directly determine distance
information using the divergence of the wavefront.
The divergence or gradient information
essentially considers the change in the field and thus would logically start
with the difference signal from the two acoustic filtered elements. The
directivity nulls will still need to be removed with acoustic filters, as was
the case for the directional determination above, but then the difference
signal could be used for determination of distance.
This will use some processing brainpower,
but the directional filters are already available from the direction
determination processing and these could be used, minimising the additional
brainpower required. The assumptions would be that either the acoustic filter
correction was able to be done before the sum and difference processing, that
the filter corrections were duplicated or that some form of synchronised
parallel streams were used for the processing.
The key point is that after acoustic correction to remove nulls and
dips, the difference signal should contain the divergence information for
distance in all directions and this should not need a great deal of further
processing. In particular it does not need spectral analysis.
A significant processing load for decoding
and comparison could now be left out by the “brain”, compared with processing
the two directional parts of the summed signals [[3]].
From the model, it
then makes sense that the primary human acoustic cue is
distance as this requires less “computer power” than direction based
processing. Sound source distance should show as a faster perceptual awareness,
and could thus be a trigger for cognitive control of direction processing.
Some evidence of this has already been
found for the Pre-associative Acoustic Store in humans [[4]]. The simple proximal test described in the box above also supports
this.
An overall
“wiring diagram” of human perception is suggested in figure 3, starting from
the “pink” acoustic correction filters on either side of a head and progressing
through to conscious perception.
The Rayleigh model can now be extended to
include symmetric directional acoustic filter structures, and we can be on the
lookout for parallel concurrent processing and both distance and direction
perception streams in the brain.
The
information contained herein is copyright to HuonLabs. No material can be
reproduced in its totality or in part or without the express permission of
HuonLabs Pty Ltd. Any reference to this material must quote the HuonLabs
source. Trade marks and Patent applications apply to most aspects of the work
disclosed here. Contact HuonLabs for further details, product information or
licensing enquiries.