Natural auditory environments involve many different concurrent sounds. They all sum together to one mixed signal that enters the ear. How does the human brain extract information from this permanent overwhelming chaos of surrounding sound waves? is problem is commonly known as the Cocktail Party Phenomenon.
The challenge to an auditory system in this situation is to find out which elements of the sound should be grouped together and treated as parts of the same source or object. Grouping them incorrectly can cause the listener to hear non-existent sounds built out of the wrong combinations of the original components. The cocktail party problem is a significant challenge for computational audition and machine listening systems. The remarkable ability of the human brain to direct attention to the sounds of interest while ignoring the others, as well as switching attention between sources in a multilayered sound environment, is still, to this day, unmatched by machines.
The following examples are taken from a Standard Computational Auditory Scene Recognition Test (SCASRT). In this test, humans, on average, were able to recognize 57 different scenes with an accuracy of 78%. The best recognition rate of a machine listening system (attained by the Cithaeron Audition Model 3.1) was 56 % for 33 different scenes.