There are different levels of "proximity" information we could provide:
- Distance only (no direction) - just with loudness. We could scale loudness based on a realistic function (and we should at least find out what that is), or we could use an artificial function to either increase or decrease the range at which proximate regions become audible. This is further broken into two sub-cases:
- Continuous distance scaling. If the loudness of an object differs as a continuous function of distance, then one can judge direction by moving the mouse (maybe ... this may be difficult).
- Discrete scaling. The simplest version of this is to have an extra buffer around each object, in which its sound is audible but less loud. With discrete scaling, mouse motion does not reveal the direction to an object unless the boundary of a surrounding buffer is crossed.
- Directional. The most accurate directional audio requires headphones and computed with an individual head-related transfer function ( HRTF ), but some left-right directionality can be achieved using a generic model and even with ordinary stereo speakers. We are better at sensing direction of high-pitched sounds than lower pitches (which is why a surround-sound system has more tweeters and mid-range drivers than woofers, and why high-end stereo systems often use a single sub-woofer). High quality directional audio is a good deal more computationally intensive than simply varying loudness, but directional audio is supported program libraries (including for Java) because it is used in some games.
The indirect link between this and Jake's idea about audio input is that the most accurate audio input comes from headset microphones. The built-in microphones in most computers are poor quality and/or have a problem with noise from the computer. It should be possible to provide a good-quality microphone that is not part of a headset, but it could be a challenge to keep it well-positioned relative to a blind user.