Concept
Most sounds in audio clips are generated by physical systems.
- A voice is generated by a mouth, voicebox, and lungs
- A guitar sound is generated by a vibrating string and a reverberating box
The concept of this visualization is to display procedurally animated representations of the real physical systems (mouth and voicebox, vibrating string, etc.).
It would first have to be trained to recognize the sounds and model the system. Then it could be connected to the visual models. (Could the procedural models could feed into the training as well? Constraining the physical recognition?)
