Sensor-Driven Violin Performance Capture

ABSTRACT

Existing violin performance datasets fail to completely capture the intricate techniques of musical expression. To address this gap, we built an augmented violin system with a range of integrated sensors, yielding a multimodal dataset intended to support research in music information retrieval. Our system captures audio, video, bow pressure, bow tilt, bow and fingerboard position, violin/bow orientation, and room impulse response simultaneously from ten advanced violinists performing studio-grade repertoire.

We further analyzed hardware-captured features to correct the state-of-the-art software pitch transcription algorithm, resulting in highly accurate MIDI data that captures legato bowing, bow speed, and contact point. Our dataset achieves a 62.8% improvement in note error rate over MUSC and a 14.6% relative reduction in perceived audio distance (Zimtohrli metric).

AUGMENTED VIOLIN SYSTEM

The augmented violin integrates several distinct sensor modalities directly onto a violin and bow, with all data routed through a Raspberry Pi Pico 2W MCU mounted on a custom protoboard.

Fig. 1 — Violin Body (Top)

Place photo here: img/violin-top.jpg

Violin body (top view). Four TSP-L touch sensors provide per-string finger position; an SR-04 ultrasonic sensor measures lateral bow position.

Fig. 2 — Violin Body (Bottom)

Place photo here: img/violin-bottom.jpg

Violin body (bottom view). A 9-DOF ICM-20948 IMU captures violin orientation and movement; the Raspberry Pi Pico 2W interfaces with sensors and the host machine.

Fig. 3 — Violin Bow

Place photo here: img/violin-bow.jpg

Violin bow. Four VCNL 4010 proximity sensors measure bow pressure and location; a second 9-DOF ICM-20948 IMU captures bow orientation.

Hardware System Summary

Sensor	Model	Purpose
Touch sensors (×4)	TSP-L	Per-string finger position
Ultrasonic sensor	SR-04	Lateral bow position from bridge
IMU — violin body	ICM-20948	Violin orientation & movement
IMU — bow	ICM-20948	Bow orientation & movement
Proximity sensors (×4)	VCNL 4010	Bow pressure & contact location
MCU	Raspberry Pi Pico 2W	Sensor interfacing & data transmission

DATASET

The dataset comprises 300 minutes of studio-grade solo classical violin recordings from 10 advanced violinists, annotated with synchronized sensor streams, audio, video, and bow position labels.

Comparison with Existing Datasets

Dataset	Performers	Length	Modalities
Ours	10	300 min	Audio, MIDI Transcriptions, Video, Positions, Fingering, etc.
Violin Etudes	21^*	1668 min	Audio, MIDI Transcriptions
URMP	2	80 min	Audio, MIDI Transcriptions, Video
Bach10	1	5 min	Audio, MIDI Transcriptions

^* Violin Etudes' data is heavily skewed towards performers 1 and 2.

RESULTS

62.8% Improvement in note error rate over MUSC baseline

14.6% Relative reduction in perceived audio distance (Zimtohrli)

31 mm MAE for bow distance from bridge estimation

72 mm MAE for bow contact point estimation

Note Error Rate

Ours

15.37%

MUSC

41.34%

Perceived Audio Distance (Zimtohrli)

Ours

0.0178

MUSC

0.0205

ACKNOWLEDGEMENTS

We greatly thank Professor Olga Vechtomova of the University of Waterloo NLP Lab for her support as our faculty mentor. This study has been reviewed and received ethics clearance through a University of Waterloo Research Ethics Board (REB #47874).