Dept. of Software Engineering · University of Waterloo · Design Team No. 5

SENSOR-DRIVEN VIOLIN
PERFORMANCE CAPTURE

Rohan Shetty · Matthews Ma · Trevor Du · Kevin Gao · Zhifan Li

ABSTRACT

Existing violin performance datasets fail to completely capture the intricate techniques of musical expression. To address this gap, we built an augmented violin system with a range of integrated sensors, yielding a multimodal dataset intended to support research in music information retrieval. Our system captures audio, video, bow pressure, bow tilt, bow and fingerboard position, violin/bow orientation, and room impulse response simultaneously from ten advanced violinists performing studio-grade repertoire.

We further analyzed hardware-captured features to correct the state-of-the-art software pitch transcription algorithm, resulting in highly accurate MIDI data that captures legato bowing, bow speed, and contact point. Our dataset achieves a 62.8% improvement in note error rate over MUSC and a 14.6% relative reduction in perceived audio distance (Zimtohrli metric).

AUGMENTED VIOLIN SYSTEM

The augmented violin integrates several distinct sensor modalities directly onto a violin and bow, with all data routed through a Raspberry Pi Pico 2W MCU mounted on a custom protoboard.

Fig. 1 — Violin Body (Top)

Place photo here: img/violin-top.jpg

Violin body (top view). Four TSP-L touch sensors provide per-string finger position; an SR-04 ultrasonic sensor measures lateral bow position.
Fig. 2 — Violin Body (Bottom)

Place photo here: img/violin-bottom.jpg

Violin body (bottom view). A 9-DOF ICM-20948 IMU captures violin orientation and movement; the Raspberry Pi Pico 2W interfaces with sensors and the host machine.
Fig. 3 — Violin Bow

Place photo here: img/violin-bow.jpg

Violin bow. Four VCNL 4010 proximity sensors measure bow pressure and location; a second 9-DOF ICM-20948 IMU captures bow orientation.

Hardware System Summary

Sensor Model Purpose
Touch sensors (×4) TSP-L Per-string finger position
Ultrasonic sensor SR-04 Lateral bow position from bridge
IMU — violin body ICM-20948 Violin orientation & movement
IMU — bow ICM-20948 Bow orientation & movement
Proximity sensors (×4) VCNL 4010 Bow pressure & contact location
MCU Raspberry Pi Pico 2W Sensor interfacing & data transmission

DATASET

The dataset comprises 300 minutes of studio-grade solo classical violin recordings from 10 advanced violinists, annotated with synchronized sensor streams, audio, video, and bow position labels.

Comparison with Existing Datasets

Dataset Performers Length Modalities
Ours 10 300 min Audio, MIDI Transcriptions, Video, Positions, Fingering, etc.
Violin Etudes 21* 1668 min Audio, MIDI Transcriptions
URMP 2 80 min Audio, MIDI Transcriptions, Video
Bach10 1 5 min Audio, MIDI Transcriptions

* Violin Etudes' data is heavily skewed towards performers 1 and 2.

RESULTS

62.8% Improvement in note error rate over MUSC baseline
14.6% Relative reduction in perceived audio distance (Zimtohrli)
31 mm MAE for bow distance from bridge estimation
72 mm MAE for bow contact point estimation

Note Error Rate

Ours
15.37%
MUSC
41.34%

Perceived Audio Distance (Zimtohrli)

Ours
0.0178
MUSC
0.0205

ACKNOWLEDGEMENTS

We greatly thank Professor Olga Vechtomova of the University of Waterloo NLP Lab for her support as our faculty mentor. This study has been reviewed and received ethics clearance through a University of Waterloo Research Ethics Board (REB #47874).