4,503
Views
12
CrossRef citations to date
0
Altmetric
Articles

A Low-Cost Real-Time Tracking System for Violin

, &
Pages 305-323 | Received 14 Jan 2015, Accepted 13 Aug 2015, Published online: 08 Oct 2015

Abstract

This paper presents two low-cost, real-time methods for performance tracking on the violin. Low-latency pitch detection is achieved by using finger position measurements from a resistive fingerboard to inform audio analysis; the combination outperforming audio-only methods. Bow position and pressure are tracked using four optical reflectance sensors placed on the bow stick, allowing the displacement of the hair to be measured under the force of the string. Both sensor arrangements for this system can be fitted to existing violins without damaging the instrument. A case study demonstrating the utility of these techniques is presented finding fingered and bowed note onsets during performance.

1. Introduction

The violin is notoriously difficult to learn; unlike many other instruments, pitch control is continuous with no visual or tactile guide to what is correct. In addition, the bow requires management of seven different parameters for tone generation or, as renowned virtuoso Tasmin Little put it: ‘you can be doing everything right on the violin but, as 90% of tonal production comes from the bow, as long as your bow isn’t working, nothing is going to work’.Footnote1

With such nuanced control, it is not surprising that tracking the parameters of violin performance, or violin tracking, is a challenging technical problem. Controlling pitch and bow require such a highly developed level of feel that it is easy for sensors to physically interfere with normal play and mounting them on delicate expensive instruments can also be problematic.

Despite its many challenges, violin tracking enables exciting opportunities focusing on how one plays, not just what one plays. Applications of the technology include helping students practice and learn (van der Linden, Schoonderwaldt, Bird, & Johnson , Citation2011), study of stylistic differences between strokes (Rasamimanana, Fléty, & Bevilacqua , Citation2006) or traditions (Schoonderwaldt & Jensenius , Citation2011), study of ensemble cohesion (Grosshauser, Feese, & Tröster , Citation2013), performance with automated systems ( , Citation2006; Buxton and Dannenberg , Citation1986), and expansion of performance capabilities (McMillen , Citation2008). Many specialized systems (both real-time and non-real-time) have been designed to pursue these targets. However, the goal of real-time violin tracking using affordable, portable, and minimally intrusive means has yet to be reliably accomplished.

This paper proposes a low-cost real-time system for both pitch and bow tracking. We consider low-cost to be below $100 USD and are using the accepted target of sub-10 ms latency for real-time interactive audio systems (Freed, Chaudhary, & Davila , Citation1997). The system is designed to fit any violin and bow with easy installation, minimal impact to normal playing, and no damage to the instrument.

We will review of the present state-of-the art in Section 2, then introduce a simple fingerboard position sensor for rough tracking of finger placement on stringed instruments. We then describe how this can be combined with audio analysis for high accuracy pitch estimation (Section 3) followed by the use of optical sensors for real-time bow tracking (Section 4). Finally we provide a demonstration of our system with a case study on finding note onsets by combining the sensing techniques presented with existing audio analysis tools (Section 5).

2. Prior work: Augmented stringed instruments

Integrating violin-family instruments and electronics predates digital technology. In the 1970s Max Mathews used analog filters to simulate violin body resonances in an early electric violin (Mathews and Kohut , Citation1973) and in the mid-1980s Askenfelt (Citation1986) developed a strain gauge system for measuring bow force and position, using it to study bowing technique.

Today, many digital instruments have been developed based on the violin family, ranging from controllers inspired by violin or cello ( , Citation2013; Trueman and Cook , Citation2000) to augmented physical instruments. A thorough review of various new and augmented violin-family instruments can be found in Overholt (Citation2012), and Poepel and Overholt (Citation2006).

Augmented violins can be divided into two interrelated groups: instruments augmented to provide new performance techniques beyond the capabilities of the acoustic violin, and instruments altered for tracking of traditional forms of performance (Overholt , Citation2012). Examples of the former category include the MIT Media Lab’s Hyperinstruments (Machover , Citation1992), the Overtone Violin (Overholt , Citation2005) and Overtone Fiddle (Overholt , Citation2011).

Performance tracking augmentations often involve addition of sensors to an acoustic violin; the system used by Grosshauser et al. (Citation2013) for synchrony analysis being a good example. Custom instruments and bows can also be built with embedded sensors, such as the commercially available K-Bow (McMillen , Citation2008). The evolution of the K-Bow also illustrates how augmented instruments can accommodate both extended and traditional uses. The K-Bow technologies originated with Paradiso and Gershenfeld’s Citation1997 work on their Hyperinstruments which were further developed through Young’s Citation2002Hyperbow, a system that was intended to enable extended performance. Young and others subsequently used the same core technology for performance analysis (Rasamimanana et al. , Citation2006; Young and Deshmane , Citation2007) before it was used for targeting extended performance once again with the K-Bow (McMillen , Citation2008).

Violin performance measurement requires that accurate bow tracking and fingerboard tracking both be achieved. Askenfelt tackled both of these problems in pioneering full violin tracking work. Adding a resistance wire in amongst the bow hair, placing strain gauges on the bow and later electrifying the strings allowed bow position, pressure, velocity, and bow-bridge distance to be captured together (Askenfelt , Citation1986, Citation1989). Bow tracking developments continued with the Hyper Cello which used an antenna to drive a resistive strip along the bow thereby tracking bow position and bow-bridge distance (Paradiso and Gershenfeld , Citation1997). Young (Citation2003) improved on their work introducing the Hyperbow, a wireless system with accelerometers and additional strain gauges for capturing lateral and normal bow flex.

Maestre, Bonada, Blaauw, Perez, & Guaus (Citation2007) reported a significant improvement in real-time bow measurement by using electromagnetic field (EMF) tracking. This was achieved using a Polhemus Liberty EMF tracking systemFootnote2 providing full position and orientation data. Unfortunately, at the time of writing, the Polhemus system cost several thousand dollars making it prohibitively expensive for general purpose use. Schoonderwaldt and Demoucron (Citation2009) and Schoonderwaldt and Jensenius (Citation2011) successfully demonstrated substantial non-real-time performance analysis using a Vicon optical motion capture system to measure acceleration combined with sensors from Demoucron (Citation2008).

To date, fingerboard tracking has received less academic attention than bow tracking. This may in part be because it is a conceptually easier problem to solve but also because control of the bow is often credited with the unique expressive nuances of string instruments. Nevertheless, the left hand is a critical part of violin performance and efforts to capture left-hand finger-placement range from resistive frets on the Hyper Cello (Paradiso & Gershenfeld , Citation1997) to experiments with capacitive and position sensors on traditional violins (Grosshauser et al. , Citation2012; Grosshauser, Feese, & Tröster , Citation2013) and Freed et al’s (Citation2013) case study of fingerboard sensors for string-inspired instruments.

Our work in this paper augments an acoustic violin with low-cost fingerboard and bow sensors primarily designed to capture standard performance techniques in real-time. While the examples in this paper will focus on traditional performance, this sensor arrangement could equally be employed to create new musical effects or extended playing techniques.

3. Sensor-informed low-latency pitch tracking

Pitch on a violin is determined by open string tuning and the location of fingers pressing the strings. We have designed a simple method for electronically tracking the finger closest to the bridge on each string. This sensor data produces a rough estimate of the pitch with low latency but requires knowledge of the string tuning for precise pitch tracking. Audio-based pitch estimation is potentially more accurate for offline analysis but it fails to perform consistently for real-time latencies under 10 ms. In the following section we extend our work in Pardue, Nian, Harte, & McPherson (Citation2014) by presenting a method for combining hardware pitch estimates with audio analysis to achieve high-accuracy, low-latency pitch tracking.

3.1. Fingerboard tracking—capturing finger placement

We track finger placement by configuring the violin fingerboard to act as a linear position sensor; an approach similar to those explored by both Grosshauser and Tröster (Citation2013) and Freed, Uitti, Mansfield, & MacCallum (Citation2013). However, we seek to improve upon previous work here by using a new sensing arrangement that is non-intrusive and instrument-safe while retaining seamless fingerboard feel and appearance.

Figure 1. Fingerboard sensor configuration. The fingerboard is fitted with a custom resistive position sensor overlay. Pressing a string down with the finger causes an electrical connection to be made; the induced voltage being linearly related to the position of contact.

Figure 1. Fingerboard sensor configuration. The fingerboard is fitted with a custom resistive position sensor overlay. Pressing a string down with the finger causes an electrical connection to be made; the induced voltage being linearly related to the position of contact.

The fingerboard, as shown in Figure , is covered with a custom linear position sensor. The sensor comprises four printed conductive traces, one aligned under each string, with a sheet of velostat (a piezo-resistive material) placed on top. An air gap between the velostat and the conductive traces is created by placing thin tape between the traces prior to attaching the velostat. When a finger presses on a string, the velostat forms an electrical circuit with the conductive trace below.

Figure 2. Circuit for fingerboard position detection. A current mirror supplies fixed current to the velostat making linearly proportional to the distance travelled through the resistive velostat. A JFET, BJT, or other analog switch is used so that only one of the conductive traces under the strings passes current thus allowing measurement of finger position for each string separately.

Figure 2. Circuit for fingerboard position detection. A current mirror supplies fixed current to the velostat making linearly proportional to the distance travelled through the resistive velostat. A JFET, BJT, or other analog switch is used so that only one of the conductive traces under the strings passes current thus allowing measurement of finger position for each string separately.

If the open-string length and length of the fingerboard velostat are known then the finger position can be determined by driving the bridge end of the velostat with a fixed current and measuring the voltage induced between the bridge end and the conductive trace for a particular string (using the current mirror circuit depicted in Figure ). The ratio of to the maximum measurable voltage (i.e. the voltage measured for an open string) can then be used to calculate , the distance of the finger from the nut:(1)

From this, if we know the tuned pitch, , of an open string then we can estimate the sounded pitch, , of that string:(2)

We multiplex the velostat by grounding the conductive traces in a repeating sequence thereby allowing measurements to be made for all four strings.

The pitch value derived from Equation 2 is a rough estimate, accurate to within 35 cents on a stable well-calibrated system. However a stringed instrument is not always well tuned, velostat conductivity is temperature sensitive, and a performer’s body has varying electrical conductance. These all introduce variance into system performance. Our present design is an improvement on that of Pardue, Nian, Harte, & McPherson (Citation2014) where strings were used as the conductive wire. The previous design suffered from missed fingerings due to strings, especially the E string, not firmly contacting the velostat covering the fingerboard. The current iteration still requires firm finger pressure, but initial user trials confirm that the necessary force is within expected performance practice for all but the highest hand positions.

Present builds of the fingerboard sensor use a tape (0.15 mm thick double-sided polyimide) that leaves no residue, is strongly adhesive, and re-adheres so that the fingerboard sensor can be treated as a removable sticker. As each fingerboard is slightly different, once cut to size, the sensor can be easily placed (or removed). If the velostat surface wears, it is also possible to recut and replace just that part. Resistance of the velostat has been found to change due to direct interaction with oils and friction from users so we counteract this by protecting the contact surface with a spray plastic or other laminate.

Figure 3. Hardware-assisted pitch tracking flow-chart. Pitch is estimated based on voltage signal from hardware using Equation 2 while autocorrelation is calculated on the audio signal. If the hardware pitch estimate is high enough to fall within the autocorrelation detection range, the algorithm searches for the best pitch within a whole tone of the hardware estimate. If the frequency falls below the detection range, the second harmonic is sought using autocorrelation and the result is divided by 2 to find the fundamental frequency.

Figure 3. Hardware-assisted pitch tracking flow-chart. Pitch is estimated based on voltage signal from hardware using Equation 2 while autocorrelation is calculated on the audio signal. If the hardware pitch estimate is high enough to fall within the autocorrelation detection range, the algorithm searches for the best pitch within a whole tone of the hardware estimate. If the frequency falls below the detection range, the second harmonic is sought using autocorrelation and the result is divided by 2 to find the fundamental frequency.

Initial user trials with eight moderate to professional violinists have proved the effectiveness and comfort of the sensor. All eight stated the sensor was unobtrusive and did not interfere with play, with five participants giving strong positive remarks as to its seamlessness. It has also been used successfully by the first author in multiple public performances featuring virtuosic repertoire. The thickness of the sensor means that it may not fully fit all fingerboards (a qualified luthier raised the nut to compensate on our cheap test instruments). The author has also performed with the sensor on an antique high end instrument (over $20k) with no ill effects. In this case, to avoid recutting a low nut, thin shims held in place by string tension are used to effectively raise the nut.

3.2. Hardware assisted low-latency audio pitch tracking

Pitch determination within a monophonic recording can be per- formedperformed quite reliably off-line (De Cheveigné and Kawahara , Citation2002; von dem Knesebeck and Zölzer , Citation2010), but it is more difficult in real time when targeting less than 10 ms latency. Autocorrelation of the audio signal is a standard means for pitch estimation. Time-domain autocorrelation methods typically yield results that, when wrong, are substantially numerically different from the correct frequency (Pardue, Nian, Harte, & McPherson , Citation2014). Shorter correlation windows have lower inherent latency but make error more likely so a trade-off is unavoidable. Correlation windows are also unable to assess frequencies below , where is the window size and is the sample rate; the window must be at least as long as one period of the lowest frequency it can detect.

On the other hand, pitch errors from hardware sensors tend to be local inaccuracies, missing the correct pitch by a few percent in either direction depending on calibration and linearity. As such, combining hardware and audio approaches can yield useful improvements. Ajay Kapur demonstrated this concept with his E-Sitar (Kapur, Lazier, Davidson, Wilson, & Cook , Citation2004). The E-Sitar used electronic sensing to detect which fret is being played and audio analysis to compensate for the pitch variance from bending the strings.

We combine sensors with audio by restricting the search for optimal pitch in the autocorrelation to an area around the hardware estimate (Figure ). Using the fingerboard sensing techniques presented above in combination with established pitch detection techniques (Pardue, Nian, Harte, & McPherson , Citation2014), we were able to improve from a nominal 5% detection of correct pitch within 30 cents when using the popular Yin (De Cheveigné and Kawahara , Citation2002) algorithm (128 sample, 2.9 ms window) to 57% detection rate using sensor informed autocorrelation with the same sample window. Table also demonstrates that with a 256 sample 5.8 ms window, detection of correct pitch within 30 cents benefits from a 34% improvement using augmented methods instead of Yin. Additionally, for any fingered note, all errors using augmented techniques were entirely within a whole tone of the correct pitch, eliminating octave errors that commonly occur with audio only analysis (Pardue, Nian, Harte, & McPherson , Citation2014).

Table 1. Pitch detection effectiveness for two window sizes and four pitch estimation algorithms: autocorrelation (AC), autocorrelation combined with hardware sensor estimate (AC+S), Yin, and Yin combined with hardware sensor estimate (Yin+S). Accuracy is evaluated by comparing the estimate from each 128, or 256 sample window with the pitch estimate derived using the Yin pitch estimation algorithm with a 2048 sample window which, though unsuitable for real-time use, is a typical standard for pitch detection.

Figure 4. Basic mechanics of bow and hair deformation when the bow is pressed against the string. The string pushes the hair towards the stick. With the two ends of the hair fixed, the resulting hair forms two sides of what we term the ‘displacement triangle’.

Figure 4. Basic mechanics of bow and hair deformation when the bow is pressed against the string. The string pushes the hair towards the stick. With the two ends of the hair fixed, the resulting hair forms two sides of what we term the ‘displacement triangle’.

Further, the combination of sensor data and audio analysis allows frequencies to be found that are below the theoretical minimum for a given window size. The hardware estimate is used to identify when the expected frequency is too low and will then search the autocorrelation results for the second harmonic instead of the fundamental. While raw audio analysis was essentially unusable using a 128 sample window, autocorrelation informed by hardware estimates was still able to correctly estimate within 50 cents of the pitch over 70% of the test set.

4. Real-time bow tracking

Existing bow tracking options tend to be either expensive (Maestre, Bonada, Blaauw, Perez, & Guaus , Citation2007), non-portable ( , Citation2009; Schoonderwaldt and Demoucron , Citation2009), require a custom bow (McMillen , Citation2008), or have limited accuracy across one or many bow parameters (Rasamimanana et al. , Citation2006). Accurate low-cost solutions are also rarely real-time. To combat these limitations, we have developed a new method of bow tracking using optical sensors.

Initially presented in Pardue & McPherson  (Citation2013), we apply near-field optical sensors mounted on the bow to enable high-speed tracking of bow position and pressure through measurement of the distance between the bow hair and the stick. When played, the bow hair is pressed towards the stick by the string forming the displacement triangle as shown in Figure . The location of the triangle’s apex along the stick gives the bow position while the depth of the apex gives pressure.

One implication of the displacement triangle is that, for a given bow tension, every pressure and position combination gives a unique triangle and thus a unique measurement of that triangle. Even though the hair–stick distance at one point along the bow may be the same for different combinations of pressure and position, the relationship of the hair–stick distances between more than three points will be a unique solution. Multiple optical sensors placed along the bow let us measure the hair–stick distance and through comparison of those distances, estimate which bow pressure and position creates that particular displacement triangle.

Near-field optical reflectance sensors are excellent for high resolution measurement of millimetre distances. They are small, lightweight, and require minimal circuitry, making them ideal for mounting in the limited space between the stick and the hair. They work by reflecting infrared light off a surface and releasing current in response to how much of the emitted light returns. In our case, we measure hair–stick distance at four locations along the bow. For more details on working with near-field optical sensors, particularly in the context of bow tracking, see (Pardue and McPherson , Citation2013). We have switched from a pair of analog optical sensors at each measurement location to a single Vishay VCNL8000 due to its far superior noise resistance and performance consistency. The Vishay part enables us to maintain a sample rate over 250 Hz while providing 16 bit resolution over an effective range of 1–20 mm, with optimum performance under 5 mm.

A typical challenge for tracking bow pressure based on location is variability in bow tension. Bow tension is critical for pressure estimates since it will take more pressure to displace the hair when the bow tension is high. However, the tension is not fixed; bows are stored in an un-tensioned state and manually re-tensioned for each practice or performance session. In addition, the hair and stick both respond to the environment so that tension may further change without human intervention.

Figure 5. Picture of prototype bow (left) and training setup (right): Razer Hydra and scale with mock string.

Figure 5. Picture of prototype bow (left) and training setup (right): Razer Hydra and scale with mock string.

Due to the shape of the bow, with the bow off the string, hair–stick distance is monotonically related to bow tension: hair–stick distance increases as bow tension increases. Indeed, string players typically use hair–stick distance to estimate bow tension when re-tensioning. Conveniently, using optical sensors to measure the displacement triangle means we can also easily use them to measure the off-the-string hair–stick distance and obtain means for automatic tracking of bow tension.

Adding rotated optical sensors would presumably also let us measure bow tilt, however in this work, tilt was largely ignored with sensors oriented to face the hair when bowed with an average tilt.

4.1. Data collection and processing

There are numerous non-linearities in both sensor behaviour and variations in bow flex. Therefore, rather than designing a physical model to calculate the displacement triangle, we train mathematical systems for each bow build to learn the relationships between the four sensor locations for each bow setup empirically.

It is expected that the hair–stick distance measured at each point along the bow is determined by the the tension, , the location where the string touches the bow, , and the pressure of the bow against the string, . We model the hair–stick distance function, , at a given measurement location as a continuous piecewise function with two pieces, above, and below, the point , where denotes the specific sensor on the bow. Using four sensors, we end up with the following four piecewise functions, one for each sensor, , that describe the sensed distances for a point in time :(3)

While the actual form of may be unknown, we assume that reasonable polynomial approximations for these functions exist. We derive using empirical training data from recorded bow strokes, captured on the setup shown in Figure . The bow is moved on a mock string (a short, thin wire) which rests atop a USB scale recording pressure. An electromagnetic tracker is attached to the frog of the bow to record bow transverse position. Training bow strokes cover the full bow length and a range of realistic pressures, and were accomplished using a standard bow grip. Due to the scale having a low 5 Hz sample rate, all strokes and pressure changes were slow.

Figure 6. Sample curve surface showing the expected reading for a sensor at a given tension. A sensor reading defines -axis level for the horizontal plane slicing through the curve surface with the resulting slice defining the set of possible position and pressure combinations that would result in that reading.

Figure 6. Sample curve surface showing the expected reading for a sensor at a given tension. A sensor reading defines -axis level for the horizontal plane slicing through the curve surface with the resulting slice defining the set of possible position and pressure combinations that would result in that reading.

Once training sets have been collected, a bow tension metric for each set is calculated. Tension has an important effect on the sensor readings and therefore on the equations for calculating bow position; however, the actual tension measurement in Newtons is never needed, so we calculate a simple tension metric as the average of the hair–stick distance of all four sensors, sampled when the bow is off the string and horizontal. It is expected that bow tension will remain stable through any moderate length performance, an assumption backed up by results from the training sets. There was typically less than 0.05 mm difference between the tension metric at the start and end of each set.

During training, bow transverse location is determined by finding the location of the attached electromagnetic position sensor with the test string at the frog and then at the tip. Provided each training stroke uses the full bow length, the distance between the tip and frog measurements should always be the bow’s length. This allows the frog–tip vector to be used to rotate results into a single dimension, assuming that the bow always moves along a straight vector. This assumption, sensor noise, and human error resulted in the expectation of as much as 5% error in the position training data as determined by comparing electromagnetic position estimate with known fixed locations on the bow. The scale data is also shifted in time slightly to minimize the effects of the low sample rate and latency of the scale. Additionally, the mock string and slow bow speed used in training strokes (required for the slow digital scales) result in excessive bouncing of the bow on the test string. This bouncing is reflected in the optical sensor measurements, but is not captured in the slower scale measurement readings, further introducing training measurement error.

Guaus, Bonada, Maestre, Pérez, & Blaauw (Citation2009) describes an effective bow pressure calibration technique building on Schoonderwaldt and Demoucron (Citation2009) that uses similar principles with more accurate sensing technologies. However, in keeping with our low cost target, we intentionally limited ourselves to simple, cheap, and portable off-the shelf technologies accepting that this will reduce the accuracy of our result. We also found that for our calibration tools, the continuous bowed strokes were both easier to accomplish, and, by spanning the entire bow to produce far more data points, yielded better results than obtaining sensor estimates at a pre-defined set of combined locations and pressures.

Figure 7. Distance between expected hair–stick distance and actual momentary hair–stick distance for all reasonable pairs for each sensor (top four, clockwise from upper left: frog, lower bow, upper bow, tip) the sensor sets combined (bottom left). The minimal distance metric for , is chosen as the expected bow location and bow pressure , shown vs the actual measured position and force (bottom right).

Figure 7. Distance between expected hair–stick distance and actual momentary hair–stick distance for all reasonable pairs for each sensor (top four, clockwise from upper left: frog, lower bow, upper bow, tip) the sensor sets combined (bottom left). The minimal distance metric for , is chosen as the expected bow location and bow pressure , shown vs the actual measured position and force (bottom right).

Now we can do the actual curve estimation for the functions given in Equation 3. We use polyfitn,Footnote3 a Matlab package by John D’Errico, to perform three-dimensional linear regression. The sample size was largely determined by the need to run enough sample sets to avoid overfitting in the tension dimension (Bishop & Nasrabadi , Citation2006) and, provided a reasonable range of pressure and position values were visited each tension, led by default to a reasonable fit in other dimensions. For this paper, 20 training sets were produced, each running 2–3 min using the full bow length with downward forces ranging 0.15–4.0 N, producing over 130,000 data points. As tension, , was treated as constant for each set, with 20 points of comparison, the polynomial fit order of was restricted to the second or third degree to reduce the risk of overfitting along that dimension.

Evaluation of the curve of best fit was done through cross-validation using a one-set-out approach. Initial experiments suggested that overfitting became observable with fifth-order or higher polynomials. As a result, polynomial orders above 8 were not evaluated. For each of the eight polynomial equations in Equation 3, the optimal order was considered independently and chosen by finding the order that yielded the lowest average root-mean-square error (RMSE) across all test sets. Any discontinuity within between polynomials and is smoothed by the use of a Gaussian weighted average of the two centred around the sensor location.

4.2. Using curves to estimate bow pressure and position

Having found the polynomial curves to calculate each expected optical reading for given tension, position and pressure, we can estimate the expected set of hair–stick distances for any real-world bow state. Of course, outside of the test environment, these hair–stick distances are the only measurements we have to infer information during a performance. Again, the bow tension metric is the average of hair–stick distances with the bow off the string. We assume is fixed and determine for a session by measuring the hair–stick distances with the bow off the string just prior to playing.

Given the tension, , for the session , we generate theexpected measurement curve surface, a two-dimensionalmatrix—with position along the -dimension, and pressure along the -dimension—for each pair for each sensor. Figure provides an example of the expected resulting surface for a given sensor and tension. Assuming ideal curve fits, a given hair–stick distance measurement will determine the slice of the curve set which is the possible pairs that would produce that result. Due to expected noise, we assume the actual pair should be near, but not necessarily on the ideal slice.

During performance, we compute the absolute difference between the measured result and each sensor’s matrix of expected possible results. The estimate difference is scaled by the quality of the corresponding curve fit, as given by inverse of the fit’s RMSE which was calculated while determining best polynomial:(4)

The RMSE also implicitly reflects the overall variation of a given sensor’s measurement range so this scaling also balances the different sensor’s contributions. This scaled difference between expected and measured is then summed across all sets. As illustrated in Figure , with fixed, the pair for which the expected distances across all four sensor locations provides the lowest summed difference is theoretically the actual momentary location and pressure:(5)

Accumulated measurement and regression error mean that, while theory and reality often match, sometimes, they do not.

4.3. Utilizing expected time series continuity

Effectiveness of bow position and force estimation was evaluated by cross-validation using the same one-set-out method used for curve selection. The average RMSE for the estimated versus actual normalized bow position was 0.165 AU (Arbitrary UnitsFootnote4) and the average RMSE for the estimated versus actual force is 0.296 N. We can improve results by noting that bow motion and downward force should be continuous, i.e. any estimate should neighbour the previous estimate.

Estimates are weighted using a Gaussian with the mean, , centred on the previous estimate so that neighbouring points are more likely to be selected as the optimum. This approach has the drawback that when wrong, the estimate may get stuck in local minima. This issue is partially alleviated by changing the weighting distribution based on the likelihood of the last estimate’s correctness. If the estimate is believed to be incorrect, the weighting is flattened by altering the Gaussian’s deviation , so that a distant point may be considered equally likely. Confidence (using the colloquial meaning of the term) is initially assessed using the distance, (Equation 4). With ideal data and ideal equations describing the relationship between sensor reading, position, force and tension, would be zero. The non-zero difference, , that does occur is attributable to real-world error and becomes a useful estimate for correctness.

Experimental results suggest some additional trends in error that do not correlate with the basic confidence factor, . First, if estimates suggest the bow is not moving, although it could reflect actual performance, it is more often evidence of the nearest neighbour requirement causing the estimate to get stuck. In this case the confidence is decreased, to reduce the effect of nearest neighbour weighting allowing the algorithm to jump to a better but more distant bow estimate. Second, the position and force estimation algorithm is most likely to be wrong and get stuck at lower forces, often resulting in the estimate for position at the extreme tip or frog. As a result, we decrease the confidence for estimates with low pressure or at the extrema of the bow. We have also tried using an unscented Kalman filter to optimize results and balance measurement error with state, but it did not outperform the flexible weighting design.

4.4. Estimation effectiveness

Weighting to select nearest neighbours decreased the RMSE for the estimated versus actual normalized bow position from 0.165 to 0.121 AU and the average RMSE for the estimated versus actual force from 0.296 to 0.260 N. Generally theweighted neighbour restriction significantly improved results when the force is above 0.6 N. In fact, within the training set, for forces that measured above 0.6 N, the weight of a standard violin bow (61 g), the RMSE for the normalized position estimate, 0.063 AU, suggests an expected error of 6%, which is only marginally above the expected 5% accuracy of the electro-magnetically measured bow transverse position. Similarly, force estimates tend to be worse near the frog and excluding the bottom 10% of the bow, reduces the force estimate RMSE to 0.190 N. Sample results are shown in Figure .

4.5. Further experiments and discussion

We experimented both with sensor placement and varying the number of sensors. The results presented in this paper were generated from sensors placed at 45, 151, 485 and 624 mm from the frog. The bow hair is 650 mm long. The positions targeted measuring at bow extrema and equal distribution along the length of the bow while not interfering with bow use.

Figure 8. Raw sensor readings, bow position and downward force estimates from a test set with measured position and measured force. The sensor readings are taken at the four sensor locations on the bow and are used to derive from the hair–stick distance measurements in the top graph.

Figure 8. Raw sensor readings, bow position and downward force estimates from a test set with measured position and measured force. The sensor readings are taken at the four sensor locations on the bow and are used to derive from the hair–stick distance measurements in the top graph.

Training was done with two further sensors at 530 and 581 mm from the frog. Using all six sensors did improve results but not dramatically. Based on a subset of the full training set, using six sensors resulted in a position estimate RMSE of 0.114 AU compared to 0.121 AU and a force estimate RMSE of 0.256 N compared to 0.260 N. We felt that the additional latency required to poll six sensors outweighed the 5% performance improvement. We also experimented with using either of the two added sensors instead of the sensors at 624 or 485 mm, or using only three sensors but found these both reduced accuracy with little or no added benefit.

Crucially, the accuracy of the bow estimates is highly dependent on physically stable sensors; even a slight change in sensor angle or position can dramatically impact estimate accuracy. Because tension, used to calculate curve estimations, is derived from a combination of all four sensors, error from a single sensor propagates to the estimates from all sensors. If the original position can not be repeated, it requires retraining the bow.

Having previously spent significant time working with flexible circuits we switched to slim 0.8 mm circuit boards and experimented with means of reliably and safely attaching them to the bow. Results in this paper used the slim circuit boards attached using a combination of double sided tape and instrument putty. Though susceptible to movement, the putty is reasonably easy to readjust if a sensor moves and is also suitable for expensive instruments.

The bow used in the same eight-person initial user tests as in Section 3.1 used balsa wood supports (lighter and more stable than putty) at the tip and frog and, as the bows we are primarily working with are carbon fibre, we use hot glue and double sided tape to fix the sensors. The various iterations of bow build have weighed in between 73 and 76 g, 13–16 g heavier than the unaugmented violin bow (59–61 g) but only slightly more than a viola bow (68–72 g). Improvements in the manufacturing process would allow us to reduce the weight and improve feel. For instance, we are using unnecessarily heavy wire to connect to the sensors which could be better replaced with lightweight small connectors.

We intentionally opted to cable the bow rather than use wireless connections as the addition of batteries to support wireless systems inevitably impacts balance. Still, the cabled approach requires supporting the cable. The player wears three straps, two at the wrist to support the cable near the bow, and one farther up the arm so that the cable doesn’t flap with fast hand gestures. To make it convenient to put the bow down, we use easily disconnected magnetic connectors at the wrist. The two wrist straps sit either side of the connectors, with the bow-side strap providing optional velcro support between the bow and the bow-side connector.

In user tests, users rated the bow heavy but acceptable. Only one player stated the bow had altered balance with all others saying the augmentations only had minimal effect on performance technique. The cable was deemed more intrusive, although easily improved by increasing length between the bow and the first connector. The bow has also been used by the first author in public performance of pieces that are technically demanding for the bow hand without adverse affects.

It is worth noting that with the sensor positioning in this paper, it is possible for the string to catch the sensor placed 151 mm from the frog. Likelihood of catching the string is directly affected by bow tension, and how much the player tilts the bow. Manufacturing improvements should resolve this issue in future.

5. Case study: Note onset using sensors

Note onset detection is useful for a variety of tasks such as automated transcription (Benetos , Citation2012), score-following (Cont , Citation2008), and performance analysis (Grosshauser et al. , Citation2012). Though audio-based note onset is fairly mature for percussive instruments, it remains an open challenge for many non-percussive instruments (Böck, Krebs, & Schedl , Citation2012b). Real-time note onset detection provides a useful demonstration of our real-time violin tracking methods since it requires recognizing both bowed and pitched events.

5.1. Non-percussive note onset detection

For instruments with clear attack transients like piano and percussion, audio note onset is considered largely solved (Collins , Citation2005a). However the magnitude-based methods used for percussive instruments do not work as well for instruments with slow or subtle onsets such as voice, wind instruments and bowed string instruments. Dixon (Citation2006) and Collins (Citation2005a) provide overviews of various approaches to onset detection.

Methods for non-percussive note onset detection commonly focus on differences in the spectral energy or frequency difference between windows (Collins , Citation2005b), phase differences that might suggest a new waveform, evaluation of differences in Mel band as a more psycho-acoustically relevant technique (Klapuri , Citation1999), and more complex means that blend multiple techniques to improve results. Two problems for non-percussive note onset detection are latency and accuracy. Many of these onset algorithms are non-causal, though recurrent neural networks have showed promise for real-time detection (Böck, Arzt, Krebs, & Schedl , Citation2012a). Real-time non-percussive onset detection typically uses spectral analysis (Collins , Citation2005a). Not only are the minimal sample windows required for computing accurate spectral content typically over 10 ms, but the need to compare changes across successive windows will also further push latency well above the target of 10 ms for real-time performance systems ( , Citation1997). A recent comparison of real-time onset detectors requires that onsets are detected within 25 ms of their ground truth labels, but it uses windows of 46 ms (2048 samples) with 10 ms between hops (Böck, Krebs, & Schedl , Citation2012b), implying a minimum average latency of 33 ms.

Accuracy also remains below usable standards for controlling sounds in a live performance. Spectral methods can be confused by vibrato, trills and other expressive devices. Slow pitch changes can lead to both missed onsets and false positives, while vibrato typically leads to a significant increase in error. OnsetDetector from Stowell and Plumbley (Citation2007) is one of the better tools for bowed onset in monophonic pitched contexts, achieving around 70% F-measure (defined as , where is the precision, the percentage of true positives, and is the recall). Böck’s SuperFlux algorithm was significantly improved through adding vibrato suppression (Böck & Widmer , Citation2013). With a 25 ms window, it achieved an F-measure of around 75% with 83% precision and recall around 69% of actual onsets found.

5.2. Mechanics of violin note-onset

On the violin, two things are required for a bowed note to sound: bow velocity and string contact. New notes come mainly from change of bow direction, change of string (correlated with bow angle to the ground), or a change in left-hand finger placement. Sensors can capture each of these actions before the sound is produced, allowing early identification of note onsets. String changes may be challenging to identify quickly from bow position and pressure alone, but audio analysis may be able to fill in this gap. Audio analysis may also help identify false positives or even missed detections in sensor data.

We can consider five types of note onset:

(1)

Off-string attack (OSA): bow is placed on the string and moves

(2)

Bow change (BC): bow is already in contact with the string but the bow changes direction

(3)

Finger change (FC): pitch change of at least one semitone through left-hand finger change

(4)

String change (SC): pitch change through change of string the bow is playing

(5)

Slurred repetition (SR): delineation of new notes through accent accomplished by abrupt change in bow pressure and/or velocity.

While only one of these actions is necessary to define a new note, they often occur in conjunction. Identifying these actions requires fingerboard tracking and bow tracking. Here, we consider four of the five note onset cases separately (excluding string changes) and assess the ability of the augmented violin to detect note onset in real time. Informal tests suggest that string change can be detected with a gyroscope on the frog of the bow or a bridge which provides separate pickups for each string, but this is beyond the scope of this paper.

5.2.1. Off-string attack

This category of note onsets includes any situation in which the bow first makes contact with the string when having previously been in the air. It includes setting the bow firmly on the string and then moving it (typical for accents), initial contact with string while the bow is already in motion, and bouncing the bow on the string (spiccato).

Optical sensors on the bow easily detect whether the bow is on the string, and thus can detect whether an off-string attack may be happening. Contact with the string causes the average hair–stick distance measurement across the four sensors to deviate from the off-the-string measurement (which is also used to estimate bow tension). Placing the bow on the string will typically result in at least a 10% increase in the instantaneous average while all but the most aggressive waving of the bow in the air will result in only a 3–4% deviation in the raw tension.Footnote5

We use the pressure estimate from Section 4 to detect bow–string contact; this estimate is less dependent on the location of the contact along the bow than raw sensor readings. We chose 60 g (0.59 N), corresponding to 15% above off-string sensor readings and slightly less than the weight of the bow, as the threshold for bow on the string; we chose 45 g (0.44 N) as the threshold for the bow coming off the string. Figure provides an example application of the off-string onset detection algorithm.

To detect bow motion, we simply take the difference between successive bow locations. This imposes an implicit latency of 4–8 ms in detecting off-string attacks. Lastly, in order to avoid false triggers around the same event, no off-string note onset is considered within 100 ms of the previous onset. This corresponds to a realistic maximum rate of note production for almost all performance scenarios.

5.2.2. Bow change

Bow changes are variations in direction of movement while the bow remains on the string. These appear as local maxima and minima of bow position over time (see Figure , Section 5.4.2). Finding these extrema in real-time is more difficult; extrema can only be identified retrospectively, and noise in the position estimates imposes a trade-off between latency and accuracy depending on the number of successive frames examined.

Figure 9. Pseudo-code describing the algorithm for finding the transition from a down bow to an up bow. The transition from up to down bow looks for a minimum instead of maximum.

Figure 9. Pseudo-code describing the algorithm for finding the transition from a down bow to an up bow. The transition from up to down bow looks for a minimum instead of maximum.

Bow changes can only be detected when the bow is known to be on the string, i.e. when an off-string attack has previously been detected. Upon detection of an off-string attack, the initial stroke is identified as up bow or down bow. Changing from up bow to down bow implies a local minimum in position; changing from down to up implies a local maximum.

Figure shows the bow change detection procedure. Position data is filtered to remove high-frequency noise, and filtered bow position is then examined within a sliding historical window of 15 samples. To find a down-to-up change, we look for the first instance where the most recent position sample is below the earliest sample in the window. The bow change is then identified as the location of the local maximum in that window. A similar procedure is used for up-to-down change. To reduce false positives due to the position estimate getting stuck (Section 4.3), the window must include a clear change in bow position with motion passing a minimum threshold. The start of the new bow stroke must also be in a physically plausible location compared to the start of the previous bow: for example, when changing from up bow to down bow, the down bow must start at a position closer to the frog than where the up bow started.

A shorter time window will reduce latency of identification at the cost of greater noise susceptibility. We also reduce noise susceptibility by requiring several successive samples to confirm the direction change, though this adds further latency. We chose to require five successive confirmations of the direction change. For symmetrical alternating bow strokes, the extremum will fall roughly in the middle of the window, introducing  ms of total expected latency between the physical bow change and labelling. The latency is offset by the fact that the bow change will precede the audio onset since the string must be re-excited. Bow change latency is thus still above our targets, but further reductions in sensor noise will allow tighter windows and lower latency. No two bow changes can be detected within 125 ms of each other.

5.2.3. Finger changes

Note onset from pitch change is caused by either string changes (not considered here) or change in left-hand finger placement. To detect the latter, we use hardware pitch estimates from the fingerboard sensor (Section 3). Position readings from the finger closest to the bridge are converted to a frequency estimate using Equation 2. For onset detection, pitch accuracy is less important than detecting changes, so we do not need the assisted autocorrelation technique.

The frequency estimate is linearized by converting it to the number of fractional semitones. It is then high-pass filtered for frequencies above 1 Hz to emphasize instantaneous transitions. Because the resulting filtered equation may decay but not fully return to zero between onsets, we take a high order differential, comparing with 40 ms previously and looking for transitions that pass a minimum threshold of of the instantaneous change expected from a semitone. The lower threshold allows transitions which do not change instantaneously between successive samples. If the sign of the filtered signal matches the sign of the transition, it is declared an onset event. This last restriction avoids false positives due to noise in the decay of the signal after filtering. No new pitch onset is considered for 100 ms after the preceding one. Figure provides a sample of data used for finger change detection both raw, and processed.

Figure 10. Hardware data is used to determine pitch estimates for each string (top) and then run through a high pass filter to emphasize pitch changes (bottom). We label a pitch based note onset (blue vertical lines) when the function exceeds the expected difference due to a half step above/below the prior value.

Figure 10. Hardware data is used to determine pitch estimates for each string (top) and then run through a high pass filter to emphasize pitch changes (bottom). We label a pitch based note onset (blue vertical lines) when the function exceeds the expected difference due to a half step above/below the prior value.

Figure 11. Bowing styles used for testing: (a) slurred legato, (b) legato with repeated notes, (c) spiccato. Two octave G-major scales were used in the testing, of which the lower octave is shown here.

Figure 11. Bowing styles used for testing: (a) slurred legato, (b) legato with repeated notes, (c) spiccato. Two octave G-major scales were used in the testing, of which the lower octave is shown here.

Using the high pass filter, false positives due to vibrato will be eliminated as the pitch change is not fast enough to pass through the filter, nor is the pitch variation significant enough to exceed the threshold. On the other hand, note onsets from slow, continuous glissandi are unlikely to be detected, and we did not design for handling trills. Both glissandi and trills should be detectable with the sensor data, but this is left for future research.

5.2.4. Slurred repetition

Slurred repetition is when a note is repeated without changing the bow. It is typically accomplished by momentarily reducing pressure and slowing the bow before quickly resuming a higher speed and downward force. It is possible to repeat a note with only pressure or speed, but more commonly the two change together. Focusing only on slurred repetitions which include a pressure change, we can build on the off-string detection algorithm which already identifies drops in pressure.

5.3. Testing

Onset detection was tested using three performance cases. The first consisted of three 2-octave G major scales, each with a different bowing style (Figure ): the first scale was played legato with two notes to the bow; the second also contained two notes per bow but repeated each pitch across the bow change; the third was played with separate bows, with an off-string spiccato stroke on the ascending scale and a more on-string staccato or martelé on the descending scale. The scales cover each onset case except slurred repetition.

The second case comes from the entirety of the Schubert Lullaby from Book 4 of the Suzuki Violin Method (2009 revised edition). This piece includes slower material and longer slurs. The third case was the first 20 bars of the Seitz Student Violin Concerto No. 5, op. 22. It includes a spiccato section, a variety of on-the-string transitions including slurred repetition, and faster inter-onset intervals, with notes occurring less than 150 ms apart several times. Each of the three cases used a different bow tension, which was measured prior to the selection’s start. The total number of different types of onset is given in Table .

Table 2. Number of different kinds of note onsets (on-stringattack, bow change, finger change, string change, and string re-attack) in the three sample pieces. Because multiple onset actions can coincide, the sum of the number of onsets per type does not match the total number of notes.

Figure 12. Off-string attack classification including spiccato strokes. Force estimates must be above 0.59 N (60 g) following a period of the bow being off the string and bow movement must also be visible. Small up and down spiccato strokes following note onset are visible. Momentary spikes in pressure are due to estimate error at low applied force. Evidence of grabbing the string can be seen later in the sample at 85.87 and 87.31 s.

Figure 12. Off-string attack classification including spiccato strokes. Force estimates must be above 0.59 N (60 g) following a period of the bow being off the string and bow movement must also be visible. Small up and down spiccato strokes following note onset are visible. Momentary spikes in pressure are due to estimate error at low applied force. Evidence of grabbing the string can be seen later in the sample at 85.87 and 87.31 s.

The results presented below were not calculated in real time, but the algorithms were all explicitly written to be real-time capable and causal. Audio from the violin was recorded at 44.1 kHz directly to the computer, and sensor data was logged. Audio and sensor data were synchronized by capturing a low sample rate version of the audio on the embedded sensor ADC, and then comparing audio markers at the beginning and end of each performance. We found that based on relative marker locations, audio and sensor streams are expected to remain within one sensor sample (4 ms) of one another when run in real-time. Ground truth note onsets were manually labelled using a combination of visual assessment of audio signal, spectral analysis and auditory judgment in ambiguous cases.

5.4. Results

Results were calculated separately for each note onset type. They are classified as clear false positives (FP), clear false negatives (FN), and whether they are identified within 75, 50, 25 or 10 ms of the labelled onset time or precede the label by more than 25 or 50 ms. Anything below 10 ms matches our real-time target while anything within 25 ms is more comparable to standard note-onset metrics. As we are often detecting the physical actions that lead to sound production, we would frequently expect results to precede the labelled note onset.

Table 3. Number of false positive FP and false negative FN off-string attacks along with correct detections found within  ms of the hand labelled note onset. A negative time means the attack was detected prior to the label.

 

Table 4. Number of false positive and false negative detected bow changes and correctly detected bow changes found within  ms of the hand labelled note onset. A negative time means the attack was detected prior to the label.

Figure 13. Bow based note onset classifications for complete Schubert Lullaby. Estimated bow changes are marked where the algorithm places the change, not when it is detected. The position estimate gets ‘stuck’ around 53, 79, 81 and 86 s.

Figure 13. Bow based note onset classifications for complete Schubert Lullaby. Estimated bow changes are marked where the algorithm places the change, not when it is detected. The position estimate gets ‘stuck’ around 53, 79, 81 and 86 s.

Table 5. Number of false positive and false negative detected fingered pitch changes and correctly detected fingered pitch changes found within  ms of the hand labelled note onset. A negative time means the attack was detected prior to the label.

Figure 14. Bow based note onset classifications for opening phrase of Seitz Concerto. Despite bow position estimate errors shortly before, slurred re-attack is easily detectable at 47.24 s. An accented note is identifiable at 45.59 s by the drop in down force, including the lift immediately required to re-attack the string. It is also clear that the tension metric derived prior to play for this session is too low as the clear drop in pressure after 51 s suggests the bow is no longer on the string but the force estimate remains around 0.2 N rather than 0 N.

Figure 14. Bow based note onset classifications for opening phrase of Seitz Concerto. Despite bow position estimate errors shortly before, slurred re-attack is easily detectable at 47.24 s. An accented note is identifiable at 45.59 s by the drop in down force, including the lift immediately required to re-attack the string. It is also clear that the tension metric derived prior to play for this session is too low as the clear drop in pressure after 51 s suggests the bow is no longer on the string but the force estimate remains around 0.2 N rather than 0 N.

5.4.1. Note onset through off-string attack

The effectiveness of note onset detection in the case of an off-string attack is given in Table and illustrated by the sample in Figure . A positive result of the off-string note onset detection is that there were only four clear false positives and one false negative. 86% of onsets are detected within 25 ms and 68% within 10 ms. 24% of onsets were found more than 25 ms in advance, possibly due to the length of time it takes for the string to vibrate and the violin to resonate. Some particularly early detections during spiccato and staccato strokes were likely due to the bow being set lightly on the string in advance of the stroke in order to ‘grab’ it and then holding the bow in place prior to pulling on the string.

5.4.2. Note onset through bow change

Results for note onset detection through bow change are given in Table . Due to the impact of noise on real-time recognition, detecting bow change in real-time is poor. Although bow direction changes are easy to recognize over longer periods of time, when looking for extrema using relatively small windows in time, false positives frequently occur. If the bow position has any significant instability, it is incorrectly recognized as a hit. Worse, as the present algorithm expects the bow to alternate directions, false positives always come in pairs.

As expected, performance was also slow. Figure illustrates classification of bow changes. Altering the detection algorithm to reduce false positives comes at the cost of higher onset detection latency. The effect of the inherent latency due to the real-time constraint is apparent in that only 26% were detected by 25 ms after the hand label. Further, removing the real-time constraint and expanding both the search window for extrema to 400 ms and the thresholds to declare an event, it is possible to rival audio note onset techniques; for the scale test sample, we eliminate all false positives and false negatives with 89% of bow changes placed no later than 10 ms after the hand label. These bow measurements are clearly useful, however improvements in estimate noise will have to be made in order to use optical bow tracking for real-time bow change detection.

For real-time detection of bow changes, measuring inertial movement through inertial sensors (IMU) may be a better option. The use of an IMU for classifying bow strokes is well established (Rasamimanana et al. , Citation2006; Young & Deshmane , Citation2007) and has in fact been incorporated into our violin tracking hardware, but it is not included in this paper as we focus on the optical approach here.

5.4.3. Note onset through finger change

Results for note onset detection through bow change are given in Table . Real-time note onset detection using pitch change performs well with an overall F-measure of 91%. False positives and negatives were almost entirely due to lost contacts between the sensor conductive layers with occasional additional false negatives due to the player pressing the finger down gradually. Further refinements in sensor build may solve the minor remaining issues. 63% of notes are within 25 ms and 56% meet our target of 10 ms. Again, there are a high number of predictive onsets. The number of predictive onset labels is due in large part to off-the-string notes. In these cases, fingers are typically put down on the finger board prior to the bow. Notes with an off-string attack account for almost of onset detections more than 50 ms prior to the expected onset. In these cases, off-string attack should be preferred over pitch change because that corresponds better to sound production.

It was also possible to see a difference in onset detection timings between putting fingers down on the string versus lifting them up. The average difference in timing between hand labels and algorithm labels when adding fingers up the fingerboard is 30 ms whereas removing fingers, the average difference was  ms. This suggests that it takes 25–30 ms to firmly contact the string. Note onsets with higher delays tended to fall into two cases: change of string allowing for late release of the finger on the previous string, or poor finger contact. Pitch changes with poor finger contact typically produced a period of time in which no clear pitch was evident in the spectral domain analysis. In these cases, note onset detection tended to coincide with the point when the audio pitch became clean.

5.4.4. Note onset through slurred repetition

Our tests only included two examples similar to slurred repetitions, these being re-attacks without changing bow direction, both presented in Figure . In both examples, the pressure reduction prior to re-attack caused the down force to go below the off-string attack algorithm’s 0.44 N limit for the bow remaining on string meaning both slurred repetitions were classified as an off-string attack even though the bow may not technically have left the string. With only two sample cases, one detected within 35 ms of the hand-labelled onset, and one labelled within 4 ms of hand-labelled onset, it is not possible to draw robust conclusions, but onset detection techniques have so far demonstrated the ability to detect this often challenging case.

6. Conclusion

We have demonstrated a new low-cost, non-intrusive set of sensors for real-time violin performance tracking. A linear position sensor on the fingerboard tracks left-hand fingerings that determine pitch. Rough pitch estimates from the sensors can be used to inform audio-based pitch tracking methods to achieve both higher accuracy and lower latency than audio-only approaches. Our bow tracking system uses four optical reflectance sensors which enable real-time estimates for whether the bow is on or off the string, the downward force applied, and the location of the point of contact between the bow hair and the string.

As a case study, we have used our system to detect note onsets in three short performances. We have used sensor inputs to detect three different types of note onset in real time: off-string attacks, on-string bow changes and pitch changes. Off-string attack detection is effective with 81.5% of events detected no later than 25 ms after the hand-labelled event and an overall F-measure of 0.96 for detection. Onsets due to pitch change were found 63.4% of the time at least 25 ms after the hand label with an overall F-measure of 0.91. Note onset through bow change is less effective with only 11.3% detected no later than 25 ms after the label, however when real-time detection constraints are dropped, bow changes are placed no later than 25 ms after the label 90.0% of the time. A fourth type of note onset, slurred repetition, can also be detected from the sensors, though it rarely appeared in the musical excerpts.

Unlike motion capture systems, the sensor technologies in this paper do not place any requirements on the surrounding space or the movement of the player. They can be added to any violin, with applications to teaching and augmented violin performance. The combination of sensor and audio data enables faster, higher-accuracy real-time performance tracking than either modality alone.

Acknowledgements

Thanks to the Electronics Lab staff and the Centre for Digital Music at Queen Mary University of London.

Additional information

Funding

Andrew McPherson was partly supported by the Engineering and Physical Sciences Research Council under grant EP/K032046/1.

Notes

1 Violin or Guitar: Which is Easier? A debate originally broadcast on 16 September 2014 for the BBC Radio 4 Today show with guitarist John Etheridge.

4 As used in this paper, units are not really arbitrary as 1 AU is equivalent to the length of the bow hair, 650 mm. However normalized bow position more usefully aligns with how violinists describe bow position.

5 These percentages will vary depending on the bow tightness and contact location. Raw sensor readings are inversely related to distance.

References

  • Askenfelt, A. (1986). Measurement of bow motion and bow force in violin playing. Journal of the Acoustical Society of America, 80, 1007.
  • Askenfelt, A. (1989). Measurement of the bowing parameters in violin playing. II: Bow-bridge distance, dynamic range, and limits of bow force. Journal of the Acoustical Society of America, 86, 503.
  • Benetos, E. (2012). Automatic transcription of polyphonic music exploiting temporal evolution (PhD thesis), Queen Mary University of London.
  • Bevilacqua, F., Rasamimanana, N., Fléty, E., Lemouton, S., & Baschet, F. (2006). The augmented violin project: research, composition and performance report. Proceedings of the 2006 International Conference on New Interfaces for Musical Expression (NIME06), Paris, France. (pp. 402–406). Available online: http://www.nime.org/proceedings/2006/nime2006_402.pdf.
  • Bishop, C.M., & Nasrabadi, N.M. (2006). Pattern recognition and machine learning, Vol. 1. New York: Springer.
  • Böck, S., Arzt, A., Krebs, F., & Schedl, M. (2012a). Online realtime onset detection with recurrent neural networks. In Proceedings of the 15th International Conference on Digital Audio Effects (DAFx-12), York, UK. http://www.dafx12.york.ac.uk/papers/dafx12submission_4.pdf
  • Böck, S., Krebs, F. & Schedl, M. (2012b). Evaluating the online capabilities of onset detection methods. Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR 2012) (pp. 49–54). International Canada: Society for Music Information Retrieval.
  • Böck, S., & Widmer, G. (2013). Maximum filter vibrato suppression for onset detection. In Proceedings of the 16th International Conference on Digital Audio Effects (DAFx-13), Maynooth, Ireland. http://dafx13.nuim.ie/papers/09.dafx2013_submission_12.pdf
  • Buxton, W., & Dannenberg, R. (1986). The computer as musical accompanist. ACM SIGCHI Bulletin, 17, 41–43.
  • Collins, N. (2005a). A comparison of sound onset detection algorithms with emphasis on psycho-acoustically motivated detection functions. In Audio Engineering Society Convention 118 (pp. 28–31). New York: Audio Engineering Society.
  • Collins, N. (2005b). Using a pitch detector for onset detection. In Proceedings of 6th International Conference on Music Information Retrieval (ISMIR 2005) (pp. 100–106). London: Queen Mary, University of London.
  • Cont, A. (2008, August). ANTESCOFO: Anticipatory synchronization and control of interactive parameters in computer music. In International Computer Music Conference (ICMC), Belfast, Ireland. pp. 33–40.
  • De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. Journal of the Acoustical Society of America, 111, 1917.
  • Demoucron, M. (2008). On the control of virtual violins-Physical modelling and control of bowed string instruments (PhD thesis), Université Pierre et Marie Curie-Paris VI, Paris, France.
  • Dixon, S. (2006). Onset detection revisited. Proceedings of the 9th International Conference on Digital Audio Effects, (DAFx’06) Montreal, Canada. pp. 133–137.
  • Freed, A., Chaudhary, A., & Davila, B. (1997). Operating systems latency measurement and analysis for sound synthesis and processing applications. In Proceedings of the 1997 International Computer Music Conference (pp. 479–481) Thessaloniki, Greece.
  • Freed, A., Uitti, F.M., Mansfield, S., & MacCallum, J. (2013). “Old” is the new “New”: A fingerboard case study in recrudescence as a NIME development strategy. In Proceedings of NIME (pp. 442–445). Available online: http://www.nime.org/proceedings/2012/nime2012_256.pdf.
  • Grosshauser, T., Candia, V., Hildebrandt, H., & Tröster, G. (2012). Sensor based measurements of musicians’ synchronization issues. In Proceedings of NIME (6 pp.). Available online: http://nime.org/proceedings/2013/nime2013_286.pdf.
  • Grosshauser, T., Feese, S., & Tröster, G. (2013). Capacitive left hand finger and bow sensors for synchronization and rhythmical regularity analysis in string ensembles. In Proceedings of the Sound and Music Computing Conference 2013, SMC 2013, Stockholm, Sweden (pp. 438–442). Berlin: Logos Verlag.
  • Grosshauser, T., & Tröster, G. (2013). Finger position and pressure sensing techniques for stringed and keyboard instruments. In Proceedings of 13th International Conference on New Interfaces for Musical Expression (NIME13) (6 pp.). Available online: http://nime.org/proceedings/2013/nime2013_286.pdf.
  • Guaus, E., Bonada, J., Maestre, E., Pérez, A., & Blaauw, M. (2009). Calibration method to measure accurate bow force for real violin performances. In (2009) International Computer Music Conference (ICMC), Montreal, Canada.
  • Kapur, A., Lazier, A.J., Davidson, P., Wilson, R.S., & Cook, P.R. (2004). The electronic sitar controller. Proceedings of NIME (pp. 7–12). Available online:http://www.nime.org/proceedings/2004/nime2004_007.pdf.
  • Klapuri, A. (1999). Sound onset detection by applying psychoacoustic knowledge. In Proceedings of ICASSP (Vol. 6, pp. 3089–3092).
  • Machover, T. (1992). Hyperinstruments: A progress report, 1987–1991. Cambridge, MA: MIT Media Laboratory.
  • Maestre, E., Bonada, J., Blaauw, M., Perez, A., & Guaus, E. (2007). Acquisition of violin instrumental gestures using a commercial EMF tracking device. Proceedings of International Computer Music Conference (ICMC), Copenhagen, Denmark (Vol. 1, pp. 386–393).
  • Mathews, M.V., & Kohut, J. (1973). Electronic simulation of violin resonances. Journal of the Acoustical Society of America, 53, 1620.
  • McMillen, K.A. (2008). Stage-worthy sensor bows for stringed instruments. In Proceedings of NIME (pp. 347–348). Available online: http://www.nime.org/proceedings/2008/nime2008_347.pdf.
  • Overholt, D. (2005). The Overtone Violin: A new computer music instrument. In (2005) Proceedings of International Computer Music Conference, Barcelona, Spain (pp. 604–607). Available online: http://www.nime.org/proceedings/2005/nime2005_034.pdf.
  • Overholt, D. (2011). The Overtone Fiddle: An actuated acoustic instrument. In (2011) Proceedings of New Instruments for Musical Expression, Oslo, Sweden. Available online: http://www.nime.org/proceedings/2011/nime2011_004.pdf
  • Overholt, D. (2012). Violin-related HCI: A taxonomy elicited by the musical interface technology design space. Arts and technology (pp. 80–89). Berlin: Springer.
  • Paradiso, J. & Gershenfeld, N. (1997). Musical applications of electric field sensing. Computer Music Journal, 21(2), 69–89.
  • Pardue, L.S. & McPherson, A.P. (2013). Near-field optical reflective sensing for bow tracking. In Proceedings of NIME (pp. 363–368). Available online: http://nime.org/proceedings/2013/nime2013_247.pdf.
  • Pardue, L.S., Nian, D., Harte, C., & McPherson, A.P. (2014). Low-latency audio pitch tracking: a multi-modal sensor-assisted approach. Proceedings of NIME (pp. 54–59). Available online: http://www.nime.org/proceedings/2014/nime2014_336.pdf.
  • Poepel, C. & Overholt, D. (2006). Recent developments in violin-related digital musical instruments: where are we and where are we going?. In Proceedings of NIME (6 pp.). Available online: http://www.nime.org/proceedings/2006/nime2006_390.pdf.
  • Rasamimanana, N., Bernardin, D., Wanderley, M., & Bevilacqua, F. (2009). String bowing gestures at varying bow stroke frequencies: A case study. In Gesture-based human-computer interaction and simulation (pp. 216–226). Berlin: Springer.
  • Rasamimanana, N., Fléty, E. & Bevilacqua, F. (2006). Gesture analysis of violin bow strokes. Gesture in Human-Computer Interaction and Simulation ( Lecture Notes in Computer Science 3881, pp. 145–155). Berlin, Heidelberg: Springer-Verlag.
  • Schoonderwaldt, E. & Demoucron, M. (2009). Extraction of bowing parameters from violin performance combining motion capture and sensors. Journal of the Acoustical Society of America, 126, 2695.
  • Schoonderwaldt, E. & Jensenius, A.R. (2011). Effective and expressive movements in a French-Canadian fiddler’s performance. Proceedings of NIME (pp. 256–259). Available online: http://www.nime.org/proceedings/2011/nime2011_256.pdf.
  • Stowell, D., & Plumbley, M. (2007). Adaptive whitening for improved real-time audio onset detection. Proceedings of International Computer Music Conference (ICMC), Copenhagen, Denmark (Vol. 18).
  • Trueman, D., & Cook, P. (2000). BoSSA: The deconstructed violin reconstructed. Journal of New Music Research, 29(2), 121–130.
  • van der Linden, J., Schoonderwaldt, E., Bird, J., & Johnson, R. (2011). Musicjacket-combining motion capture and vibrotactile feedback to teach violin bowing. IEEE Transactions on Instrumentation and Measurement, 60(1), 104–113.
  • von dem Knesebeck, & Zölzer, U. (2010). Comparison of pitch trackers for real-time guitar effects. In Proceedings of the 13th International Conference on Digital Audio Effects (DAFx’10), Graz, Austria, (pp. 266–269).
  • Young, D. (2002). The hyperbow controller: Real-time dynamics measurement of violin performance. In Proceedings of NIME (pp. NIME02-1-6). Available online: http://www.nime.org/proceedings/2002/nime2002_201.pdf.
  • Young, D. (2003). Wireless sensor system for measurement of violin bowing parameters. Proceedings of the Stockholm Music Acoustics Conference (pp. 111–114). Available online: http://opera.media.mit.edu/papers/YoungSMAC03.pdf.
  • Young, D. & Deshmane, A. (2007). Bowstroke database: A web-accessible archive of violin bowing data. Proceedings of NIME (pp. 352–357). Available online: http://www.nime.org/proceedings/2007/nime2007_352.pdf.