Abstract
Word–Gesture keyboards allow users to enter text using continuous input strokes (also known as gesture typing or shape writing). We developed a production model of gesture typing input based on a human motor control theory of optimal control (specifically, modeling human drawing movements as a minimization of jerk—the third derivative of position). In contrast to existing models, which consider gestural input as a series of concatenated aiming movements and predict a user’s time performance, this descriptive theory of human motor control predicts the shapes and trajectories that users will draw. The theory is supported by an analysis of user-produced gestures that found qualitative and quantitative agreement between the shapes users drew and the minimum jerk theory of motor control. Furthermore, by using a small number of statistical via-points whose distributions reflect the sensorimotor noise and speed–accuracy trade-off in gesture typing, we developed a model of gesture production that can predict realistic gesture trajectories for arbitrary text input tasks. The model accurately reflects features in the figural shapes and dynamics observed from users and can be used to improve the design and evaluation of gestural input systems.
Notes
1 In this article the term trajectory refers to all kinematic variables that describe some motion (position, velocity, acceleration, etc.), whereas the term path refers to only its geometric form.
2 General reviews of the issues for text entry interaction are also given by MacKenzie and Tanaka-Ishii (Citation2007) and MacKenzie and Soukoreff (Citation2002).
3 This article uses terms such as distance and closest as generic terms for the metrics of a gesture recognition algorithm and not to imply a specific matching method.
4 In the rest of this article the term gesture typing is used as a generic descriptor for the basic word-gesture keyboard input process.
5 When precise control is required under strict visual guidance the principle does not necessarily hold (reviewed by Fitts, Citation1954), and there is a speed–accuracy trade-off to be considered (Woodworth, Citation1899).
6 Collection procedures are detailed in Section 5.3. All gesture examples in this article were selected from this data set.
7 In practice, gestures are ultimately validated for correctness by a gesture recognition system, but using such a system here would introduce a confound between our model of gesture production and the implicit model of the chosen recognition system and lexicon.
8 Model I9250, with a 4.65-in. screen running at a resolution of 720 × 1280 pixels (316 ppi). Touch events were observed at a resolution of approximately 90 Hz but were mercurial and required resampling (see upcoming text).
Additional information
Notes on contributors
Philip Quinn
Philip Quinn ([email protected]) is a researcher with an interest in human factors measurement and modeling; he recently completed a doctorate at the University of Canterbury, New Zealand, and is presently a research scientist at Google Inc.
Shumin Zhai ([email protected], shuminzhai.com) is an HCI scientist with an interest in foundational research and practical inventions of interaction methods; he is a senior staff research scientist at Google Inc.