Abstract
Multimodal studies posit that meaning is not only communicated through spoken and written words, but also through other modes such as images, gesture, gaze, proximity etc. The widespread availability of high-quality, miniaturised audio and video recording and storing technology has made multimodal data collection cheap and easy. However, the transcription and analysis of the resulting avalanche of recorded data is complex, time-consuming, labour-intensive and expensive. To date there is no established practice or consensus as to scope, methods, objectives or definitions. In fact, concern has been voiced that the field risks expanding to the point of incoherence, sometimes building theory from intuition and generalising from single case studies. Lessons from the 200-year-old discipline of modern linguistics can provide one way forward for the vibrant emerging field of multimodal studies by introducing methods that generate results and hypotheses which can be critically evaluated and empirically tested.