Why theory is essential: the relationship between theory, analysis and data

Author: David Adger


Issues relating to why theories of linguistics have such an important place in the academic discipline

Table of contents


I will restrict my remarks here to ‘core’ areas of linguistics (those that constitute the traditional levels of description: semantics; syntax; morphology; phonology), excluding phonetics where the physical nature of the data makes the questions somewhat different under current understanding, and excluding areas such as pragmatics and text linguistics, where I have little competence. Within these core areas, linguistic data is of essentially three types: corpus data, experimental data and intuition data.

One important thing to get over to students about these kinds of data is that they are meaningless in the absence of a theory. There currently exist no reasonable operational techniques or inductive procedures for extracting anything from linguistic data which can lead to insight or understanding in our focus area. However, given a framework of assumptions of some kind (a theory), distributional patterns in the data may constitute evidence for the (in)correctness of these assumptions. Only under an analysis driven by a theory, does the data impact on our understanding of language. Because of this, there is no well defined notion of ‘proof’ in empirically driven linguistics, a concept which students often find difficult to grasp.

Students need to appreciate that theory is not simply a summary of the data (a common misconception); rather it is a specification of mechanisms (loosely construed) which give rise to the observed effects. Nor is analysis merely a summary of the data: an analysis is an attempt to discover how well the mechanisms of the theory uncover meaningful patterns in the data.

There are, then, three separate aspects of the process whereby we can achieve insight in core linguistics: the development of theory; the construction of analyses; the collection of data. Some of the data of core linguistics is actually generated by the theory, in that experimental and intuition-based data are collected to test theory, and the collection techniques are designed with this aim in mind. Even corpus data undergoes basic analysis when it is tagged for (possibly computational) search, and this analysis is, to some extent, driven by theoretical concerns, or by methodological principles deriving from such concerns. Theory development, then, is essential because there is no route to understanding except through theory.

Because of these properties of theories, it is important also to emphasize that the database for a theory cannot be delimited prior to the definition of that theory. One cannot treat data as evidence for or against a theory simply by claiming that the data is inside or outside its antecedently defined domain; rather there must be clear argument as to the causal link between the data and the theory, and clear decisions about whether such data can be safely ignored given the state of current understanding. These are extremely difficult and subtle questions, which are as difficult in practice for experienced linguists as they are for students to understand. However, such questions are extremely important, since they emphasize the tentative nature of scientific knowledge in general.

Finally, an important object of study is the theory itself. As well as its explanatory potential in accounting for observed and (as yet) unobserved facts, a theory has other properties that can be independently investigated: are its basic assumptions mutually consistent; does the theory contain assumptions which overlap in their effects, essentially contributing to redundancy; does it conform to other metrics of simplicity (number of primitives, etc.)?


