ABSTRACT
It is argued that many of the major notions in ‘mainstream’ linguistic theory and method over the years have been influenced by a ‘classical formalist’ ambience that suited conventional ideas about how science ought to proceed but fostered an idealized ‘frozen’ conception of the language system in isolation form reality and society. Today, the tide is turning toward functionalist accounts of language; but the accompanying shift in our scientific programme calls for careful reflection. Some deep-lying motives for the shift are explored with a view to potential consequences.
A. The quest for ‘language by itself’
1. Two basic ‘facts’ about language seem fairly plain. One ‘fact’ is that language has a high degree of organization reflected in the ‘front end’ presented to our perception — the sounds and forms of words and phrases. The other ‘fact’ is that people use language to do things — to ‘mean’ things and to achieve things. Within the big picture of language adopted by speakers and hearers in everyday life, these two facts seldom compete or conflict. Yet the study of language has often sought to choose either the one fact or the other — either how language ‘by itself’ is organized or what people use language to do. If you face such a decision, the first choice may well look more appealing in promising a smaller, tidier job. Instead of confronting what J.R. Firth (1957a: 187) was pleased to call, after Alfred North Whitehead, ‘the mush of general goings on’, we can focus on the organization of language and divide up the labour of our studies, one person or group studying the sounds, another the words, another the phrases, and so on. Once all these items have been found and classified, our job should be finished.
2. Given such an appeal, it is hardly surprising if the majority of study so far, ranging from ‘traditional grammar’ up through philology and modern linguistics, has been devoted to ‘language by itself’. When Saussure’s influential Course in General Linguistics emphatically concluded with ‘the fundamental idea’ that ‘the true and unique object of linguistics is language studied in and for itself’ (1969 [1916]: 232), he (or his students) presumably intended to shield linguistics from absorption by neighboring sciences. Saussure complained that ‘heretofore language has almost always been studied in connection with something else’ (1969: 16). Though surrounded by ‘other sciences that sometimes borrow from its data, sometimes supply it with data’ — ‘political history’, ‘psychology’, ‘anthropology’, ‘sociology’, ‘ethnography’, ‘prehistory’, and ‘palaeontology’ — ‘linguistics must be carefully distinguished’ from such sciences, which can contribute only to ‘external linguistics’, concerning ‘everything that is outside’ the ‘system’ of ‘language’; in return, ‘we can draw no accurate conclusions outside the domain of linguistics proper’ (1969: 102f, 147, 6, 9, 224, 20f, 228) (but see § 43).
3. In effect, the prospects for any science of language were made contingent on the precept that ‘language by itself’ can indeed be located and studied, given the proper methods. This precept was in turn reflected in several tenets propounded in influential books setting down the ‘classical’ programme of ‘mainstream’ linguistics:1
(1) A language should be considered a uniform, stable, and abstract system in a single stage of its evolution.
(2) This system is to be defined by internal, language-based criteria.
(3) Language is a phenomenon distinct from other domains of human knowledge or activity.
(4) A language should be described apart from variations due to time, place, or identity of speakers.
(5) The description of a language should be couched in statements at a high degree of generality, if possible about the language as a whole or even about all languages.
Within that programme, the tenets interlock in projecting a free-standing and self-sufficient conception of language that stands firm while we are describing it (cf. § 26, 65).
4. Most of the theoretical and practical problems throughout modern linguistics have arisen from the tendency to consider tenets (1-5) as fundamental postulates which any science of language must accept rather than as empirical hypotheses to be tested by a range of methods. Linguistics was rendered highly self-conscious about the hypothetical but henceforth essential borderline between ‘linguistic’ versus ‘extra-linguistic’ or ‘non-linguistic’ data, issues, explanations, and so on (cf. § 30, 63). Since ‘language by itself’ is not a ‘fact’ or ‘object’ directly presented to observation, linguistics sought to construct it by sheer theoretical bootstrapping. The most fateful consequence has been the idea that language can be removed from all contexts for purposes of rigorous analysis; in fact, such analysis merely creates a different and special context, one that may exert powerful but largely unacknowledged controls on the language data (cf. § 40, 54).
5. Let us focus here on hypothesis (1) stating that language should be considered a uniform, stable, and abstract system, which can be called for short the u-s-a hypothesis (though it was by no means limited to or universally accepted in the real USA). The strongest test for this hypothesis would be whether linguistic research does indeed discover such a ‘u-s-a system’ for a given ‘natural language’ like English. The discovery process has proceeded by means of conventional data-handling strategies, such as:
(1) collating: a large set of data samples are compared and contrasted to distill out what they have in common, e.g. which word types frequently occur with other types;
(2) generalizing: certain aspects of the observed data are construed to be general ones, e.g. that the ‘Subject-Verb-Object’ order of a sample set of English sentences is a typical pattern for the language as a whole;
(3) rarefying: the ‘rich’ data are rendered more ‘sparse’ by disregarding certain aspects or details, e.g. variations in the actual pronunciation of language sounds;
(4) decontextualizing: the data are taken out of the observed context and treated as if they had occurred in isolation or could occur in a wide range of contexts, e.g. irrespective of the social status of groups or speakers;
(5) introspecting: the linguists make estimations based on their own intuitions about the language, e.g. which sentences do or do not violate the ‘rules’;
(6) consulting informants: native speakers are and asked to judge or rate data samples of their language, e.g. to decide whether two utterances ‘mean the same’.
Since the data by themselves do not tell us exactly how these strategies should be applied, the validity of the strategies ought to be a further hypothesis, or rather a set of hypotheses, to be tested by our results.
6. But how can the results provide a test for the validity of the very strategies expressly deployed to produce those results? To escape circularity, the key tests would surely be the convergence among data discovered and described, and the consensus among linguists about how the data should be treated and interpreted. In retrospect, these two tests have been met with full success only in the description of language sounds in ‘phonology’. Here, linguistics indeed found a ‘u-s-a system’ of ‘phonemes’ whose quantity and nature can be precisely described by two sets of criteria. Physically, each ‘phoneme’ can be uniquely described by its features, e.g., a ‘voiced stop’ such as [d] produced when the vocal cords vibrate and the air flow is fully blocked; the visual correspondence between phonemes and written letters of the Roman alphabet was also supportive, though it was not an official base because the description was strictly addressed to spoken language. Mentally, each ‘phoneme’ must be capable of differentiating between units that also differ in meaning, e.g. [d] versus [t] in ‘hid’ versus ‘hit’. This full success made the study of language sound systems in ‘phonemics’ or ‘phonology’ into the ‘model paradigm’ in modern linguistics, e.g. when Firth (1957a [1951]: 222; 1968 [1957b]: 191) recommended that ‘phonemic description should serve primarily as a basis for the statement of grammatical and lexical facts’, and that ‘linguistic analysis’ should have ‘the same rigorous control of formal categories’ as ‘in all phonological analysis’. A lasting heritage of this view has been the proliferation of ‘-eme’ terms (e.g. ‘morpheme’, ‘lexeme’, ‘tagmeme’, ‘syntagmeme’, ‘sememe’) modeled after the ‘phoneme’.
7. Henceforth, ‘mainstream’ theories confidently projected language to be an array of ‘u-s-a subsystems’ (usually called ‘levels’), each consisting of a repertory of minimal combinable elements comparable to ‘phonemes’. A complete description of a language would be the sum of the descriptions for each subsystem, supplied by linguists working in the several areas within a neat ‘division of labour’. For a time, some linguists (especially in America) insisted that ‘rigid, water-tight compartments or levels are aesthetically satisfying and provide the only valid scientific conclusions’, and that ‘level mixing’ was a ‘sin’, e.g. ‘the Pike heresy’ of ‘persistently using non-phonetic criteria in phonemics’ (quoted by Pike 1967: 59, 443, 66, 362; cf. Hockett 1942, 1955; Moulton 1947; Voegelin 1949:78; W. Smith 1950: 8; Trager & H. Smith 1951).
8. Yet matters have proven less manageable as research has moved beyond the subsystem of sounds. The subsystem of minimal meaningful forms, called ‘morphemes’, is already less tidy. Convergence and consensus are fairly high for identifying and isolating the morphemes in our data, where the chief physical criterion, the linear arrangement of the data written down, is visually clear though less well-defined than the articulatory criteria of phonology. But convergence and consensus are rather lower for classifying morphemes into categories, since observed linear positions by themselves do not afford explicit, clean-cut indications of category; at most, we can set up some categories whose names indicate where items appear, e.g. ‘prefixes’ in front, ‘suffixes’ behind, and ‘infixes’ in the middle. Some languages do present specific morphemic sectors, such as the inflections of Nouns and Verbs, which can be precisely and exhaustively described; yet even there, complexities can arise, e.g., the category of ‘English Noun plural morphemes’ written ‘‑s’ or ‘‑es’ but pronounced /s/, /z/, /∂s/, or /∂z/2 , plus the ‘zero morpheme’ not written at all (like ‘sheep’). Otherwise, the majority of ‘morphemes’ fall into very large and fuzzy sets, e.g., all Nouns or all Verbs. The standard solution to this problem has been to put all indivisible words over into the class of ‘lexemes’ and reserve the term ‘morphemes’ for the tidier sectors.
9. The subsystem of ‘syntax’, which concerns the arrangement of phrases and clauses, is still more problematic, chiefly because we are dealing with a repertory consisting not of minimal units but of complex units (sometimes called ‘syntagmemes’ after the terms ‘phonemes’ and ‘morphemes’) ranging from just one morpheme (e.g. ‘help!’) up to an extensive phrase, clause, or sentence. Nor does it seem feasible to give an exhaustive, precise listing of phrases or clauses; even the traditional division into ‘Subject’ and ‘Predicate’ can leave tricky residues, e.g., signals of the speaker’s viewpoint like ‘frankly’. And syntax inherits the problems of morphology about classifying items in sets. Again, the physical appearance of data written down for visual inspection does a deal of handiwork insofar as the divisions between words and between at least some phrases seem evident. But the reasons why an observed pattern of words and phrases has that shape must be inferred.
10. Evidently, the methods of identifying and classifying units into repertories had supported consensus among linguists quite well for phonology and fairly well for morphology, but later much less well for syntax. So amid a flood of animated controversy, phonology plus morphology were moved toward the sidelines and syntax assumed the role of ‘model paradigm’ in linguistics. True, the ‘u-s-a hypothesis’ remained firmly in place; but the ‘u-s-a system’ was now conceived to be a repertory of ‘rules’ for arranging units into phrases and sentences. Yet since — unlike the units — these ‘rules’ plainly do not appear ‘in’ the data, this new paradigm placed increasing demands on the ingenuity of linguists in devising ‘rules’. The data-handling strategy (5) of ‘introspecting’ now assumed a key dual role not just in relating the rules to discovered data but in generating invented data that would reflect the linguists’ knowledge of the rules as native speakers — their ‘competence’ (cf. § 46). In this dual and somewhat circular role, introspecting threatened to overshadow the other data-handling strategies, especially the strategies of ‘collating samples’ and ‘consulting informants’.
11. The state of affairs was most diffuse in semantics, the investigation of the meanings of language. While phonology was the model paradigm, semantics had sought to set up a repertory of abstract minimal units called ‘semes’ or ‘sememes’ like the ‘phonemes’, e.g. ‘± Animate’ or ‘± Human’, but the criteria for identifying them lacked any straightforward basis such as the phonemes had (cf. § 6, 19ff). When syntax became the model paradigm, semantics was handed the job of supplying ‘rules’ to ‘interpret’ syntactic ‘strings’. The wherewithal of this ‘interpretation’ was ‘semantic features’, which strongly resembled the ‘sememes’.
12. What gradually ensued was an uneasy imbalance between the language data and a descriptive apparatus which was still to be defined solely by internal, language-based criteria separated from variations due to time, place, or identity of speakers. Predictably, convergence and consensus receded dramatically. Groups of linguists proposing different types of rules proliferated; and even linguists who agreed about rule types attained conflicting descriptions when they moved beyond the more straightforward and well-behaved examples. Decades of further work on rule-systems has not managed either to supply a complete, definitive description of any language or even to attain consensus about how we should seek one.
13. So if the tests for the ‘u-s-a hypothesis’ are convergence and consensus (§ 6), then it stands refuted, and we need to reconsider the ‘mainstream’ research programme based upon it. My sense is that such a reconsideration is now well under way, but has not been guided by a sufficiently consolidated and well-argued rationale. The danger persists that recent trends may be seen as retreating from scientific standards, whereas we are in fact redefining those standards (cf. section C).
B. The enduring problem of ‘constraints’
14. A constraint can be defined as any factor which makes certain items or patterns more or less likely than others. The firmest constraints upon language found in linguistics so far are in phonology, namely the physically defined features of the phonemes and the mentally defined capability of differentiating between units that also differ in meaning (§ 6). The physical constraints are the most powerfully distinctive ones: a ‘stop’ cannot be both ‘labial’ and ‘dental’, or both ‘voiced’ and ‘unvoiced’. Though speakers may not actually make audible distinctions between, say, [d] and [t], the two phonemes retain their secure and unique positions in the ‘u-s-a system’ of English phonemes (§ 6). The mental constraints are less obviously distinctive, but are easily met by finding at least one contrastive pair like ‘hid’ and ‘hit’, whose members do not mean the same thing. Notice here that we need not state what they mean or in what respects their meanings differ; we merely need a pair for which nobody would deny the meanings do differ (cf. § 38!).
15. in morphology, the constraints are already less tractable. Are we to assume, for instance, that the speaker of English is aware of Romance-language-based morphemes like /in-/ and /im-/ signifying negation and their sensitivity to phonemic position before dentals (‘intangible’) versus labials (‘impossible’); or of the criteria for using them versus /un-/, /non-/, or /a-/; or of their distinctness from the same set of phonemes and graphemes signifying direction in ‘inject’ or ‘impale’? Or are these constraints merely a historical sediment of English that has become ‘arbitrary’, whereas the constraints on, say, singular versus plural are still active and productive?
16. It was in syntax that the problems of constraints was destined to become truly virulent. As we saw in section A, the notion of a system being a repertory of minimal combinable elements proved explosively unmanageable for syntax and was replaced by the notion of a system of ‘rules’ for arranging units into phrases and sentences (§ 10). Losing the constraints supplied by ‘minimalness’ and by the straightforward procedures for isolating minimal units turned out to be quite costly. A extensive new set of constraints was required which would distinguish all the allowed or ‘grammatical’ sequences, of whatever length and complexity, from all the disallowed or ‘ungrammatical’ ones. Since the ‘classical’ programme of ‘mainstream’ linguistics’ required this job to be done by internal, language-based criteria alone (§ 3, 12), the ‘natural’ constraints of situation and context that always apply to real data in human interaction were not deemed admissible unless they had been formally reconstructed as purely linguistic ‘rules’.
17. The ‘rules’ were accordingly envisioned to be explicit, formal statements of constraints applying directly to sequences or ‘strings’ composed not of words as such but of syntactic categories such as ‘NP’ (noun phrase) or ‘VP’ (verb phrase). The set of the allowable (or ‘grammatical’) category-sequences of a language like English is indefinitely large but not, as was claimed, infinitely large, at least not in the genuine mathematical sense of ‘infinity’. A truly infinite system will eventually produce all possible combinations, even unimaginably improbable ones, just as in infinite time a roomful of chimpanzees pressing the keys of typewriters will eventually write the works of Shakespeare. Such statements tell us nothing about language or about Shakespeare, but are mere tautologies of the concept of infinity. What might actually be infinite is the set of possible realizations of such sequences as utterances, plus the specific details of time, place, tone of voice, etc., which were not addressed by mainstream descriptions anyway, witness tenet (4) in § 3.
18. Still, it is troubling to imagine that an indefinitely (let alone infinitely) large set of category sequences might call for an indefinitely (let alone infinitely) large set of rules. So the ‘transformational’ approach was eagerly greeted as a means for constraining the set of proposed rules by postulating rules that convert some sequences into other sequences and thereby provide them all with their respective ‘structural descriptions’. This attractive idea not merely slammed the door shut again on infinity (which, I have suggested, was not really necessary) but allowed some sequences to act as constraints on other sequences, with the rules acting as channels for relaying the constraints. Within this conception, three scenarios were possible:
(a) There exists precisely one such rule set for a given language like English;
(b) There exist several, perhaps many such sets.
(c) There exists no such set.
Only if (a) holds can we predict a steady trend toward convergence and consensus.
19. The confidence that (a) does hold was buoyed up for a time by the expanded freedom to devise rules that are not ‘in’ the data but merely held to ‘underlie’ it. The freedom was much enhanced when the domain of rules was expanded from syntax to include semantics (§ 11). But the freedom worked against convergence and consensus as long as it remained unclear how these semantic constraints could be derived and stated. Syntactic ‘rules’ had been conceived as constraints on linear orders and needed merely to state where items should go. ‘Semantic rules’ had to operate between the domain of meaning, which is hardly linear in any straightforward sense, and the domain of syntax, which presumably is. The ‘sememes’ like ‘± Animate’ or ‘± Human’ were merely binary; if they were internally ordered, then chiefly by hierarchy, e.g. ‘Human’ being a subclass of ‘Animate’, rather than by linearity. For the rules to operate upon sequences, a feature like ‘+ Human’ assigned to a Noun category would be a constraint on what categories can precede (e.g. of Adjectives) or follow (e.g. of Verbs).
20. So the ‘transformational generative’ solution to the spiraling problem of constraints and rule-sets undercut convergence and consensus still further. Symptomatic here was the virulent and unresolvable dispute over how much of the formal arranging of sentences should be done by the syntax or by the semantics. The ‘standard’ model held the line in favour of the ‘syntactic component’ as the sole motor of arranging the sequence which was then ‘interpreted’ by the ‘semantic component’; but this scheme made it difficult for the semantic constraints to actively assist the arranging. In the converse model (‘generative semantics’), the ‘logical form’ of the sentence was first set up by the semantics and then ‘interpreted’ by the syntax to yield the actual sentence pattern; but how can ‘logicality’, focusing on issues like ‘quantification’ (e.g. ‘all’, ‘some’ ‘every’, etc.), be interfaced with linearity?
21. The ensuing controversies and of the rapid withdrawal of support for ‘generative semantics’ suggest that semantic constraints are vastly more subtle and complicated than any other constraints linguistics has been seeking. A syntactic sequence is at least a clear arrangement, with some items definitely placed before (or on the ‘left’) of other and some some items definitely placed after (or on the ‘right’). But semantics keeps hitting on ambiguities, i.e., on cases where the ‘same’ linguistic material may have several meanings; and so we need a large additional set of constraints to determine which of those meanings is the chosen one, e.g. for written examples received in the absence of the writer:
[1] Blind woman forced by cop to clean up after her guard dog accepts settlement (Evening Times-Globe [Saint John, New Brunswick, Canada]) 8/17/88
[2] State Recycling Skyrockets in 1988 (Tulsa World 8/18/88)
[3] Police chase winds through three towns (Saint Croix Courier [New Brunswick, Canada.] 12/14/88)
[4] Actor sent to jail for not finishing sentence (Knoxville, TN New Sentinel, 1/21/89)
We could resolve [1] by stipulating that ‘accepting a settlement’ belongs to the class of actions requiring a ‘Human’ Agent, such as ‘woman’ but not ‘dog’; and that ‘clean up after’ belongs to the class of standing ‘Prepositional Verbs’ having their own meanings. We could resolve [2] and [3] three by alternative syntactic descriptions, with ‘Recycling’ being Noun, not Verb, and ‘winds’ being Verb, not Noun; but do we want semantic rules to stipulate that a ‘state’ cannot ‘recycle skyrockets’ if it so decides (and can get them back), or that ‘police’ cannot ‘chase the wind’ in hopes of apprehending it, say, on charges of vandalism and property damage? For [4], we can’t get help from syntax, since both meanings of “sentence’ (uttered sequence and court punishment) apply to a simple Noun. Conceivably, an ‘actor’ could end up in ‘jail’ for willfully violating a contract by breaking off his or her performance in mid-sentence. Or, less conceivably, some authority could be so convinced of the inviolate status of formal grammatical rules as to make incomplete sentences a punishable offense. (Lord knows, sillier laws have been passed, such as that statute in force on a Norwegian coastal island making it a crime to be in a bad mood in public.)
22. Often, we can only resolve such ambiguities by reasoning about what the writer probably intended to say, based on our knowledge about the world. To assists semantics in these fresh and thorny tasks of ‘disambiguation’, linguistics finally turned to ‘pragmatics’, which, being the study of ‘the relation between linguistic expressions and their uses’ (Webster’s Seventh Collegiate Dictionary, p. 667), might seem at odds with the ‘classical’ programme of ‘mainstream’ linguistics to study ‘language by itself’ (cf. § 3). However, the programme was upheld by conceiving the speaker in a uniform, stable, and abstract (‘u-s-a’) fashion as a faceless supplier of intentions to perform ‘speech acts’ that constrain the meaning of sentences. The hearer was a similarly faceless recoverer of those intentions. So language remained firmly at the theoretical centre, and the sentence merely acquired the further role of a basis for reasoning backwards to the speaker’s intention(s) and forward to the hearer’s recovery of the intention(s). Again, the constraints were to be stated as formal ‘rules’ at the highest degree of generality.
23. In this section, I have argued that the historical development of linguistics was driven by the search for the one set of constraints that apply all across the language — scenario (a) in § 18. But the long-range failure to attain or merely to approach convergence and consensus could well be taken (and has been, e.g. Bierwisch 1965) to support to scenario (b) allowing for several such sets, perhaps a great many. Recent developments indicate, however, that the lack of convergence and consensus are instead evidence for scenario (c), stating that no such set can be ever be discovered. If so, important progress must wait until the ‘classical’ programme has been fundamentally revised (§ 57-69).
24. Basically, language can be described as a mediating system interposed like a layer between a layer of ‘reality’, i.e. the world we live in (however we conceive it to be) and a layer of ‘society’, which talks in and about that world. Society can of course go directly to reality by acting upon it, e.g. plowing fields or building houses. But having language typically makes most such actions more worthwhile and effective, and makes many other actions possible quite apart from acting upon reality.
25. The ‘classical’ programme of ‘mainstream’ linguistics indicates that we can and should detach language from this configuration and roll the other layers aside. The validity of this move hinges on the deeper (‘u-s-a’) hypothesis that once detached, the language system will stand firm: complete and fully organized by its own internal constraints (cf. § 3). The lines of argument I have developed in Sections A and B lead to the opposite conclusion: that once detached, the system tends to skid out of control, and can only be described if we restore the constraints of reality and society. Much theoretical and reconstructive work in linguistics has in effect been such a restoration but has stopped short of drawing the conclusion itself. More often, certain constraints of reality and society have been disguised as ‘formal rules’ operating upon isolated sentences, each sentence being a valid instantiation of ‘language by itself’. When we move beyond straightforward, simple examples hand-picked to fit the rules (like ‘John is easy to please’), the other missing constraints take their vengeance upon us by stubbornly blocking convergence and consensus. And no amount of redoubled ingenuity in designing rules, or ‘extending’ and ‘revising’ the ‘mainstream’ theories, can ever resolve this impasse.
26. If language were a uniform, and stable, and abstract (‘u-s-a’) system, we could indeed detach it from reality and society. But such is at most the putative local status of phonology, with every ‘phoneme’ held uniquely and precisely in place by physical and mental criteria (§ 6). But in its global status, language is an evolving system that is not uniform over time, and fluctuates between abstract and concrete. We must take account of how links are temporarily established to relate items within the current version of the system (cf. § 48). If we took away the constraints from reality and society that help to build these transitory networks, the language system would not stand firm but would skid out of control, whence the principled impossibility of describing it in that state.
27. However, this global status supports local frozen islands, to borrow a key term from complexity theory (e.g. Kauffman 1990a, b). In language, these islands include exactly those formal domains or factors that have been successfully described by ‘mainstream’ morphology and syntax. But many other domains or factors remain in flux until they become relevant for the current version of the language needed to support the ongoing discourse. The main reason why linguistics did not attain convergence and consensus was the inappropriate and untested notion that the global status of the system is frozen, or can be frozen for purposes of description. Doing this job even partially demands a heroic ‘freezing’ action on the linguist’s part; and the divergence of the outcome from the outcome of other such actions is not surprising, but inevitable. We might even predict the degree of divergence by reference to the relative state of flux that is to be frozen: lower in morphology, higher in syntax, and highest of all in semantics.
28. When language is put to use in discourse, brief local ‘freezings’ continually congeal and then disperse, rather like a liquid at a ‘subcritical stage’ which readily attain ‘critical mass’ and then ‘critical dispersion’ with modest inputs of energy (cf. Beaugrande, in preparation). Some of the constraints used here come from the standing frozen islands, while others are made to order for the occasion. The demonstration sentences picked for most formal linguistic analysis attempt to take a footing on the standing frozen islands but they slip off to the degree that this terrain is insufficient and often slippery as well, whence the disputes among linguists.
C. Formalism versus functionalism
29. Going back to the two ‘basic facts’ cited at the outset (§ 1), we can now contrast two fundamental outlooks on language. The ‘fact’ that language has a high degree of organization is essential for formalism, a term that can subsume all methods construing form to be the basis and framework of language — how entities are shaped or arranged. The ‘fact’ that people use language to do things is essential for functionalism, a term that can subsume all methods construing function to be the basis and framework of language — what means are used toward which ends. In the past, constructive interaction between the two stances has been regrettably hindered by the predisposition of each to regard itself as the outermost framework of language science and its counterpoint as a limited subdomain, as suggested graphically in Fig. 2.
30. The ‘classical’ programme for describing ‘language by itself’ has naturally favoured formalism, since the forms seem to be the most uniform, stable, and abstract (‘u-s-a’) aspects of language, whereas functional aspects tend to be associated with use. So it has become conventional in linguistics to presuppose the legitimacy of formalism, whereas the legitimacy of functionalism must be expressly justified. Formalism was widely held to confer high ‘scientific’ status, whereas functionalism was either ignored or else patronized as ‘unscientific’, ‘pre-theoretical’, or merely ‘applied’. So functional research has been severely held back by inappropriate or premature demands for rigor, abstractness, generality, and so on, stated as absolute, a priori criteria of science.2 One concrete symptom has been the routine efforts of functionalist methods to justify and defend themselves by constantly reasserting what ought to be obvious, e.g.:
there is more to using language, and communicating successfully with other people, than being able to produce correct sentences. Not all sentences are interesting, relevant, or suitable; one cannot put any sentence after another and hope that it will mean something. (Cook 1989: 3)
Such an argument would be pointless had not formalism attached vast importance to ‘grammaticality’ (here, ‘correctness’) of the isolated sentence (§ 16f), making it the cornerstone of ‘linguistic competence’ and declining to inquire whether a sentence might be are interesting, relevant, or suitable in actual communication, questions which would inevitably reach beyond the boundaries of ‘language by itself’ (cf. § 4).
31. Symptomatic too are the many hesitant compromises in which modest amounts of functional data are cautiously admitted without revising the formalist framework, e.g. in situating ‘functional sentence perspective’ upon ‘generative semantics’; or in which formalist methods are glibly renamed ‘functional’ ones, as in ‘structural-functional’ grammar. Ironically, these compromises are sometimes faulted for going too far, whereas their weakness lies rather in not going far enough!
32. In the long run, though, pure formalism runs aground on its own austere principles and is trapped in irresolvable dilemmas because, I have argued, it is based on hypotheses that stand refuted by the collective result of linguistic research over at least the past thirty years. The promise for a complete, precise formal description of any natural ‘language by itself’ remains unfulfilled not because linguists have not yet worked out the ‘correct’ theory or model, but because no theory can ever freeze the design of ‘language by itself’ (§ 27).
33. I would surmise here that the significant advances of functionalism in recent decades have reached a turning point — a ‘subcritical stage’ close to ‘critical mass’ (in the sense of § 28). Instead of merely patching up or abetting formalism with sporadic functional constraints, we are now seeking a convergence and consensus for theories and models which are genuinely and unabashedly functional from start to finish and which will determine the role and valence of formality on that basis. We will bring to fulfillment the long-standing advocacy of the ‘Prague School’ scholars led by Vilém Mathesius who proposed that instead of ‘proceeding’ ‘from form to function’, as ‘older linguistics’ had done, we ‘proceed from function to form’ (1926: 198; cf. Mathesius 1975 [1961]; and see now Nekvapil 1991). Leading in to his contrast between Czech and English sentences, Mathesius (1975 [1961]: 84f) suggested that functional factors (e.g. ‘theme’) originally preceded formal ones (e.g. ‘Subject’) and thus coincided with them for a time, but not for a ‘long duration’.
34. Let us reconsider in this light the organization of language into ‘levels’ (cf. § 7). The characteristic descriptive formalist scheme had its levels defined by the units of a set of ‘u-s-a’ systems, one each for phonemes, morphemes, words or ‘lexemes’, and phrases or ‘syntagmemes’, each being the subject matter of one established field in linguistics, as suggested in Table 1.
In some sense, these units appear ‘in’ the data of language samples, at least when they have been transcribed into a consistent visual orthography (cf. § 6, 8f). Perhaps encouraged by this visual medium, the relationship among the levels was assumed, at least implicitly, to be based on a building-block conception of size and constituency, the phonemes being the components of morphemes, the morphemes the components of words, and the words the components of phrases (cf. Bloomfield 1933: 162). Hence, the whole scheme was held together by a ratio of parts to wholes, even though the criteria for defining the respective types of units were not consistent, e.g. features of articulation (like ‘voiced’) applying only to the phonemes. The meaning of a sentence or utterance should accordingly be the straightforward sum of the meanings of the parts — a precept expressly stated by Saussure (1966 [1916]: 121) and Chomsky (1965: 144, 162f), among others. The validity of the precept could not be seriously tested until linguistics proceeded from stipulating that phonemes (can) differentiate meanings and that morphemes have meanings over to stating what those meanings are (cf. § 6ff, 14).
35. A characteristic functionalist scheme, in contrast, might have levels such as ‘intonation’ or ‘prosody’, ‘lexicogrammar’, and ‘discourse’, which are the subject-matter of more recent or less established field in linguistics (Table 2).
Intonation or prosody is both the sequence of uttered sounds corresponding to the abstract units (the phonemes) and the overall curve or ‘melody’ of pitch, tone, and volume of the sounds. The ‘lexicogrammar’ includes not just the morphemes and the phrase structures, but their cognitive grounding in the community’s system of world-knowledge about how processes and their participants are organized, e.g. whether an Action (e.g. ‘accept a settlement’) has a Human Agent (cf. § 19, 67) (cf. Longacre 1976; Halliday 1985). And ‘discourse’ is the total communicative event, including gestures, facial expressions, emotional displays, and so on. These levels are interrelated not through size and constituency, but through mutually determining functions, witness the intonation curves that are typical for certain discourse domains (e.g. political speeches). We can turn here to the influential idea of Frantisek Danes (1964) that one level be regarded as the means which serve the ends of the other levels.
36. This idea can be insightfully applied also to the more familiar scheme of descriptive ‘levels’ in order to characterize their relations to each other and to meaning. As shown in Table 3 (up to down axis on the left side),
the levels whose units are typically but not obligatorily smaller are the means for the ends of the levels whose units are typically but not obligatorily larger. Moreover, each level as a whole is the means for the end of the meanings on that level (left to right axis). And finally, the meanings on the levels whose units are typically but not obligatorily smaller are the means for the ends of the meanings on the levels whose units are typically but not obligatorily larger (up to down axis on the right side).
37. This formulation seems well-suited to the precepts of pioneering functionalists. We can recall here Firth’s pronouncement that ‘descriptive’ or ‘structural linguistics’ should ‘deal with meaning throughout the whole range of the discipline’ and ‘at all levels of analysis’ (1968: 50, 160). We can also recall Pike’s warning that ‘the sharp-cut segmentation of meanings’ is ‘in principle impossible’: ‘meaning has its locus not in the individual bits and pieces’, but ‘within the language structure’ in an ‘identified context’ (1967: 609, 134). There, ‘the meaning of one unit in part constitutes’ and ‘is constituted of the meaning of a neighbouring unit’ (1967: 609). So ‘meaning’ is a ‘contrastive component of the entire complex’ and ‘occurs only as a function of a total behavioural event in a total social matrix’ (1967: 148f, 609). Pike’s view might help resolve such ‘difficulties’ as arise when ‘morphemes’ seem ‘lexically meaningless’ or ‘lack’ an ‘unchanging core of meaning’ (1967: 184, 186, 598f; cf. Bazell 1949; Bolinger 1950; Hockett 1947; Nida 1948, 1951). And we can treat ‘semantic variants’ in terms of how they are ‘conditioned by the universe of discourse’ (1967: 599).
38. In the functionalist scheme, relations or ratios of size and constituency are not decisive, because a means relates to its end first and foremost in terms of its function, purpose, or motivation and only secondarily and at times arbitrarily through its form, shape, or dimensions. The co-presence of several ‘levels’ follows simply from the requirement that so complex a system as language must avail itself of several types of items, each type specialized for some functions more than for others. Each type helps to render it probable (albeit not totally certain) that the active version of the language system will support the stretches of discourse that participants actually process, whose length and complexity are decided on line by ‘packaging and scheduling strategies’ (see § 44 below) rather than defined a priori by the units of formal linguistic analysis. For the wherewithal of spoken sounds to be sufficiently distinctive to be reliably produced and received, the phonemes supply targets around which the variations of actual uttered and heard speech are clustered while current contextual constraints ensure that mistakes or miscues happen fairly seldom and endanger communication even more seldom. To enable distinctions among the differing functions of the same word-base (e.g. the stem of a Verb), a language is highly likely (though not forced) to work with means whose formal signals consist of modifications or expansions of the base; so the morphemes get organized into modest ‘frozen islands’ whose borders are stable enough that many cases can be handled with compact resources, e.g. the Arabic ‘broken’ or ‘internal’ plural that modifies the form versus the ‘sound’ or ‘external’ plural that adds an ending (like ‘-iin’ to the masculine and ‘-­aat’ to the feminine in Spoken Iraqi Arabic); even special cases are then readily handled, e.g. for assigning plurals to English words that get borrowed into Arabic, some with the internal plural like ‘film - aflaam’ (film/s) and some with the suffixed plural like ‘tilifizyoon - tilifizyoonat’ (television/s), depending on whether the word happens to resemble native words; even nonce-borrowings follow, as observed in Arab code-switching, e.g. ‘daktoor - dakaatra’ (doctor/s) versus ‘muudeel - muudeelaat’ (model/s) (Sallo 1994). Finally, the language needs standing word-base units to carry the brunt of distinct combinable meanings; hence the lexemes for a large open category whose sub-categories (the ‘parts of speech’) may be indicated by morphemic systems or by linear position or by both, whence the dual imperative for syntax. The meaning of the utterance is not registered separately on any of the levels but is the operational result of the strategies which draw upon these resources as suits the current context. So such questions as how much of the formal arranging of sentences is done by the syntax or by the semantics (§ 20) are unanswerable in principle, because meaning is never absent from any ‘level’ or ‘component’.
39. The notions of ‘frozen’ and ‘flux’ can help capture the central functionalist notions of ‘unmarked’ versus ‘marked’, which has often been interpreted merely in terms of higher versus lower frequencies. The standing frozen islands tend to coordinate the most unmarked options, e g. the Active versus Passive Clause formats of English. The more marked the options, the more they would tend to involve express momentary ‘freezing’. The effort of producing and receiving them would depend on this factor rather than on frequencies of occurrence, which are unduly abstract and computationally unrewarding or in many cases totally unworkable. In a Shakespeare passage like this:
[5] But when the planets
In evil mixture to disorder wander [...]
Fights, changes, horrors
Divert and crack, rend and deracinate
The unity and married calm of states
Quite from their fixure! (Troilus and Cressida I, iii, 94-101)
the combinations are strikingly marked, e.g. the ‘married calm of states’; yet we can comprehend the meaning (i.e. that ‘disorder’ follows when ‘the speciality of rule hath been neglected’, as Ulysses says) and appreciate the imagistic effects by performing a similar freezing in our own current versions of the English language, which may require some literary training.
40. Viewed this way, degrees of ‘markedness’ become the functional successor to formal ‘degrees of grammaticalness’. When an utterance is consensually deemed by native speakers to instantiate a ‘grammatical sentence’ of the language, it is the output channeled predominantly, though (aside from standing clichés like ‘no man is an island’) not exclusively, from frozen islands and their immediate vicinity (§ 46). So we do not have a clean contrast between yes-or-no or between ‘grammatical’ versus ‘ungrammatical’ unless we set about to create deliberate ‘non-sentences’, an act which necessarily drives a wedge between our analysis and the empirical realities of language wherein ‘non-data’ are seldom produced on purpose (§ 4).
41. The functionalist project advocated above does not reject formalism at large but rather its claims to be the exclusive source and statement of categories, criteria, and constraints. Our leading criteria cannot be formality and rigour as ends in themselves, but empirically and computationally supportable descriptions of how a language as a complex system can be designed to operate and evolve as rapidly effectively as it evidently does. Formality and rigour will not be rejected but shifted to a new position. The results of formalism would be ‘bracketed’ and situated in a deeper and wider perspective, such that the patterns and regularities uncovered so far are viewed not parts of a final description or explanation of language but as data which still require a functional description or explanation.
42. A promising pathway for research might be to seek formal and rigorous accounts of the ‘requirements for evolvability in complex systems’, such as ‘self-organization’ and ‘selection’ (e.g. Kauffmann 1990a, b). Such accounts are now available across a range of sciences, including mathematics, physics (especially condensed matter physics), astronomy, chemistry, biology, immunology, psychology, economics, computer science, engineering, and robotics (e.g. Anderson et al. [eds.] 1988; Langton [ed.] 1988; Perelson [ed.] 1988; Jen [ed.] 1989; Stein [ed.] 1989; Langton et al. [eds.] 1992; Zurek [ed.] 1990) and suggest significant principles for the new foundations of a science of text and discourse as well (Beaugrande, in preparation). We might thereby explicitly resituate linguistics among the other sciences after a long tradition of either fending off presumed encroachments (as in Saussure’s claims cited in § 2) or making sporadic or sketchy borrowings, e.g. comparing a ‘grammar’ that ‘generates all grammatically “possible” utterances’ with a ‘chemical theory’ that ‘generates all physically possible compounds’ (Chomsky 1957: 48)
43. As an evolving complex system, language would operate not directly with standing ‘rules’ but with powerful packaging and scheduling strategies that select some ‘rules’ from a standing repertory (e.g. that the English Article precedes the Noun) and generate other ‘rules’ on the spot (e.g. that ‘recycling’ is done to commonplace plentiful objects like paper and cans rather than to uncommon objects like ‘skyrockets’), and apply the rules in some workable order, sometimes in sequence and sometimes in parallel. These strategies can freely derive constraints from reality (e.g. that winds are unprofitable to chase) and from society (e.g. that uttering grammatically incomplete sentences can hardly be a prison offense) (examples from § 21).
44. The most powerful constraints would therefore apply not directly to the sentence as a sequence of syntactic categories, as formalist linguists have consistently assumed, but rather to the design processes which are tuning the ‘current version’ of the language and generating those constraints needed for the ongoing communicative context. Formalist linguists have, as it were, been looking too far ‘downstream’ for ‘shallow’ constraints on the sentence itself; but these cannot reveal the working of the system until we uncover the ‘deeper’ constraints ‘upstream’ that are charged with specifying constraints at varying degrees of ‘shallowness’, including those addressed by formalism as well as those above or below them.
45. It would follow that the language system, or a native speakers ‘competence’ of it, cannot consist of a complete set of standing formal ‘rules’ that apply to the sentence (cf. § 10, 30). Instead, it consists of a complex of constraints shading outward from a modest ‘inner set’ of general standing rules (more or less ‘frozen islands’) likely to apply in most of the currently active versions of the language, toward ‘outer zones’ (‘in flux’) wherein more specific and transitory ‘rules’ are set up to sustain the one currently active version by means of operations for search, activation, and regulation of linkages among items in patterns. The ‘rules’ about which linguists do attain consensus would come from that inner set, while the ‘rules’ which remain in dispute would come from the outer zones (cf. § 40). So what we might take to be an abstract or formal linguistic ‘rule’ describing a formal ‘sentence structure’ would actually be a commonplace selection or output of operations that fluctuate to suit the motivations and organizational demands of the context. The notion that such motivations can be parcelled off to ‘components’ like ‘syntax’, ‘semantics’ and ‘pragmatics’ clouds our understanding of the empirical fact that the motivations are products of continual interactions. Placed in abstraction and isolation, those ‘components’ have no organization amenable to complete or definitive empirical discovery and description, much less to a definitive formalization. Only a fully developed functional framework can tell us which sections of language can be formalized and to what degrees.
46. The precept that actual communication runs on the currently activated system offers an opportunity to reformulate the whole issue of meaning in terms of which meanings might be activated at a given moment. One empirical strategy has been developed in recent research on ‘priming’ (e.g. Kintsch 1988, 1989). A probe item is held to be primed — its level of activation is raised above the inactive state — if people consistently recognize and respond to it more rapidly than otherwise, e.g., by pressing a key to signal that it either is or is not an English word (Kintsch 1989: 197). During text reception such as reading, the initial association among a word and its possible meanings was surprisingly found to be not merely non-determinate but non-selective! So when people are reading a given word in a text, both its relevant and its non-relevant meanings are initially ‘primed’ and activated; but after a short time, the non-relevant ones are deactivated while the relevant ones raise their activation and ‘spread’ it to farther associates. Suppose you are a speaker of English reading a text, on a moving computer display, containing the passage:
[6] The townspeople were amazed to find that all the buildings had collapsed except the mint
The text suddenly halts at ‘mint’, and the display gives you a target item to decide if it’s a real word. For a brief interval up to roughly half a second, your response will probably show priming for both the relevant target ‘money’ and the non-relevant ‘candy’, but not for the inferrable ‘earthquake’ (what caused ‘the buildings’ to ‘collapse’). Thereafter, the non-relevant item loses its priming while the relevant and the inferential items gain. Evidently, the constraints of context exert their control during this interval and regulate the strength whereby any one word or meaning is associated with the current ‘control centres’ of the topic.
47. The importance of this finding can hardly be overestimated. The resulting ‘construction-integration model’ is distinguished case of a major theoretical revision driven directly by empirical data — a rare event in the study of language by linguistics and language philosophy. We find concrete evidence that the meaning of a discourse is not just constructed on the spot, but with extremely cheap ‘rules’ — in fact, ‘rules’ may not be the proper term at all. The processing of the discourse at the receptive end first activates the ‘nodes’ within the knowledge network that are stored for the each word (or word-part) being recognized. This activation automatically spreads to the meaning-nodes in the same network. The now active network (suggested by right-hand graphic of figure 2 in § 26) runs through several cycles whereby the strengths of the connections are adjusted, some being raised and others lowered; and which adjustments occur is evidently determined by the constraints of the context. Here, linguistics and semantics would frame the leading question: what sort of rules could possibly be skilled and rapid enough to do the job? And how could they be called up and applied if, at (or near) the split second when they are needed, the processor has not resolved ambiguous word senses?
48. The answers may lie in a striking parallel that has come to the fore in ‘complexity theory’, relating again to the ‘requirements for evolvability in complex systems’ cited in § 42, namely the concepts of ‘self-organization’ and ‘increasing returns’ under the folksy motto: ‘them that has, gets’ (Waldrop 1992: 17; see Beaugrande, in preparation, for details and sources). The most rudimentary requirements for ‘self-organizing processes’ have been studied in research on the ‘cellular automaton’, a self-operating mechanism embodying a ‘programmable universe’ wherein time is ‘ticked off’ by a ‘cosmic clock’ and space is filled with an arrangement of discrete ‘cells’, each of which can be in only one of a fixed repertory of states, say, either living or dead (compare Burks 1970). With each tick of the clock, this automaton makes a transition to a new state determined by its own current state and the current state of its neighbors. The ‘laws’ of such a universe can be encoded in a ‘transition table’ stating the ‘rules’ for changing from any current state to a possible consequent state. A cellular automaton can be simulates with current computer technology , e.g. as a program for generating patterns of dots on a screen according to rules specified by the programmer (see Wolfram 1984; Wolfram [ed.] (1986). The simulations uncovered a surprising regularity conforming to only four classes of ‘rules’ (Table 4), whose names I have reformulated somewhat (compare Waldrop 1990: 225ff).
Class 1 are ‘doomsday rules’: no matter what random pattern of living or dead cells you start with, they all get rapid death within a few time-steps, and the grid on the computer screen goes completely uniform. Stated within the theory of dynamical systems, these rules have a single ‘point attractor’, like a marble rolling around in a basin: wherever it started it would soon roll down and stop in the centre. Class 2 are ‘stagnation rules’ whereby the initial pattern soon congeals into stable blobs that sit there in a lethargy of faint, regular oscillations. In dynamical systems, these rules have a set of ‘periodic attractors’, like a pattern of hollows in a bumpy bowl, in each of which the marble could keep rolling gently but indefinitely. Class 3 are ‘chaotic rules’ that produce an excess of activity, and the grid on the screen appears to be boiling with a ‘chaos’ (in an ordinary sense) of structures so unpredictable and unstable that they break up almost as soon as they form. In dynamical systems, these rules have a set of ‘strange attractors’, like a marble rolling around in a bowl so fast and furiously that it can never settle down. Finally, Class 4 are ‘self-organizing rules’ that produce an ‘order’ of structures which multiply, grow, split, and recombine in coherent patterns but don’t ever fully settle down. These rules, which have no correlated ‘attractor’ in the theory of dynamical systems, seem the most similar to the basic principles that could construct life-systems and their processes and in fact generate patterns quite reminiscent, say, of the growth of ferns.
49. Programmers kept putting in ‘rules’ and sorting them into one of these four ‘classes’ just by watching the results, hoping that the classes can be reliably distinguished by some definable property. And, surprisingly, one such property was found in the straightforward ‘survival probability’, i.e. the likelihood that any given cell would be alive in the next ‘generation’ ticked off the ‘clock’ (shown in Table 1). A probability near 0 goes with ‘doomsday rules’, and everything dies off almost at once. A somewhat higher probability goes with ‘stagnation rules’, and things survive but in stasis. A 50-50 probability goes with ‘chaotic rules’, and each cell switches constantly from life to death and back, so that nothing can stay organized. A ‘critical threshold’ around 27.3% turns out to go with ‘self-organizing rules’, where life-like structures arise spontaneously.
50. The findings in priming during the reception of discourse strongly suggest that there too, some mode of ‘self-organization’ must be at work, and that its key feature is again the regulation of critical values, as has in fact been simulated on computers by Kintsch’s group. The nodes whose mutual linkage is near these values will become ‘attractors’ for their surrounding sectors in the knowledge network and thence the ‘control centres’ for building up the array of knowledge that corresponds to the ‘meaning’ of the discourse as the construct of the receiver (here the reader), and not as the output of ‘shallow rules’ called up to map out specific ‘phrase structures’, ‘transform’ them into others, and to ‘interpret’ the result by pasting together the meanings of the constituent formal pieces. The simulations by the Kintsch group in Colorado and the group around David Rumelhart and James McClelland in California indicate that an associative network can support a coherent array of text meaning (Kintsch’s ‘textbase’) by adjusting strengths of linkage in a ‘connectionist’ manner (cf. Rumelhart & McClelland 1986). Here, ‘concepts are defined in a knowledge net by meaning constructed from their position in the net; immediate associates and semantic neighbors of a node constitute its core meaning’, whereas ‘its complete and full meaning’ could be obtained only by ‘exploring its relations to all the other nodes in the net’ (Kintsch 1988: 164). If so, the attempts of classical semantics to expound the exact meanings of words necessarily branches out indefinitely, whence the conspicuous lack of convergence and consensus noted in § 11f.
51. I would see a confirmation here for my own long-standing conjecture (e.g. Beaugrande 1987, written in 1985) that language processing entails a significant margin of ‘non-determinacy’ that has not been adequately reflected in linguistic theory but is vital for managing language complexity and fluctuation, especially within the subsystem of ‘semantics’. Against the deterministic research ‘tradition’ of ‘modelling knowledge use in comprehension by designing powerful rules to ensure that the right elements are generated in the right context’, Kintsch and his group have shown us how much can be accounted for by a ‘weaker production system’ whose ‘rules’ are ‘just powerful enough that the right element is likely to be among those generated’ along with ‘irrelevant or inappropriate’ ones (Kintsch 1988: 163f). Such a system ‘can operate in many contexts’, as befits discourse ‘environments characterized by almost infinite variability’ (ibid.). So ‘a computational model of text comprehension’ as ‘the construction of a mental representation of a text with simple, though rough and crude rules’ being ‘used promiscuously’, followed by ‘a wholistic integration phase’ that produces ‘a coherent picture’, would seem to be ‘psychologically more plausible and computationally more flexible’ than the ‘precise rules’ that classical semantics has envisioned (Kintsch 1992: 263).
52. Computers have also made a significant advance in a different direction but again indicating that relatively few constraints (universal ‘frozen islands’) apply all across the language as an abstract system. The majority apply rather to discourse domains or contexts, some sparser, some richer. These contexts, which have largely remained implicit in ostensibly formal analysis (§ 4), can now be systematically described through huge computerized corpuses of real language data, such as the ‘Bank of English’ at the Birmingham University, which, as of January 1994, contains ‘several hundred millions of words of running text’, with an operational sample corpus of 167 million words of text from 797 British and American books; newspapers (Times, Independent, Guardian, Today, Wall Street Journal, New Scientist, Economist); magazines; radio broadcasts (BBC and NPR); and recordings of conversations4 (cf. Baker at al. [eds.] 1993). Such data banks can reveal regularities that simply aren’t evident either from modest samples or from introspection of native speakers (Sinclair 1992a, b). The question of how general a given regularity might be is no longer a matter of intuition subtly biased by a vested interest in situating things on the highest plane (§ 5, 17, 33). Instead, it is a matter to be verified by looking at sets of contexts in which key words appear more often or less often, and at the phrasings which frequently link certain word-types.
53. An interesting case in point is the Verb ‘build up’. If used in the Active as a Productive Process with a Human Agent as Subject and with a Target, the corpus collocations show an ameliorative attitude (e.g. when ‘you build up an organisation’); used in the Medial with a non-human Subject as Developmental Process and no Target, the collocations show a pejorative attitude (e.g. when ‘cholesterol builds up in the body)’ (Louw 1993: 171).4 In formalist linguistics, such a factor would probably be set aside as ‘subjective’, ‘vague’, or simply ‘extralinguistic’.
54. Admittedly, the display of data by no means eliminates the need for careful interpretation by the investigator nor transfers it over onto the computer. The assignment of attitudes just mentioned is still a subjective decision based on our world knowledge about whether things involved with ‘building u
