5,250
Views
72
CrossRef citations to date
0
Altmetric
The specialization of function: Cognitive and neural perspectives on modularity

Modular processes in mind and brain

Pages 156-208 | Published online: 20 Dec 2011
 

Abstract

One approach to understanding a complex process starts with an attempt to divide it into modules·, sub- processes that are independent in some sense, and have distinct functions. In this paper, I discuss an approach to the modular decomposition of neural and mental processes. Several examples of process decomposition are presented, together with discussion of inferential requirements. Two examples are of well-established and purely behavioural realizations of the approach (signal detection theory applied to discrimination data; the method of additive factors applied to reaction-time data), and lead to the identification of mental modules. Other examples, leading to the identification of modular neural processes, use brain measures, including the fMRI signal, the latencies of electrophysiological events, and their amplitudes. Some measures are pure (reflecting just one process), while others are composite. Two of the examples reveal mental and neural modules that correspond. Attempts to associate brain regions with be- haviourally defined processing modules that use a brain manipulation (transcranial magnetic stimulation, TMS) are promising but incomplete. I show why the process-decomposition approach discussed here, in which the criterion for modularity is separate modifiability, is superior for modular decomposition to the more frequently used task comparison procedure (often used in cognitive neuropsychology) and to its associated subtraction method. To demonstrate the limitations of task comparison, I describe the erroneous conclusion to which it has led about sleep deprivation, and the interpretive difficulties in a TMS study.

Acknowledgments

For providing unpublished details of their data I thank Brent Alsop, Marinella Cappelletti, Stanislas Dehaene, Russell Epstein, Silke Goebel, John Kounios, Lotfi Merabet, Allen Osman, Alvaro Pascual-Leone, Philippe Pinel, Eric Schumacher, and Fren Smulders. For helpful discussions I thank Geoffrey Aguirre, David Brainard, Russell Epstein, Martha Farah, Joshua Gold, Roy Hamilton, Nancy Kanwisher, John Kounios, David Meyer, Jacob Nachmias, Allen Osman, and Seth Roberts. For helpful comments on the manuscript I thank Jessica Cantlon, Max Coltheart, Stanislas Dehaene, Martha Farah, Silke Goebel, Ronald Knoll, Brad Mahon, David Meyer, Allen Osman, Brenda Rapp, Eric Schumacher, Richard Schweickert, Fren Smulders, Sharon Thompson-Schill, Vincent Walsh, and two anonymous reviewers. For computer support I thank Vincent Hurtubise, Christopher Leary, and Roderick Smith.

Supplementary data (a table of features of the nineteen examples in this article and Sternberg, 2001) is published online alongside this article at· http://dx.doi.org/10.1080/02643294.2011.557231

Notes

1A module may itself be composed of modules.

2Heuristic arguments for the modular organization of complex biological computations have been advanced by Simon (Citation1962, Citation2005) and, in his “principle of modular design”, by Marr Citation(1976), who argued (p. 485) that “Any large computation should be split up and implemented as a collection of small sub-parts that are as nearly independent of one another as the overall task allows. If a process is not designed in this way, a small change in one place will have consequences in many other places. This means that the process as a whole becomes extremely difficult to debug or to improve, whether by a human designer or in the course of natural evolution, because a small change to improve one part has to be accompanied by many simultaneous compensating changes elsewhere.”

3Machamer, Darden, and Craver Citation(2000) distinguish “activities” and “entities”.

4This criterion for modularity seems to be far weaker than the set of module properties suggested by Fodor Citation(1983), according to whom modules are typically innate, informationally encapsulated, domain specific, “hard-wired”, autonomous, and fast. However, domain specificity appears to imply separate modifiability.

5Such double dissociation of subprocesses should be distinguished (Sternberg, Citation2003) from the more familiar double dissociation of tasks (Schmidt & Vorberg, Citation2006), discussed in Section 9.

6Adapted from of Sternberg Citation(2001) by permission.

7When the hypotheses about A and B are sufficiently detailed to specify particular process-specific factors that should influence them selectively, this leads to an alternative formulation of the inferential logic, in which the specification of F and G is included in the joint hypothesis, with the remainder of the reasoning adjusted accordingly. For a discussion of such alternatives, see Sternberg (Citation2001, Section A.2.3).

8A common error of interpretation is to assert the nonexistence of an effect or interaction merely because it fails to reach statistical significance. In evaluating a claim that an effect is null, it is crucial to have at least an index of precision (such as a confidence interval) for the size of the effect. One alternative is to apply an equivalence test that reverses the asymmetry of the standard significance test (Berger & Hsu, Citation1996; Rogers, Howard, & Vessey, Citation1993). In either case we need to specify a critical effect size (depending on what we know and the particular circumstances) such that it is reasonable to treat the observed effect as null if, with high probability, it is less than that critical size. The critical size might be determined by the sizes of effects generated by plausible models. Bayesian methods (e.g., Gallistel, Citation2009; Rouder, Speckman, Sun, Morey, & Iverson, Citation2009) provide another alternative, especially if the null is framed as an appropriate interval hypothesis rather than a point hypothesis. An example of suitable caution about inferring a null effect can be found in Ghorashi, Enns, Klein, and Di Lollo Citation(2010).

9Adapted from of Sternberg Citation(2001) by permission.

10Whereas properties () apply to observable quantities, the analogous properties () apply to contributions to a composite measure that are not directly observable.

11Matters may not be so simple for other combination rules, such as multiplication; see Section 8.2.

12The reasoning described in is sometimes erroneously expressed as “If we assume H3 then additivity confers support on .” This ignores the support that additivity also confers on H3.

13It is important to note that whereas factors that selectively influence serially arranged processes will have additive effects on mean , this is not the only possible basis for additive effects on an interesting measure, despite beliefs to the contrary (e.g., Poldrack, Citation2010, p. 148; Jennings, McIntosh, Kapur, Tulving, & Houle, 1997, p. 237). The critical requirement for additivity is combination by summation, whatever its basis.

14As in some other brain measurements (e.g., PET, fMRI), the poor ratio often means that averaging over trials is required for the measures to be interpretable. Here, the “noise” is due partly to neural events unrelated to the task being performed, whose contributions are reduced by combining subtraction of the pre-stimulus baseline level with an averaging process that reveals only those events that are consistently time-locked to the stimulus or the response.

15The effect of on was ms; its effect on was ms; the difference between these effects is ms (; ).

16It is this requirement that would have made it difficult to implement a suitable factorial experiment.

17The effect of on was ms; its effect on was ms; the difference between these effects is ms (; ).

18 would then be a composite measure, influenced by both factors, with summation as the combination rule.

19When a mean is taken over values of a subscript, that subscript is replaced by a dot.

20The error variance values reported by Smulders et al. Citation(1995) and the SE estimates provided here are likely to be overestimates (because balanced condition-order effects were treated as error variance); the data required to calculate better values are no longer available (F. T. Y. Smulders, personal communication, September, 1999).

21It is perhaps a confusion between time (process) and space (processor) that has led some commentators (e.g., Broadbent, Citation1984) to believe that a process whose modules are organized in stages cannot include feedback because it must be implemented by a “pipeline”: an ordered set of processors through which information passes in a fixed direction from input to output. Broadbent's “pipelines” constrain the relation between process and representation: later processes must operate on representations that have been processed more highly—that are “further from the input”. Stage models need not be constrained in this way; they merely partition processing operations into temporally successive components. There is no reason why a later stage cannot make use of new sensory information (such as feedback) in (re)processing earlier sensory information. For further discussion of Broadbent's Citation(1984) critique of stage models, and the distinction among three kinds of stage (completion-controlled, outcome-contingent, and data-dependent), see Sternberg Citation(1984).

22This way of deriving the forces their means into agreement: ; the question of interest is whether the differences among the four values agree.

23Given a constraint on the durations of different stages that is stronger than zero correlation but weaker than stochastic independence, the assumption of stages plus selective influence implies numerous properties of aspects of the RT distributions in addition to their means (Sternberg, Citation1969; Roberts & Sternberg, Citation1993), such as additive effects on var(). However, without this constraint, stages plus selective influence don't require effects on var() to be additive.

24The conclusion that and influence separately-modifiable sequential processes, or stages, is further strengthened by analyses of complete RT-distributions (rather than just RT means) from similar experiments (Sternberg, Citation1969, Sec. V; Roberts & Sternberg, Citation1993, Exp. 2).

25If , for example, for each rewarded response there are four rewarded responses, encouraging a liberal (low) criterion for .

26Such tests require no assumptions about whether a change in factor level causes an increase or decrease in activation. This contrasts with the assumption, sometimes used to infer modular neural processors (Kanwisher, Downing, Epstein, & Kourtzi, Citation2001), that stimuli more prototypical of those for which a processor is specialized will produce greater activation.

27Without requiring it, this finding invites us to consider that there are two qualitatively different encoding processes , one for each notation, rather than “one” process whose settings depend on . This possibility is supported by the observation that “the notation factor affects the circuit where information is processed, not just the intensity of the activity within a fixed circuit” (S. Dehaene, personal communication, September 29, 2006). If so, we have a case where a change in the level of a factor (here, ) induces a task change (one operation replaced by another; see Section 12.1), but evidence for modularity emerges nonetheless: the proximity effect is invariant across the two tasks. Based on the idea that the processes implemented by different processors are probably different, the (multidimensional) activation data from such a simple (two-factor) experiment can support a claim of operations replacement. In contrast, an RT experiment that alone could support such a claim has yet to be devised.

28Using a dual-task experiment, Sigman and Dehaene Citation(2005) have added to the evidence that distinguishes E from C : E could occur concurrently with all stages of the initial task, whereas C had to await completion of the “central” stage of the initial task.

29Extrastriate cortex was expected to respond to ; previous studies had implicated the remaining five regions (see ) in response selection.

30It is noteworthy and requires explanation that in each of the five cases where an effect is not statistically significant, it is nonetheless in the same direction as in those cases where the effect is significant. Is this because the neural populations that implement the S and R processes are incompletely localized, or because the measured regions don't correspond to the populations, or for some other reason?

31Schumacher and D'Esposito Citation(2002) suggest that such a process might occur only under the stress of a subject's being in the scanner, and not under normal conditions. However, RT data from the practice session, outside the scanner, showed a non-significant interaction of about the same size and in the same direction. It is also of interest that a whole-brain analysis of the fMRI data did not reveal any additional task-sensitive regions (E. Schumacher, personal communication, November 27, 2006).

32There is an unresolved puzzle about these data that suggests that it would be valuable to replicate this experiment, using a procedure known to produce additive effects on . The large SEs associated with the very small mean interaction contrasts for the data shown in panels 7A and 7B reflect the fact that the variability of the interaction contrast over subjects is quite large—so large relative to the mean that the reported F-statistics in both cases were 0.00. Indeed, relative to the variability, the reported mean interaction contrasts were significantly () too small.

33The relation between fMRI adaptation and neuronal activity is controversial; see, e.g., Sawamura, Orban, and Vogels (Citation2006), and Grill-Spector (Citation2006).

34Suppose that the fMRI analysis leads to the conclusion that the two factors influence separate modules within the PPA. This would not preclude their having interactive effects on the mean RT. This could happen, e.g., if the processes that contribute to the RT include one or more processes, other than the one(s) implemented by the PPA, that are influenced by both factors. Or it could happen if the processes that contribute to the RT are selectively influenced by the two factors, but are arranged in parallel rather than as stages. On the other hand, suppose that the fMRI analysis leads to the conclusion that there is a single module influenced by the two factors. Then, unless the PPA does not play a role in generating the response, additivity of the RT effects of the two factors would be unexpected.

35In the present analysis of the effects of place change, the data for same place were collapsed over levels of the view factor. This analysis differs from that of Epstein et al. Citation(2008).

36These are means of the BOLD signal strengths over the PPA regions in the two hemispheres.

37In a whole-brain analysis, reliable long- and short-interval effects were found in many other brain regions. There was no persuasive evidence that any region had just one of these effects. Furthermore, the number of regions in which the interaction of the two effects was significant (two among 21 tests) can be explained as the result of type I error. Thus, no regions were found that provided pure measures of either effect, and additional evidence was found of additivity of the two effects on the fMRI signal.

38In the popular diffusion model (e.g., Ratcliff & Smith, Citation2010), the most natural way in which a factor has its effect is by changing the rate of evidence accumulation. Additive effects on this rate produce effects on that are overadditive.

39These constants would depend, for example, on the mean proximities of the two populations to the centre of the brain region in which the fMRI signal is measured, and on the time relation between the two activations.

40If the duration of either process is changed sufficiently by the change in factor level so that the temporal distribution of neural activation in the region is altered, additivity could fail. However, the findings in Ex. 6.4 are perhaps reassuring: Despite the fact that the data ( indicated substantial effects on the durations of both processes of interest, the additivity of the fMRI effects in both of the regions influenced by both factors was remarkably good. Apparently, even these duration changes are small relative to the sluggishness of B.

41This argument is stronger for Ex. 6.4, where pure measures of each factor were found in three regions: This makes it more likely that the finding of effects of both factors in two other regions should be explained by those regions containing two populations of neurons, each responsive to one factor.

42For two perceptually separable (integral) dimensions, variation in one does not (does) interfere with making decisions based on the other, and perceptual distances obey a city-block (Euclidean) metric. See also, e.g., Ashby and Maddox Citation(1994).

43While viewing each sequence of shapes, subjects reported on the position of a bisecting line.

44This is a weaker requirement than stochastic independence, but may nonetheless be important.

45Adapted from Table 4 of Sternberg Citation(2001) by permission.

46See SM:13 and SM:A.13 for more details.

47Numerical experiments show that under some plausible conditions, Eq. 12 is well approximated even when the contributions u and v to are highly correlated.

48This follows from the electrical linearity of brain tissue (Nunez & Srinivasan, 2006, Ch. 1.5).

49Support for the theory is support for all of its three components. However, because the combination rule is given by physics in this application, there is no need to test component H3.

50In this application, modularity appears to change over time: During an earlier epoch (400 to 600 ms after probe onset) the two effects interacted substantially, while their topographies changed little from one epoch to the next. See SM:14 for more details.

51See SM:A.1.

52Subscripts d and r refer to the two tasks; subscripts s and o refer to the two stimulated brain regions. SEs are based on between-subject variability. Also supporting the claim of double dissociation, the differences and are significantly greater than zero, with and , respectively. However, because non-rTMS measurements were made only before rTMS, rather than being balanced over practice, straightforward interpretation of the slope values requires us to assume negligible effects of practice on those values.

53Tasks in which the number of iterations of the same process can be controlled, as in some search tasks, provide a special case of the subtraction method in which it is easier to validate the required assumptions. If the numbers of iterations in three variations of the same task are , , and , the test is the linearity of ; the slope of the function is an estimate of the duration of the iterated process.

54Tests of the invariance of the response process across tasks are provided by Ulrich, Mattes, and Miller Citation(1999).

55The test would require finding factors F and G such that in a factorial experiment using Task 1 there would be an effect of F but not of G, and in a factorial experiment using Task 2 the effect of F would be equal to its effect in Task 1, and be additive with the effect of G. The inclusion of Task 1 in such a study could add to what was learned from Task 2: it would test the pure insertion assumption as well as permitting estimation of and , rather than just of the effects on these quantities. For example, if we define the Smulders et al. Citation(1995) experiment discussed in Section 3.2 and 4.1 as Task 2 (analogous to Donders “b”), with target stimuli for the left and right hand, one could add a Task 1 (analogous to Donders “c”) in which the subject would respond with a single keystroke with the right hand if the right-hand target appeared, and make no response otherwise. One test of pure insertion would be to determine whether the effect of on in Task 1 was equal to its effect on in Task 2.

56This use of two related pairs of tasks is similar to the “cognitive conjunction” method for brain activation experiments introduced by Price and Friston Citation(1997), except that they appear to have proposed no analogous test.

57An alternative approach (sometimes called “parametric design”) is exemplified by variation of attentional load over six levels by Culham, Cavanagh, and Kanwisher Citation(2001), the use of the same working-memory task with five retention intervals by Haxby, Ungerleider, Horwitz, Rapoport, and Grady Citation(1995), and the use of a different working memory task with four sizes of memory load by Braver, Cohen, Nystrom, Jonides, Smith, and Noll (1997).

58Electrophysiological evidence that confirms the selectivity of the effect of has been found by Humphrey, Kramer, and Stanny Citation(1994).

59An important advantage of TMS over measures of brain activation (Section 6) in determining which brain regions are involved in implementing a process is that whereas activation of a region in conjunction with process occurrence does not mean that such activation is necessary for that process, interference with a process by stimulation of a region is better evidence for that region being necessary for the process to occur normally, just as does interference by a lesion in that region (Chatterjee, Citation2005). However, it needs to be kept in mind that the mechanism of TMS action is controversial (Harris, Clifford, & Miniussi, 2008; Johnson, Hamidi, & Postle, Citation2010; Miniussi, Ruzzoli, & Walsh, 2010; Siebner, Hartwigsen, Kassuba, & Rothwell, Citation2009).

60If the effect of is time specific, as is likely with single-pulse TMS or a burst of rTMS after the trial starts (“on-line” TMS), the interpretation of its interactions with other factors may not be straightforward. Thus, in the present example, suppose that it is region R that implements process B, and that B follows A. Because effects on the duration of A influence the starting time of B, and hence the time of TMS relative to B, a change in the level of F might modulate the effect of on B, and hence its effect on . The resulting interaction of with F would lead to the erroneous conclusion that region R is involved in the implementation of A. This argues for using rTMS before the task is performed (“off-line” TMS) in such studies, rather than using one or more TMS pulses during the task. When doing so, note that the cognitive aftereffects of TMS are short-lived and plausibly decline over time, which suggests that tests should be balanced in small blocks and that, to reduce error variance, trends over trials should be estimated and corrected.

61Which intercept is appropriate depends on details of the search process, and may differ for target-absent and target-present trials.

62O'Shea, Muggleton, Cowey, and Walsh (Citation2006, p. 948) say that “a single set size was used because adding a set size doubles the number of trials … ”. This reason is valid if a specified level of precision is desired for the effect of for each level of , but not if the goal is to achieve a specified level of precision for the main effect of , unless variability accelerates with . (But in their data, both and decelerate with .) Thus, if one runs 50 TMS trials each with and , instead of 100 trials with (and likewise for the non-TMS control condition), the precision of the estimated main effect of (which depends on the means over levels) would be about the same, and, at minimal added cost one would also have an estimate of the effect of , and therefore an estimate of the extent to which TMS modulates that effect.

63The first study used neither sham TMS nor TMS of a different brain region as the control condition. The authors, who reported only data from the target-present trials, in which the mean slope increased from 18.5 to 22.4 ms/item, claimed that they had found no effect of TMS on the slope, but they report neither a test of the slope difference nor a confidence interval. In the second study, in which the control condition was TMS of a different brain region, s were shorter, slopes were smaller, and the effect of TMS on the slope was greater.

64Data were retrieved for 11 of the 12 subjects; one of these, a clear outlier, was omitted. For each of the two conditions there were only about 50 RTs per subject for trials with each of the two responses. Starting with robust locally fitted polynomial regression (loess), a monotonic function was fitted to each of the four sets of observations for each subject. No adjustment was made for any decline over time in the aftereffect of rTMS.

65Future studies should concentrate observations on the closer proximities, where the -effect is greater. Also, because there is considerable variation across individuals in the magnitude of the -effect, subjects should perhaps be selected for large -effects.

66This factor could, but need not, be the presence or absence of a lesion in a region within which process B was carried out.

67This problem need not arise when the lesion is produced surgically, as in Farah, Soso, and Dashieff Citation(1992). However, because such lesions are produced to ameliorate some other pathology, such as epilepsy, they may be associated with more than one effect.

68Crawford and Garthwaite's (2006) methods are concerned with showing the presence of an effect in a single patient, not its absence.

69There is a sharp distinction between persuasive evidence of invariance and the failure to find a significant effect. In some papers it is concluded that brain damage has no effect merely from the fact that the test of the effect proves not to be statistically significant. Often no confidence intervals or other measures of the effect are provided that would permit one to decide how large the required effect would have to be, for significance. One such example can be found in the interesting study of lateral prefrontal damage by Gehring and Knight Citation(2002).

70Furthermore, if the second factor was a lesion in region , and only a composite measure was available, then a test of the hypothesized combination rule would require some patients who had lesions in both regions.

71Other impediments to using brain damage as a factor in process decomposition are:

(3) The victims of strokes often have widespread cerebrovascular disease. Traumatic head injuries tend to produce widespread minor damage, as well as localized major damage. This may be why damage that appears to be localized seems often to produce at least small effects on many functions.

(4) Even where functionally distinct brain regions are spatially distinct there is no reason to expect that the region of damage due to a stroke (which is determined by the brain's vascular organization) corresponds, so as to be functionally specific. Indeed, the localized effects of a stroke may be to damage nerve tracts that project to many brain regions.

(5) It may be difficult to find undamaged control subjects with overall levels of performance that are poor enough to be comparable. One approach is to increase the difficulty of the task for these subjects, but such increases may themselves have differential effects on different aspects of performance. Another approach is to select the better performing among the brain damaged subjects, but such selection is also a potential source of bias.

72If one variety of differential influence obtains, one can find factors F and G such that both factors influence both processes A and B, but for A (B) the effect of F (G) is the larger (Kanwisher et al., Citation2001). Whether differential or selective influence characterizes processors is controversial (Haxby, Citation2004; Reddy & Kanwisher, Citation2006).

73With suitable normalization, these pairs of inequalities are equivalent.

74For example, suppose that and , where x is the level (“strength”) of a factor. Let , , , and . Then both of the above pairs of properties are satisfied.

75For example, a design, with four conditions, provides only one df for interaction, whereas a design, with nine conditions, provides four df.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.