|
Sleep Research Online 5(1): 19-51, 2003 http://www.sro.org/2003/Fulda/19/ Printed in the USA. All rights reserved. |
1096-214X © 2003 WebSciences |
Sleep related breathing disorders (SRBD) are usually associated with impaired daytime functioning. The magnitude of this impairment might vary for different neuropsychological functions. Our objective was to assess cognitive dysfunction in SRBD patients. Different medical and psychological databases (Evidence Based Medicine, Medline, Embase, PsychInfo, PsychLit, The Eric Database, BiblioSleep) were searched (last search, December 2000). The reference lists of articles were checked and several journals and conference proceedings were hand-searched. We selected all observational studies comparing patients with an established diagnosis of SRBD to non-sleep disordered control groups, to clinical control groups, or to population norms on neuropsychological or psychometric performance measures, including computer-assisted tests and driving simulators. We rated the quality of each study according to criteria of external validity, internal validity, statistical validity, and the level of evidence. Outcome measures were classified according to a taxonomy of neuropsychological functions and statistically analyzed using meta-analytical techniques. Fifty-four studies reporting cognitive functioning of SRBD patients were reviewed. A total of 1,635 patients were compared with 1,737 control subjects. Twenty-eight studies provided adequate statistics and were integrated further. SRBD patients showed moderate to large reductions in mental flexibility, visual delayed-memory retrieval, and driving simulation performance (pooled effect size estimates ranged from 0.61 to 0.72). Small to moderate reductions were found for focused and sustained attention, verbal delayed-memory retrieval, verbal fluency and composite measures of general intellectual functioning (pooled effect size estimates ranged from 0.17 to 0.51). No difference was observed for divided attention, concept formation and reasoning, and verbal or visual immediate-memory performance. Data integration was not undertaken in the areas of attention-span and motor functions due to large between-study heterogeneity, and in the areas of perception, alertness, selective attention, vigilance, constructional performance, learning performance, executive functions and verbal and performance IQ measures due to insufficient data. Our conclusions were that cognitive performance of SRBD patients was impaired, yet there are remarkable differences between various neuropsychological functions and subfunctions. The integrated data show convincingly that disordered breathing during sleep is a risk factor for cognitive functioning during the daytime.
Patients with sleep-related breathing disorders experience wide-ranging cognitive dysfunction.
Experimental data corroborate the everyday experience that undisturbed sleep of appropriate duration, intensity and consistency is a prerequisite for adequate cognitive functioning, while sleep disorders are frequently associated with impaired daytime functioning. Major corollaries of disturbed sleep are cognitive dysfunction, mood disorders and social impairment. The kind and degree of impairment differs widely between diagnostic groups, and within groups between patients.
The objective of this meta-analysis is to summarize present knowledge on cognitive dysfunction in patients with sleep related breathing disorders (SRBD), an area in which the majority of studies were published. Cognitive dysfunction in other sleep disorders like insomnia, narcolepsy and restless legs syndrome will be reviewed in a separate meta-analysis.
Patients with SRBD experience cognitive dysfunction that is apparent in most areas of neuropsychological functioning (Hudgel, 1989; Kelly et al., 1990; Day et al., 1999). The available evidence was reviewed in three recent publications (Décary et al., 2000; Engleman and Joffe, 1999; Engleman et al., 2000). While Décary et al. (2000) summarized study results on cognitive dysfunction in a narrative review, Engleman and Joffe, (1999) Engleman et al. (2000) were the first who provided a quantitative overview of effect sizes and integrated results across studies statistically. They used broad neuropsychological categories such as attention and psychomotor tasks, memory and learning, executive and "frontal" tasks. As Décary et al. showed, construct validity of neuropsychological task performance, especially in the area of attentional functions, is not well understood and has led to very different interpretations even for the same task. The aggregation level for neuropsychological task performance in SRBD patients thus remains to be determined empirically. For this reason, we have tried to combine both approaches. In the present review we summarize evidence on cognitive dysfunction in SRBD patients by grouping individual study outcomes according to the well-established taxonomy of neuropsychological functions by Lezak (1995). If summary statistics were available for individual studies, they were further processed for homogenous groups of functions using meta-analytical techniques. This yields measures of between-study heterogeneity and pooled effect sizes for neuropsychological task performance of SRBD patients.
METHODS
Selection Criteria
For the present review we considered all studies that compared cognitive performance
in sleep disordered persons to either (i) cognitive performance in adequate
non-sleep disordered control groups, (ii) clinical control groups without known
neuropsychological impairments, or (iii) population norms. Selection criteria
were (a) types of studies (all observational studies), (b) types of participants
(sleep disordered persons where the sleep disorder had been established according
to at least minimum diagnostic criteria; ASDA, 1990), (c) types of comparisons
(to non-sleep disordered control groups, to clinical control groups without
known neuropsychological impairment, or to population norms), and (d) types
of outcome measures (neuropsychological or psychometric performance measures,
including computer-assisted tests and driving simulators).
Search Strategy
The following electronic databases were searched from June to December, 2000
last search December 12, 2000: Evidence Based Medicine for the period of 1974
to May, 2000; Medline for the period of 1966 to November, 2000; Embase for the
period of 1989 to October, 2000; PsychInfo for the period of 1987 to September,
2000; PsychLit for the period of 1987 to June, 2000; The Eric Database for the
period of 1982 to June, 2000; and BiblioSleep for the period of 1990 to November,
2000.
The following search terms were used: sleep disorder, sleep apnea, OSAS, CSAS, sleep related breathing disorder, SRBD, upper airway resistance syndrome, UARS, snoring, neuropsychological, cognitive, vigilance, attention, memory, performance, driving simulation. Furthermore, the reference lists of articles were checked and several journals and conference proceedings were hand-searched, especially Sleep Research, Volumes 1 to 25, corresponding to the years 1972 through 1996.
Assessment of Study Quality
All studies were evaluated according to criteria of external, internal and statistical
validity. External and internal validity were indexed by two key concepts each
(see below). The quality criteria of the present review were based upon criteria
for evidence-based medicine (EBM) (Clarke and Oxman, 2000); but also considered the non-interventional
nature of those studies where randomized allocation to cases and controls was
not possible for obvious reasons. Study quality was assessed with respect to
the aims of the present review and was based on information provided in the
actual publication. No attempt was made to obtain further information from individual
authors.
All studies were evaluated according to (a) external validity related to sampling, (b) external validity related to case definition, (c) internal validity with regard to selection bias, (d) internal validity with regard to performance bias, and (e) statistical validity. Validity was judged to be high, satisfactory, undetermined or unsatisfactory with the exception of statistical validity, which was only judged as high, satisfactory or undetermined. A detailed description of the quality assessment is given in Appendix I.
Levels of Evidence
In evidence-based medicine, results of primary and secondary analysis are classified
according to ten levels of evidence (1a to 1c, 2a to 2c, 3a, 3b, 4 and 5; Clarke
and Oxman, 2000). These levels describe the best
available evidence with regard to a particular research question. So far, evidence-based
medicine has not developed standard quality criteria for non-randomized or non-interventional
studies that are not concerned with therapeutic interventions, prognosis, diagnosis
or economic efficiency. Nevertheless, for the present review levels of evidence
for primary studies were closely matched to existing levels of evidence:
We classified each individual study according to the levels of evidence specified above. Whenever feasible, the effect of study quality was examined by excluding retrospectively those studies with poor quality from the analysis, to test for stability of pooled effect sizes.
Classification of Outcome Measures
Neuropsychological outcome measures were grouped according to the taxonomy of
neuropsychological functions as proposed by Lezak (1995). In all those cases where at least
five independent studies were available that compared the performance of sleep-disordered
patients with a control group, outcome measures were aggregated by means of
meta-analytical techniques (Voyer et al., 1995).
Integration of Outcome Measures: Effect Sizes and Meta-analysis
In experimental research, the main means of evaluating a scientific hypothesis
is the statistical test. The p value resulting from a statistical test
is the probability that the effect, as estimated from the data, may have emerged
given the null hypothesis (i.e., no effect) is true. This probability, in turn,
is a function of (a) the number of observations, (b) the size of the effect,
and (c) the relative efficiency of the statistical test used. It is obvious
that components (a) and (c) of the system are specific to the design of a particular
study and have only marginal relevance when the hypothesis is to be evaluated
on substantial (as opposed to technical) grounds.
The central idea of meta-analysis is that an "average" effect size can be estimated by combining all the unrepresentative, scattered effect sizes obtained in small-scale studies into one combined ("big") effect size that describes the central tendency of the whole distribution of study outcomes. In doing so, the focus of attention is shifted from the idea of significance testing (is the effect greater than zero?) to the idea of estimating the size of an effect (how large is it, exactly?). Since outcomes are often measured by a variety of different questionnaires, computer-assisted tests and miscellaneous observations, each with a different response metric, study outcomes have to be standardized to make them statistically comparable. The usual way is to transform the means of the outcome variables into a "z-metric" (a distribution with zero mean and unit standard deviation), and then to compute the number of standard deviations by which two group means differ. The standardized mean difference is the effect size. Effect sizes around 0.2 are considered as small, those around 0.5 as medium, and those of 0.8 or greater as large (Cohen, 1992). An illustrative description of effect sizes states that "medium represents an effect size likely to be visible with the naked eye,"a small effect size is to be "noticeably smaller, yet not trivial," and large effect sizes are "the same distance above medium as small is below it" (Cohen, 1992 , p. 156). A small effect size is equivalent to the difference in height between 15- and 16-year old girls, a medium effect size is equivalent to the difference in intelligence scores between clerical and semiskilled workers, and a large effect size is equivalent to the difference in intelligence scores between college professors and college freshman (Johnson and Eagly, 2000). When combining effect sizes from different studies, the most common weight is the reciprocal variance so that studies that have larger sample sizes are given more weight. Before computing a weighted mean effect size, the homogeneity of the single study effect sizes must be examined to determine whether the studies can be adequately described by a single effect size. The homogeneity statistic evaluates the hypothesis that the effect sizes are consistent across studies and can thus be meaningfully combined. Given a homogenous set of effect sizes, the result of a meta-analysis is a weighted mean effect size for a population of studies, which can be tested statistically. The typical graphical display of the results (see Figures) shows the effect sizes and confidence intervals from each of the single studies and below the weighted mean effect size from all studies combined. If the confidence interval crosses the vertical axis at zero, an effect size is not significant.
In the present study we aggregated data from single studies within basic neuropsychological functions (e.g., memory) on the level of well-defined sub-functions (e.g., immediate memory) if at least five studies could be found for the given function. Since meta-analysis relies on independent observations, effect sizes from studies comparing two patient groups to one control group, or multiple controls groups with one patient group, were averaged so that only strictly independent observations were entered into each analysis. A technical description of the applied methods is given in Appendix II.
RESULTS
Study Description
The literature search in the electronic databanks yielded a total of 308 documents.
All abstracts were read and 167 articles in full-text format were selected for
further evaluation. Thirty-four of those were selected for the present review
(Findley et al., 1986,
1989,
1995,
1999;
Lojander et al., 1999;
Sauter et al., 2000;
Bédard et al., 1991;
Verstraeten et al., 1996,
1997;
Schulz et al., 1997;
Camus et al., 1999;
Stone et al., 1994;
Klonoff et al., 1987;
Roehrs et al., 1995;
Borak et al., 1996;
Kotterba et al., 1997;
Cassel et al., 1989;
Kales et al., 1985;
Walsleben et al., 1989;
Knight et al., 1987;
Risser et al., 2000;
Berry et al., 1987,
1990;
Muñoz et al., 2000;
Juniper et al., 2000;
Randerath et al., 2000;
George et al., 1996;
Naëgelé et al., 1995;
Barbé et al., 1998;
Redline et al., 1997;
Greenberg et al., 1987;
Kim et al., 1997;
Ingram et al., 1994;
Phillips et al., 1994).
Hand searching and the checking of references yielded another 20 documents.
Fourteen of these were located in conference proceedings (Zozula et al., 1998a,
1998b;
Sloan et al., 1989;
Bonanni et al., 1999;
Naëgelé et al., 1999;
Kuo et al., 2000;
Pietrini et al., 1998;
Chugh et al., 1998;
Lauer et al., 1998;
Dani et al., 1996;
Dinges et al., 1998;
Morisson et al., 1997;
Van Son et al., 2000;
Verstraeten et al., 2000),
two in books (Findley et al., 1991;
Weeß, 1996),
and four in hand-searched journals (Lee et al., 1999;
Kotterba et al., 1998;
Rohmfeld et al., 1994;
Büttner et al., 2000).
Ten studies reported multiple patient (Findley et al., 1986;
Lojander et al., 1999;
Sauter et al., 2000;
Bédard et al., 1991;
Rohmfeld et al., 1994)
or control groups (Findley et al., 1995;
Verstraeten et al., 1996,
1997).
Two cases where results from two different studies were reported in one publication
were treated as separate studies (Findley et al., 1989,
1991).
In another case, where the same patient group was compared with two different
control groups (Verstraeten et al., 1996,
1997),
the data were treated as one study. The final database thus contained 55 studies.
A detailed description of all selected studies is provided in Table
1.
Thirty-three studies compared the performance of SRBD patients and control subjects, sampled from a non-complaining population. Eleven studies compared patients with a control group, sampled in the sleep laboratory. One of them (Findley et al., 1995) included a sample of healthy subjects in addition to subjects who were screened for, but did not fulfil criteria for sleep apnea syndrome (SAS). The clinical control groups included treated patients (Schulz et al., 1997), non-apneic snorers (Verstraeten et al., 1997; Chugh et al., 1998), a mixed group of treated patients and non-apneic snorers (Camus et al., 1999), insomniacs (Verstraeten et al., 1996; Stone et al., 1994) and non-apneic patients referred for evaluation of sleep apnea (Findley et al., 1991, 1995), and a group of patients scheduled for bypass surgery (Klonoff et al., 1987). Ten studies compared performance of SRBD patients to population norms (Findley et al., 1986; Lojander et al., 1999; Sauter et al., 2000; Roehrs et al., 1995; Borak et al., 1996; Kotterba et al., 1997; Cassel et al., 1989; Kales et al., 1985; Walsleben et al., 1989; Verstraeten et al., 2000); one study (Bonanni et al., 1999) compared with an unspecified database group; one study (Kotterba et al., 1998) compared with a normal control group and population norms; and one (Stone et al., 1994) compared with a clinical control group and population norms.
There were eight definitions of SRBD used within the studies. The type of sleep-related breathing disorder was defined as obstructive sleep apnea syndrome (OSAS) in 29 studies (Findley et al., 1986, 1989, 1999; Lojander et al., 1999; Sauter et al., 2000; Bédard et al., 1991; Verstraeten et al., 1996, 1997; Schulz et al., 1997; Camus et al., 1999; Klonoff et al., 1987; Roehrs et al., 1995; Kotterba et al., 1997, 1998; Kales et al., 1985; Knight et al., 1987; Risser et al., 2000; Berry et al., 1990; Muñoz et al., 2000; Juniper et al., 2000; Zozula et al., 1998a, 1998b; Pietrini et al., 1998; Lauer et al., 1998; Morisson et al., 1997; Weeß, 1996; Rohmfeld et al., 1994; Büttner et al., 2000), obstructive sleep apnea (OSA) in eleven studies (Findley et al., 1991, 1995; Borak et al., 1996; Walsleben et al., 1989; George et al., 1996; Sloan et al., 1989; Bonanni et al., 1999; Chugh et al., 1998; Van Son et al., 2000; Verstraeten et al., 2000), sleep apnea syndrome (SAS) in four studies (Naëgelé et al., 1995; Barbé et al., 1998; Dani et al., 1996; Lee et al., 1999) and occasionally sleep apnea (SA) (Cassel et al., 1989), obstructive sleep apnea/hypopnea syndrome (Naëgelé et al., 1995), sleep disordered breathing (Redline et al., 1997), sleep apnea DOES syndrome (Greenberg et al., 1987), or insomnia with obstructive sleep apnea (Stone et al., 1994). Six studies, in which the groups were identified outside the sleep laboratory, distinguished between cases and controls on the basis of the apnea/hypopnea index (AHI) (Kim et al., 1997; Ingram et al., 1994; Phillips et al., 1994; Berry et al., 1987) or the respiratory disturbance index (RDI) (Kuo et al., 2000; Dinges et al., 1998). Apnea severity indices that were reported included the apnea hypopnea index (AHI, 22 studies), the apnea index (AI, 9 studies), the respiratory disturbance index (RDI, 12 studies), the respiratory event index (REI, 2 studies), or the oxygen desaturation index (ODI, 3 studies). Six studies did not report an apnea severity index. Average apnea severity measures ranged for AHI from 11 (Phillips et al., 1994) to 73 (George et al., 1996; Zozula et al., 1998a), for AI from 17 (Knight et al., 1987) to 83 (Findley et al., 1989), for RDI from 12 (Rohmfeld et al., 1994) to 30 (Sauter et al., 2000), for ODI from 26 (Lojander et al., 1999) to 86 (Findley et al., 1986), and for the respiratory event index (REI) from 66 (Roehrs et al., 1995) to 71 (Camus et al., 1999).
The minimal diagnostic requirement for the diagnosis of sleep-related breathing disorders in the present review was nocturnal oximetry, which was considered to have been performed if an apnea severity index was reported. The majority of studies also performed a full night polysomnography to establish the diagnosis in the patient group, with four exceptions: one study (Lojander et al., 1999) used oximetry in combination with the static-charge-sensitive-bed; one study (Juniper et al., 2000) used oximetry and snoring, and two studies did not specify diagnostic procedures but provided measures of apnea severity (Sloan et al., 1989; Dani et al., 1996). For these four latter studies, external validity related to case definition was considered undetermined. Although not all subjects with sleep-disordered breathing were patients, for the sake of simplicity, we will refer to them as SRBD patients in the following.
The average age varied between 37 and 78 years for the SRBD patients and between 34 and 75 years for the control subjects, with a peak between 40 and 50 years for both groups. Forty-eight studies reported the gender of patients and 39 of them did so for the control group. Thirty studies included females, with a total of 132 females in patient groups and 201 in control groups. In comparison, a total of 995 males were included in patient groups and 669 in control groups. Summarized across all studies, there were 1,635 SRBD patients and 1,737 control subjects.
Study Quality
The results of the evaluation process are summarized in Table
2. External validity related to case definition was high for 36 studies,
satisfactory for 15 studies, and undetermined for four studies. External validity
related to sampling was high in only three studies, satisfactory in another
two, undetermined in 16 studies, and unsatisfactory in 34 studies. Internal
validity and statistical validity were only evaluated for those 44 studies,
which compared performance of SRBD patients and controls. Internal validity
with regard to selection bias was high in three studies, satisfactory in six
studies, undetermined in 34 studies, and unsatisfactory in one study. Likewise,
internal validity controlling for performance bias was high in three studies,
satisfactory in eight studies, undetermined in 31 studies, and unsatisfactory
in two studies. Statistical validity was high in three studies, satisfactory
in 13 studies, and undetermined in 28 studies. Regarding the level of evidence
for single studies only one study was judged as Level 1 evidence, while seven
were Level 2, 14 Level 3, and 33 Level 4 evidence.
Neuropsychological Functions
Perception
Four studies investigated basic perceptual abilities in SRBD patients Bédard
et al., 1991;
Knight et al., 1987;
Dani et al., 1996;
Lee et al., 1999).
Patients did not differ from controls in skin writing perception (graphesthesia)
(Knight et al., 1987),
the Hooper visual organization test (Bédard et al., 1991),
and a visual matching test (Bédard et al., 1991).
In a sensory motor task, patients showed a higher number of correct responses
than controls (Lee et al., 1999).
Finally, Dani et al. (1996)
reported reduced facial recognition in a small group of five SRBD patients when
compared to controls. No data integration was undertaken since the number of
studies was small and the tasks employed diverse.
Attention
Thirty-eight studies have assessed attentional performance (Findley et al.,
1986,
1991;
Sauter et al., 2000;
Bédard et al., 1991;
Verstraeten et al., 1996,
1997,
2000;
Schulz et al., 1997;
Camus et al., 1999;
Stone et al., 1994;
Roehrs et al., 1995;
Borak et al., 1996;
Kotterba et al., 1997,
1998;
Cassel et al., 1989;
Walsleben et al., 1989;
Knight et al., 1987;
Muñoz et al., 2000;
Randerath et al., 2000;
Naëgelé et al., 1995;
Barbé et al., 1998;
Redline et al., 1997;
Greenberg et al., 1987;
Kim et al., 1997;
Phillips et al., 1994;
Zozula et al., 1998a,
1998b;
Sloan et al., 1989;
Bonanni et al., 1999;
Kuo et al., 2000;
Pietrini et al., 1998;
Chugh et al., 1998;
Lauer et al., 1998;
Dani et al., 1996;
Dinges et al., 1998;
Morisson et al., 1997;
Weeß, 1996;
Lee et al., 1999;
Rohmfeld et al., 1994).
Since most studies reported multiple outcome measures, data will be reviewed
for different areas of attention separately.
Measures of alertness were employed in six studies (Bonanni et al., 1999; Verstraeten et al., 2000; Weeß, 1996; Lee et al., 1999; Kotterba et al., 1998; Rohmfeld et al., 1994). SRBD patients and controls did not differ in the Critical Flicker Fusion test (CFF) in two studies (Weeß, 1996; Rohmfeld et al., 1994) and in a short two-minute choice reaction time task (Lee et al., 1999). Simple reaction time, on the other hand, was prolonged in patients when compared to controls and norms (Kotterba et al., 1998) as well as to an unspecified database (Bonanni et al., 1999). Verstraeten et al. (2000) found that while some patients showed impaired performance in a phasic alertness task, performance in a tonic alertness task was unimpaired in patients when compared to norms. Only three studies reported means and standard deviations, so that no data integration was undertaken.
Attention span was assessed in ten studies in the auditory (Borak et al., 1996; Knight et al., 1987; Naëgelé et al., 1995; Redline et al., 1997; Greenberg et al., 1987; Pietrini et al., 1998; Lauer et al., 1998; Dani et al., 1996; Verstraeten et al., 2000; Lee et al., 1999) and visual domain (Naëgelé et al., 1995; Pietrini et al., 1998; Lauer et al., 1998). The digit span forward did not differ between patients and controls in one study (Lee et al., 1999), while it was reduced in two others compared to controls (Naëgelé et al., 1995) or norms (Verstraeten et al., 2000). Similarly, two studies (Knight et al., 1987; Lee et al., 1999) found no difference between patients and controls in the reversed digit span; another two (Naëgelé et al., 1995; Redline et al., 1997) found a reduced performance of patients, and a fifth study (Verstraeten et al., 2000) reported that some of the patients showed impaired performance in comparison to norms. The combined digit span did not differ between patients and controls in two studies (Knight et al., 1987; Lauer et al., 1998), while it was reduced in four studies compared to controls (Greenberg et al., 1987; Pietrini et al., 1998; Dani et al., 1996) or norms (Borak et al., 1996). In the visual domain, performance was reduced on the Corsi block-tapping task in one study (Naëgelé et al., 1995); on the Hiskey-Nebraska blocks, in one study (Pietrini et al., 1998) but not in another study (Lauer et al., 1998). In addition, Naëgelé et al. (1995) employed a double encoding task where a visual, a verbal and a double span were assessed, all of which were reduced in patients. Five studies (Naëgelé et al., 1995; Redline et al., 1997; Greenberg et al., 1987; Dani et al., 1996; Lee et al., 1999) reported means and standard deviations for attention span measures. The final data set compared 84 SRBD patients performance to that of 71 controls on a combined digit span measure (Naëgelé et al., 1995; Greenberg et al., 1987; Dani et al., 1996; Lee et al., 1999) or the reversed digit span (Redline et al., 1997). Effect sizes ranged from -0.18 (Lee et al., 1999) to 2.30 (Dani et al., 1996) with significant between-study heterogeneity (x2=9.61, df=4, p<0.05; Table 3). Figure 1 shows the individual study effect sizes.
Focused attention was assessed in 22 studies (Findley et al., 1986, 1991; Sauter et al., 2000; Bédard et al., 1991; Stone et al., 1994; Borak et al., 1996; Kotterba et al., 1997, 1998; Cassel et al., 1989; Walsleben et al., 1989; Knight et al., 1987; Naëgelé et al., 1995; Redline et al., 1997; Greenberg et al., 1987; Kim et al., 1997; Phillips et al., 1994; Zozula et al., 1998a; Kuo et al., 2000; Lauer et al., 1998; Dinges et al., 1998; Verstraeten et al., 2000; Lee et al., 1999). In the majority of studies more than one test was employed. For this reason data will be first reviewed for separate tests and then for pooled measures of focused attention.
Nineteen studies compared Trail Making Test (TMT) performance of SRBD patients and control subjects (Bédard et al., 1991; Naëgelé et al., 1995, 1999; Redline et al., 1997; Greenberg et al., 1987; Kim et al., 1997; Phillips et al., 1994; Zozula et al., 1998a; Kuo et al., 2000; Findley et al., 1991; Lee et al., 1999; Kotterba et al., 1998) or population norms (Findley et al., 1986; Sauter et al., 2000; Kotterba et al., 1997, 1998; Cassel et al., 1989; Walsleben et al., 1989; Verstraeten et al., 2000). Thirteen studies employed the Trails A with seven of them (Redline et al., 1997; Phillips et al., 1994; Zozula et al., 1998a; Naëgelé et al., 1999; Kuo et al., 2000; Lauer et al., 1998; Lee et al., 1999) reporting no difference between patients and controls. Two studies (Naëgelé et al., 1995; Kotterba et al., 1998) found that patients scored lower than controls, and four studies (Kotterba et al., 1997; Cassel et al., 1989; Walsleben et al., 1989; Verstraeten et al., 2000) reported that patients' performance was impaired when compared to norms. Trails A effect sizes from only four studies were available (Naëgelé et al., 1995; Redline et al., 1997; Kuo et al., 2000; Lee et al., 1999), thus data integration was not undertaken. Fourteen studies used the Trails B with half of them (Naëgelé et al., 1995; Redline et al., 1997; Greenberg et al., 1987; Kim et al., 1997; Kuo et al., 2000; Findley et al., 1991; Lee et al.,