False Memory for Narrative Statements

A good novel encourages readers to make inferences about events that are happening behind the scenes. However, that leads to a common mistake: as readers proceed through the novel, they may mistake inferences that they have made based on the plot for events that were specifically discussed at some point in the story. In a classic paper, Bransford and Franks (1971) studied a laboratory analogue of this situation and showed that participants' recollections of narratives are thoroughly infected with false memories, and that false memories seem to be as strong as true ones (later research showed the latter result to be qualified by faulty instructions; Reyna & Kiernan, 1994). Perhaps most importantly, they found that those false memories are by-products of making sense of narratives, of integrating sentences into more general ideas—a process known as gist memory in fuzzy-trace theory (Reyna & Brainerd, 1995). The adult participants in Bransford and Franks' research listened to a series of sentences that had been constructed so as to contain different numbers of meaning units (propositions), and after they had studied the sentences they were given an old-new recognition test on which some of the probes were old (i.e., sentences that they had just read) and others were new sentences that contained the same meaning units. To take a concrete example that has often been recounted in textbooks, the participants studied 12 sentences that contained the following four propositions:

The ants are in the kitchen.

The jelly is sweet.

The jelly is on the table.

The ants ate the jelly.

Examples of the sentences that were studied are shown in Table 9.1. Note that the sentences varied in complexity, with complexity being determined by the number of propositions in each sentence—one, two, or three. All of these sentences were presented on the old-new recognition test, along with new sentences such as, "The ants ate the jelly that was on the table," "The ants in the kitchen ate the sweet jelly," and "The ants in the kitchen ate the sweet jelly that was on the table." Although none of the latter sentences was studied, it is obvious that each is consistent with the gist of the ant story.

Bransford and Franks' (1971) participants listened to a total of four narratives (six sentences apiece) before they responded to the recognition test. There were three types of probes on the test: presented and unpresented sentences of the sort just described, plus further unpresented sentences that violated the gist of that narrative. After each probe was presented, participants were required to make two responses: to indicate whether the sentence was old or new and to rate how confident they were in this old-new judgment on a scale of 1 to 5. There are two key results that are very surprising. First, although the participants were very good at identifying new sentences as new when they violated the gist of the narratives, they made poor choices when new sentences preserved the gist of the narratives. Second, the participants' confidence in their memories for presented sentences and in their false memories for unpresented gist-preserving sentences were similar. This second pattern is shown in Figure 9.1. Participants' confidence ratings for a sentence were given a positive sign when the recognition judgment was old and a negative sign table 9.1. Sentences from Bransford and Franks (1971) Narrative False Memory Study

Complexity Levels Sentences

1 The ants were in the kitchen.

The jelly was on the table.

2 The ants in the kitchen ate the jelly.

The ants ate the sweet jelly.

3 The ants ate the sweet jelly which was on the table.

when it was new, regardless of whether the sentence was actually old or new. Common sense (and memory theories of that era) would say that presented sentences would be recognized as old with high positive ratings, while unpresented sentences would be recognized as new with high negative ratings. Instead, unpresented sentences were recognized as old at very high levels. What is more, it can be seen in Figure 9.1 that participants' confidence in their erroneous recognition of unpresented sentences was indistinguishable from their confidence in their correct recognition of presented sentences. In fact, as can also be seen, confidence ratings were not predicted at all by whether a sentence was actually old or new. Instead, confidence was predicted by how many narrative propositions from the story a test sentence contained (one, two, three, or four). In other words, the more completely a test sentence recapitulated the gist of a narrative, the more confident participants were in their memory judgments about it, regardless of whether the judgment was correct or incorrect.

In the years that followed the publication of this research, a number of developmental investigators studied the question of whether children also display these extremely high levels of spontaneous memory distortion for narratives. The earliest publication was an article by Paris and Carter (1973). These investigators presented children with shorter narratives than Bransford and Franks (1971) had used. The narratives consisted








figure 9.1. Relations between number of meaning propositions contained in sentence-recognition probes, true memory (hits), and false memory (false alarms) in Bransford and Franks's (1971) research.

of two sentences that specified a spatial relation between some common objects (e.g., "The bird is in the cage. The cage is under the table") and a filler sentence (e.g., "The bird has yellow feathers"). It was found that, like Bransford and Franks's adults, children were very likely to state that new sentences were old when those sentences preserved the gist of the short narrative (e.g., "The bird is under the table"). Other studies of children's false memory for narrative states were soon reported by investigators such as Johnson and Scholnick (1979), Liben and Posnansky (1977), Paris and Mahoney (1974), Prawatt and Cancelli (1976), and Small and Butterworth (1981). Paris and Carter's original finding continued to hold. Children falsely recognized new sentences that preserved the gist of narratives, and they did so at much higher levels than they falsely recognized new sentences that violated the gist of narratives. Other investigators found that it was not only sentences for which children displayed this effect. They also falsely recognized numbers and pictures that preserved the gist of narratives (Brainerd & Reyna, 1993; Brainerd & Gordon, 1994; Paris & Mahoney, 1974).

As interesting as these results are, the question of what causes them— their theoretical interpretation, in other words—is of far greater significance (see Reyna, 1996; Reyna & Kiernan, 1994, 1995). For the most part, the aforementioned investigators relied on an interpretation known as constructivism, which Bransford and Franks (1971) had used to explain their data and that had originally been proposed by Bartlett (1932). According to constructivism, people do not retain memories of the individual sentences in a narrative but instead store the meaning content of sentences and develop memories that Bransford and Franks called holistic semantic structures or schemas. The crux of this hypothesis is that information about the exact details of sentences is not preserved in memory, at least not beyond a few minutes, and instead an overall, integrated semantic representation is preserved that is then used to make judgments about memory probes. Considering the extensive developmental literature on limitations in young children's meaning-making (e.g., Bjorklund & Hock, 1982; Bjorklund & Jacobs, 1985) and in making meaning-driven inferences (e.g., Piaget & Inhelder, 1973), a straightforward developmental prediction of constructivism is that false memory for narratives should be less pronounced in young children than in older children and less pronounced in older children than in adolescents or adults.

Although Bransford and Franks's (1971) data certainly seem to suggest that the formation of overall meaning structures controls memory judgments, from which the prediction of developmental increases in narrative false memory falls out, developmental studies failed to generate consistent findings confirming that prediction. Actually, the findings were rather confusing. On the one hand, investigators such as Liben and Posnansky (1977) and Paris and Carter (1973) reported studies in which no definite pattern of development was detected. Regardless of whether development was measured by chronological age or tests of conceptual understanding, children from both more and less advanced developmental levels exhibited comparable levels of false memory for meaning-preserving statements. On the other hand, as constructivism predicts, investigators such Johnson and Scholnick (1979) and Prawatt and Cancelli (1976) reported that false memory for meaning-preserving statements increased with age in all conditions. If that were not sufficiently confusing, still other studies were reported by Ackerman (1992, 1994) and by Poole and White (1991) in which developmental decreases in false memory were obtained in all conditions. Finally, there were published studies whose developmental patterns were internally inconsistent. Here, Brown, Smiley, Day, Townsend, and Lawton (1977) and Paris and Mahoney (1974) reported studies in which in narrative false memories displayed different age trends in different conditions.

To muddy the waters further, some investigators pointed to a potential methodological problem in these studies, a problem in which the familiarity of the meaning of test probes is confounded with the familiarity of their surface form. In the sorts of test materials that were used to measure true and false memory for narratives, true test sentences contained only words that were presented to the children earlier (in a prior example: "The bird is under the table"), and false sentences always contained new words (in a prior example: "The bird is out of the cage") Thus, when true sentences are correctly recognized or when false sentences are correctly rejected, contrary to constructivism, this may simply be because the words are familiar, not because the meanings of narrative sentences have been understood and remembered. This problem of confounding the meaning familiarity of narratives with the familiarity of the words that are used in the sentences of which narratives are composed was eventually solved by Paris and Mahoney (1974) and Liben and Posnansky (1977; see also Reyna & Kiernan, 1994, 1995, who further unconfounded paraphrases from inferences, separating comprehension from reasoning). These authors developed a narrative procedure in which meaning familiarity and word familiarity were factorially manipulated. However, when this design was applied in a series of developmental studies, the data ran decidedly against constructivism. Contrary to that hypothesis, false memory for statements that preserved the meaning of narratives declined with age in all conditions.

That this disconfirmation of constructivism is a correct finding is suggested by disconfirmation of another key prediction of constructivism that is concerned with dissociations between true and false memory. The prediction in question follows from the fact that constructivism is what memory theorists call a one-process theory; that is, it posits that a single memory representation of a narrative, a schema, underlies participants' responses to all test probes—true statements, false-but-gist-consistent statements, and false-and-gist-violating statements. Because the same schema is used for all responses, children's performance on the different types of probes should be related as follows: the stronger and more accessible a schema is, the more likely children should be to recognize both true statements and unpresented-but-gist-consistent statements as old, because both fit the schema, and the less likely they should be to recognize unpresented-and-gist-violating statements as old. In the earliest article using Bransford and Franks's (1971) procedures with children, Paris and Carter (1973) focused attention on the first half of this prediction, noting that false recognition of unpresented-but-gist-consistent statements should increase as correct recognition of presented statements increased. This half of the prediction is the more interesting one because it seems counterintuitive. It says, paradoxically, that the more accurate children's memories are (as indexed by the hit rate for actual statements), the less accurate children's memories are (as indexed by the false-alarm rate for unpresented-but-gist-consistent statements).

Of the various papers that were published prior to the appearance of Reyna and Kiernan's (1994, 1995) articles, we know of none that reported credible evidence that confirmed the putative association between hit and false-alarm rates that constructivism expects. The prediction was examined in detail in these latter articles. Reyna and Kiernan's studies, with children in the elementary grades, disconfirmed the prediction in all conditions and at all age levels. Rather than being positively associated, children's hit and false-alarm rates were dissociated on recognition tests: hits and false alarms were uncorrelated within experimental conditions, and they were driven in opposite directions by certain experimental manipulations. In the latter connection, it was found that when narratives were changed from statements about spatial relations (e.g., the bird narrative, above) to statements about magnitude relations (e.g., "The coffee is hotter than the tea. The tea is hotter than the cocoa. The cocoa is sweet"), the hit rate went up and the false alarm rate went down. Although Reyna and Kiernan's dissociation findings disconfirmed a core prediction of constructivism, their findings were replicated in other contexts. For instance, in a doctoral dissertation that grew out their work, Lim (2003) found that adults exhibited the same patterns of dissociation. Similarly, Brain-erd and Gordon (1994) and Brainerd and Reyna (1995) found analogous dissociations between hit and false-alarm rates in children's memory for numbers that had been presented in narratives, and Brainerd and Reyna (1993) found dissociations when pictures were used to measure children's memory for true narrative statements and unpresented-but-gist-consistent statements.

Summing up the story so far, constructivism was developed to explain the powerful false-memory effects that are present in narrative tasks, but the developmental data ran heavily against its most central predictions, which leads one to ask, what other theory can handle the data? Two possibilities are spreading-activation theories, which are in the classical associationist tradition of verbal learning research, and by the source-monitoring framework (e.g., Johnson et al., 1993). Although spreading-activation theories have most commonly been applied to memory for individual items, rather than memory for connected statements, Anderson and his associates have developed models of this sort for sentences (e.g., Anderson, Budiu, & Reder, 2001). In such models, recognition of unpresented-but-gist-consistent sentences is explained on the ground that (a) presented sentences and unpresented sentences activate certain areas (nodes) of underlying associative (or semantic) networks, and (b) the areas that are activated by presented sentences overlap considerably with the areas that are activated by unpresented-but-gist-consistent sentences (making it difficult to discriminate between them). Readers will probably have noticed that there is already an empirical problem with spreading-activation models. They, too, are one-process theories (the same network is activated by presented and unpresented-but-gist-consistent sentences), and hence, unless they are enriched with ad hoc assumptions, they pre dict the same patterns of association between hits and false alarms that constructivism predicts (Reyna et al., 2007). It is easy to see that spreading activation has a difficult time with the finding that some manipulations drive hit and false-alarm rates in opposite directions. When two sentences activate the same area of a network (because they have the same meaning), how can activation simultaneously go up for one sentence and down for the other? Activation theories also have difficulty accounting for the fact that in certain types of experimental conditions, participants recognize unpresented-but-gist-consistent sentences at higher rates than true sentences (Reyna & Lloyd, 1997).

What about the source-monitoring framework? Although, historically, this has been one of the most influential theories in false-memory research (for reviews, see Reyna, 2000a; Reyna & Lloyd, 1997), it makes the same predictions of association between hit and false-alarm rates as constructivism, which is perhaps not surprising considering that it evolved from constructivism. This account focuses on participants' ability to judge the points of origin of familiar information. (Was it actually presented to me in the experiment? Is it just similar to something that was presented in the experiment? Did I dream it? Did I hear it on the radio this morning? Did I read it in the newspaper last night?) The core claim is that these judgments rely on the content of the memories that were stored in connection with experienced events. As Reyna and Lloyd (1997) and Reyna (2000a) showed, constructivism's prediction of positive association between hit and false-alarm rates then falls out because the same memory content is used to make judgments about experienced events (which controls the hit rate) and about closely related events that were not actually experienced (which controls the false-alarm rate). More specifically, contrary to many findings of dissociation in the literature, the source-monitoring framework forecasts association between presented and related (but un-presented) items in memory because rememberers are said to decide that unpresented-but-gist-consistent sentences were actually presented (rather than inferred) because they share a great deal of the memory content of actual experience and because rememberers do not necessarily access the few features of memory content that they do not share. With respect to developmental trends in false memory, the source-monitoring framework is more successful in accounting for the developmental declines in false memory that have been observed in experiments that eliminated the aforementioned confound between meaning familiarity and word familiarity. Several developmental source-monitoring studies have been reported, and the consistent prediction in those studies has been that children's ability to make accurate judgments of the points of origin of their memories will improve steadily with age. The prediction has been repeatedly confirmed using designs in which children make source judgments about probes (Was this sentence presented in red or green letters? Was this sentence presented in large or small letters?), rather than old/new memory judgments (Roberts, 2002). If erroneous source judgments are responsible for false memories, narrative false memories should decline with age, as they do in unconfounded experimental designs. However, as we saw, contrary developmental trends were obtained in experiments that used other designs. The source-monitoring framework has nothing definite to say in that connection.

However, fuzzy-trace theory (FTT; see Brainerd & Reyna, 2001; Reyna & Brainerd, 1995) is able handle those conflicting developmental patterns. FTT is an attempt to avoid the mistake of throwing out the theoretical baby with the theoretical bath water by preserving and integrating the successful parts of two important traditions, verbal learning and psycholinguistics, and then applying them to relations between memory and reasoning (Reyna & Brainerd, 1995). False memory is a prime area of application because it is not quite memory and not quite reasoning. FTT posits that children store dissociated verbatim and gist representations of their experience, that they do so in parallel (rather than the gist of experience being distilled from memory for its verbatim form), that verbatim and gist representations also exhibit retrieval dissociation because they are accessed by different types of cues, and that verbatim and gist representations become inaccessible (i.e., are "forgotten") at different rates as time passes. Developmentally, FTT posits that both verbatim and gist memory improve with age: older children are better able to store, retrieve, and preserve traces of the exact surface form of their experience; older children understand and are able to extract a broader range of meanings from experience (e.g., robin is a bird but is also a mascot); and when different experiences exemplify related meanings, older children are better able to connect that meaning across the different experiences (e.g., "There were several animals—birds, cows, and cats—on that list that was just read to me") Reyna and Kiernan's (1994, 1995) experiments were designed not only to eliminate the meaning/word familiarity confound but also to test explicit predictions that FTT makes about children's true and false memories. According to Reyna and Kiernan, the predictions that one makes about different experimental manipulations turn on a simple question, namely, how would those manipulations be expected to affect the relative accessibility of verbatim and gist traces of experience? Examples of forensically significant manipulations for which the answer is straightforward include delay (verbatim traces become inaccessible more rapidly than gists), age (verbatim and gist memory both improve with age), materials (some types of memory materials, such as pictures and or metaphorical statements, enhance verbatim traces more than gist traces, while other materials have the opposite effect), and memory test instructions (some instructions encourage verbatim processing and others encourage gist processing). But how does this translate into concrete predictions about children's true and false memories? The answer is that predictions follow from the fact that FTT is an opponent-processes model with specific representational assumptions: gist processing is assumed to support true memories of actual events ("I drank a Coke at the baseball game") and false memories that preserve the meaning of those events ("I drank a Pepsi at the baseball game"), whereas verbatim processing is assumed to support true memories and to suppress false ones ("No, I didn't drink a Pepsi because I clearly remember Mom telling me to buy Cokes for us"). At this point, it is easy to see that FTT expects the sorts of dissociations that have been observed between true and false memories (because they are controlled by opposing processes). Further, predictions about the preceding manipulations are now obvious: false memory will tend to increase with delay (because the traces that suppress false memories become inaccessible more rapidly than the traces that support false memories); materials that strengthen verbatim traces will reduce false memories, but materials that strengthen gist traces will increase false memories; test instructions that stress verbatim processing will reduce false memories, but test instructions that stress gist processing will increase false memories. All of these predictions were studied by Reyna and Kiernan (1994, 1995) and also by Brainerd and Gordon (1994), and all were confirmed.

What does FTT say about age or developmental variability in false memory? As will be readily apparent to most readers, all possible developmental trends (increase, decrease, no change) can occur under specific experimental conditions because (a) verbatim and gist processing both improve with age and (b) they are opponent processes with respect to false memory. Thus, as we have discussed elsewhere (Brainerd & Reyna, 1998), developmental trends in false memory are not monolithic across experimental conditions but rather are variable as a function of experimental conditions. To put it another way, the age trend that is observed in a particular condition will depend in the first instance on the mix of verbatim and gist processing that occurs in that condition, and in the second on the respective amounts of verbatim and gist development that occur during the target age range.

An important feature of this analysis is that it has no difficulty handling the basic finding of inconsistent age trends in studies of children's narrative false memories. A far more important feature, however, is that different age trends are not actually inconsistent, but they are expected to occur on theoretical grounds and, even more important, their direction can be predicted. Assuming that we are dealing with some age range in which there are substantial age improvements in both verbatim and gist processing, it is obvious that age declines are predicted for tasks in which performance variability is primarily under the control of verbatim processing, age increases are predicted for tasks in which performance variability is primarily under the control of gist processing, and age invariance is predicted when verbatim and gist processing make reasonably equivalent contributions to performance variability. Are such predictions confirmed? One way to answer this question is to conduct a retrospective analysis of the aforementioned studies of children's false memories of narratives in order to determine whether there are salient methodological differences that divide along verbatim-gist lines and are therefore probably responsible for the contrasting results. A more satisfactory approach is to study the development of false memory using other paradigms that allow the mix of verbatim and gist processing to be brought under experimental control. We take up that issue in the next section, where developmental studies of false memory for words are considered.

Before we move to that topic,however,it should be mentioned that FTT makes predictions about false memory phenomena in children that are of considerable forensic interest. Four examples are the mere-testing effect (Brainerd & Reyna, 1996), the false-persistence effect (Brainerd, Reyna, & Brandse, 1995), the false-recognition reversal effect (Brainerd, Reyna, Kneer, 1995), and the repeated gist cuing effect (Reyna, 2000b; Reyna & Lloyd, 1997). The mere-testing effect is the tendency for there to be elevated errors on follow-up memory tests (as opposed to initial memory tests), the false-persistence effect is the tendency of false memories to be stable over time, the false-recognition reversal effect is the fact that participants can sometimes be highly accurate at rejecting false memories, and the repeated gist cuing effect is the tendency of false memories to increase as memories of the meaning of experience become stronger. The first effect, which bears on the aforementioned issue of repeated questioning of children in legal cases, is based on the theoretical idea that the mere administration of a false-but-gist-consistent information on a memory test ("Did you drink Pepsi at the baseball game?") stimulates memory-falsifying gist processing. If this principle is correct, false memory for that information should be elevated on later tests—and it is. The second effect, which bears on the types of retrieval environments that are provided when children are questioned in legal cases, such as returning to the scene of the crime, is based on the principle that providing exact surface cues from the target experience will stimulate memory-defalsifying verbatim processing. If this principle is correct, testing participants in the presence of such cues should reduce false memory when questions are asked about false-but-gist-consistent information; it does. The third effect, which bears on whether children who are questioned in legal cases should be interviewed in a timely fashion, is based on the principle that verbatim traces become inaccessible more rapidly than gist. If this principle is correct, false memory for gist-consistent information should be higher a few days after events, than, say, shortly after; it is. The last effect, which is relevant to legal cases in which children have been repeatedly exposed to the same crime, such as CSA, is based on the principle what when people are repeatedly exposed to situations in which the gist remains the same but the details vary, memory-falsifying gist processing is strengthened. If this principle is correct, false memory for gist-consistent information should increase as repetition increases, and it does. This last effect can have especially pernicious consequences for convicting CSA perpetrators when their victims have been repeatedly abused, however. The same repetition of superficially different, but related, events that reinforces memory for gist interferes with accurate memory for verbatim details. Thus, repeated abuse, in which separate events resemble one another but differ in details, should lead to poor memory for details. Thus, the theory says, with repeated abuse, it will be easier for defense investigators and cross-examining attorneys to impeach children's allegations of abuse by showing that their testimony is inaccurate with respect to key details. That is especially tragic because the gist memories of repeated victims of CSA (i.e., that they were in fact abused) are more likely to be accurate than the gist memories of children who have suffered a single incident of abuse. However, the law bases convictions on verbatim details, such as whether the perpetrator had a tattoo on his left or right arm. Also, as we saw earlier, whether or not a crime has occurred may depend on precise differences in behavior (e.g., exactly where a touch occurred), and if a crime has occurred, differences in its judged severity likewise depend on precise differences in behavior (e.g., whether a touch occurred over or under a child's clothing).

