The loci of Stroop effects: a critical review of methods and evidence for levels of processing contributing to color-word Stroop effects and the implications for the loci of attentional selection

  • Open access
  • Published: 13 August 2021
  • Volume 86 , pages 1029–1053, ( 2022 )

Cite this article

You have full access to this open access article

stroop effect research paper

  • Benjamin A. Parris   ORCID: orcid.org/0000-0003-2402-2100 1 ,
  • Nabil Hasshim 1 , 2 , 5 ,
  • Michael Wadsley 1 ,
  • Maria Augustinova 3 &
  • Ludovic Ferrand 4  

15k Accesses

47 Citations

12 Altmetric

Explore all metrics

Despite instructions to ignore the irrelevant word in the Stroop task, it robustly influences the time it takes to identify the color, leading to performance decrements (interference) or enhancements (facilitation). The present review addresses two questions: (1) What levels of processing contribute to Stroop effects; and (2) Where does attentional selection occur? The methods that are used in the Stroop literature to measure the candidate varieties of interference and facilitation are critically evaluated and the processing levels that contribute to Stroop effects are discussed. It is concluded that the literature does not provide clear evidence for a distinction between conflicting and facilitating representations at phonological, semantic and response levels (together referred to as informational conflict), because the methods do not currently permit their isolated measurement. In contrast, it is argued that the evidence for task conflict as being distinct from informational conflict is strong and, thus, that there are at least two loci of attentional selection in the Stroop task. Evidence suggests that task conflict occurs earlier, has a different developmental trajectory and is independently controlled which supports the notion of a separate mechanism of attentional selection. The modifying effects of response modes and evidence for Stroop effects at the level of response execution are also discussed. It is argued that multiple studies claiming to have distinguished response and semantic conflict have not done so unambiguously and that models of Stroop task performance need to be modified to more effectively account for the loci of Stroop effects.

Similar content being viewed by others

stroop effect research paper

Different types of semantic interference, same lapses of attention: Evidence from Stroop tasks

stroop effect research paper

Semantic Stroop interference is modulated by the availability of executive resources: Insights from delta-plot analyses and cognitive load manipulation

stroop effect research paper

A spatial version of the Stroop task for examining proactive and reactive control independently from non-conflict processes

Avoid common mistakes on your manuscript.

Introduction

In his doctoral dissertation, John R. Stroop was interested in the extent to which difficulties that accompany learning, such as interference, can be reduced by practice (Stroop, 1935 ). For this purpose, he construed a particular type of stimulus. Stroop displayed words in a color that was different from the one that they actually designated (e.g., the word red in blue font). After he failed to observe any interference from the colors on the time it took to read the words (Exp.1), he asked his participants to identify their font color. Because the meaning of these words (e.g., red) interfered with the to-be-named target color (e.g., blue), Stroop observed that naming aloud the color of these words takes longer than naming aloud the color of small squares included in his control condition (Exp.2). In line with both his expectations and other learning experiments carried out at the time, this interference decreased substantially over the course of practice. However, daily practice did not eliminate it completely (Exp.3). During the next thirty years, this result and more generally this paradigm received only modest interest from the scientific community (see, e.g., Jensen & Rohwer, 1966, MacLeod, 1992 for discussions). Things changed dramatically when color-word stimuli, ingeniously construed by Stroop, became a prime paradigm to study attention, and in particular selective attention (Klein, 1964 ).

The ability to selectively attend to and process only certain features in the environment while ignoring others is crucial in many everyday activities (e.g., Jackson & Balota, 2013 ). Indeed, it is this very ability that allows us to drive without being distracted by beautiful surroundings or to quickly find a friend in a hallway full of people. It is clear then that an ability to reduce the impact of potentially interfering information by selectively attending to the parts of the world that are consistent with our goals, is essential to functioning in the world as a purposive individual. The Stroop task (Stroop, 1935 ), as this paradigm is now known, is a selective attention task in that it requires participants to focus on one dimension of the stimulus whilst ignoring another dimension of the very same stimulus. When the word dimension is not successfully ignored, it elicits interference: Naming aloud the color that a word is printed in takes longer when the word denotes a different color (incongruent trials, e.g., the word red displayed in color-incongruent blue font) compared to a baseline condition. This difference in color-naming times is often referred to as the Stroop interference effect or the Stroop effect (see the section ‘Definitional issues’ for further development and clarifications of these terms).

Evidencing its utility, the Stroop task has been widely used in clinical settings as an aid to assess disorders related to frontal lobe and executive attention impairments (e.g., in attention deficit hyperactivity disorder, Barkley, 1997 ; schizophrenia, Henik & Salo, 2004 ; dementia, Spieler et al., 1996 ; and anxiety, Mathews & MacLeod, 1985 ; see MacLeod, 1991 for an in-depth review of the Stroop task). The Stroop task is also ubiquitously used in basic and applied research—as indicated by the fact that the original paper (Stroop, 1935 ) is one of the most cited in the history of psychology and cognitive science (e.g., Gazzaniga et al., 2013 ; MacLeod, 1992 ). It is, however, important to understand that the Stroop task as it is currently employed in neuropsychological practice (e.g., Strauss et al., 2007 ), its implementations in most basic and applied research (see here below), and leading accounts of the effect it produces, are profoundly rooted in the idea that the Stroop effect is a unitary phenomenon in that it is caused by the failure of a single mechanism (i.e., it has a single locus). By addressing the critical issue of whether there is a single locus or multiple loci of Stroop effects, the present review not only addresses several pending issues of theoretical and empirical importance, but also critically evaluates these current practices.

The where vs. the when and the how of attentional control

The Stroop effect has been described as the gold standard measure of selective attention (MacLeod, 1992 ) in which a smaller Stroop interference effect is an indication of greater attentional selectivity. However, the notion that it is selective attention that is the cognitive mechanism enabling successful performance in the Stroop task has recently been sidelined (see Algom & Chajut, 2019 , for a discussion of this issue). For example, in a recent description of the Stroop task, Braem et al. ( 2019 ) noted that the size of the Stroop congruency effect is “indicative of the signal strength of the irrelevant dimension relative to the relevant dimension, as well as of the level of cognitive control applied” (p769). Cognitive control is a broader concept than selective attention in that it refers to the entirety of mechanisms used to control thought and behavior to ensure goal-oriented behavior (e.g., task switching, response inhibition, working memory). Its invocation in describing the Stroop task has proven to be somewhat controversial given that it implies the operation of top-down mechanisms, which might or might not be necessary to explain certain experimental findings (Algom & Chajut, 2019 ; Braem et al., 2019 ; Schmidt, 2018 ). It does, however, have the benefit of hypothesizing a form of attentional control that is not a static, invariant process but instead posits a more dynamic, adaptive form of attentional control, and provides foundational hypotheses about how and when attentional control might happen. However, the present work addresses that which the cognitive control approach tends to eschew (see Algom & Chajut, 2019 ): the question of where the conflict that causes the interference comes from. Importantly, the answer to the where question will have implication for the how and when questions.

The question of where the interference derives has historically been referred to as the locus of the Stroop effect (e.g., Dyer, 1973 ; Logan & Zbrodoff, 1998 , Luo, 1999 ; Scheibe et al., 1967 ; Seymour, 1977 ; Wheeler, 1977 ; see also MacLeod, 1991 , and Parris, Augustinova & Ferrand, 2019 ). Whilst, by virtue of our interest in where attentional selection occurs, we review evidence for the early or late selection of information in the color-word Stroop task, recent models of selective attention have shown that whether selection is early or late is a function of either the attentional resources available to process the irrelevant stimulus (Lavie, 1995) or the strength of the perceptual representation of the irrelevant dimension (Tsal & Benoni, 2010 ). Moreover, despite being referred to as the gold standard attentional measure and as one of the most robust findings in the field of psychology (MacLeod, 1992 ), it is clear that Stroop effects can be substantially reduced or eliminated by making what appear to be small changes to the task. For example, Besner, Stolz, and Boutillier ( 1997 ) showed that the Stroop effect can be reduced and even eliminated by coloring a single letter instead of all letters of the irrelevant word (although notably they used button press responses which produced smaller Stroop effects (Sharma & McKenna, 1998 ) making it easier to eliminate interference; see also Parris, Sharma, & Weekes, 2007 ). In addition, Melara and Mounts ( 1993 ) showed that by making the irrelevant words smaller to equate the discriminability of word and color, the Stroop effect can be eliminated and even reversed.

Later, Dishon-Berkovits and Algom ( 2000 ) noted that often in the Stroop task the dimensions are correlated in that one dimension can be used to predict the other (i.e., when an experimenter matches the number of congruent (e.g., the word red presented in the color red) and incongruent trials in the Stroop task, the irrelevant word is more often presented in its matching color than in any other color which sets up a response contingency). They demonstrated that when this dimensional correlation was removed the Stroop effect was substantially reduced. By showing that the Stroop effect is malleable through the modulation of dimensional uncertainty (degree of correlation of the dimensional values and how expected the co-occurrences are) or dimensional imbalance (of the salience of each dimension) their data, and resulting model (Melara & Algom, 2003 ; see also Algom & Fitousi, 2016 ), indicate that selective attention is failing because the experimental set-up of the Stroop task provides a context with little or no perceptual load / little or no perceptual competition, and where the dimensions (word and color) are often correlated and / or asymmetrical in discriminability that contributes to the robust nature of the Stroop effect. In other words, the Stroop task sets selective attention mechanisms up to fail, pitching as it does the intention to ignore irrelevant information against the tendency and resources to process conspicuous and correlated characteristics of the environment (Melara & Algom, 2003 ). But, in the same way that neuropsychological impairments teach us something about how the mind works (Shallice, 1988 ), it is these failures that give us an opportunity to explore the architecture of the mechanisms of selective attention in healthy and impaired populations. We, therefore, ask the question: if control does fail, where (at what levels of processing) is conflict experienced in the color-word Stroop task?

Given our focus on the varieties of conflict (and facilitation), the where of control, we will not concern ourselves with the how and the when of control. Manipulations and models of the Stroop task that are not designed to understand the types of conflict and facilitation that contribute to Stroop effects such as list-wise versus item-specific congruency proportion manipulations (e.g., Botvinick et al., 2001 ; Bugg, & Crump, 2012 ; Gonthier et al., 2016 ; Logan & Zbrodoff, 1979 ; Schmidt & Besner, 2008 ; Schmidt, Notebaert, & Van Den Bussche, 2015 ; see Schmidt, 2019 , for a review) or memory load manipulations (e.g., De Fockert, 2013 ; Kalanthroff et al., 2015 ; Kim et al., 2005 ; Kim, Min, Kim & Won, 2006 ), will be eschewed, unless these manipulations are specifically modified in a way that permits the understanding of the processing involved in producing Stroop interference and facilitation. To reiterate the aims of the present review, here we are less concerned with the evaluative function of control which judges when and how control operates (Chuderski & Smolen, 2016 ), but are instead concerned with the regulative function of control and specifically at which processing levels this might occur. In short, the present review attempts to identify whether at any level, other than the historically favoured level of response output, processing reliably leads to conflict (or facilitation) between activated representations. Before we address this question, however, we must first address the terminology used here and, in the literature, to describe different types of Stroop effects.

Definitional issues to consider before we begin

A word about baselines and descriptions of stroop effects.

Given the number of studies that have employed the Stroop task since its inception in 1935, it is no surprise that a variety of modifications of the original task have been employed, including the introduction of new trial types (as exemplified by Klein, 1964 ) and new ways of responding, to measure and understand mechanisms of selective attention. This has led to disagreement over what is being measured by each manipulation, obfuscating the path to theoretical enlightenment. Various trial types have been used to distinguish types of conflict and facilitation in the color-word Stroop task (see Fig.  1 ), although with less fervor for facilitation varieties, resulting in a lack of agreement about how one should go about indexing response conflict, semantic conflict, and other forms of conflict and facilitation. Indeed, as can be seen in Fig.  1 , one person’s semantic conflict can be another person’s facilitation; a problem that arises due to the selection of the baseline control condition. Differences in performance between a critical trial and a control trial might be attributed to a specific variable but this method relies on having a suitable baseline that differs only in the specific component under test (Jonides & Mack, 1984 ).

figure 1

This figure shows examples of the various trial types that have been used to decompose the Stroop effect into various types of conflict (interference) and facilitation. This has resulted in a lack of clarity about what components are being measured. Indeed, as can be seen, one person’s semantic conflict can be another person’s facilitation, a problem that arises due to the selection of the baseline control condition

Selecting an appropriate baseline, and indeed an appropriate critical trial, to measure the specific component under test is non-trivial. For example, congruent trials, first introduced by Dalrymple-Alford and Budayr ( 1966 , Exp. 2), have become a popular baseline condition against which to compare performance on incongruent trials. Congruent trials are commonly responded to much faster than incongruent trials and the difference in reaction time between the two conditions has been variously referred to as the Stroop congruency effect (e.g., Egner et al., 2010 ), the Stroop interference effect (e.g., Leung et al., 2000 ), and the Total Stroop Effect (Brown et al., 1998 ), and Color-Word Impact (Kahneman & Chajczyk, 1983 ). However, when compared to non-color-word neutral trials, congruent trials are often reported to be responded to faster, evidencing a facilitation effect of the irrelevant word on the task of color naming (Dalrymple-Alford, 1972 ; Dalrymple-Alford & Budayr, 1966 ). Referring to the difference between incongruent and congruent trials as Stroop interference then—as is often the case in the Stroop literature—fails to recognize the role of facilitation observed on congruent trials and epitomizes a wider problem. As already emphasized by MacLeod ( 1991 ), this difference corresponds to “(…) the sum of facilitation and interference, each in unknown amounts” (MacLeod, 1991 , p.168). Moreover, as will be discussed in detail later, congruent trial reaction times have been shown to be influenced by a newly discovered form of conflict, known as task conflict (Goldfarb & Henik, 2007 ) and are not, therefore, straightforwardly a measure of facilitation either.

Furthermore, whilst the common implementation of the Stroop task involves incongruent, congruent, and non-color-word neutral trials (or perhaps where the non-color-word neutral baseline is replaced by repeated letter strings e.g., xxxx), this common format ignores the possibility that the difference between incongruent and neutral trials involves multiple processes (e.g., semantic and response level conflict). As Klein ( 1964 ) showed the irrelevant word in the Stroop task can refer to concepts semantically associated with a color (e.g., sky; Klein, 1964 ), potentially permitting a way to answer to the question of whether selection occurs early at the level of semantics, before response selection, in the processing stream. But it is unclear whether such trials are direct measures of semantic conflict or indirect measures of response conflict.

Here, we employ the following terms: We refer to the difference between incongruent and congruent conditions as the Stroop congruency effect , because it contrasts performance in conditions with opposite congruency values. For the reasons noted above, the term Stroop interference or just interference is preferentially reserved for referring to slower performance on one trial type compared to another. The word conflict will denote competing representations at any particular level that could be the cause of interference (note that interference might not result from conflict (De Houwer, 2003 ) as, for example, in the emotional Stroop task, interference could result without conflict from competing representations (Algom et al., 2004 )). When the distinction is not critical, the terms interference and conflict will be used interchangeably. The term Stroop facilitation or just facilitation will refer to the speeding up of performance on one trial type compared to another (unless specified otherwise). In common with the literature, facilitation will also be used to refer to the opposite of conflict; that is, it will denote facilitating representations at any level. Finally, the term Stroop effect(s) will be employed to refer more generally to all of these effects.

Levels of conflict vs. levels of selection

When considering the standard incongruent Stroop trial (e.g., red in blue) where the word dimension is a color word (e.g., red) that is incongruent with the target color dimension that is being named, and where the color red is also a potential response, one might surmise numerous levels of representation where these two concepts might compete. Processing of the color dimension of a Stroop stimulus to name the color would, on a simple analysis, require initial visual processing, followed by activation of the relevant semantic representation and then word-form (phonetic) encoding of the color name in preparation for a response. For this process to advance unimpeded until response there would need to be no competing representations activated at any of those stages. Like color naming, the processes of word reading also requires visual processing but of letters and not of colors perhaps avoiding creating conflict at this level, although there is evidence for a competition for resources at the level of visual processing under some conditions (Kahneman & Chajczyk, 1983 ). Word reading also requires the computation of phonology from orthography which color processing does not. One way interference might occur at this level is if semantic processing or word-form encoding during the processing of the color dimension also leads to the unnecessary (for the purposes of providing a correct response) activation of the orthographic representation of the color name—as far as we are aware there is no evidence for this. However, orthography does appear to lead to conflict through a different route—the presence of a word or word-like stimulus appears to activate the full mental machinery used to process words. This unintentionally activated word reading task set, conflicts with the intentionally activated color identification task set, creating task conflict. Task conflict occurs whenever an orthographically plausible letter string is presented (e.g., the word table leads to interference, as does the non-word but pronounceable letter string fanit ; the letter string xxxxx less so; Levin & Tzelgov, 2016 ; Monsell et al., 2001 ).

Despite being a task in which participants do not intend to engage, irrelevant word processing would also likely involve the activation of a phonological representation of the word and the activation of a semantic representation (and likely some word-form encoding), either of which could lead to the activation of representations competing for selection. However, just because the word is processed at certain level (e.g., orthography or phonology here) does not mean that each of these levels independently lead to conflict. Phonological information would only independently contribute to conflict if the process of color naming activated a competing representation at the same level. Otherwise, the phonological representation of the irrelevant word might simply facilitate activation of the semantic representation of the irrelevant word thereby providing competition for the semantic representation of the relevant color. In which case, whilst phonological information would contribute to Stroop effects, no selection mechanism would be required at the phonological level. And of course, there could be conflict at the phonological processing level, but with no selection mechanism available, conflict would have to be resolved later. To identify whether selection occurs at the level of phonological processing, a method would be needed to isolate phonological information from information at the semantic and response levels.

So-called late selection accounts would argue that any activated representations at these levels would result in increased activation at the response level where selection would occur with no competition or selection at earlier stages (e.g., Dyer, 1973 ; Logan & Zbrodoff, 1998 , Luo, 1999 ; Scheibe et al., 1967 ; Seymour, 1977 ; Wheeler, 1977 ; see also MacLeod, 1991 , and Parris, Augustinova & Ferrand, 2019a , 2019b , 2019c ; for discussions of this topic). In contrast, so-called early selection accounts (De Houwer, 2003 ; Scheibe et al., 1967 ; Seymour, 1977 ; Stirling, 1979 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ) argue for earlier and multiple sites of attentional selection with Hock and Egeth ( 1970 ) even arguing that the perceptual encoding of the color dimension is slowed by the irrelevant word, although this has been shown to be a problematic interpretation of their results (Dyer, 1973 ). In Zhang and colleagues models, attentional selection occurred and was resolved at the stimulus identification stage, before any information was passed on to the response level which had its own selection mechanism.

The organization of the review

It is important to emphasize at this point then that when considering the locus or loci of the Stroop effect, there are in fact two issues to address. The first concerns the level(s) of processing that significantly contribute to Stroop interference (and facilitation) so that a specific type of conflict actually arises at this level. The second issue concerns the level(s) of attentional selection: Is there, like Zhang and Kornblum ( 1998 ) and Zhang et al. ( 1999 ) have suggested, more than one level at which attentional selection occurs?

With regards to the first issue, we start below by critically evaluating the evidence for different levels of processing that putatively contribute to conflict with the objective of assessing the methods used to index the forms of conflict, and what we can learn from them. To do this, we employed the distinction introduced by MacLeod and MacDonald ( 2000 ) who argued for two categories of conflict: informational and the aforementioned task conflict (see also Levin & Tzelgov, 2016 ) to further structure the review. Informational conflict arises from the semantic and response information that the irrelevant word conveys. This roughly corresponds to the distinction between stimulus-based and response-based conflicts (Kornblum & Lee, 1995 ; Kornblum et al., 1990 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ). According to this approach, conflict arises due to overlap between the dimensions of the Stroop stimulus at the level of stimulus processing (Stimulus–Stimulus or S–S overlap) and at the level of response production (Stimulus–Response or S–R overlap). At the level of stimulus processing interference can occur at the perceptual encoding, memory retrieval, conceptual encoding and stimulus comparison stages. At the level of response production interference can also occur at response selection, motor programming and response execution. In the Stroop task, the relevant and irrelevant dimensions both involve colors and would, thus, produce Stimulus–Stimulus conflict and both stimuli overlap with the response (S–R overlap) because the response involves color classification. We also include phonological processing and word frequency in the informational conflict taxon (cf. Levin & Tzelgov, 2016 ). We discuss informational conflict and its varieties in the first section which is entitled ‘Decomposing Informational conflict’.

Task conflict, as noted above, arises when two task sets compete for resources. In the Stroop task, the task set for color identification is endogenously and purposively activated, and the task set for word reading is exogenously activated on presentation of the word. The simultaneous activation of two task sets creates conflict even before the identities of the Stroop dimensions have been processed. Therefore, this form of conflict is generated by all irrelevant words in the Stroop task including congruent and neutral words (Monsell et al., 2001 ). We discuss task conflict in the section ‘ Task conflict ’. We then discuss the often overlooked phenomenon of Stroop facilitation in the section entitled ‘ Informational facilitation ’. In the section entitled “Other evidence relevant to the issue of locus vs. loci of the Stroop effect” we consider the influence of response mode (vocal, manual, oculomotor) on the variety of conflicts and facilitation observed in the subsection ‘Response modes and the loci of the Stroop effect’ and we consider whether conflict and facilitation effects are resolved even once a response has been favored in the subsection ‘Beyond response selection: Stroop effects on response execution’. In the final section entitled “Locus or loci of selection?”, we use the outcome of these deliberations to discuss the second issue of whether the evidence supports attentional selection at a single or at multiple loci.

Decomposing informational conflict

A seminal paper by George S. Klein in 1964 (Klein, 1964 ) represents a critical impetus for understanding different types of informational conflict. Indeed, up until Klein, all studies had utilized incongruent color-word stimuli as the irrelevant dimension. Klein was the first to manipulate the relatedness of the irrelevant word to the relevant color responses to determine the “evocative strength of the printed word” ( 1964 , p. 577). To this end, he compared color-naming times of lists of nonsense syllables, low-frequency non-color-related words, high-frequency non-color words, words with color-related meanings (semantic associates: e.g., lemon, frog, sky), color words that were not in the set of possible response colors (non-response set stimuli), and color words that were in the set of possible response colors (response set stimuli). The response times increased linearly in the order they are presented above. Whilst lists of nonsense syllables vs. low-frequency words, high-frequency words vs. semantic-associative stimuli, and semantic-associative stimuli vs. non-response set stimuli did not differ, all other comparisons were significant.

It is important to underscore that for Klein himself, there was no competition between semantic nodes or at any stage of processing, and, thus, no need for attentional selection other than at the response stage. Only when both irrelevant word and relevant color are processed to the point of providing evidence towards different motor responses, do the two sources of information compete. Said differently, whilst he questioned the effect of semantic relatedness, Klein assumed that semantic relatedness would only affect the strength of activation of alternative motor responses. Highlighting his favoring of a single late locus for attentional selection, Klein noted that words that are semantically distant from the color name would be less likely to “arouse the associated motor-response in competitive intensity” (p. 577). Although others (e.g., early selection accounts mentioned above) have argued for competition and selection occurring earlier than response output, a historically favored view of the Stroop interference effect as resulting solely from response conflict has prevailed (MacLeod, 1991 ) such that so-called informational conflict (MacLeod & MacDonald, 2000 ) is viewed as being essentially solely response conflict. That is, the color and word dimensions are processed sufficiently to produce evidence towards different responses and before the word dimension is incorrectly selected, mechanisms of selective attention at response output have to either inhibit the incorrect response or bias the correct response.

Response and semantic level processing

To assess the extent to which we can (or cannot) move forward from this latter view, we describe and critically evaluate methods used to dissociate and measure the potentially independent contributions of response and semantic conflict. We start by considering so-called same-response trials before going on to consider semantic-associative trials, non-response set trials and a method that has used semantic distance on the electromagnetic spectrum as a way to determine the involvement of semantic conflict in the color-word Stroop task. Indeed, this is an important first step for determining whether at this point informational conflict can (or cannot) be reliably decomposed.

Same-response trials

Same-response trials utilize a two-to-one color-response mapping and have become the most popular way of distinguishing semantic and response conflict in recent studies (e.g., Chen et al., 2011 ; Chen, Lei, Ding, Li, & Chen, 2013a ; Chen, Tang & Chen, 2013b ; Jiang et al., 2015 ; van Veen & Carter, 2005 ). First introduced by De Houwer ( 2003 ), this method maps two color responses to the same response button (see Fig.  1 ), which allows for a distinction between stimulus–stimulus (lexico-semantic) and stimulus–response (response) conflict.

By mapping two response options onto the same response key (e.g., both ‘blue’ and ‘yellow’ are assigned to the ‘z’ key), certain stimuli combinations (e.g., when blue is printed in yellow) are purported to not involve competition at the level of response selection; thus, any interference during same-response trials is thought to involve only semantic conflict. Any additional interference on different-response incongruent trials (e.g., when red is printed in yellow and where both ‘red’ and ‘yellow’ are assigned to different response keys) is taken as an index of response conflict. Performance on congruent trials (sometimes referred to as identity trials when used in the context of the two-to-one color-response mapping paradigm, here after 2:1 paradigm) is compared to performance on same-response incongruent trials to reveal interference that can be attributed to only semantic conflict, whereas a different-response incongruent vs same-response incongruent trial comparison is taken as an index of response conflict. Thus, the main advantage of using same-response incongruent trials as an index of semantic conflict is that this approach claims to be able to remove all of the influence of response competition (De Houwer, 2003 ). Notably, according to some models of Stroop task performance same-response incongruent trials should not produce interference because they do not involve response conflict (Cohen, Dunbar & McCelland, 1990 ; Roelofs, 2003 ).

Despite providing a seemingly convenient measure of semantic and response conflict, the studies that have employed the 2:1 paradigm share one major issue—that of an inappropriate baseline (see MacLeod, 1992 ). Same-response incongruent trials have consistently been compared to congruent trials to index semantic conflict. However, congruent trials also involve facilitation (both response and semantic facilitation—see below for more discussion of this) and thus, the difference between these two trial types could simply be facilitation and not semantic interference, a possibility De Houwer ( 2003 ) alluded to in his original paper (see also Schmidt et al., 2018 ). And whilst same-response trials plausibly involve semantic conflict, they are also likely to involve response facilitation because despite being semantically incongruent, the two dimensions of this type of Stroop stimulus provide evidence towards the same response. This means that both same-response and congruent trials involve response facilitation. Therefore the difference between same-response and congruent trials would actually be semantic conflict (experienced on same-response trials) + semantic facilitation (experienced on congruent trials), not just semantic conflict. This also has ramifications for the difference between different-response and same-response trials since the involvement of response facilitation on same-response trials means that the comparison of these two trials types would actually be response conflict plus response facilitation, not just response conflict.

Hasshim and Parris ( 2014 ) explored this possibility by comparing same-response incongruent trials to non-color-word neutral trials. They reasoned that this comparison could reveal faster RTs to same-response incongruent trials thereby providing evidence for response facilitation on same-response trials. In contrast, it could also reveal faster RTs to non-color-word neutral trials, thus, would have provided evidence for semantic interference (and would indicate that whatever response facilitation is present is hidden by an opposing and greater amount of semantic conflict). Hasshim and Parris reported no statistical difference between the RTs of the two trial types and reported Bayes Factors indicating evidence in favor of the null hypothesis of no difference. This would suggest that, when using reaction time as the index of performance, same-response incongruent trials cannot be employed as a measure of semantic conflict since they are not different from non-color-word neutral trials. In a later study, the same researchers investigated whether the two-to-one color-response mapping paradigm could still be used to reveal semantic conflict when using a more sensitive measure of performance than RT (Hasshim & Parris, 2015 ). They attempted to provide evidence for semantic conflict using an oculomotor Stroop task and an early, pre-response pupillometric measure of effort, which had previously been shown to provide a reliable alternative measure of the potential differences between conditions (Hodgson et al., 2009 ). However, in line with their previous findings, they reported Bayes Factors indicating evidence for no statistical difference between the same-response incongruent trials and non-color-word neutral trials. These findings, therefore, suggest that the difference between same-response incongruent trials and congruent trials indexes facilitation on congruent trials, and that the former trials are not therefore a reliable measure of semantic conflict when reaction times or pupillometry are used as the dependent variable. Notably, Hershman and Henik ( 2020 ) included neutral trials in their study of the 2:1 paradigm, but did not report statistics comparing same-response and neutral trials (although they did report differences between same-response and congruent trials where the latter had similar RTs to their neutral trials) It is clear from their Fig. 1, however, that pupil sizes for neutral and same-response trials do begin to diverge at around the time the button press response was made. This divergence gets much larger ~ 500 ms post-response indicating that a difference between the two trial types is detectable using pupillometry. Importantly, however, Hershman and Henik employed repeated letter string as their neutral condition, which does not involve task conflict (see the section on task conflict below for more details). This means that any differences between their neutral trial and the same-response trial could be entirely due to task and not semantic conflict.

However, despite Hasshim and Parris consistently reporting no difference between same-response and non-color-word neutral trials, in an unpublished study, Lakhzoum ( 2017 ) has reported a significant difference between non-color-word neutral trials and same-response trials. Lakhzoum’s study contained no special modifications to induce a difference between these two trial types, and had roughly similar trial and participant numbers and a similar experimental set-up to Hasshim and Parris. Yet Lakhzoum observed the effect that Hasshim and Parris have consistently failed to observe. The one clear difference between Lakhzoum ( 2017 ), Hasshim and Parris ( 2014 , 2015 ), however, was that Lakhzoum used French participants and presented the stimuli in French where Hasshim and Parris conducted their studies in English. A question for further research then is whether and to what extent language, including issues such as orthographic depth of the written script of that language, might modify the utility of same-response trials as an index of semantic conflict.

Indeed, even though the 2:1 paradigm is prone to limitations, more research is needed to assess its utility for distinguishing response and semantic conflict. Notably, in both their studies Hasshim and Parris used colored patches as the response targets (at least initially, Hasshim & Parris, 2015 , replaced the colored patches with white patches after practice trials) which could have reduced the magnitude of the Stroop effect (Sugg & McDonald, 1994 ). Same-response trials cannot, for obvious reasons, be used with the commonly used vocal response as a means to increase Stroop effects (see Response Modes and varieties of conflict section below), but future studies could use written word labels, a manipulation that has also been shown to increase Stroop effects (Sugg & McDonald, 1994 ), and thus might reveal a difference between same-response incongruent and non-color-word neutral conditions. At the very least future studies employing same-response incongruent trials should also employ a neutral non-color-word baseline (as opposed to color patches used by Shichel & Tzelgov, 2018 ) to properly index semantic conflict and should avoid the confounding issues associated with congruent trials (see also the section on Informational Facilitation below).

As noted above, same-response incongruent trials are also likely to involve response facilitation since both dimensions (word and color) provide evidence toward the same response. Since congruent trials and same-response incongruent trials both involve response facilitation, the difference between the two conditions likely represents semantic facilitation, not semantic conflict. As a consequence, indexing response conflict via the difference between different-response and same-response trials is also problematic. Until further work is done to clarify these issues, work applying the 2:1 color-response paradigm to understand the neural substrates of semantic and response conflicts (e.g., Van Veen & Carter, 2005 ) or wider issues such as anxiety (Berggren & Derakshan, 2014 ) remain difficult to interpret.

Non-response set trials

Non-response set trials are trials on which the irrelevant color word used is not part of the response set (e.g., the word ‘orange’ in blue, where orange is not a possible response option and blue is; originally introduced by Klein, 1964 ). Since the non-response set color word will activate color-processing systems, interference on such trials has been interpreted as evidence for conflict occurring at the semantic level. These trials should in theory remove the influence of response conflict because the irrelevant color word is not a possible response option and thus, conflict at the response level is not present. The difference in performance between the non-response set trials and a non-color-word neutral baseline condition (e.g., the word ‘table’ in red) is taken as evidence of interference caused by the semantic processing of the irrelevant color word (i.e., semantic conflict). In contrast, response conflict can be isolated by comparing the difference between the performance on incongruent trials and the non-response set trials. This index of response conflict has been referred to as the response set effect (Hasshim & Parris, 2018 ; Lamers et al., 2010 ) or the response set membership effect (Sharma & McKenna, 1998 ) and describes the interference that is a result of the irrelevant word denoting a color that is also a possible response option. The aim of non-response set trials is to provide a condition where the irrelevant word is semantically incongruent with the relevant color such that the resultant semantic conflict is the only form of conflict present.

It has been argued that the interference measured using non-response set trials, the non-response set effect, is an indirect measure of response conflict (Cohen et al., 1990 ; Roelofs, 2003 ) and is, thus, not a measure of semantic conflict. That is, the non-response set effect results from the semantic link between the non-response set words and the response set colors and indirect activation of the other response set colors leads to response competition with the target color. As far as we are aware there is no study that has provided or attempted to provide evidence that is inconsistent with this argument. Thus, for non-response set trials to have utility in distinguishing response and semantic conflict, future research will need to evidence the independence of these types of conflict in RTs and other dependent measures.

Semantic-associative trials

Another method that has been used to tease apart semantic and response conflict employs words that are semantically associated with colors (e.g., sky-blue, frog-green). In trials of this kind (e.g., sky printed in green), first introduced by Klein ( 1964 ), the irrelevant words are semantically related to each of the response colors. Recall that for Klein this was a way of investigating different magnitudes of response conflict (the indirect response conflict interpretation). Indeed, the notion of comparing RTs on color-associated incongruent trials to those on color-neutral trials to specifically isolate semantic conflict (i.e., so-called “sky-put” design) was first suggested by Neely and Kahan ( 2001 ). It was later actually empirically implemented by Manwell, Roberts and Besner ( 2004 ) and used since in multiple studies investigating Stroop interference (e.g., Augustinova & Ferrand, 2014 ; Risko et al., 2006 ; Sharma & McKenna, 1998 ; White et al., 2016 ).

Interference observed when using semantic associates tends to be smaller than when using non-response set trials (Klein, 1964 ; Sharma & McKenna, 1998 ). This suggests that semantic associates may not capture semantic interference in its entirety (or alternatively that non-response set trials involve some response conflict). Sharma and McKenna ( 1998 ) postulated that this is because non-response set trials involve an additional level of semantic processing which, following Neumann ( 1980 ) and La Heij, Van der Heijdan, and Schreuder ( 1985 ), they called semantic relevance (due to the fact that color words are also relevant in a task in which participants identify colors). It is, however, also the case that smaller interference observed with semantic associates compared to non-response set trials can be conceptualized simply as less semantic association with the response colors for non-color words (sky-blue) than for color words (red–blue).

As with non-response set trials, it is unclear whether semantic associates exclude the influence of response competition because they too can be modeled as indirect measures of response conflict (e.g., Roelofs, 2003 ). Since semantic-associative interference could be the result of the activation of the set of response colors to which they are associated (for instance when sky in red activates competing response set option blue), it does not allow for a clear distinction between semantic and response processes. In support of this possibility, Risko et al. ( 2006 ) reported that approximately half of the semantic-associative Stroop effect is due to response set membership and therefore response level conflict. The raw effect size of pure semantic-associative interference (after interference due to response set membership was removed) in their study was only between 6 ms (manual response, 112 participants) and 10 ms (vocal response, 30 participants).

When the same group investigated this issue with a different approach (i.e., ex-Gaussian analysis), their conclusions were quite different. White and colleagues ( 2016 ) found the semantic Stroop interference effect (difference between semantic-associative and color-neutral trials) in the mean of the normal distribution (mu) and in the standard deviation of the normal distribution (sigma), but not the tail of the RT distribution (tau). This finding was different from past studies that found standard Stroop interference in all three parameters (see, e.g., Heathcote et al., 1991 ). Therefore, White and colleagues reasoned that the source of the semantic (as opposed standard) Stroop effect is different such that the interference associated with response competition on standard color-incongruent trials (that is to be seen in tau) is absent in incongruent semantic associates. However, White et al. only investigated semantic conflict. A more recent study that considered both response and semantic conflict in the same experiment found they influence similar portions of the RT distribution (Hasshim, Downes, Bate, & Parris, 2019 ), suggesting that ex-Gaussian analysis cannot be used to distinguish the two types of conflict.

Interestingly, Schmidt and Cheesman ( 2005 ) explored whether semantic-associative trials involve response conflict by employing the 2:1 paradigm depicted above. With the standard Stroop stimuli, they reported the common differences between same- and different-response incongruent trials (that are thought to indicate response conflict) and between congruent and same-response incongruent (that are thought to indicate semantic conflict in the 2:1 paradigm). However, with semantic-associative stimuli they only observed an effect of semantic conflict a finding that differs from that of Risko et al. ( 2006 ) whose results indicate an effect of response conflict with semantic-associative stimuli. But, as already noted, the issues associated with employing just congruent trials as a baseline in the 2:1 paradigm and the potential response facilitation on same-response trials lessens the interpretability of this result.

Complicating matters further still, Lorentz et al. ( 2016 ) showed that the semantic-associative Stroop effect is not present in reaction time data when response contingency (a measure of how often an irrelevant word is paired with any particular color) is controlled by employing two separate contingency-matched non-color-word neutral conditions (but see Selimbegovic, Juneau, Ferrand, Spatola & Augustinova, 2019 ). There was, however, evidence for Stroop facilitation with these stimuli and for interference effects in the error data. Nevertheless, studies utilizing semantic-associative stimuli that have not controlled for response contingency might not have accurately indexed semantic-associative interference. Future research should focus on assessing the magnitude of the semantic-associative Stroop interference effect after the influences of response set membership and response contingency have been controlled.

Levin and Tzelgov ( 2016 ) also reported that they failed to observe the semantic-associative Stroop effect across multiple experiments using a vocal response (in both Hebrew and Russian). Only when the semantic associations were primed via a training protocol were semantic-associative Stroop effects observed, although they were not able to consistently report evidence for the null hypothesis of no difference. They subsequently argued that the semantic-associative Stroop effect is probably present but is a small and “unstable” contributor to Stroop interference. This is a somewhat surprising conclusion given the small but consistent effects reported by others with a vocal response (Klein, 1964 ; Risko et al., 2006 ; Scheibe et al., 1967 ; White et al., 2016 ; see Augustinova & Ferrand, 2014 , for a review). However, it seems reasonable to conclude that the semantic-associative Stroop effect is not easily observed, especially with a manual response (e.g., Sharma & McKenna, 1998 ).

Finally, any observed semantic-associative interference could be interpreted as being an indirect measure of response competition (even after factors such as response set membership and response contingency are controlled). Indeed, the colors associated with the semantic-associative stimuli are also linked to the response set colors (Cohen et al., 1990 ; Roelofs, 2003 ) and thus, semantic associates do not generate an unambiguous measure of semantic conflict, at least when only RTs are used. Thus, it seems essential for future research to investigate this issue with additional, and perhaps more refined indicators of response processing such as EMGs.

Semantics as distance on the electromagnetic spectrum

Klopfer ( 1996 ) demonstrated that RTs were slower when both dimensions of the Stroop stimulus were closely related on the electromagnetic spectrum. The electromagnetic spectrum is the range of frequencies of electromagnetic radiation and their wavelengths including those for visible light. The visible light portion of the spectrum goes from red with the shortest and violet with the longest wavelengths with Orange, Yellow, Green and Blue (amongst others) in between. The Stroop effect has been reported to be larger when the color and word dimensions of the Stroop stimulus are close on the spectrum (e.g., blue in green) compared to when the colors were distantly related (e.g., blue in red; see also Laeng et al., 2005 , for an effect of color opponency on Stroop interference). In other words, Stroop interference is greater when the semantic distance between the color denoted by the word and the target color in “color space” is smaller, making it seemingly difficult to argue that semantic conflict does not contribute to Stroop interference. However, Kinoshita, Mills, and Norris ( 2018 ) recently failed to replicate this electromagnetic spectrum effect indicating that more research is needed to assess whether this is a robust effect. Even if replicated, however, this manipulation cannot escape the interpretation of semantic conflict as being the indirect indexing of response conflict. Therefore, these replications also call for additional indicators of response processing or the lack of thereof.

Can we distinguish the contribution of response and semantic processing?

Perhaps due to the past competition between early and late selection, single-stage accounts of Stroop interference (Logan & Zbrodoff, 1998 ; MacLeod, 1991 ) response and semantic conflict have historically been the most studied and, therefore, compared types of conflict. For instance, there is a multitude of studies indicating that semantic conflict is often preserved when response conflict is reduced by experimental manipulations including hypnosis-like suggestion (Augustinova & Ferrand, 2012 ), priming (Augustinova & Ferrand, 2014 ), Response–Stimulus Interval (Augustinova et al., 2018a ), viewing position (Ferrand & Augustinova, 2014a ) and single letter coloring (Augustinova & Ferrand, 2007 ; Augustinova et al., 2010 , 2015 , 2018a , 2018b ). This dissociative pattern (i.e., significant semantic conflict while response conflict is reduced or even eliminated) is often viewed as indicating two qualitatively distinct types of conflict, suggesting that these manipulations result in response conflict being prevented. However, these studies have commonly employed semantic-associative conflict which could be indirectly measuring response conflict and it could, therefore, be argued that it is not the type of conflict but simply residual response conflict that remains (Cohen et al., 1990 ; Roelofs, 2003 ). Therefore, it still remains plausible that the dissociative pattern simply indicates quantitative differences in response conflict.

As we have discussed in this section, interference generated by both non-response trials and trials that manipulation proximity on the electromagnetic spectrum are prone to the same limitations. The 2:1 paradigm is a paradigm that could in principle remove response conflict from the conflict equation, but the issues surrounding this manipulation need to be further researched before we can be confident of its utility. Therefore, at this point, it seems reasonable to conclude that published research conducted so far with additional color-incongruent trial types (same-response, non-response, or semantic-associative trials) does not permit the unambiguous conclusion that the informational conflict generated by standard color-incongruent trials (word ‘red’ presented in blue) can be decomposed into semantic and response conflicts. More than ever then, cumulative evidence from more time- and process-sensitive measures are required.

Other types of informational conflict: considering the role of phonological processing and word frequency

Whilst participants are asked to ignore the irrelevant word in the color-word Stroop task, it is clear that their attempts to do so are not successful. If word processing proceeds in an obligatory fashion such that before accessing the semantic representation of the irrelevant word, the letters, orthography, and phonology are also processed, interference could happen at these levels of processing. But, as anticipated by Klein ( 1964 ), just because the word is processed at these levels does not mean that each leads to level-specific conflict. To determine whether or not these different levels of processing also independently contribute to Stroop interference, various trial types and manipulations have been employed that have attempted to dissociate pre-semantic levels of processing. The most notable methods are: (1) phonological overlap between the irrelevant word and color name; (2) the use of pseudowords; and (3) manipulation of word frequency. This section attempts to identify whether pre-semantic processing of the irrelevant word reliably leads to conflict (or facilitation) at levels other than response output.

Phonological overlap between word and color name

A study by Dalrymple-Alford ( 1972 ) presented evidence for solely phonological interference in the Stroop task. Dalrymple-Alford manipulated the phonemic overlap between the irrelevant word and color name. For example, if the color to be named was red, the to-be-ignored word would be rat (sharing initial phoneme) or pod (sharing the end phoneme) or a word that shares no phoneme at all (e.g., fit ). Dalrymple-Alford reported evidence for greater interference at the initial letter than at the end letter position (similar effects were observed for facilitation). Using a more carefully designed set of stimuli (originally created by Coltheart et al., 1999 , who focused on just facilitation), Marmurek et al. ( 2006 ) also showed greater interference and facilitation at the initial letter position than the end letter position; although, in their study effects at the end letter position did not reach significance. This paradigm represents a direct measure of phonological processing that, importantly, does not have a semantic component (other than the weak conflict that would result from the activation of two semantic representations with unrelated meanings). However, in line with the interpretation by Coltheart et al. ( 1999 ), Marmurek and colleagues argued it was evidence for phonological processing of the irrelevant word that either facilitates or interferes with the production of the color name at the response output stage (see also Parris et al., 2019a , 2019b , 2019c ; Regan, 1978; Singer et al., 1975 ). Thus, whilst the word is processed phonologically, the only phonological representation with which the resulting representation could compete is that created during the phonological encoding of the color name, which would only be produced at later response processing levels. In sum, it is not possible to conclude in favor of qualitatively different conflict (or facilitation) other than that at the response level using this approach.

Pseudowords

A pseudoword is a non-word that is pronounceable (e.g., veglid ). In fact, some real words are so rare (e.g., helot , eft ) that to most they are equivalent to pseudowords. As noted above, Klein ( 1964 ) used rare words in the Stroop task and showed that they interfered less than higher-frequency words but more than consonant strings (e.g., GTBND ). Both Burt’s ( 2002 ) and Monsell et al.’s ( 2001 ) studies later supported the finding that pseudowords result in more interference than consonant strings. In recent work, Kinoshita et al. ( 2017 ) asked what aspects of the reading process is triggered by the irrelevant word stimulus to produce interference in the color-word Stroop task. They compared performance on five types of color-neutral letter strings to incongruent words. They included real words (e.g., hat ), pronounceable non-words (or pseudowords; e.g., hix ), consonant strings (e.g., hdk ), non-alphabetic symbol strings (e.g., &@£ ), and a row of Xs. They reported that there was a word-likeness or pronounceability gradient with real words and pseudowords showing an equal amount of interference (with interference increasing with string length) and more than that produced by the consonant strings. Consonant strings produced more interference than the symbol strings and the row of Xs which did not differ from each other. The absence of the lexicality effect (defined by color-neutral real words producing more interference than pseudowords) was explained by Kinoshita and colleagues as being a consequence of the pre-lexically generated phonology from the pronounceable irrelevant words interfering with the speech production processes involved in naming the color. Under this account, the process of phonological encoding (the segment-to-frame association processes in articulation planning) of the color name must be slowed by the computation of phonology that occurs independent of lexical status (because it happens with pronounceable pseudowords). Notably, the authors reported evidence for pre-lexically generated phonology when participants responded vocally (by saying aloud the color name), but not when participants responded manually (by pressing a key that corresponds to the target color) suggesting the effects were the result of the need to articulate the color name.

Some pseudowords can sound like color words (e.g., bloo), and are known as pseudohomophones. Besner and Stolz ( 1998 ) employed pseudohomophones as the irrelevant dimension, and found substantial Stroop effects when compared to a neutral baseline (see also Lorentz et al., 2016 ; Monahan, 2001 ) suggesting that there is phonological conflict in the Stroop task. However, pseudohomophones do not involve only phonological conflict since they contain substantial orthographic overlap with their base words (e.g., bloo , yeloe , grene , wred ) and will likely activate the semantic representations of the colors indicated by the word via their shared phonology. In short, interference produced by pseudohomophones could result from phonological, orthographic, or semantic processing but also and importantly it can still simply result from response conflict (see also Tzelgov et al., 1996 , work on cross-script homophones which shows phonologically mediated semantic/response conflict, but not phonological conflict).

Taken together, this work shows a clear effect of phonological processing of the irrelevant word on Stroop task performance; and one that likely results from the pre-lexical phonological processing of the irrelevant word. Again, however, it is unclear whether the resulting competition arises at the pre-lexical level (suggesting the color name’s pre-lexical phonological representation is unnecessarily activated) or whether phonological processing of the irrelevant word leads to phonological encoding of that word that then interferes with the phonological encoding of the relevant color name. The latter seems more likely than the former.

High- vs. low-frequency words

In support of the notion that non-semantic lexical factors contribute to Stroop effects, studies have shown an effect of the word frequency of non-color-related words on Stroop interference. Word frequency refers to the likelihood of encountering that word in reading and conversation. It is a factor that has long been known to contribute to word reading latency, and given that color words tend to be high-frequency words, it is possible word frequency contributes to Stroop effects. Whilst the locus of word frequency effects in word reading are unclear, it is known that it takes longer to access lexico-semantic (phonological/semantic) representations of low-frequency words (Gherhand & Barry, 1998 , 1999 ; Monsell et al., 1989 ).

According to influential models of the Stroop task, the magnitude of Stroop interference is determined by the strength of the connection between the irrelevant word and the response output level (Cohen et al., 1990 ; Kalanthroff et al., 2018 ; Zhang et al., 1999 ). Since high-frequency words are by definition encountered more often, their strength of connection to the response output level would be higher than that for low-frequency words. This leads to the prediction that color-naming times should be longer when the distractor word is of a higher frequency. Evidence in support of this has been reported by Klein ( 1964 ), Fox et al. ( 1971 ) and Scheibe et al. ( 1967 ). However, Monsell et al. ( 2001 ) pointed out methodological issues in these older studies that could have confounded the results. First, these previous studies employed the card presentation version of the Stroop task in which the items from each stimulus condition (e.g., all the high-frequency words) are placed on different cards and the time taken to respond to all the items on one card is recorded. This method, it was argued, could result in the adoption of different response criteria for the different cards and permits previews of the next stimulus which could result in overlap of processing. Second, Monsell et al. noted that these studies employed a limited set of 4–5 stimuli in each condition which were repeated numerous times on each card, potentially leading to practice effects that would potentially nullify any effects of word frequency. After addressing these issues, Monsell et al. ( 2001 ) reported no effects of word frequency on color-naming times, although there was a non-significant tendency for low-frequency words to result in more interference than high-frequency words. With the same methodological control as Monsell et al., but with a greater difference in frequency between the high and low conditions, Burt ( 1994 , 1999 , 2002 ) has repeatedly reported that low-frequency words produce significantly more interference than high-frequency words (findings recently replicated by Navarrete et al., 2015 ). A recent study by Levin and Tzelgov ( 2016 ) also reported more interference to low-frequency words although their effects were not consistent across experiments, a finding that could be attributed to their use of a small set of words for each class of words.

The repeated finding of greater interference for low-frequency words is consistent with the notion that word frequency contributes to determining response times in the Stroop task, but is inconsistent with predictions from models of the class exemplified by Cohen et al. ( 1990 ). The finding of larger Stroop effects for lower-frequency words provides a potent challenge to the many models based on the Parallel Distributed Processing (PDP) connectionist framework (Cohen et al., 1990 ; Kalanthroff et al., 2018 ; Kornblum et al., 1990 ; Kornblum & Lee, 1995 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ; see Monsell et al., 2001 for a full explanation of this). As noted, these models would argue, on the basis of a fundamental tenet of their architectures, that higher-frequency words should produce greater interference because they have stronger connection strengths with their word forms. Notably, whilst unsupported by later studies, the lack of an effect of word frequency in Monsell et al.’s data led them to the conclusion that there was another type of conflict involved in the Stroop task, called task conflict. It is to the topic of task conflict that we now turn.

Task conflict

The presence of task conflict in the Stroop task was first proposed in MacLeod and MacDonald’s ( 2000 ) review of brain imaging studies (see also Monsell et al., 2001 ; see Littman et al., 2019 , for a mini review). The authors proposed its existence because the anterior cingulate cortex (ACC) appeared to be more activated by incongruent and congruent stimuli when compared to repeated letter neutral stimuli such as xxxx (e.g., Bench et al., 1993 ). MacLeod and MacDonald suggested that increased ACC activation by congruent and incongruent stimuli reflects the signaling the need for control recruitment in response to task conflict. Since task conflict is produced by the activation of the mental machinery used to read, interference at this level occurs with any stimulus that is found in the mental lexicon. Studies have used this logic to isolate task conflict from informational conflict (e.g., Entel & Tzelgov, 2018 ).

Congruent trials, proportion of repeated letter strings trials and negative facilitation

In contrast to color-incongruent trials that are thought to produce both task and informational conflicts, color-congruent trials are only thought to produce task conflict. Conflict of any type, by definition, increases response times and thus, congruent trial reaction times can be expected to be longer than those on trials that do not activate a task set for word reading. Repeated color patches, symbols or letters (e.g., ■■■, xxxx or ####) have, therefore, been introduced as a baseline for such a comparison. Indeed, these trials are not expected to generate task conflict as they do not activate an item in the mental lexicon. The difference between these non-linguistic baselines and congruent trials would therefore represent a measure of task conflict, and has been referred to as negative facilitation. However, a common finding in such experiments is that congruent trials still produce faster RTs than neutral non-word stimuli or positive facilitation (Entel et al., 2015 ; see also Augustinova et al., 2019 ; Levin & Tzelgov, 2016 , Shichel & Tzelgov, 2018 ), indicating that task conflict is not fully measured under such conditions. Goldfarb and Henik ( 2007 ) reasoned that this is likely due to the fact that faster responses on congruent trials compared to a non-linguistic baseline results when task conflict control is highly efficient, permitting the expression of positive facilitation.

To circumvent this issue, they attempted to reduce task conflict control by increasing the proportion of non-word neutral trials (repeated letter strings) to 75% (see also Kalanthroff et al., 2013 ). Increasing the proportion of non-word neutral trials would create the expectation for a low task conflict context and so task conflict monitoring would effectively be offline. In addition to increasing the proportion of non-word neutral trials, on half of the trials, the participants received cues that indicated whether the following stimulus would be a non-word or a color word, giving another indication as to whether the mechanisms that control task conflict should be activated. For non-cued trials, when presumably task conflict control was at its nadir, and therefore task conflict at its peak, RTs were slower for congruent trials than for non-word neutral trials, producing a negative facilitation effect. Goldfarb and Henik ( 2007 ) suggested that previous studies had not detected a negative facilitation effect because resolving task conflict for congruent stimuli does not take long, and thus, as mentioned above, the effects of positive facilitation had hidden those of negative facilitation. In sum, by reducing task control both globally (by increasing the proportion of neutral trials) and locally (by adding cues to half of the trials), Goldfarb and Henik were able to increase task conflict enough to demonstrate a negative facilitation effect; an effect that has been shown to be a robust and prime signature of task conflict (Goldfarb & Henik, 2006 , 2007 ; Kalantroff et al., 2013).

Steinhauser and Hübner ( 2009 ) manipulated task conflict control by combining the Stroop task with a task-switching paradigm. In this paradigm participants switch between color naming and reading the irrelevant word (see Kalanthroff et al., 2013 , for a discussion on task switching and task conflict). Thus, the two task sets are active in this task context. This means that during color-naming Stroop trials, the word dimension of the stimulus will be more strongly associated with word processing than it otherwise would. This would have the effect of increasing the conflict between the task set for color naming and the task set of word reading. Steinhauser and Hübner ( 2009 ) found that under these experimental conditions, participants performed worse on congruent (and incongruent) trials than they did on the non-word neutral trials, evidencing negative facilitation, the key marker of task conflict. These results showing increasing task conflict when there is less control over the task set for word reading on color-naming trials reaffirmed Goldfarb and Henik’s ( 2007 ) findings that showed that reducing task control on color-naming trials leads to task conflict.

Whilst both of the above methods are useful in showing that task conflict can influence the magnitude of Stroop interference and facilitation, both manipulations result in magnifying task conflict (and likely other forms of conflict) to levels greater than is present when such targeted manipulations are not used.

Repeated letter strings without a task conflict control manipulation

As has been noted, task conflict appears to be present whenever the irrelevant stimulus has an entry in the lexical system. Consequently, studies have used the contrast in mean color-naming latencies between color-neutral words and repeated letter strings to index task conflict (Augustinova et al., 2018a ; Levin & Tzelgov, 2016 ). However, Augustinova et al. argued that both of these stimuli might include task conflict in different quantities. This is because the processing activated by a string of repeated letters (e.g., xxx) stops at the orthographic pre-lexical level, whereas the one activated by color-neutral words (e.g., dog) proceeds through to access to meaning (see also Augustinova et al., 2019 ; Ferrand et al., 2020 ), and as such the latter might more strongly activate the task set for word reading. Augustinova et al. ( 2019 ) reported task conflict (color-neutral—repeated letter strings) with vocal responses but not manual responses. Likewise, in a manual response study, Hershman et al. ( 2020 ) reported that repeated letter strings did not differ in terms of Stroop interference relative to symbol strings, consonant strings and color-neutral words. All were responded to more slowly than congruent trials, however, evidencing facilitation on congruent trials. Levin and Tzelgov ( 2016 ) compared vocal response color-naming times of repeated letter strings and shapes and found that repeated letter strings had longer color-naming times indicating some level of extra conflict with repeated letter strings, which they referred to as orthographic conflict, but which could also be expected to activate a task set for word reading. The implication of this work is that whilst repeated letter strings can be used as a baseline against which to measure task conflict relative to color-neutral words, they are likely to be useful mainly with vocal responses (Augustinova et al., 2019 ), and moreover can be expected to lead to some level of task conflict (Levin & Tzelgov, 2016 ).

For a purer measure of task conflict, when eschewing manipulations needed to produce negative facilitation, future research would do better to compare response times for color-neutral stimuli with those for shapes whilst employing a vocal response (Levin & Tzelgov, 2016 ; see Parris et al., 2019a , 2019b , 2019c , who reported no difference between color-neutral stimuli and unnamable/novel shapes with a manual response in an fMRI experiment). This does not mean, however, that task conflict is not measureable with manual responses in designs that eschew manipulations that produce negative facilitation: Continuing with their exploration of Stroop effects in pupillometric data Hershman et al. ( 2020 ) reported that pupil size data revealed larger pupils to congruent than to repeated letter strings (and also symbol strings, consonant strings and non-color-related words); in other words, they reported negative facilitation.

Does task conflict precede informational conflict?

The studies discussed above also suggest that task conflict occurs earlier than informational conflict. Hershman and Henik ( 2019 ) recently provided evidence that supports this supposition. Using incongruent, congruent and a repeated letter string baseline, but without manipulating the task conflict context in a way that would produce negative facilitation, Hershman and Henik observed a large interference effect and small non-significant, positive facilitation. However, the authors also recorded pupil dilations during task performance and reported both interference and negative facilitation (pupils were smaller for the repeated letter string condition than for congruent stimuli). Importantly, the pupil data began to distinguish between the repeated letter string condition and the two word conditions (incongruent and congruent) up to 500 ms before there was divergence between the incongruent and congruent trials. In other words, task conflict appeared earlier than informational conflict in the pupil data.

If it is not firmly established that task conflict comes before informational conflict on a single trial, recent research has shown that it certainly seems to come first developmentally. By comparing performance in 1st, 3rd and 5th graders, Ferrand and colleagues ( 2020 ) showed that 1st graders experience smaller Stroop interference effects (even when controlling for processing speed differences) compared to 3rd and 5th graders. Importantly, whereas the Stroop interference effect in these older children is largely driven by the presence of response, semantic and task conflict, in the 1st graders (i.e., pre-readers) this interference effect was entirely due to task conflict. Indeed, these children produced slower color-naming latencies for all items using words as distractors compared to repeated letter strings, without being sensitive to color-(in)congruency and to the informational (phonological, semantic or. response) conflict that it generates. The finding of task conflict’s developmental precedence is consistent with the idea that visual expertise for letters (as evidence by aforementioned N170 tuning for print) is known to be present even in pre‐readers (Maurer et al., 2005 ).

A model of task conflict

Kalanthroff et al. ( 2018 ) presented a model of Stroop task performance that is based on processing principles of Cohen and colleagues’ models (Botvinick et al., 2001 ; Cohen et al, 1990 ). What is unique about their model is the role proactive (intentional, sustained) control plays in modifying task conflict (see Braver, 2012 ). When proactive control is strong, bottom-up activation of word reading is weak, and top-down control resolves any remaining task competition rapidly. Conversely, when proactive control is weak, bottom-up information can activate task representations more readily leading to greater task conflict. According to their model, the presence of task conflict inhibits all response representations, effectively raising the response threshold and slowing responses. This raising of the response threshold would not happen for repeated letter string trials (e.g., xxxx) because the task unit for word reading would not be activated. Since responses for congruent trials would be slowed, negative facilitation results. To control task conflict when it arises, Kalanthroff et al. ( 2018 ) argued that due to the low level of proactive control, reactive control is triggered to resolve task conflict via the weak top-down input from the controlling module in the Anterior Cingulate Cortex. Thus, in contrast to Botvinick et al.’s ( 2001 ) model, reactive control is triggered by weak proactive control, not the detection of informational conflict. When proactive control is high, there is no task conflict, and the reactive control mechanism is not triggered, and the response convergence at the response level leads to response facilitation which can be fully expressed. Since task conflict control is not reliant on the presence of intra-trial informational conflict, and it is not resolved at the response output level, it is resolved by an independent control mechanism. Thus, the Kalanthroff et al. model predicts the independent resolution of response and task conflict.

In sum, task conflict has been shown to be an important contributor to both Stroop interference and Stroop facilitation effects. Task conflict can result in the reduction of the Stroop facilitation effect, increased Stroop interference, and in its more extreme form, it can produce negative facilitation (RTs to congruent trials are longer than those to a non-word neutral baseline). A concomitant decrease in Stroop facilitation and increase in Stroop interference (or vice versa) is also another potential marker of task conflict (Parris, 2014 ), although since a reduced Stroop facilitation and an increased Stroop interference can be produced by other mechanisms (i.e., decreased word reading/increased attention to the color dimension and increased response conflict, respectively), at this point, negative facilitation is clearly the best marker of task conflict (in RT or pupil data; Hershman & Henik, 2019 ). Kalanthroff et al. ( 2018 ) have argued that task conflict is a result of low levels of proactive control. However, more work is perhaps needed to identify what triggers activation of the task set for word reading and how types of informational conflict might interact with task conflict. Levin and Tzelgov ( 2016 ) describe informational conflict as being an “episodic amplification of task interference” (p3), where task conflict is a marker of the automaticity of reading and informational conflict the effect of dimensional overlap between stimuli and responses. With recent evident suggesting readability is a key factor in producing task conflict (Hershman et al., 2020 ), task conflict is possibly closely related to the ease with which a string of letters is phonologically encoded, its pronounceability (Kinoshita et al., 2017 ), suggesting a link between task and phonological conflict. Indeed, Levin and Tzelgov ( 2016 ) associated the orthographic and lexical components of word reading with task conflict. However, it is unclear how phonological processing is categorized in their framework and importantly how facilitation effects are accounted for under such a taxonomy.

Informational facilitation

As already mentioned, Dalrymple-Alford and Budayr ( 1966 , Exp. 2) were the first to report a facilitation effect of the irrelevant word on color naming (see also Dalrymple-Alford, 1972 for coining the term). Since then, the Stroop facilitation effect has become an oft-present effect in Stroop task performance and is usually measured by the difference in color-naming performance on non-color-word trials and color-congruent trials. However, the use of congruent trials is, more than any other trial type, fraught with confounding issues. As amply developed in the previous section, when task conflict is high, congruent word trial RTs can actually be longer than non-color-word trial RTs eliminating the expression of positive facilitation in the RT data and even producing negative facilitation (Goldfarb & Henik, 2007 ). Indeed, perhaps the first record of task conflict in the Stroop literature, Heathcote et al. ( 1991 ) reported that whilst the arithmetic mean difference between color-congruent and color-neutral trial types reveals facilitation in the Gaussian portion of the RT distribution, it actually reveals interference in the tail of the RT distribution. In sum, congruent trial RTs are clearly influenced by processes that pull RTs in different directions. Moreover, it has been argued that Stroop facilitation effects are not true facilitation effects at all, in the sense that the faster RTs on congruent trials do not represent the benefit of converging information from the two dimensions of the Stroop stimulus (see below for a further discussion of this issue). Thus, before considering what levels of processing contribute to facilitation effects, we must first consider the nature of such effects.

Accounting for positive facilitation

Since clear empirical demonstrations of task conflict being triggered by color-congruent trials were reported (see above), it has become difficult to consider the Stroop facilitation effect as a flip side of the Stroop interference (Dalrymple-Alford & Budayr, 1966 ). Stroop facilitation is often observed to be smaller, and less consistent, than Stroop interference (MacLeod, 1991 ) and this asymmetricity is largely dependent on the baseline used (Brown, 2011 ). Yet, this asymmetrical effect has been accounted for by models of the Stroop task via informational facilitation (i.e., without considering the opposing effect of task conflict). For example, in Cohen et al.’s ( 1990 ) model smaller positive facilitation is accounted for via a non-linear activation function which imposes a ceiling effect on the activation of the correct response—in other words, double the input (convergence) does not translate into double the output (Cohen et al., 1990 ).

MacLeod and McDonald (2000) and Kane and Engle ( 2003 ) have argued that the facilitating effect of the color-congruent irrelevant word is not true facilitation from any level of processing and is instead the result of ‘inadvertent reading’. That is, on some color-congruent trials, participants use only the word dimension to generate a response, meaning that these responses would be 100 ms–200 ms faster than if they were color naming (because word reading is that much faster than color naming). The argument is that it happens on only the occasional congruent trial (because of the penalty (error or large RTs) that would result from carrying it over to incongruent trials). Doing this occasionally would equate to the roughly 25 ms Stroop facilitation effect observed in most studies and would explain why facilitation is generally smaller than interference. Since the color-naming goal is not predicted to be active on these occasional congruent trials, it implies that only the task set for word reading is active, and hence the absence (or a large reduction) of task conflict, which fits with the finding of more informational facilitation in low task conflict contexts. Inadvertent reading would also be expected to produce facilitation in the early portion of the reaction time distribution (as supported by Heathcote et al.’s findings).

Roelofs ( 2010 ) argued, however, that with cross-language stimuli presented to bilingual participants, words cannot be read aloud to produce facilitation between languages (i.e., the Dutch word Rood —meaning ‘red’—cannot be read aloud to produce the response ‘red’ by Dutch–English bilinguals). Roelofs ( 2010 ) asked Dutch–English bilingual participants to name color patches either in Dutch or English whilst trying to ignore contiguously presented Dutch or English words. Given that informational facilitation effects were observed both within and between languages, Roelofs argued that the Stroop facilitation effect cannot be based on inadvertent reading. However, whilst Rood (Red), Groen (Green), and Blau (Blue) are not necessarily phonologically similar to their English counterparts, they clearly share orthographic similarities, which could produce facilitation effects (including semantic facilitation). Still, Roelofs observed large magnitudes of facilitation effects rendering it less likely that facilitation was based solely on orthography, although this was primarily when the word preceded the onset of the color patch. There were indeed relatively small facilitation effects when the word and color were presented at the same time. Nevertheless, the inadvertent reading account also cannot easily explain facilitation on semantic-associative congruent trials (see below for evidence of this) since the word does not match the response.

Another influence that can account for the facilitating effect of congruent trials is response contingency. Response contingency refers to the association between an irrelevant word and a response. In a typical Stroop task set-up, the numbers of congruent and incongruent trials are matched (e.g., 48 congruent/48 incongruent). Since in each congruent trial, there is only one possible word to pair with each color, it means that each color word is more frequently paired with its corresponding color (when the word red is displayed, there is a higher probability of its color being red). This would mean that responses on congruent trials would be further facilitated through learned word–response associations, and those on incongruent trials further slowed, by something other than and additional to the consequence of word processing (Melara & Algom, 2003 ; Schmidt & Besner, 2008 ). Indeed, it is as yet unclear as to whether informational facilitation would remain if facilitative effects of response contingency were controlled. Therefore, future studies are needed to address this still open issue (see Lorentz et al., 2016 for this type of endeavor but with semantic associates).

Decomposing informational facilitation

Perhaps because it has been perceived as the lesser, and less stable effect, the Stroop facilitation effect has not been explored as much as the Stroop interference effect in terms of potential varieties of which it may be comprised (Brown, 2011 ). Coltheart et al. ( 1999 ) have shown that when the irrelevant word and the color share phonemes (e.g., rack in red, boss in blue), participants are faster to name the color than when they do not (e.g., hip in red, mock in blue). Given that none of the words used in their experiment contained color relations, their effect was likely entirely based on phonological facilitation (see also Dennis & Newstead, 1981 ; Marmurek et al., 2006 ; Parris et al., 2019a , 2019b , 2019c ; Regan, 1979). Notably, effects such as this could not be explained by either the inadvertent reading nor response convergence accounts of Stroop facilitation and could not have resulted from response contingency (whilst any word in red, green or blue would have a greater chance of beginning with an ‘r’, ‘g’ and ‘b’ than any other letter respectively, there were three times as many trials in which the words did not begin with those letters). It is possible, however, that phonological facilitation operates on a different mechanism to semantic and response facilitation effects.

To the best of our knowledge only four published studies have explored this variety of informational facilitation directly. Dalrymple-Alford ( 1972 ) reported a 42 ms semantic-associative facilitation effect (non-color-word neutral—semantic-associative congruent) and a 67 ms standard facilitation effect (non-color-word neutral—congruent) suggesting a response facilitation effect of 25 ms (see Glaser & Glaser, 1989 ; and Mahon et al., 2012 , for replications of this effect). Interestingly, however, when compared to a letter string baseline (e.g., xxxx), the congruent semantic associates actually produced interference—a finding implicating an influence of task conflict. More recently, Augustinova et al. ( 2019 ) reported semantic (11 ms) and response (39 ms) facilitation effects with vocal responses but only semantic facilitation (14 ms) with manual responses (response facilitation was a non-significant 7 ms). Interestingly, the comparison between the letter string baseline and congruent semantic associates produced 9 ms facilitation with the manual response, but 33 ms interference with the vocal response suggesting a complex relationship between response mode, semantic facilitation and task conflict. Indeed, exactly like color-congruent items discussed above, both congruent semantic-associative trials and their color-neutral counterpart with no facilitatory components still involve task conflict.

These (potentially) isolable forms of facilitation are interesting, require further study, and have the potential to shed light on impairments in selective attention and cognitive control. Of particular interest is how these forms of facilitation are modified by the presence of various levels of task conflict. Nevertheless, as with semantic conflict, it is possible that apparent semantic facilitation effects result from links between the irrelevant dimension and the response set colors (Roelofs, 2003 ) meaning that they are response- and not semantically based effects. Therefore, other approaches are needed to tackle the issue of semantic (vs. response) facilitation. It might be useful to recall at this point that both Roelofs’ ( 2010 ) cross-language findings and the differences in reaction times between congruent and same-response trials (e.g., De Houwer, 2003 ) possibly result from semantic facilitation and so would not be helpful in this regard.

Other evidence relevant to the issue of locus vs. loci of the Stroop effect

Response modes and the loci of the stroop effect.

Responding manually (via keypress) in the Stroop task consistently leads to smaller Stroop effects when compared to responding vocally (saying the name aloud, e.g., Augustinova et al., 2019 ; McClain, 1983 ; Redding & Gerjets, 1977 ; Repovš, 2004 ; Sharma & McKenna, 1998 ). It has been argued that this is because each response type has differential access to the lexicon where interference is proposed to occur (Glaser & Glaser, 1989 ; Kinoshita et al., 2017 ; Sharma & McKenna, 1998 ). Indeed, smaller Stroop effects with manual (as opposed to vocal) responses has been attributed to one of its components (i.e., semantic conflict) being significantly reduced (Brown & Besner, 2001 ; Sharma & McKenna, 1998 ). Therefore, the manipulation of response mode has been used to address the issue of the locus of the Stroop effect.

In response to reports of failing to observe Stroop effects with manual responses (e.g., McClain, 1983 ), Glaser and Glaser ( 1989 ) proposed in their model that manual responses with color patches on the response keys could not produce interference because perception of the color and the response to it were handled by the semantic system with little or no involvement of the lexical system where interference was proposed to occur. However, based on the earlier translation models (e.g., Virzi & Egeth, 1985 ), Sugg and McDonald ( 1994 ) showed that Stroop interference was obtained with manual responses when the response buttons were labeled with written color words instead of colored patches. Sugg and McDonald argued that written label responses must have direct access to the lexical system.

Using written label manual responses, Sharma and McKenna ( 1998 ) tested Glaser and Glaser’s model and showed that response mode matters when considering the types of conflict that participants experience in the Stroop task. They reported that in contrast to vocal responses, manual responses produced no lexico-semantic interference as measured by comparing semantic-associative and non-color-word neutral trials, and by comparing non-response set trials with semantic-associative trials, although they did report a response set effect (response set—non-response set) with both vocal (spoken) and manual responses. Sharma and McKenna interpreted their results as being partially consistent with Glaser and Glaser’s model, suggesting that the types of conflict experienced in the Stroop task are different between response modes. However, Brown and Besner ( 2001 ) later re-analyzed the data from Sharma and McKenna and showed that if you do not only analyze adjacent conditions (with condition order determined by a priori beliefs about the magnitude of Stroop effects) and compare instead non-adjacent conditions such as non-response set and non-color-word neutral trials (the non-response set effect), semantic conflict is observed with a manual response.

Roelofs ( 2003 ) has theorized that interference with manual responses only occurs because verbal labels are attached to the response keys; such a position predicts that manual and vocal responses should lead to similar conflict and facilitation effects, but smaller overall effects with manual responses due to the proposed mediated nature of manual Stroop effects. Consistently, many studies have since reported robust interference effects including semantic conflict effects with manual responses using colored patch labels (as measured by non-response set—non-color-word neutral, e.g., Hasshim & Parris, 2018 ; or as measured by semantic-associative Stroop trials, e.g., Augustinova et al., 2018a ). Parris et al., ( 2019a , 2019b ), Zahedi, Rahman, Stürmer, & Sommer (2019) and Kinoshita et al. ( 2017 ) have reported data indicating that the difference between manual and vocal responses occurs later in the phonological encoding or articulation planning stage where vocal responses encourage greater phonological encoding than does the manual response (see Van Voorhis & Dark, 1995 for a similar argument).

Augustinova et al. ( 2019 ) have reported that the difference between manual and vocal responses is largely due to a larger contribution of response conflict with vocal responses. Yet, in addition they also reported a much larger contribution of task conflict with vocal responses. Notably, the contribution of both semantic conflict and semantic facilitation remained roughly the same for the response modes, whereas response facilitation increased dramatically (from non-significant 7 ms to 39 ms) with vocal responses indicating that response and semantic forms of facilitation are independent. Therefore, the research to date suggests that there are larger response- and task-based effects with vocal responses. Since negative facilitation was not used as a measure of performance in this study, which has been reported with manual responses (e.g., Goldfarb & Henik, 2007 ), one needs to be careful what conclusions are drawn about task conflict; nevertheless, task conflict does seem to contribute less to Stroop effects with manual responses under common Stroop task conditions in which task conflict control is not manipulated. Importantly, this only applies to response times. As already noted, Hershman and Henik ( 2019 ) reported no task conflict with manual responses but also showed that in the same participants pupil sizes changes revealed task conflict in the form of negative facilitation on the very same trials.

It is important that more research investigating how the make-up of Stroop interference might change with response mode is conducted, especially since other response modes such as typing (Logan & Zbrodoff, 1998 ), oculomotor (Hasshim & Parris, 2015 ; Hodgson et al., 2009 ) and mouse (Bundt, Ruitenberg, Abrahamse, & Notebaert, 2018 ) responses have been utilized. This is especially important given that a lesion to the ACC has been reported to affect manual but not vocal response Stroop effects (Turken & Swick, 1999 ). Up until very recently very little consideration has been given to how response mode might affect Stroop facilitation effects (Augustinova et al., 2019 ) so more research is needed to better understand the influence of response mode on facilitation effects. Indeed, as noted above models have proposed either the same or different processes underlying manual and vocal Stroop effects providing predictions that need to be more fully tested. Aside from issues surrounding measurement of the varieties of conflict and facilitation that underlie Stroop effects with manual and vocal responses, mitigating the conclusions that can be drawn from the work summarized in this section, it is interesting that the way we act on the Stroop stimulus can potentially change how it is processed.

Beyond response selection: Stroop effects on response execution

So far, we have concentrated on Stroop effects that occur before response selection. However, it is also possible that Stroop effects could be observed after (or during) response selection. When addressing questions about the locus of the Stroop effect, some studies have questioned the commonly held assumption that there is modularity between response selection and response execution; that is, they have considered whether interference experienced at the level of response selection spills over into the actual motoric action of the effectors (e.g., the time it takes to articulate the color name) or whether interference is entirely resolved before then. Researchers have considered this possibility with vocal (measuring the time between the production of the first phoneme and the end of the last; Kello et al., 2000 ), type-written (measuring the time between the pressing of the first letter key and the pressing of the last letter key; Logan & Zbrodoff, 1998 ), oculomotor (measuring the amplitude (size) of the saccade (eye movement) to the target color patch; Hodgson, Parris, Jarvis & Gregory, 2009 ), and mouse movement (Bundt et al., 2018 ; Yamamoto, Incera & McLennan, 2016 ) responses.

In Hodgson et al.’s ( 2009 ) study, participants responded by making an eye movement to one of four color patches located in a plus-sign configuration around the centrally presented Stroop stimulus to indicate the font color of the Stroop stimulus. In two experiments, one in which the target’s color remained in the same location throughout the experiment and one in which the colors occupied a different patch location (still in the plus-sign configuration) on every trial, Stroop interference effects were observed on saccadic latency, but not on saccade amplitude or velocity indicating that all interference is resolved before a motor movement is made and, therefore, that Stroop interference does not affect response execution. Similar null effects on response execution were reported for type-written responses across four experiments by Logan and Zbrodoff ( 1998 ).

Kello et al. ( 2000 ) initially also observed no Stroop effects on vocal naming durations (the time it takes to actually vocalize the response). In a follow-up experiment, however, in which they introduced a response deadline of 575 ms, they observed Stroop congruency effects on response durations. This likely holds for the other studies on response execution mentioned here. Indeed, Hodgson et al. pointed out that they could not exclude the possibility that under some circumstances the spatial characteristics of saccades would also show effects on incongruent trials given previous work showing that increasing spatial separation between target and distractor stimuli leads to an increase in the effect of the distractor on characteristics of the saccadic response (Findlay, 1982 ; McSorley et al., 2004 ; Walker et al., 1997 ).

Bundt et al. ( 2018 ) recently reported a Stroop congruency effect on response execution times in a study requiring participants to use a computer mouse to point to the target patch on the screen. Response targets where all in the upper half of the computer screen and participants guided the mouse from a start position in the lower half of the screen. They observed this effect despite not separating the target and distractor or enforcing a response time deadline. The configuration differences, the use of mouse-tracking vs. the oculomotor methodology and the language of the stimuli (Dutch vs. English), might have contributed to producing the different results. Unfortunately, Bundt and colleagues did not employ a neutral trial baseline so it is not clear whether their effect represents interference, facilitation, or both.

In summary, two studies have reported Stroop effects on response execution; findings that represent a challenge to the currently assumed modularity between response selection and execution. More work is needed to determine what conditions produce Stroop effects on response execution and in which response modalities. Furthermore, it would be interesting for future research to reveal whether semantic and task conflict are registered at this very late stage of selection. For now, this work suggests that even if selection only occurred at the level of response output and not before, it is not always entirely successful, even if the eventual response is correct.

Locus or loci of selection?

In many early considerations of the Stroop effect, a putative explanation was that interference would not occur unless a name has been generated for the irrelevant dimension; and interference was a form of response conflict due to there being a single response channel (Morton, 1969 ). Since word reading would more quickly produce a name than color naming it was thought that the word name would be sat in the response buffer before the color name arrived and, thus, would have to be expunged before the correct name could be produced. Thus, Stroop interference was thought to be a consequence of the time it took to process each of the dimensions.

Treisman ( 1969 ) questioned why selective attention did not gate the irrelevant word. Treisman concluded that the task of focusing on one dimension whilst excluding the other was impossible, especially when the dimensions are presented simultaneously. Parallel processing of both dimensions would, therefore, occur and thus, response competition could be conceived of as the failure of selective attention to fully focus on the color dimension and gate the input from word processing. Bringing Treisman ( 1969 ) and Morton’s ( 1969 ) positions together, Dyer ( 1973 ) proposed interference results from both a failure in selective attention and a bottleneck at the level of response (at which the word information arrives more quickly). However, the speed-of-processing account has been shown to be unsupported (Glaser & Glaser, 1982; MacLeod & Dunbar, 1988 ), leaving the failure of attentional selection as the main mechanism leading to Stroop interference.

Whilst it is clear that participants must select a single response in the Stroop task and, thus, that selection occurs at response output, conflict stems from incompatibility between task-relevant and task-irrelevant stimulus features (Egner et al., 2007 ), and is, thus, stimulus-based conflict. However, even if stimulus incompatibility does make an independent contribution to Stroop interference it might not have an independent selection mechanism; all interference produced at all levels might accumulate and be resolved only later when a single response has to be selected. One way to investigate whether selection occurs at any level other than response output would be to show successful resolution of conflict in the complete absence of response conflict. The 2:1 color-response mapping paradigm is the closest method so far construed that would permit this but as we have explained it is problematic and moreover, it only addresses the distinction between semantic and response conflict.

There are now accounts of the Stroop task which argue that selection occurs both at early and late stages of processing (Altmann & Davidson, 2001 ; Kornblum & Lee, 1995 ; Kornblum et al., 1990 ; Phaf et al., 1990 ; Sharma & McKenna, 1998 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ). For example, in Kornblum and colleagues’ models selection occurs for both SS-conflict and SR-conflict, independently. We have provided evidence for multiple levels of processing contributing to Stroop interference—both stimulus- and response-based contributions. At the level of the stimulus, we have argued that there is good evidence for task conflict. At the level of response, we have argued that the current methods used to dissociate forms of informational conflict including phonological, semantic (stimulus) and response conflict do not permit us to conclude in favor of separate selection mechanisms for each. Moreover, we have discussed evidence that selection at the level of response output is not entirely successful given that response execution effects have been reported.

Another approach would be to show that the different forms of conflict are independently affected by experimental manipulations. Above we alluded to Augustinova and colleagues research showing that semantic conflict is often reported to be preserved in contexts where response conflict is reduced (e.g., Augustinova & Ferrand, 2012 ). However, we discussed the potential limitations of this approach. Taking another example, in an investigation of the response set effect and non-response set effect, Hasshim and Parris ( 2018 ) reported within-subjects experiments in which the trial types (e.g., response set, non-response set, non-color-word neutral) were presented either in separate blocks (pure) or in blocks containing all trial types in a random order (mixed). They observed a decrease in RTs to response set trials when trials were presented in mixed blocks when compared to the RTs to response set trials in pure blocks. These findings demonstrate that presentation format modulates the magnitude of the response set effect, substantially reducing it when trials are presented in mixed blocks. Importantly for present purposes, the non-response set effect was not affected by the manipulation suggesting that the response set and non-response set effects are driven by independent mechanisms. However, Hasshim and Parris’s effect could also be a consequence of the limited effect of presentation format and simply be showing that some conflict is left over—and we do not know which type of conflict it is because the measure was not good enough (see also Hershman et al., 2020 ; Hershman & Henik, 2019 , 2020 , showing that conflict can be present but not expressed in the RT data). Future research could further investigate the effect of mixing trial types in blocks on the expression of types of conflict and facilitation in both within- and between-subjects designs.

Kinoshita et al. ( 2018 ) argued that semantic Stroop interference can be endogenously controlled evincing independent selection. The authors reported that a high proportion (75%) of non-readable neutral trials (#s) magnified semantic conflict (in the same way this manipulation increases task conflict). This means that a low proportion of non-readable neutral trials leads to reduced semantic conflict. However, since their manipulation was based on the number of non-readable stimuli, Kinoshita et al. ( 2018 ) would have also increased task conflict. Neatly, their non-color-related neutral word baseline condition permitted them to show that the semantic component of informational conflict was modulated. Uniquely, in their study they employed both semantic-associative and non-response set trials to measure semantic conflict, perhaps providing converging evidence for a modification of semantic conflict. Problematically, however, they did not include a measure of response conflict in their study so it is not known whether purported indices of response conflict are also affected along with the indices of semantic conflict and thus, their results do not unambiguously represent a modification of semantic conflict. Their study does, however, provide evidence that as task conflict increases, so inevitably does informational conflict because task conflict is an indication that the word is being processed (assuming a sufficient reading age; see Ferrand et al., 2020 ).

It is our contention that despite attempts to show independence of control of semantic and response conflict, the published evidence so far does not permit a clear conclusion on the matter because the measures themselves are problematic. Future research could combine the semantic distance manipulation (Klopfer, 1996 ) with a corollary for responses (see, e.g., Chen & Proctor, 2014 ; Wühr & Heuer, 2018 ). For example, an effect of the physical (e.g., red in blue, where red is next to blue on a response box vs. red in green when green is further away from the red response key) and conceptual (e.g., red in blue, where the red response is indicated by the key labeled ‘5’ and the blue by a key labeled ‘6’) distance of the response keys has been reported whereby the closer physically or conceptually the response keys, the greater the amount of interference experienced (Chen & Proctor, 2014 ). Controlling for semantic distance whilst manipulating response distance and vice versa might give an insight into the contributions of semantic and response conflict to Stroop interference by allowing the independent manipulation of both.

In our opinion, methods addressing task conflict, particularly those demonstrating negative facilitation and its control, are evidence for a form of conflict that is independent from response conflict. The evidence for an earlier locus (Hershman & Henik, 2019 ), distinct developmental trajectory (Ferrand et al., 2020 ) and independent control (Goldfarb & Henik, 2007 ; Kalanthroff et al., 2013 ) support the notion that task conflict has a different locus and selection mechanism to response conflict. Therefore, any model of Stroop performance that does not account for task conflict does not provide a full account of factors contributing to Stroop effects. Only one model currently accounts for task conflict (Kalanthroff et al., 2018 ) although this model employs the PDP connectionist architecture that falls foul of the word frequency findings noted above.

Unambiguous evidence that interference (or facilitation) is observed even in the absence of response competition (or convergence) constitutes a necessary prerequisite for moving beyond the historically favored response locus of Stroop effects. In our opinion, task conflict has been shown to be an independent locus for Stroop interference, but phonological, semantic and response conflict (collectively informational conflict) have not been shown to be independent forms of conflict. One could argue that models that incorporate early selection mechanisms are better supported by the evidence, at least in their ability to represent multiple levels of selection that might possibly occur, if not necessarily where that selection occurs since these models do not account for task conflict. Moreover, no extant model can currently predict interference that is observed to occur at the level of response execution and only one model seems able to account for differences in magnitudes of Stroop effects as a function of response modes (Roelofs, 2003 ).

In short, if the conclusions drawn here are accepted, models of Stroop task performance will have to be modified so they can more effectively account for multiple loci of both Stroop interference and facilitation. This also applies to the implementations of the Stroop task that are currently used in neuropsychological practice (e.g., Strauss et al., 2007 ) and applied in basic and applied research. As discussed by Ferrand and colleagues (2020), the extra sensitivity of the Stroop test (stemming from the ability to detect and rate each of these components separately) would provide clinical practitioners with invaluable information since the different forms of conflict are possibly detected and resolved by different neural regions. In sum, this review also calls for changes in Stroop research practices in basic, applied and clinical research.

Availability of data and material

Not applicable.

Algom, D., & Chajut, E. (2019). Reclaiming the Stroop effect back from control to input-driven attention and perception. Frontiers in Psychology, 10 , 1683. https://doi.org/10.3389/fpsyg.2019.01683

Article   PubMed   PubMed Central   Google Scholar  

Algom, D., Chajut, E., & Lev, S. (2004). A rational look at the emotional stroop phenomenon: A generic slowdown, not a stroop effect. Journal of Experimental Psychology: General, 133 (3), 323–338.

Article   Google Scholar  

Algom, D., & Fitousi, D. (2016). Half a century of research on Garner interference and the separability–integrality distinction. Psychological Bulletin, 142 (12), 1352–1383.

Article   PubMed   Google Scholar  

Altmann, E. M. & Davidson, D. J. (2001). An integrative approach to Stroop: Combining a language model and a unified cognitive theory. In J. D. Moore & K. Stenning (Eds.), Proceedings of the 23rd Annual Conference of the Cognitive Science Society (pp. 21–26). Hillsdale, NJ: Laurence Erlbaum.

Augustinova, M., Clarys, D., Spatola, N., & Ferrand, L. (2018b). Some further clarifications on age-related differences in Stroop interference. Psychonomic Bulletin & Review, 25 , 767–774.

Augustinova, M., & Ferrand, L. (2007). Influence de la présentation bicolore des mots sur l’effet Stroop [First letter coloring and the Stroop effect]. Annee Psychologique, 107 , 163–179.

Augustinova, M., & Ferrand, L. (2012). Suggestion does not de-automatize word reading: Evidence from the semantically based Stroop task. Psychonomic Bulletin & Review, 19 (3), 521–527.

Augustinova, M., & Ferrand, L. (2014). Automaticity of word reading evidence from the semantic stroop paradigm. Current Directions in Psychological Science, 23 (5), 343–348.

Augustinova, M., Flaudias, V., & Ferrand, L. (2010). Single-letter coloring and spatial cuiing do not eliminate or reduce a semantic contribution to the Stroop effect. Psychonomic Bulletin & Review, 17 , 827–833.

Augustinova, M., Parris, B. A., & Ferrand, L. (2019). The loci of Stroop interference and facilitation effects with manual and vocal responses. Frontiers in Psychology, 10 , 1786.

Augustinova, M., Silvert, L., Ferrand, L., Llorca, P. M., & Flaudias, V. (2015). Behavioral and electrophysiological investigation of semantic and response conflict in the Stroop task. Psychonomic Bulletin & Review, 22 , 543–549.

Augustinova, M., Silvert, S., Spatola, N., & Ferrand, L. (2018a). Further investigation of distinct components of Stroop interference and of their reduction by short response stimulus intervals. Acta Psychologica, 189 , 54–62.

Barkley, R. A. (1997). Behavioral inhibition, sustained attention, and executive functions: Constructing a unifying theory of ADHD. Psychological Bulletin, 121 (1), 65.

Bench, C. J., Frith, C. D., Grasby, P. M., Friston, K. J., Paulesu, E., Frackowiak, R. S. J., & Dolan, R. J. (1993). Investigations of the functional anatomy of attention using the Stroop test. Neuropsychologia, 31 (9), 907–922.

Berggren, N., & Derakshan, N. (2014). Inhibitory deficits in trait anxiety: Increased stimulus-based or response-based interference? Psychonomic Bulletin & Review, 21 (5), 1339–1345.

Besner, D., Stolz, J. A., & Boutilier, C. (1997). The stroop effect and the myth of automaticity. Psychonomic Bulletin & Review , 4 (2), 221–225. https://doi.org/10.3758/BF03209396

Besner, D., & Stolz, J. A. (1998). Unintentional reading: Can phonological computation be controlled? Canadian Journal of Experimental Psychology-Revue Canadienne De Psychologie Experimentale, 52 (1), 35–43.

Botvinick, M. M., Braver, T. S., Barch, D. M., Carter, C. S., & Cohen, J. D. (2001). Conflict monitoring and cognitive control. Psychological Review, 108 (3), 624–652.

Braem, S., Bugg, J. M., Schmidt, J. R., Crump, M. J., Weissman, D. H., Notebaert, W., & Egner, T. (2019). Measuring adaptive control in conflict tasks. Trends in Cognitive Sciences., 23 (9), 769–783.

Braver, T. S. (2012). The variable nature of cognitive control: A dual mechanisms framework. Trends in Cognitive Sciences, 16 (2), 106–113.

Brown, M., & Besner, D. (2001). On a variant of Stroop’s paradigm: Which cognitions press your buttons? Memory & Cognition, 29 (6), 903–904.

Brown, T. L. (2011). The relationship between Stroop interference and facilitation effects: Statistical artifacts, baselines, and a reassessment. Journal of Experimental Psychology: Human Perception and Performance, 37 (1), 85–99.

PubMed   Google Scholar  

Brown, T. L., Gore, C. L., & Pearson, T. (1998). Visual half-field Stroop effects with spatial separation of word and color targets. Brain and Language, 63 (1), 122–142.

Bugg, J. M., & Crump, M. J. C. (2012). In support of a distinction between voluntary and stimulus-driven control: A review of the literature on proportion congruent effects. Frontiers in Psychology, 3 , 367.

Bundt, C., Ruitberg, M. F., Abrahamse, E. L. & Notebaert, W. (2018). Early and late indications of item-specific control in a Stroop mouse tracking study. PLoS One, 13 (5), e0197278.

Burt, J. S. (1994). Identity primes produce facilitation in a colour naming task. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 47 (A), 957–1000.

Burt, J. S. (1999). Associative priming in color naming: Interference and facilitation. Memory and Cognition, 27 (3), 454–464.

Burt, J. S. (2002). Why do non-colour words interfere with colour naming? Journal of Experimental Psychology-Human Perception and Performance, 28 (5), 1019–1038.

Chen, A., Bailey, K., Tiernan, B. N., & West, R. (2011). Neural correlates of stimulus and response interference in a 2–1 mapping Stroop task. International Journal of Psychophysiology, 80 (2), 129–138.

Chen, A., Tang, D., & Chen, X. (2013b). Training reveals the sources of Stroop and Flanker interference effects. PLoS ONE, 8 (10), e76580. https://doi.org/10.1371/journal.pone.0076580

Chen, J., & Proctor, R. W. (2014). Conceptual response distance and intervening keys distinguish actions goals in the Stroop Colour-Identification Task. Psychonomic Bulletin and Review, 21 (5), 1238–1243.

Chen, Z., Lei, X., Ding, C., Li, H., & Chen, A. (2013a). The neural mechanisms of semantic and response conflicts: An fMRI study of practice-related effects in the Stroop task. NeuroImage, 66 , 577–584.

Chuderski, A., & Smolen, T. (2016). An integrated utility-based model of conflict evaluation and resolution in the Stroop task. Psychological Review, 123 (3), 255–290.

Cohen, J. D., Dunbar, K., & McClelland, J. L. (1990). On the control of automatic processes: A parallel distributed processing account of the Stroop effect. Psychological Review, 97 (3), 332.

Coltheart, M., Woollams, A., Kinoshita, S., & Perry, C. (1999). A position-sensitive Stroop effect: Further evidence for a left-to-right component in print-to-speech conversion. Psychonomic Bulletin & Review, 6 (3), 456–463.

Dalrymple-Alford, E. C. (1972). Associative facilitation and interference in the Stroop color-word task. Perception & Psychophysics, 11 (4), 274–276.

Dalrymple-Alford, E. C., & Budayr, B. (1966). Examination of some aspects of the Stroop color-word test. Perceptual and Motor Skills, 23 , 1211–1214.

De Fockert, J. W. (2013). Beyond perceptual load and dilution: A review of the role of working memory in selective attention. Frontiers in Psychology, 4 , 287.

De Houwer, J. (2003). On the role of stimulus-response and stimulus-stimulus compatibility in the Stroop effect. Memory & Cognition, 31 (3), 353–359.

Dennis, I., & Newstead, S. E. (1981). Is phonological recoding under strategic control? Memory & Cognition, 9 (5), 472–477.

Dishon-Berkovits, M., & Algom, D. (2000). The Stroop effect: It is not the robust phenomenon that you have thought it to be. Memory and Cognition , 28 , 1437–1449.

Dyer, F. N. (1973). The Stroop phenomenon and its use in the study of perceptual, cognitive and response processes. Memory & Cognition, 1 (2), 106–120.

Egner, T., Delano, M., & Hirsch, J. (2007). Separate conflict-specific cognitive control mechanisms in the human brain. NeuroImage, 35 (2), 940–948.

Egner, T., Ely, S., & Grinband, J. (2010). Going, going, gone: Characterising the time-course of congruency sequence effects. Frontiers in Psychology, 1 , 154.

Entel, O., & Tzelgov, J. (2018). Focussing on task conflict in the Stroop effect. Psychological Research Psychologische Forschung, 82 (2), 284–295.

Entel, O., Tzelgov, J., Bereby-Meyer, Y., & Shahar, N. (2015). Exploring relations between task conflict and informational conflict in the Stroop task. Psychological Research Psychologische Forschung, 79 , 913–927.

Ferrand, L., & Augustinova, M. (2014). Differential effects of viewing positions on standard versus semantic Stroop interference. Psychonomic Bulletin & Review, 21 (2), 425–431.

Ferrand, L., Ducrot, S., Chausse, P., Maïonchi-Pino, N., O’Connor, R. J., Parris, B. A., Perret, P., Riggs, K. J., & Augustinova, M. (2020). Stroop interference is a composite phenomenon: Evidence from distinct developmental trajectories of its components. Developmental Science, 23 (2), e12899.

Findlay, J. M. (1982). Global visual processing for saccadic eye movements. Vision Research, 22 (8), 1033–1045.

Fox, L. A., Schor, R. E., & Steinman, R. J. (1971). Semantic gradients and interference in color, spatial direction, and numerosity. Journal of Experimental Psychology, 91 (1), 59–65.

Gazzaniga, M. S., Ivry, R., & Mangun, G. R. (2013). Cognitive Neuroscience: The Biology of Mind (IV). Norton.

Google Scholar  

Gherhand, S., & Barry, C. (1998). Word frequency effects in oral reading are not merely age-of-acquisition effects in disguise. Journal of Experimental Psychology: Learning, Memory and Cognition, 24 , 267–283.

Gherhand, S., & Barry, C. (1999). Age of acquisition, word frequency, and the role of phonology in the lexical decision task. Memory & Cognition, 27 (4), 592–602.

Glaser, W. R., & Glaser, M. O. (1989). Context effects in stroop-like word and picture processing. Journal of Experimental Psychology: General, 118 (1), 13–42.

Goldfarb, L., & Henik, A. (2006). New data analysis of the Stroop matching task calls for a reevaluation of theory. Psychological Science, 17 (2), 96–100.

Goldfarb, L., & Henik, A. (2007). Evidence for task conflict in the Stroop effect. Journal of Experimental Psychology: Human Perception and Performance, 33 (5), 1170–1176.

Gonthier, C., Braver, T. S., & Bugg, J. M. (2016). Dissociating proactive and reactive control in the Stroop task. Memory and Cognition, 44 (5), 778–788.

Hasshim, N., Bate, S., Downes, M., & Parris, B. A. (2019). Response and semantic Stroop effects in mixed and pure blocks contexts: An ex-Gaussian analysis. Experimental Psychology, 66 (3), 231–238.

Hasshim, N., & Parris, B. A. (2014). Two-to-one color-response mapping and the presence of semantic conflict in the Stroop task. Frontiers in Psychology, 5 , 1157.

Hasshim, N., & Parris, B. A. (2015). Assessing stimulus-stimulus (semantic) conflict in the Stroop task using saccadic two-to-one colour response mapping and preresponse pupillary measures. Attention, Perception and Psychophysics, 77 , 2601–2610.

Hasshim, N., & Parris, B. A. (2018). Trial type mixing substantially reduces the response set effect in the Stroop task. Acta Psychologica, 189 , 43–53.

Heathcote, A., Popiel, S. J., & Mewhort, D. J. K. (1991). Analysis of response time distributions: An example using the Stroop task. Psychological Bulletin, 109 , 340–347.

Henik, A., & Salo, R. (2004). Schizophrenia and the stroop effect. Behavioral and Cognitive Neuroscience Reviews, 3 (1), 42–59.

Hershman, R., & Henik, A. (2019). Dissociation between reaction time and pupil dilation in the Stroop task. Journal of Experimental Psychology: Learning, Memory and Cognition, 45 (10), 1899–1909.

Hershman, R., & Henik, A. (2020). Pupillometric contributions to deciphering Stroop conflicts. Memory & Cognition, 48 (2), 325–333.

Hershman, R., Levin, Y., Tzelgov, J., & Henik, A. (2020). Neutral stimuli and pupillometric task conflict. Psychological Research Psychologische Forschung . https://doi.org/10.1007/s00426-020-01311-6

Hock, H. S., & Egeth, H. (1970). Verbal interference with encoding in a perceptual classification task.  Journal of Experimental Psychology, 83 (2, Pt.1), 299–303.

Hodgson, T. L., Parris, B. A., Gregory, N. J., & Jarvis, T. (2009). The saccadic Stroop effect: Evidence for involuntary programming of eye movements by linguistic cues. Vision Research, 49 (5), 569–574.

Jackson, J. D., & Balota, D. A. (2013). Age-related changes in attentional selection: Quality of task set or degradation of task set across time? Psychology and Aging , 28 (3), 744– 753. https://doi.org/10.1037/a0033159

Jiang, J., Zhang, Q., & van Gaal, S. (2015). Conflict awareness dissociates theta-band neural dynamics of the medial frontal and lateral frontal cortex during trial-by-trial cognitive control. NeuroImage, 116 , 102–111.

Jonides, J. & Mack, R. (1984). On the Cost and Benefit of Cost and Benefit. Psychological Bulletin , 96 (1), 29–44.

Kahneman, D., & Chajczyk, D. (1983). Tests of automaticity of reading: Dilution of Stroop effects by color-irrelevant stimuli. Journal of Experimental Psychology: Human Perception and Performance, 9 (4), 497–509.

Kalanthroff, E., Goldfarb, L., Usher, M., & Henik, A. (2013). Stop inter- fering: Stroop task conflict independence from informational conflict and interference. Quarterly Journal of Experimental Psychology , 66 , 1356–1367. https://doi.org/10.1080/17470218.2012.741606 .

Kalanthroff, E., Avnit, A., Henik, A., Davelaar, E., & Usher, M. (2015). Stroop proactive control and task conflict are modulated by concurrent working memory load. Psychonomic Bulletin and Review, 22 (3), 869–875.

Kalanthroff, E., Davelaar, E., Henik, A., Goldfarb, L., & Usher, M. (2018). Task conflict and proactive control: A computational theory of the Stroop task. Psychological Review, 125 (1), 59–82.

Kane, M. J., & Engle, R. W. (2003). Working-memory capacity and the control of attention: The contributions of goal neglect, response competition, and task set to Stroop interference. Journal of Experimental Psychology: General, 132 (1), 47–70.

Kello, C. T., Plaut, D. C., & MacWhinney, B. (2000). The task-dependence of staged versus cascaded processing: An empirical and computational study of Stroop interference in speech production. Journal of Experimental Psychology: General, 129 (3), 340–360.

Kim, M.-S. Min, S.-J. Kim, K., & Won, B.-Y. (2006). Concurrent working memory load can reduce distraction: An fMRI study [Abstract]. Journal of Vision, 6 (6):125, 125a, http://journalofvision.org/6/6/125/ , doi: https://doi.org/10.1167/6.6.125 .

Kim, S.-Y., Kim, M.-S., & Chun, M. M. (2005). Concurrent working memory load can reduce distraction. Proceedings of the National Academy of Sciences, 102 (45), 16524–16529.

Kinoshita, S., De Wit, B., & Norris, D. (2017). The magic of words reconsidered: Investigating the automaticity of reading color-neutral words in the Stroop task. Journal of Experimental Psychology: Learning Memory and Cognition, 43 (3), 369–384.

Kinoshita, S., Mills, L., & Norris, D. (2018). The semantic stroop effect is controlled by endogenous attention.  Journal of Experimental Psychology: Learning Memory and Cognition . DOI:  https://doi.org/10.1037/xlm0000552

Klein, G. S. (1964). Semantic power measured through the interference of words with color-naming. The American Journal of Psychology, 77 (4), 576–588.

Klopfer, D. S. (1996). Stroop interference and color-word similarity. Psychological Science, 7 (3), 150–157.

Kornblum, S., Hasbroucq, T., & Osman, A. (1990). Dimensional overlap: Cognitive basis for stimulus-response compatibility–a model and taxonomy. Psychological Review, 97 (2), 253–270.

Kornblum, S., & Lee, J. W. (1995). Stimulus-response compatibility with relevant and irrelevant stimulus dimensions that do and do not overlap with the response. Journal of Experimental Psychology: Human Perception and Performance, 21 (4), 855–875.

La Heij, W., & van der Heijdan & Schreuder, . (1985). Semantic priming and Stroop-like interference in word-naming tasks. Journal of Experimental Psychology: Human Perception and Performance, 11 , 60–82.

Laeng, B., Torstein, L., & Brennan, T. (2005). Reduced Stroop interference for opponent colours may be due to input factors: Evidence from individual differences and a neural network simulation. Journal of Experimental Psychology: Human Perception and Performance, 31 (3), 438–452.

Lakhzoum, D. (2017). Dissociating semantic and response conflicts in the Stroop task: evidence from a response-stimulus interval effect in a two-to-one paradigm. Master’s thesis in partial fulfilment of the requirements for the research Master’s degree in Psychology. Faculty of Psychology, Social Sciences and Education Science Clermont-Ferrand.

Lamers, M. J., Roelofs, A., & Rabeling-Keus, I. M. (2010). Selection attention and response set in the Stroop task. Memory & Cognition, 38 (7), 893–904.

Leung, H.-C., Skudlarski, P., Gatenby, J. C., Peterson, B. S., & Gore, J. C. (2000). An event-related functional MRI study of the Stroop color word interference task. Cerebral Cortex, 10 (6), 552–560.

Levin, Y., & Tzelgov, T. (2016). What Klein’s “semantic gradient” does and does not really show: Decomposing Stroop interference into task and informational conflict components. Frontiers in Psychology, 7 , 249.

PubMed   PubMed Central   Google Scholar  

Littman, R., Keha, E., & Kalanthroff, E. (2019). Task conflict and task control: A mini-review. Frontiers in Psychology, 10 , 1598.

Logan, G. D., & Zbrodoff, N. J. (1979). When it helps to be misled: Facilitative effects of increasing the frequency of conflicting stimuli in a Stroop-like task. Memory and Cognition, 7 , 166–174.

Logan, G. D., & Zbrodoff, N. J. (1998). Stroop-type interference: Congruity effects in colour naming with typewritten responses. Journal of Experimental Psychology-Human Perception and Performance, 24 (3), 978–992.

Lorentz, E., McKibben, T., Ekstrand, C., Gould, L., Anton, K., & Borowsky, R. (2016). Disentangling genuine semantic Stroop effects in reading from contingency effects: On the need for two neutral baselines. Frontiers in Psychology, 7 , 386.

Luo, C. R. (1999). Semantic competition as the basis of Stroop interference: Evidence from Color-Word matching tasks. Psychological Science, 10 (1), 35–40.

MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109 (2), 163–203.

MacLeod, C. M. (1992). The Stroop task: The" gold standard" of attentional measures. Journal of Experimental Psychology: General, 121 (1), 12–14.

MacLeod, C. M., & Dunbar, K. (1988). Training and Stroop-like interference: Evidence for a continuum of automaticity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14 (1), 126–135.

MacLeod, C. M., & MacDonald, P. A. (2000). Interdimensional interference in the Stroop effect: Uncovering the cognitive and neural anatomy of attention. Trends in Cognitive Sciences, 4 (10), 383–391.

Mahon, B. Z., Garcea, F. E., & Navarrete, E. (2012). Picture-word interference and the Response-Exclusion Hypothesis: A response to Mulatti and Coltheart. Cortex, 48 , 373–377.

Manwell, L. A., Roberts, M. A., & Besner, D. (2004). Single letter colouring and spatial cuing eliminates a semantic contribution to the Stroop effect. Psychonomic Bulletin & Review, 11 (3), 458–462–817.

Marmurek, H. H. C., Proctor, C., & Javor, A. (2006). Stroop-like serial position effects in color naming of words and nonwords. Experimental Psychology, 53 (2), 105–110.

Mathews, A., & MacLeod, C. (1985). Selective processing of threat cues in anxiety states. Behaviour Research and Therapy, 23 (5), 563–569.

Maurer, U., Brem, S., Bucher, K., & Brandeis, D. (2005). Emerging neurophysiological specialization for letter strings. Journal of Cognitive Neuroscience, 17 (10), 1532–1552.

McClain, L. (1983). Effects of response type and set size on Stroop color-word performance. Perceptual & Motor Skills, 56 , 735–743.

McSorley, E., Haggard, P., & Walker, R. (2004). Distractor modulation of saccade trajectories: Spatial separation and symmetry effects. Experimental Brain Research, 155 , 320–333.

Melara, R. D., & Algom, D. (2003). Driven by information: A tectonic theory of Stroop effects. Psychological Review, 110 (3), 422–471.

Melara, R. D., & Mounts, J. R. W. (1993). Selective attention to Stroop dimension: Effects of baseline discriminability, response mode, and practice. Memory & Cognition , 21 , 627–645.

Monahan, J. S. (2001). Coloring single Stroop elements: Reducing automaticity or slowing color processing? The Journal of General Psychology, 128 (1), 98–112.

Monsell, S., Dolyle, M. C., & Haggard, P. N. (1989). Effects of frequency on visual word recognition tasks: Where are they? Journal of Experimental Psychology: General, 118 , 43–71.

Monsell, S., Taylor, T. J., & Murphy, K. (2001). Naming the colour of a word: Is it responses or task sets that compete? Memory & Cognition, 29 (1), 137–151.

Morton, J. (1969). Categories of interference: Verbal mediation and conflict in card sorting. British Journal of Psychology., 60 (3), 329–346.

Navarrete, E., Sessa, P., Peressotti, F., & Dell’Acqua, R. (2015). The distractor frequency effect in the colour-naming Stroop task: An overt naming event-related potential study. Journal of Cognitive Psychology, 27 (3), 277–289.

Neely, J. H., & Kahan, T. A. (2001). Is semantic activation automatic? A critical re-evaluation. In H.L. Roediger, J.S. Nairne, I. Neath, & A.M. Surprenant (Eds.), The Nature of Remembering: Essays in Honor of Robert G. Crowder (pp. 69–93). Washington, DC: American Psychological Association.

Neumann, O. (1980). Selection of information and control of action. Unpublished doctoral dissertation, University of Bochum, Bochum, Germany.

Parris, B. A. (2014). Task conflict in the Stroop task: When Stroop interference decreases as Stroop facilitation increases in a low task conflict context. Frontiers in Psychology, 5 , 1182.

Parris, B. A., Sharma, D., & Weekes, B. (2007). An Optimal Viewing Position Effect in the Stroop Task When Only One Letter Is the Color Carrier. Experimental Psychology , 54 (4), 273–280. https://doi.org/10.1027/1618-3169.54.4.273 .

Parris, B. A., Augustinova, M., & Ferrand, L. (2019a). Editorial: The locus of the Stroop effect. Frontiers in Psychology . https://doi.org/10.3389/fpsyg.2019.02860

Parris, B. A., Sharma, D., Weekes, B. S. H., Momenian, M., Augustinova, M., & Ferrand, L. (2019b). Response modality and the Stroop task: Are there phonological Stroop effects with manual responses? Experimental Psychology, 66 (5), 361–367.

Parris, B. A., Wadsley, M. G., Hasshim, N., Benattayallah, A., Augustinova, M., & Ferrand, L. (2019c). An fMRI study of Response and Semantic conflict in the Stroop task. Frontiers in Psychology, 10 , 2426.

Phaf, R. H., Van Der Heijden, A. H. C., & Hudson, P. T. W. (1990). SLAM: A connectionist model for attention in visual selection tasks. Cognitive Psychology, 22 , 273–341.

Redding, G. M., & Gerjets, D. A. (1977). Stroop effects: Interference and facilitation with verbal and manual responses. Perceptual & Motor Skills, 45 , 11–17.

Regan, J. E. (1979). Automatic processing . (Doctoral dissertation, University of California, Berkeley, 1977). Dissertation Abstracts International 39, 1018-B.

Repovš, G. (2004). The mode of response and the Stroop effect: A reaction time analysis. Horizons of Psychology, 13 , 105–114.

Risko, E. F., Schmidt, J. R., & Besner, D. (2006). Filling a gap in the semantic gradient: Color associates and response set effects in the Stroop task. Psychonomic Bulletin & Review, 13 (2), 310–315.

Roelofs, A. (2003). Goal-referenced selection of verbal action: Modeling attentional control in the Stroop task. Psychological Review, 110 (1), 88–125.

Roelofs, A. (2010). Attention and Facilitation: Converging information versus inadvertent reading in Stroop task performance. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36 , 411–422.

Scheibe, K. E., Shaver, P. R., & Carrier, S. C. (1967). Color association values and response interference on variants of the Stroop test. Acta Psychologica, 26 , 286–295.

Schmidt, J. R. (2019). Evidence against conflict monitoring and adaptation: An updated review. Psychonomic Bulletin and Review, 26 (3), 753–771.

Schmidt, J. R., & Besner, D. (2008). The Stroop effect: Why proportion congruent has nothing to do with congruency and everything to do with contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34 (3), 514–523.

Schmidt, J. R., & Cheesman, J. (2005). Dissociating stimulus-stimulus and response-response effects in the Stroop task. Canadian Journal of Experimental Psychology, 59 (2), 132–138.

Schmidt, J. R., Hartsuiker, R. J., & De Houwer, J. (2018). Interference in Dutch-French bilinguals: Stimulus and response conflict in intra- and interlingual Stroop. Experimental Psychology, 65 (1), 13–22.

Schmidt, J. R., Notebaert, W., & Den Bussche, V. (2015). Is conflict adaptation an illusion? Frontiers in Psychology, 6 , 172.

Selimbegovič, L., Juneau, C., Ferrand, L., Spatola, N., & Augustinova, M. (2019). The Impact of Exposure to Unrealistically High Beauty standards on inhibitory control. L’année Psychologique/topics in Cognitive Psychology, 119 , 473–493.

Seymour, P. H. K. (1977). Conceptual encoding and locus of the Stroop effect. Quarterly Journal of Experimental Psychology, 29 (2), 245–265.

Shallice, T. (1988). From Neuropsychology to Mental Structure. Cambridge University Press; Cambridge.

Sharma, D., & McKenna, F. P. (1998). Differential components of the manual and vocal Stroop tasks. Memory & Cognition, 26 (5), 1033–1040.

Shichel, I., & Tzelgov, J. (2018). Modulation of conflicts in the Stroop effect. Acta Psychologica, 189 , 93–102.

Singer, M. H., Lappin, J. S., & Moore, L. P. (1975). The interference of various word parts on colour naming in the Stroop test. Perception & Psychophysics, 18 (3), 191–193.

Spieler, D. H., Balota, D. A., & Faust, M. E. (1996). Stroop performance in healthy younger and older adults and in individuals with dementia of the Alzheimer’s type. Journal of Experimental Psychology: Human Perception and Performance, 22 (2), 461.

Steinhauser, M., & Hubner, R. (2009). Distinguishing response conflict and task conflict in the Stroop task: Evidence from ex-Gaussian distribution analysis. Journal of Experimental Psychology. Human Perception and Performance, 35 (5), 1398–1412.

Stirling, N. (1979). Stroop interference: An input and an output phenomenon. The Quarterly Journal of Experimental Psychology, 31 (1), 121–132.

Strauss, E., Sherman, E., & Spreen, O. (2007). A compendium of neuropsychological tests: Administration, Norms and Commentary (3rd ed.). Oxford University Press.

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18 (6), 643–662.

Sugg, M. J., & McDonald, J. E. (1994). Time course of inhibition in color-response and word-response versions of the Stroop task. Journal of Experimental Psychology: Human Perception and Performance, 20 (3), 647–675.

Treisman, A. M. (1969). Strategies and models of selective attention. Psychological Review, 76 (3), 282–299.

Tsal, Y., & Benoni, H. (2010). Diluting the burden of load: Perceptual load effects are simply dilution effects. Journal of Experimental Psychology: Human Perception and Performance, 36 (6), 1645–1656.

Turken, A. U., & Swick, D. (1999). Response selection in the human anterior cingulate cortex. Nature Neuroscience, 2 , 920–924.

Tzelgov, J., Henik, A., Sneg, R., & Baruch, O. (1996). Unintentional word reading via the phonological route: The Stroop effect with cross-script homophones. Journal of Experimental Psychology: Learning, Memory and Cognition, 22 (2), 336–349.

Van Veen, V., & Carter, C. S. (2005). Separating semantic conflict and response conflict in the Stroop task: A functional MRI study. NeuroImage, 27 (3), 497–504.

Van Voorhis, B. A., & Dark, V. J. (1995). Semantic matching, response mode, and response mapping as contributors to retroactive and proactive priming. Journal of Experimental Psychology: Learning, Memory and Cognition, 21 , 913–932.

Virzi, R. A., & Egeth, H. E. (1985). Toward a Translational Model of Stroop Interference. Memory & Cognition, 13 (4), 304–319.

Walker, R., Deubel, H., Schneider, W., & Findlay, J. (1997). Effect of remote distractors on saccade programming: Evidence for an extended fixation zone. Journal of Neurophysiology, 78 , 1108–1119.

Wheeler, D. D. (1977). Locus of interference on the Stroop test. Perceptual and Motor Skills, 45 , 263–266.

White, D., Risko, E. F., & Besner, D. (2016). The semantic Stroop effect: An ex-Gaussian analysis. Psychonomic Bulletin & Review, 23 (5), 1576–1581.

Wühr, P., & Heuer, H. (2018). The impact of anatomical and spatial distance between responses on response conflict. Memory and Cognition, 46 , 994–1009.

Yamamoto, I., & S. & McLennan, C. T. . (2016). A reverse Stroop task with mouse tracking. Frontiers in Psychology, 7 , 670.

Zahedi, A., Rahman, R. A., Stürmer, B., & Sommer, W. (2019). Common and specific loci of Stroop effects in vocal and manual tasks, revealed by event-related brain potentials and post-hypnotic suggestions. Journal of Experiment Psychology: General. EPub ahead of print: http://dx.doi.org/ https://doi.org/10.1037/xge0000574

Zhang, H., & Kornblum, S. (1998). The effects of stimulus–response mapping and irrelevant stimulus–response and stimulus–stimulus overlap in four-choice Stroop tasks with single-carrier stimuli. Journal of Experimental Psychology: Human Perception and Performance, 24 (1), 3–19.

Zhang, H. H., Zhang, J., & Kornblum, S. (1999). A parallel distributed processing model of stimulus–stimulus and stimulus–response compatibility. Cognitive Psychology, 38 (3), 386–432.

Download references

The work reported was supported in part by ANR Grant ANR-19-CE28-0013 and RIN Tremplin Grant 19E00851 of Normandie Région, France.

Author information

Authors and affiliations.

Department of Psychology, Faculty of Science and Technology, Bournemouth University, Talbot Campus, Poole, Fern Barrow, BH12 5BB, UK

Benjamin A. Parris, Nabil Hasshim & Michael Wadsley

School of Psychology, University College Dublin, Dublin, Ireland

Nabil Hasshim

Normandie Université, UNIROUEN, CRFDP, 76000, Rouen, France

Maria Augustinova

Université Clermont Auvergne, CNRS, LAPSCO, 63000, Clermont-Ferrand, France

Ludovic Ferrand

School of Applied Social Sciences, De Montfort University, Leicester, UK

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Benjamin A. Parris .

Ethics declarations

Conflict of interest, additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Parris, B.A., Hasshim, N., Wadsley, M. et al. The loci of Stroop effects: a critical review of methods and evidence for levels of processing contributing to color-word Stroop effects and the implications for the loci of attentional selection. Psychological Research 86 , 1029–1053 (2022). https://doi.org/10.1007/s00426-021-01554-x

Download citation

Received : 10 July 2020

Accepted : 27 June 2021

Published : 13 August 2021

Issue Date : June 2022

DOI : https://doi.org/10.1007/s00426-021-01554-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

REVIEW article

The stroop color and word test.

\r\nFederica Scarpina,*

  • 1 “Rita Levi Montalcini” Department of Neuroscience, University of Turin, Turin, Italy
  • 2 IRCCS Istituto Auxologico Italiano, Ospedale San Giuseppe, Piancavallo, Italy
  • 3 CiMeC Center for the Mind/Brain Sciences, University of Trento, Rovereto, Italy

The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used to assess the ability to inhibit cognitive interference that occurs when the processing of a specific stimulus feature impedes the simultaneous processing of a second stimulus attribute, well-known as the Stroop Effect. The aim of the present work is to verify the theoretical adequacy of the various scoring methods used to measure the Stroop effect. We present a systematic review of studies that have provided normative data for the SCWT. We referred to both electronic databases (i.e., PubMed, Scopus, Google Scholar) and citations. Our findings show that while several scoring methods have been reported in literature, none of the reviewed methods enables us to fully assess the Stroop effect. Furthermore, we discuss several normative scoring methods from the Italian panorama as reported in literature. We claim for an alternative scoring method which takes into consideration both speed and accuracy of the response. Finally, we underline the importance of assessing the performance in all Stroop Test conditions (word reading, color naming, named color-word).

Introduction

The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used for both experimental and clinical purposes. It assesses the ability to inhibit cognitive interference, which occurs when the processing of a stimulus feature affects the simultaneous processing of another attribute of the same stimulus ( Stroop, 1935 ). In the most common version of the SCWT, which was originally proposed by Stroop in the 1935, subjects are required to read three different tables as fast as possible. Two of them represent the “congruous condition” in which participants are required to read names of colors (henceforth referred to as color-words) printed in black ink (W) and name different color patches (C). Conversely, in the third table, named color-word (CW) condition, color-words are printed in an inconsistent color ink (for instance the word “red” is printed in green ink). Thus, in this incongruent condition, participants are required to name the color of the ink instead of reading the word. In other words, the participants are required to perform a less automated task (i.e., naming ink color) while inhibiting the interference arising from a more automated task (i.e., reading the word; MacLeod and Dunbar, 1988 ; Ivnik et al., 1996 ). This difficulty in inhibiting the more automated process is called the Stroop effect ( Stroop, 1935 ). While the SCWT is widely used to measure the ability to inhibit cognitive interference; previous literature also reports its application to measure other cognitive functions such as attention, processing speed, cognitive flexibility ( Jensen and Rohwer, 1966 ), and working memory ( Kane and Engle, 2003 ). Thus, it may be possible to use the SCWT to measure multiple cognitive functions.

In the present article, we present a systematic review of the SCWT literature in order to assess the theoretical adequacy of the different scoring methods proposed to measure the Stroop effect ( Stroop, 1935 ). We focus on Italian literature, which reports the use of several versions of the SCWT that vary in in terms of stimuli, administration protocol, and scoring methods. Finally, we attempt to indicate a score method that allows measuring the ability to inhibit cognitive interference in reference to the subjects' performance in SCWT.

We looked for normative studies of the SCWT. All studies included a healthy adult population. Since our aim was to understand the various available scoring methods, no studies were excluded on the basis of age, gender, and/or education of participants, or the specific version of SCWT used (e.g., short or long, computerized or paper). Studies were identified using electronic databases and citations from a selection of relevant articles. The electronic databases searched included PubMed (All years), Scopus (All years) and Google Scholar (All years). The last search was run on the 22nd February, 2017, using the following search terms: “Stroop; test; normative.” All studies written in English and Italian were included.

Two independent reviewers screened the papers according to their titles and abstracts; no disagreements about suitability of the studies was recorded. Thereafter, a summary chart was prepared to highlight mandatory information that had to be extracted from each report (see Table 1 ).

www.frontiersin.org

Table 1. Summary of data extracted from reviewed articles; those related to the Italian normative data are in bold .

One Author extracted data from papers while the second author provided further supervision. No disagreements about extracted data emerged. We did not seek additional information from the original reports, except for Caffarra et al. (2002) , whose full text was not available: relevant information have been extracted from Barletta-Rodolfi et al. (2011) .

We extracted the following information from each article:

• Year of publication.

• Indexes whose normative data were provided.

Eventually, as regards the variables of interest, we focused on those scores used in the reviewed studies to assess the performance at the SCWT.

We identified 44 articles from our electronic search and screening process. Eleven of them were judged inadequate for our purpose and excluded. Four papers were excluded as they were written in languages other than English or Italian ( Bast-Pettersen, 2006 ; Duncan, 2006 ; Lopez et al., 2013 ; Rognoni et al., 2013 ); two were excluded as they included children ( Oliveira et al., 2016 ) and a clinical population ( Venneri et al., 1992 ). Lastly, we excluded six Stroop Test manuals, since not entirely procurable ( Trenerry et al., 1989 ; Artiola and Fortuny, 1999 ; Delis et al., 2001 ; Golden and Freshwater, 2002 ; Mitrushina et al., 2005 ; Strauss et al., 2006a ). At the end of the selection process we had 32 articles suitable for review (Figure 1 ).

www.frontiersin.org

Figure 1. Flow diagram of studies selection process .

From the systematic review, we extracted five studies with Italian normative data. Details are reported in Table 1 . Of the remaining 27 studies that provide normative data for non-Italian populations, 16 studies ( Ivnik et al., 1996 ; Ingraham et al., 1988 ; Rosselli et al., 2002 ; Moering et al., 2004 ; Lucas et al., 2005 ; Steinberg et al., 2005 ; Seo et al., 2008 ; Peña-Casanova et al., 2009 ; Al-Ghatani et al., 2011 ; Norman et al., 2011 ; Andrews et al., 2012 ; Llinàs-Reglà et al., 2013 ; Morrow, 2013 ; Lubrini et al., 2014 ; Rivera et al., 2015 ; Waldrop-Valverde et al., 2015 ) adopted the scoring method proposed by Golden (1978) . In this method, the number of items correctly named in 45 s in each conditions is calculated (i.e., W, C, CW). Then the predicted CW score (Pcw) is calculated using the following formula:

equivalent to:

Then, the Pcw value is subtracted from the actual number of items correctly named in the incongruous condition (CW) (i.e., IG = CW − Pcw): this procedure allows to obtain an interference score (IG) based on the performance in both W and C conditions. Thus, a negative IG value represents a pathological ability to inhibit interference, where a lower score means greater difficulty in inhibiting interference.

Six articles ( Troyer et al., 2006 ; Bayard et al., 2011 ; Campanholo et al., 2014 ; Bezdicek et al., 2015 ; Hankee et al., 2016 ; Tremblay et al., 2016 ) adopted the Victoria Stroop Test. In this version, three conditions are assessed: the C and the CW correspond to the equivalent conditions of the original version of the test ( Stroop, 1935 ), while the W condition includes common words which do not refer to colors. This condition represents an intermediate inhibition condition, as the interference effect between the written word and the color name is not present. In this SCWT form ( Strauss et al., 2006b ), for each condition, the completion time and the number of errors (corrected, non-corrected, and total errors) are recorded and two interference scores are computed:

Five studies ( Strickland et al., 1997 ; Van der Elst et al., 2006 ; Zalonis et al., 2009 ; Kang et al., 2013 ; Zimmermann et al., 2015 ) adopted different SCWT versions. Three of them ( Strickland et al., 1997 ; Van der Elst et al., 2006 ; Kang et al., 2013 ) computed, independently, the completion time and the number of errors for each condition. Additionally, Van der Elst et al. (2006) , computed an interference score based on the speed performance only:

where WT, CT, and CWT represent the time to complete the W, C, and CW table, respectively. Zalonis et al. (2009) recorded: (i) the time; (ii) the number of errors and (iii) the number of self-corrections in the CW. Moreover, they computed an interference score subtracting the number of errors in the CW conditions from the number of items properly named in 120 s in the same table. Lastly, Zimmermann et al. (2015) computed the number of errors and the number of correct answers given in 45 s in each conditions. Additionally, they calculated an interference score derived by the original scoring method provided by Stroop (1935) .

Of the five studies ( Barbarotto et al., 1998 ; Caffarra et al., 2002 ; Amato et al., 2006 ; Valgimigli et al., 2010 ; Brugnolo et al., 2015 ) that provide normative data for the Italian population, two are originally written in Italian ( Caffarra et al., 2002 ; Valgimigli et al., 2010 ), while the others are written in English ( Barbarotto et al., 1998 ; Amato et al., 2006 ; Brugnolo et al., 2015 ). An English translation of the title and abstract of Caffarra et al. (2002) is available. Three of the studies consider the performance only on the SCWT ( Caffarra et al., 2002 ; Valgimigli et al., 2010 ; Brugnolo et al., 2015 ) while the others also include other neuropsychological tests in the experimental assessment ( Barbarotto et al., 1998 ; Amato et al., 2006 ). The studies are heterogeneous in that they differ in terms of administered conditions, scoring procedures, number of items, and colors used. Three studies adopted a 100-items version of the SCWT ( Amato et al., 2006 ; Valgimigli et al., 2010 ; Brugnolo et al., 2015 ) which is similar to the original version proposed by Stroop (1935) . In this version, in every condition (i.e., W, C, CW), items are arranged in a matrix of 10 × 10 columns and rows; the colors are red, green, blue, brown, and purple. However, while two of these studies administered the W, C, and CW conditions once ( Amato et al., 2006 ; Valgimigli et al., 2010 ), Barbarotto et al. (1998) administered the CW table twice, requiring participants to read the word during the first administration and then to name the ink color during the consecutive administration. Additionally, they also administered a computerized version of the SCWT in which 40 stimuli are presented in each condition; red, blue, green, and yellow are used. Valgimigli et al. (2010) and Caffarra et al. (2002) administered shorter paper versions of the SCWT including only three colors (i.e., red, blue, green). More specifically, the former administered only the C and CW conditions including 60 items each, arranged in six columns of 10 items. The latter employed a version of 30 items for each condition (i.e., W, C, CW), arranged in three columns of 10 items each.

Only two of the five studies assessed and provided normative data for all the conditions of the SCWT (i.e., W, C, CW; Caffarra et al., 2002 ; Brugnolo et al., 2015 ), while others provide only partial results. Valgimigli et al. (2010) provided normative data only for the C and CW condition, while Amato et al. (2006) and Barbarotto et al. (1998) administered all the SCWT conditions (i.e., W, C, CW) but provide normative data only for the CW condition, and the C and CW condition respectively.

These studies use different methods to compute subjects' performance. Some studies record the time needed, independently in each condition, to read all ( Amato et al., 2006 ) or a fixed number ( Valgimigli et al., 2010 ) of presented stimuli. Others consider the number of correct answers produced in a fixed time (30 s; Amato et al., 2006 ; Brugnolo et al., 2015 ). Caffarra et al. (2002) and Valgimigli et al. (2010) provide a more complex interference index that relates the subject's performance in the incongruous condition with the performance in the others. In Caffarra et al. (2002) , two interference indexes based on reading speed and accuracy, respectively, are computed using the following formula:

Furthermore, in Valgimigli et al. (2010) an interference score is computed using the formula:

where DC represents the correct answers produced in 20 s in naming colors and DI corresponds to the correct answers achieved in 20 s in the interference condition. However, they do not take into account the performance on the word reading condition.

According to the present review, multiple SCWT scoring methods are available in literature, with Golden's (1978) version being the most widely used. In the Italian literature, the heterogeneity in SCWT scoring methods increases dramatically. The parameters of speed and accuracy of the performance, essential for proper detection of the Stroop Effect, are scored differently between studies, thus highlighting methodological inconsistencies. Some of the reviewed studies score solely the speed of the performance ( Amato et al., 2006 ; Valgimigli et al., 2010 ). Others measure both the accuracy and speed of performance ( Barbarotto et al., 1998 ; Brugnolo et al., 2015 ); however, they provide no comparisons between subjects' performance on the different SCWT conditions. On the other hand, Caffarra et al. (2002) compared performance in the W, C, and CW conditions; however, they computed speed and accuracy independently. Only Valgimigli et al. (2010) present a scoring method in which an index merging speed and accuracy is computed for the performance in all the conditions; however, the Authors assessed solely the performance in the C and the CW conditions, neglecting the subject's performance in the W condition.

In our opinion, the reported scoring methods impede an exhaustive description of the performance on the SCWT, as suggested by clinical practice. For instance, if only the reading time is scored, while accuracy is not computed ( Amato et al., 2006 ) or is computed independently ( Caffarra et al., 2002 ), the consequences of possible inhibition difficulties on the processing speed cannot be assessed. Indeed, patients would report a non-pathological reading speed in the incongruous condition, despite extremely poor performance, even if they do not apply the rule “naming ink color,” simply reading the word (e.g., in CW condition, when the stimulus is the word/red/printed in green ink, patient says “Red” instead of “Green”). Such behaviors provide an indication of the failure to maintain consistent activation of the intended response in the incongruent Stroop condition, even if the participants properly understand the task. Such scenarios are often reported in different clinical populations. For example, in the incongruous condition, patients with frontal lesions ( Vendrell et al., 1995 ; Stuss et al., 2001 ; Swick and Jovanovic, 2002 ) as well as patients affected by Parkinson's Disease ( Fera et al., 2007 ; Djamshidian et al., 2011 ) reported significant impairments in terms of accuracy, but not in terms of processing speed. Counting the number of correct answers in a fixed time ( Amato et al., 2006 ; Valgimigli et al., 2010 ; Brugnolo et al., 2015 ) may be a plausible solution.

Moreover, it must be noted that error rate (and not the speed) is an index of inhibitory control ( McDowd et al., 1995 ) or an index of ability to maintain the tasks goal temporarily in a highly retrievable state ( Kane and Engle, 2003 ). Nevertheless, computing exclusively the error rate (i.e., the accuracy in the performance), without measuring the speed of performance, would be insufficient for an extensive evaluation of the performance in the SCWT. In fact, the behavior in the incongruous condition (i.e., CW) may be affected by difficulties that are not directly related to an impaired ability to suppress the interference process, which may lead to misinterpretation of the patient's performance. People affected by color-blindness or dyslexia would represent the extreme case. Nonetheless, and more ordinarily, slowness, due to clinical circumstances like dysarthria, mood disorders such as depression, or collateral medication effect, may irremediably affect the performance in the SCWT. In Parkinson's Disease, ideomotor slowness ( Gardner et al., 1959 ; Jankovic et al., 1990 ) impacts the processing speed in all SCWT conditions, determining a global difficulty in the response execution rather than a specific impairment in the CW condition ( Stacy and Jankovic, 1992 ; Hsieh et al., 2008 ). Consequently, it seems necessary to relate the performance in the incongruous condition to word reading and color naming abilities, when inhibition capability has to be assessed, as proposed by Caffarra et al. (2002) . In this method the W score and C score were subtracted from CW score. However, as previously mentioned, the scoring method suggested by Caffarra et al. (2002) computes errors and speed separately. Thus, so far, none of the proposed Italian normative scoring methods seem adequate to assess patients' performance in the SCWT properly and informatively.

Examples of more suitable interference scores can be found in non-Italian literature. Stroop (1935) proposed that the ability to inhibit cognitive interference can be measured in the SCWT using the formula:

where, total time is the overall time for reading; mean time per word is the overall time for reading divided by the number of items; and the number of uncorrected errors is the number of errors not spontaneously corrected. Gardner et al. (1959) also propose a similar formula:

where 100 refers to the number of stimuli used in this version of the SCWT. When speed and errors are computed together, the correct recognition of patients who show difficulties in inhibiting interference despite a non-pathological reading time, increases. However, both the mentioned scores ( Stroop, 1935 ; Mitrushina et al., 2005 ) may be susceptible to criticism ( Jensen and Rohwer, 1966 ). In fact, even though accuracy and speed are merged into a global score in these studies ( Stroop, 1935 ; Mitrushina et al., 2005 ), they are not computed independently. In Gardner et al. (1959) the number of errors are computed in relation to the mean time per item and then added to the total time, which may be redundant and lead to a miscomputation.

The most adopted scoring method in the international panorama is Golden (1978) . Lansbergen et al. (2007) point out that the index IG might not be adequately corrected for inter-individual differences in the reading ability, despite its effective adjustment for color naming. The Authors highlight that the reading process is more automated in expert readers, and, consequently, they may be more susceptible to interference ( Lansbergen et al., 2007 ), thus, requiring that the score is weighted according to individual reading ability. However, experimental data suggests that the increased reading practice does not affect the susceptibility to interference in SCWT ( Jensen and Rohwer, 1966 ). Chafetz and Matthews (2004) 's article might be useful for a deeper understanding of the relationship between reading words and naming colors, but the debate about the role of reading ability on the inhibition process is still open. The issue about the role of reading ability on the SCWT performance cannot be adequately satisfied even if the Victoria Stroop Test scoring method ( Strauss et al., 2006b ) is adopted, since the absence of the standard W condition.

In the light of the previous considerations, we recommend that a scoring method for the SCWT should fulfill two main requirements. First, both accuracy and speed must be computed for all SCWT conditions. And secondly, a global index must be calculated to relate the performance in the incongruous condition to reading words and color naming abilities. The first requirement can be achieved by counting the number of correct answers in each condition in within a fixed time ( Amato et al., 2006 ; Valgimigli et al., 2010 ; Brugnolo et al., 2015 ). The second requirement can be achieved by subtracting the W score and C score from CW score, as suggested by Caffarra et al. (2002) . None of the studies reviewed satisfies both these requirements.

According to the review, the studies with Italian normative data present different theoretical interpretations of the SCWT scores. Amato et al. (2006) and Caffarra et al. (2002) describe the SCWT score as a measure of the fronto-executive functioning, while others use it as an index of the attentional functioning ( Barbarotto et al., 1998 ; Valgimigli et al., 2010 ) or of general cognitive efficiency ( Brugnolo et al., 2015 ). Slowing to a response conflict would be due to a failure of selective attention or a lack in the cognitive efficiency instead of a failure of response inhibition ( Chafetz and Matthews, 2004 ); however, the performance in the SCWT is not exclusively related to concentration, attention or cognitive effectiveness, but it relies to a more specific executive-frontal domain. Indeed, subjects have to process selectively a specific visual feature blocking out continuously the automatic processing of reading ( Zajano and Gorman, 1986 ; Shum et al., 1990 ), in order to solve correctly the task. The specific involvement of executive processes is supported by clinical data. Patients with anterior frontal lesions, and not with posterior cerebral damages, report significant difficulties in maintaining a consistent activation of the intended response ( Valgimigli et al., 2010 ). Furthermore, Parkinson's Disease patients, characterized by executive dysfunction due to the disruption of dopaminergic pathway ( Fera et al., 2007 ), reported difficulties in SCWT despite unimpaired attentional abilities ( Fera et al., 2007 ; Djamshidian et al., 2011 ).

According to the present review, the heterogeneity in the SCWT scoring methods in international literature, and most dramatically in Italian literature, seems to require an innovative, alternative and unanimous scoring system to achieve a more proper interpretation of the performance in the SCWT. We propose to adopt a scoring method in which (i) the number of correct answers in a fixed time in each SCWT condition (W, C, CW) and (ii) a global index relative to the CW performance minus reading and/or colors naming abilities, are computed. Further studies are required to collect normative data for this scoring method and to study its applicability in clinical settings.

Author Contributions

Conception of the work: FS. Acquisition of data: ST. Analysis and interpretation of data for the work: FS and ST. Writing: ST, and revising the work: FS. Final approval of the version to be published and agreement to be accountable for all aspects of the work: FS and ST.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The Authors thank Prerana Sabnis for her careful proofreading of the manuscript.

Al-Ghatani, A. M., Obonsawin, M. C., Binshaig, B. A., and Al-Moutaery, K. R. (2011). Saudi normative data for the Wisconsin Card Sorting test, Stroop test, test of non-verbal intelligence-3, picture completion and vocabulary (subtest of the wechsler adult intelligence scale-revised). Neurosciences 16, 29–41.

PubMed Abstract | Google Scholar

Amato, M. P., Portaccio, E., Goretti, B., Zipoli, V., Ricchiuti, L., De Caro, M. F., et al. (2006). The Rao's brief repeatable battery and stroop test: normative values with age, education and gender corrections in an Italian population. Mult. Scler. 12, 787–793. doi: 10.1177/1352458506070933

PubMed Abstract | CrossRef Full Text | Google Scholar

Andrews, K., Shuttleworth-Edwards, A., and Radloff, S. (2012). Normative indications for Xhosa speaking unskilled workers on the Trail Making and Stroop Tests. J. Psychol. Afr. 22, 333–341. doi: 10.1080/14330237.2012.10820538

CrossRef Full Text | Google Scholar

Artiola, L., and Fortuny, L. A. I. (1999). Manual de Normas Y Procedimientos Para la Bateria Neuropsicolog . Tucson, AZ: Taylor & Francis.

Barbarotto, R., Laiacona, M., Frosio, R., Vecchio, M., Farinato, A., and Capitani, E. (1998). A normative study on visual reaction times and two Stroop colour-word tests. Neurol. Sci. 19, 161–170. doi: 10.1007/BF00831566

Barletta-Rodolfi, C., Gasparini, F., and Ghidoni, E. (2011). Kit del Neuropsicologo Italiano . Bologna: Società Italiana di Neuropsicologia.

Bast-Pettersen, R. (2006). The Hugdahl Stroop Test: A normative stud y involving male industrial workers. J. Norwegian Psychol. Assoc. 43, 1023–1028.

Bayard, S., Erkes, J., and Moroni, C. (2011). Collège des psychologues cliniciens spécialisés en neuropsychologie du languedoc roussillon (CPCN Languedoc Roussillon). Victoria Stroop Test: normative data in a sample group of older people and the study of their clinical applications in the assessment of inhibition in Alzheimer's disease. Arch. Clin. Neuropsychol. 26, 653–661. doi: 10.1093/arclin/acr053

Bezdicek, O., Lukavsky, J., Stepankova, H., Nikolai, T., Axelrod, B. N., Michalec, J., et al. (2015). The Prague Stroop Test: normative standards in older Czech adults and discriminative validity for mild cognitive impairment in Parkinson's disease. J. Clin. Exp. Neuropsychol. 37, 794–807. doi: 10.1080/13803395.2015.1057106

Brugnolo, A., De Carli, F., Accardo, J., Amore, M., Bosia, L. E., Bruzzaniti, C., et al. (2015). An updated Italian normative dataset for the Stroop color word test (SCWT). Neurol. Sci. 37, 365–372. doi: 10.1007/s10072-015-2428-2

Caffarra, P., Vezzaini, G., Dieci, F., Zonato, F., and Venneri, A. (2002). Una versione abbreviata del test di Stroop: dati normativi nella popolazione italiana. Nuova Rivis. Neurol. 12, 111–115.

Campanholo, K. R., Romão, M. A., Machado, M. A. D. R., Serrao, V. T., Coutinho, D. G. C., Benute, G. R. G., et al. (2014). Performance of an adult Brazilian sample on the Trail Making Test and Stroop Test. Dement. Neuropsychol. 8, 26–31. doi: 10.1590/S1980-57642014DN81000005

Chafetz, M. D., and Matthews, L. H. (2004). A new interference score for the Stroop test. Arch. Clin. Neuropsychol. 19, 555–567. doi: 10.1016/j.acn.2003.08.004

Delis, D. C., Kaplan, E., and Kramer, J. H. (2001). Delis-Kaplan Executive Function System (D-KEFS) . San Antonio, TX: Psychological Corporation.

Djamshidian, A., O'Sullivan, S. S., Lees, A., and Averbeck, B. B. (2011). Stroop test performance in impulsive and non impulsive patients with Parkinson's disease. Parkinsonism Relat. Disord. 17, 212–214. doi: 10.1016/j.parkreldis.2010.12.014

Duncan, M. T. (2006). Assessment of normative data of Stroop test performance in a group of elementary school students Niterói. J. Bras. Psiquiatr. 55, 42–48. doi: 10.1590/S0047-20852006000100006

Fera, F., Nicoletti, G., Cerasa, A., Romeo, N., Gallo, O., Gioia, M. C., et al. (2007). Dopaminergic modulation of cognitive interference after pharmacological washout in Parkinson's disease. Brain Res. Bull. 74, 75–83. doi: 10.1016/j.brainresbull.2007.05.009

Gardner, R. W., Holzman, P. S., Klein, G. S., Linton, H. P., and Spence, D. P. (1959). Cognitive control: a study of individual consistencies in cognitive behaviour. Psychol. Issues 1, 1–186.

Golden, C. J. (1978). Stroop Color and Word Test: A Manual for Clinical and Experimental Uses . Chicago, IL: Stoelting Co.

Golden, C. J., and Freshwater, S. M. (2002). The Stroop Color and Word Test: A Manual for Clinical and Experimental Uses . Chicago, IL: Stoelting.

Hankee, L. D., Preis, S. R., Piers, R. J., Beiser, A. S., Devine, S. A., Liu, Y., et al. (2016). Population normative data for the CERAD word list and Victoria Stroop Test in younger-and middle-aged adults: cross-sectional analyses from the framingham heart study. Exp. Aging Res. 42, 315–328. doi: 10.1080/0361073X.2016.1191838

Hsieh, Y. H., Chen, K. J., Wang, C. C., and Lai, C. L. (2008). Cognitive and motor components of response speed in the Stroop test in Parkinson's disease patients. Kaohsiung J. Med. Sci. 24, 197–203. doi: 10.1016/S1607-551X(08)70117-7

Ingraham, L. J., Chard, F., Wood, M., and Mirsky, A. F. (1988). An Hebrew language version of the Stroop test. Percept. Mot. Skills 67, 187–192. doi: 10.2466/pms.1988.67.1.187

Ivnik, R. J., Malec, J. F., Smith, G. E., Tangalos, E. G., and Petersen, R. C. (1996). Neuropsychological tests' norms above age 55: COWAT, BNT, MAE token, WRAT-R reading, AMNART, STROOP, TMT, and JLO. Clin. Neuropsychol. 10, 262–278. doi: 10.1080/13854049608406689

Jankovic, J., McDermott, M., Carter, J., Gauthier, S., Goetz, C., Golbe, L., et al. (1990). Parkinson Study Group. Variable expression of Parkinson's disease: a base-line analysis of DATATOP cohort. Neurology 40, 1529–1534.

Jensen, A. R., and Rohwer, W. D. (1966). The Stroop Color-Word Test: a Review. Acta Psychol. 25, 36–93. doi: 10.1016/0001-6918(66)90004-7

PubMed Abstract | CrossRef Full Text

Kane, M. J., and Engle, R. W. (2003). Working-memory capacity and the control of attention: the contributions of goal neglect, response competition, and task set to Stroop interference. J. Exp. Psychol. Gen. 132, 47–70. doi: 10.1037/0096-3445.132.1.47

Kang, C., Lee, G. J., Yi, D., McPherson, S., Rogers, S., Tingus, K., et al. (2013). Normative data for healthy older adults and an abbreviated version of the Stroop test. Clin. Neuropsychol. 27, 276–289. doi: 10.1080/13854046.2012.742930

Lansbergen, M. M., Kenemans, J. L., and van Engeland, H. (2007). Stroop interference and attention-deficit/hyperactivity disorder: a review and meta-analysis. Neuropsychology 21:251. doi: 10.1037/0894-4105.21.2.251

Llinàs-Reglà, J., Vilalta-Franch, J., López-Pousa, S., Calvó-Perxas, L., and Garre-Olmo, J. (2013). Demographically adjusted norms for Catalan older adults on the Stroop Color and Word Test. Arch. Clin. Neuropsychol. 28, 282–296. doi: 10.1093/arclin/act003

Lopez, E., Salazar, X. F., Villasenor, T., Saucedo, C., and Pena, R. (2013). “Validez y datos normativos de las pruebas de nominación en personas con educación limitada,” in Poster Presented at The Congress of the “Sociedad Lationoamericana de Neuropsicologıa” (Montreal, QC).

Lubrini, G., Periañez, J. A., Rios-Lago, M., Viejo-Sobera, R., Ayesa-Arriola, R., Sanchez-Cubillo, I., et al. (2014). Clinical Spanish norms of the Stroop test for traumatic brain injury and schizophrenia. Span. J. Psychol. 17:E96. doi: 10.1017/sjp.2014.90

Lucas, J. A., Ivnik, R. J., Smith, G. E., Ferman, T. J., Willis, F. B., Petersen, R. C., et al. (2005). Mayo's older african americans normative studies: norms for boston naming test, controlled oral word association, category fluency, animal naming, token test, wrat-3 reading, trail making test, stroop test, and judgment of line orientation. Clin. Neuropsychol. 19, 243–269. doi: 10.1080/13854040590945337

MacLeod, C. M., and Dunbar, K. (1988). Training and Stroop-like interference: evidence for a continuum of automaticity. J. Exp. Psychol. Learn. Mem. Cogn. 14, 126–135. doi: 10.1037/0278-7393.14.1.126

McDowd, J. M., Oseas-Kreger, D. M., and Filion, D. L. (1995). “Inhibitory processes in cognition and aging,” in Interference and Inhibition in Cognition , eds F. N. Dempster and C. J. Brainerd (San Diego, CA: Academic Press), 363–400.

Google Scholar

Mitrushina, M., Boone, K. B., Razani, J., and D'Elia, L. F. (2005). Handbook of Normative Data for Neuropsychological Assessment . New York, NY: Oxford University Press.

Moering, R. G., Schinka, J. A., Mortimer, J. A., and Graves, A. B. (2004). Normative data for elderly African Americans for the Stroop color and word test. Arch. Clin. Neuropsychol. 19, 61–71. doi: 10.1093/arclin/19.1.61

Morrow, S. A. (2013). Normative data for the stroop color word test for a north american population. Can. J. Neurol. Sci. 40, 842–847. doi: 10.1017/S0317167100015997

Norman, M. A., Moore, D. J., Taylor, M., Franklin, D. Jr., Cysique, L., Ake, C., et al. (2011). Demographically corrected norms for African Americans and Caucasians on the hopkins verbal learning test–revised, brief visuospatial memory test–revised, stroop color and word test, and wisconsin card sorting test 64-card version. J. Clin. Exp. Neuropsychol. 33, 793–804. doi: 10.1080/13803395.2011.559157

Oliveira, R. M., Mograbi, D. C., Gabrig, I. A., and Charchat-Fichman, H. (2016). Normative data and evidence of validity for the Rey Auditory Verbal Learning Test, Verbal Fluency Test, and Stroop Test with Brazilian children. Psychol. Neurosci. 9, 54–67. doi: 10.1037/pne0000041

Peña-Casanova, J., Qui-ones-Ubeda, S., Gramunt-Fombuena, N., Quintana, M., Aguilar, M., Molinuevo, J. L., et al. (2009). Spanish multicenter normative studies (NEURONORMA Project): norms for the Stroop color-word interference test and the Tower of London-Drexel. Arch. Clin. Neuropsychol. 24, 413–429. doi: 10.1093/arclin/acp043

Rivera, D., Perrin, P. B., Stevens, L. F., Garza, M. T., Weil, C., Saracho, C. P., et al. (2015). Stroop color-word interference test: normative data for the Latin American Spanish speaking adult population. Neurorehabilitation 37, 591–624. doi: 10.3233/NRE-151281

Rognoni, T., Casals-Coll, M., Sánchez-Benavides, G., Quintana, M., Manero, R. M., Calvo, L., et al. (2013). Spanish normative studies in a young adult population (NEURONORMA young adults Project): norms for the Boston Naming Test and the Token Test. Neurología 28, 73–80. doi: 10.1016/j.nrl.2012.02.009

Rosselli, M., Ardila, A., Santisi, M. N., Arecco Mdel, R., Salvatierra, J., Conde, A., et al. (2002). Stroop effect in Spanish–English bilinguals. J. Int. Neuropsychol. Soc. 8, 819–827. doi: 10.1017/S1355617702860106

Seo, E. H., Lee, D. Y., Kim, S. G., Kim, K. W., Youn, J. C., Jhoo, J. H., et al. (2008). Normative study of the Stroop Color and Word Test in an educationally diverse elderly population. Int. J. Geriatr. Psychiatry 23, 1020–1027 doi: 10.1002/gps.2027

Shum, D. H. K., McFarland, K. A., and Brain, J. D. (1990). Construct validity of eight tests of attention: comparison of normal and closed head injured samples. Clin. Neuropsychol. 4, 151–162. doi: 10.1080/13854049008401508

Stacy, M., and Jankovic, J. (1992). Differential diagnosis of parkinson's disease and the parkinsonism plus syndrome. Neurol. Clin. 10, 341–359.

Steinberg, B. A., Bieliauskas, L. A., Smith, G. E., and Ivnik, R. J. (2005). Mayo's older Americans normative studies: age-and IQ-adjusted norms for the trail-making test, the stroop test, and MAE controlled oral word association test. Clin. Neuropsychol. 19, 329–377. doi: 10.1080/13854040590945210

Strauss, E., Sherman, E. M., and Spreen, O. (2006a). A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary . Oxford: American Chemical Society.

Strauss, E., Sherman, E. M. S., and Spreen, O. (2006b). A Compendium of Neuropsychological Tests, 3rd Edn. New York, NY: Oxford University Press.

Strickland, T. L., D'Elia, L. F., James, R., and Stein, R. (1997). Stroop color-word performance of African Americans. Clin. Neuropsychol. 11, 87–90. doi: 10.1080/13854049708407034

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. J. Exp. Psychol. 18, 643–662. doi: 10.1037/h0054651

Stuss, D. T., Floden, D., Alexander, M. P., Levine, B., and Katz, D. (2001). Stroop performance in focal lesion patients: dissociation of processes and frontal lobe lesion location. Neuropsychologia 39, 771–786. doi: 10.1016/S0028-3932(01)00013-6

Swick, D., and Jovanovic, J. (2002). Anterior cingulate cortex and the Stroop task: neuropsychological evidence for topographic specificity. Neuropsychologia 40, 1240–1253. doi: 10.1016/S0028-3932(01)00226-3

Tremblay, M. P., Potvin, O., Belleville, S., Bier, N., Gagnon, L., Blanchet, S., et al. (2016). The victoria stroop test: normative data in Quebec-French adults and elderly. Arch. Clin. Neuropsychol. 31, 926–933. doi: 10.1093/arclin/acw029

Trenerry, M. R., Crosson, B., DeBoe, J., and Leber, W. R. (1989). Stroop Neuropsychological Screening Test . Odessa, FL: Psychological Assessment Resources.

Troyer, A. K., Leach, L., and Strauss, E. (2006). Aging and response inhibition: normative data for the Victoria Stroop Test. Aging Neuropsychol. Cogn. 13, 20–35. doi: 10.1080/138255890968187

Valgimigli, S., Padovani, R., Budriesi, C., Leone, M. E., Lugli, D., and Nichelli, P. (2010). The Stroop test: a normative Italian study on a paper version for clinical use. G. Ital. Psicol. 37, 945–956. doi: 10.1421/33435

Van der Elst, W., Van Boxtel, M. P., Van Breukelen, G. J., and Jolles, J. (2006). The Stroop Color-Word Test influence of age, sex, and education; and normative data for a large sample across the adult age range. Assessment 13, 62–79. doi: 10.1177/1073191105283427

Vendrell, P., Junqué, C., Pujol, J., Jurado, M. A., Molet, J., and Grafman, J. (1995). The role of prefrontal regions in the Stroop task. Neuropsychologia 33, 341–352. doi: 10.1016/0028-3932(94)00116-7

Venneri, A., Molinari, M. A., Pentore, R., Cotticelli, B., Nichelli, P., and Caffarra, P. (1992). Shortened Stroop color-word test: its application in normal aging and Alzheimer's disease. Neurobiol. Aging 13, S3–S4. doi: 10.1016/0197-4580(92)90135-K

CrossRef Full Text

Waldrop-Valverde, D., Ownby, R. L., Jones, D. L., Sharma, S., Nehra, R., Kumar, A. M., et al. (2015). Neuropsychological test performance among healthy persons in northern India: development of normative data. J. Neurovirol. 21, 433–438. doi: 10.1007/s13365-015-0332-4

Zajano, M. J., and Gorman, A. (1986). Stroop interference as a function of percentage of congruent items. Percept. Mot. Skills 63, 1087–1096. doi: 10.2466/pms.1986.63.3.1087

Zalonis, I., Christidi, F., Bonakis, A., Kararizou, E., Triantafyllou, N. I., Paraskevas, G., et al. (2009). The stroop effect in Greek healthy population: normative data for the Stroop Neuropsychological Screening Test. Arch. Clin. Neuropsychol. 24, 81–88. doi: 10.1093/arclin/acp011

Zimmermann, N., Cardoso, C. D. O., Trentini, C. M., Grassi-Oliveira, R., and Fonseca, R. P. (2015). Brazilian preliminary norms and investigation of age and education effects on the Modified Wisconsin Card Sorting Test, Stroop Color and Word test and Digit Span test in adults. Dement. Neuropsychol. 9, 120–127. doi: 10.1590/1980-57642015DN92000006

Keywords: stroop color and word test, neuropsychological assessment, inhibition, executive functions, systematic review

Citation: Scarpina F and Tagini S (2017) The Stroop Color and Word Test Front. Psychol. 8:557. doi: 10.3389/fpsyg.2017.00557

Received: 10 November 2016; Accepted: 27 March 2017; Published: 12 April 2017.

Reviewed by:

Copyright © 2017 Scarpina and Tagini. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Federica Scarpina, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Half a century of research on the Stroop effect: An integrative review

Profile image of Colin Macleod

1991, Psychological Bulletin

Related Papers

Kevin Niall Dunbar

In this paper we present data that is inconsistent with the idea that the source of interference in the Stroop effect is the automaticity of word reading. We show that when word reading is made controlled ink colors still interfere with the word reading. In our 1988 paper we demonstrate that intereference is due to the relative strength of two processes

stroop effect research paper

Journal of Experimental Psychology: Human Perception and Performance

Colin Macleod

Journal of Experimental Psychology-learning Memory and Cognition

David Leiser , Joseph Tzelgov

PsyCh Journal

Emily Elliott

Three experiments varied the extent of practice in an analog of the Stroop color-word task. Each experiment involved four phases: (a) baseline naming of four familiar colors, (b) training in consistently naming four novel shapes by using the names of the same four colors, (c) naming the colors when they appeared in the form of the shapes, and (d) naming the shapes when they appeared in color. .... The overall pattern is inconsistent with a simple speed of processing account ofi nterference. The alternative idea of a continuum of automaticity-- a direct consequence of training--remains plausible, and the implications of this perspective are considered. This idea was followed up in our connectionist model and in our fMRI and fNIRS work

Trends in Cognitive Sciences

Colin Macleod , Penny Macdonald

Consciousness and Cognition

Memory & Cognition

The Journal of General Psychology

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED PAPERS

Archives of General Psychiatry

Gordon Logan

Sharon Mutter

Psychological Research

nitzan shahar

International Journal of Environmental Research and Public Health

Thomas Kleinsorge

Acta Psychologica

Alper Soydan

Psychiatry Research

Thomas Nordahl , Marc Chaderjian

International journal of multilingualism

Elena Mizrahi

Soledad Ballesteros

Dr Dinkar Sharma , Frank McKenna

redalyc.uaemex.mx

Experimental psychology

Daniel Algom

Daniela Aisenberg

Journal of Abnormal Psychology

Jean-roch Laurence

Psychonomic bulletin & review

Derek Besner , Laurie Manwell

Neuropsychologia

Fabienne Collette , Christophe Phillips , Kevin Dostilio

Michael Proulx

American Annals of the Deaf

Aidan Moran

Neuropsychology

David Hardy

Stephen Monsell , Tim Taylor

Varinia Heidel

Jochen Laubrock

Semion Kertzman

Psychonomic Bulletin & Review

Hedderik Van Rijn

Basic and Clinical Neuroscience Journal

mazaher rezaei

Stéphanie Ducrot

Cognitive, Affective, & Behavioral Neuroscience

Stephanie D Preston

Experimental Psychology

Dr Dinkar Sharma

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Reclaiming the Stroop Effect Back From Control to Input-Driven Attention and Perception

Affiliations.

  • 1 School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
  • 2 Department of Education and Psychology, Open University of Israel, Ra'anana, Israel.
  • PMID: 31428008
  • PMCID: PMC6688540
  • DOI: 10.3389/fpsyg.2019.01683

According to a growing consensus, the Stroop effect is understood as a phenomenon of conflict and cognitive control. A tidal wave of recent research alleges that incongruent Stroop stimuli generate conflict, which is then managed and resolved by top-down cognitive control. We argue otherwise: control studies fail to account for major Stroop results obtained over a century-long history of research. We list some of the most compelling developments and show that no control account can serve as a viable explanation for major Stroop phenomena and that there exist more parsimonious explanations for other Stroop related phenomena. Against a wealth of studies and emerging consensus, we posit that data-driven selective attention best accounts for the gamut of existing Stroop results. The case for data-driven attention is not new: a mere twenty-five years ago, the Stroop effect was considered "the gold standard" of attention (MacLeod, 1992). We identify four pitfalls plaguing conflict monitoring and control studies of the Stroop effect and show that the notion of top-down control is gratuitous. Looking at the Stroop effect from a historical perspective, we argue that the recent paradigm change from stimulus-driven selective attention to control is unwarranted. Applying Occam's razor, the effects marshaled in support of the control view are better explained by a selectivity of attention account. Moreover, many Stroop results, ignored in the control literature, are inconsistent with any control account of the effect.

Keywords: Stroop; conflict; congruity; contingency; control; salience.

PubMed Disclaimer

Schematics of the influence of…

Schematics of the influence of relative salience on the outcome of the Stroop…

The influence of stimulus makeup…

The influence of stimulus makeup on the Stroop effect: the larger the baseline…

Anatomy of the standard Stroop…

Anatomy of the standard Stroop experiment: Four color words are combined factorially with…

Allocation of colors to words…

Allocation of colors to words to form the set of color-word stimuli in…

The relation between the color-word…

The relation between the color-word correlation built into the experimental design, usually by…

Possible chain of reasoning accommodating…

Possible chain of reasoning accommodating both the basic Stroop findings reviewed in the…

Similar articles

  • Can the Stroop effect serve as the gold standard of conflict monitoring and control? A conceptual critique. Algom D, Fitousi D, Chajut E. Algom D, et al. Mem Cognit. 2022 Jul;50(5):883-897. doi: 10.3758/s13421-021-01251-5. Epub 2021 Nov 11. Mem Cognit. 2022. PMID: 34766252
  • Task conflict and proactive control: A computational theory of the Stroop task. Kalanthroff E, Davelaar EJ, Henik A, Goldfarb L, Usher M. Kalanthroff E, et al. Psychol Rev. 2018 Jan;125(1):59-82. doi: 10.1037/rev0000083. Epub 2017 Oct 16. Psychol Rev. 2018. PMID: 29035077
  • Proportion congruency and practice: A contingency learning account of asymmetric list shifting effects. Schmidt JR. Schmidt JR. J Exp Psychol Learn Mem Cogn. 2016 Sep;42(9):1496-505. doi: 10.1037/xlm0000254. J Exp Psychol Learn Mem Cogn. 2016. PMID: 27585071
  • Task Conflict and Task Control: A Mini-Review. Littman R, Keha E, Kalanthroff E. Littman R, et al. Front Psychol. 2019 Jul 17;10:1598. doi: 10.3389/fpsyg.2019.01598. eCollection 2019. Front Psychol. 2019. PMID: 31379659 Free PMC article. Review.
  • Evidence against conflict monitoring and adaptation: An updated review. Schmidt JR. Schmidt JR. Psychon Bull Rev. 2019 Jun;26(3):753-771. doi: 10.3758/s13423-018-1520-z. Psychon Bull Rev. 2019. PMID: 30511233 Review.
  • A spatial version of the Stroop task for examining proactive and reactive control independently from non-conflict processes. Spinelli G, Lupker SJ. Spinelli G, et al. Atten Percept Psychophys. 2024 May;86(4):1259-1286. doi: 10.3758/s13414-024-02892-9. Epub 2024 Apr 30. Atten Percept Psychophys. 2024. PMID: 38691237 Free PMC article.
  • Effect of neuronavigated repetitive Transcranial Magnetic Stimulation on pain, cognition and cortical excitability in fibromyalgia syndrome. Tiwari VK, Kumar A, Nanda S, Chaudhary S, Sharma R, Kumar U, Kumaran SS, Bhatia R. Tiwari VK, et al. Neurol Sci. 2024 Jul;45(7):3421-3433. doi: 10.1007/s10072-024-07317-x. Epub 2024 Jan 25. Neurol Sci. 2024. PMID: 38270728 Clinical Trial.
  • Conflict detection and resolution in macaque frontal eye fields. Yao T, Vanduffel W. Yao T, et al. Commun Biol. 2024 Jan 23;7(1):119. doi: 10.1038/s42003-024-05800-x. Commun Biol. 2024. PMID: 38263256 Free PMC article.
  • The Stroop legacy: A cautionary tale on methodological issues and a proposed spatial solution. Viviani G, Visalli A, Montefinese M, Vallesi A, Ambrosini E. Viviani G, et al. Behav Res Methods. 2024 Aug;56(5):4758-4785. doi: 10.3758/s13428-023-02215-0. Epub 2023 Aug 24. Behav Res Methods. 2024. PMID: 37620747 Free PMC article. Review.
  • From functional neuroimaging to neurostimulation: fNIRS devices as cognitive enhancers. Waight JL, Arias N, Jiménez-García AM, Martini M. Waight JL, et al. Behav Res Methods. 2024 Mar;56(3):2227-2242. doi: 10.3758/s13428-023-02144-y. Epub 2023 Jul 28. Behav Res Methods. 2024. PMID: 37507648 Free PMC article.
  • Abrahamse E., Braem S., Notebaert W., Verguts T. (2016). Grounding cognitive control in associative learning. Psychol. Bull. 142, 693–728. 10.1037/bul0000047 - DOI - PubMed
  • Algom D., Chajut E., Lev S. (2004). A rational look at the emotional Stroop phenomenon: a generic slowdown, not a Stroop effect. J. Exp. Psychol. Gen. 133, 323–338. 10.1037/0096-3445.133.3.323, PMID: - DOI - PubMed
  • Algom D., Dekel A., Pansky A. (1996). The perception of number from the separability of the stimulus: the Stroop effect revisited. Mem. Cogn. 24, 557–572. 10.3758/BF03201083, PMID: - DOI - PubMed
  • Algom D., Fitousi D. (2016). Half a century of research on Garner interference and the separability–integrality distinction. Psychol. Bull. 142, 1352–1383. 10.1037/bul0000072, PMID: - DOI - PubMed
  • Algom D., Zakay D., Monar O., Chajut E. (2009). Wheel chairs and arm chairs: a novel experimental design for the emotional Stroop effect. Cognit. Emot. 23, 1552–1564. 10.1080/02699930802490243 - DOI

Related information

Linkout - more resources, full text sources.

  • Europe PubMed Central
  • Frontiers Media SA
  • PubMed Central

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

socsci-logo

Article Menu

stroop effect research paper

  • Subscribe SciFeed
  • Recommended Articles
  • Author Biographies
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Impacts of generative artificial intelligence in higher education: research trends and students’ perceptions.

stroop effect research paper

1. Introduction

2. materials and methods.

  • “Generative Artificial Intelligence” or “Generative AI” or “Gen AI”, AND;
  • “Higher Education” or “University” or “College” or “Post-secondary”, AND;
  • “Impact” or “Effect” or “Influence”.
  • Q1— Does GenAI have more positive or negative effects on higher education? Options (to choose one): 1. It has more negative effects than positives; 2. It has more positive effects than negative; 3. There is a balance between positive and negative effects; 4. Don’t know.
  • Q2— Identify the main positive effect of Gen AI in an academic context . Open-ended question.
  • Q3— Identify the main negative effect of Gen AI in an academic context . Open-ended question.

3.1. Impacts of Gen AI in HE: Research Trends

3.1.1. he with gen ai, the key role that pedagogy must play, new ways to enhance the design and implementation of teaching and learning activities.

  • Firstly, prompting in teaching should be prioritized as it plays a crucial role in developing students’ abilities. By providing appropriate prompts, educators can effectively guide students toward achieving their learning objectives.
  • Secondly, configuring reverse prompting within the capabilities of Gen AI chatbots can greatly assist students in monitoring their learning progress. This feature empowers students to take ownership of their education and fosters a sense of responsibility.
  • Furthermore, it is essential to embed digital literacy in all teaching and learning activities that aim to leverage the potential of the new Gen AI assistants. By equipping students with the necessary skills to navigate and critically evaluate digital resources, educators can ensure that they are prepared for the digital age.

The Student’s Role in the Learning Experience

The key teacher’s role in the teaching and learning experience, 3.1.2. assessment in gen ai/chatgpt times, the need for new assessment procedures, 3.1.3. new challenges to academic integrity policies, new meanings and frontiers of misconduct, personal data usurpation and cheating, 3.2. students’ perceptions about the impacts of gen ai in he.

  • “It harms the learning process”: ▪ “What is generated by Gen AI has errors”; ▪ “Generates dependence and encourages laziness”; ▪ “Decreases active effort and involvement in the learning/critical thinking process”.

4. Discussion

  • Training: providing training for both students and teachers on effectively using and integrating Gen AI technologies into teaching and learning practices.
  • Ethical use and risk management: developing policies and guidelines for ethical use and risk management associated with Gen AI technologies.
  • Incorporating AI without replacing humans: incorporating AI technologies as supplementary tools to assist teachers and students rather than replacements for human interaction.
  • Continuously enhancing holistic competencies: encouraging the use of AI technologies to enhance specific skills, such as digital competence and time management, while ensuring that students continue to develop vital transferable skills.
  • Fostering a transparent AI environment: promoting an environment in which students and teachers can openly discuss the benefits and concerns associated with using AI technologies.
  • Data privacy and security: ensuring data privacy and security using AI technologies.
  • The dynamics of technological support to align with the most suitable Gen AI resources;
  • The training policy to ensure that teachers, students, and academic staff are properly trained to utilize the potential of Gen AI and its tools;
  • Security and data protection policies;
  • Quality and ethical action policies.

5. Conclusions

  • Database constraints: the analysis is based on existing publications in SCOPUS and the Web of Science, potentially omitting relevant research from other sources.
  • Inclusion criteria: due to the early stage of scientific production on this topic, all publications were included in the analysis, rather than focusing solely on articles from highly indexed journals and/or with a high number of citations as recommended by bibliometric and systematic review best practices.
  • Dynamic landscape: the rate of publications on Gen AI has been rapidly increasing and diversifying in 2024, highlighting the need for ongoing analysis to track trends and changes in scientific thinking.

Author Contributions

Institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

  • Akakpo, Martin Gameli. 2023. Skilled for the Future: Information Literacy for AI Use by University Students in Africa and the Role of Librarians. Internet Reference Services Quarterly 28: 19–26. [ Google Scholar ] [ CrossRef ]
  • AlAfnan, Mohammad Awad, Samira Dishari, Marina Jovic, and Koba Lomidze. 2023. ChatGPT as an Educational Tool: Opportunities, Challenges, and Recommendations for Communication, Business Writing, and Composition Courses. Journal of Artificial Intelligence and Technology 3: 60–68. [ Google Scholar ] [ CrossRef ]
  • Almaraz-López, Cristina, Fernando Almaraz-Menéndez, and Carmen López-Esteban. 2023. Comparative Study of the Attitudes and Perceptions of University Students in Business Administration and Management and in Education toward Artificial Intelligence. Education Sciences 13: 609. [ Google Scholar ] [ CrossRef ]
  • Al-Zahrani, Abdulrahman. 2023. The impact of generative AI tools on researchers and research: Implications for academia in higher education. Innovations in Education and Teaching International , 1–15. [ Google Scholar ] [ CrossRef ]
  • Athilingam, Ponrathi, and Hong-Gu He. 2023. ChatGPT in nursing education: Opportunities and challenges. Teaching and Learning in Nursing 19: 97–101. [ Google Scholar ] [ CrossRef ]
  • Álvarez-Álvarez, Carmen, and Samuel Falcon. 2023. Students’ preferences with university teaching practices: Analysis of testimonials with artificial intelligence. Educational Technology Research and Development 71: 1709–24. [ Google Scholar ] [ CrossRef ]
  • Bannister, Peter, Elena Alcalde Peñalver, and Alexandra Santamaría Urbieta. 2023. Transnational higher education cultures and generative AI: A nominal group study for policy development in English medium instruction. Journal for Multicultural Education . ahead-of-print . [ Google Scholar ] [ CrossRef ]
  • Bearman, Margaret, and Rola Ajjawi. 2023. Learning to work with the black box: Pedagogy for a world with artificial intelligence. British Journal of Educational Technology 54: 1160–73. [ Google Scholar ] [ CrossRef ]
  • Boháček, Matyas. 2023. The Unseen A+ Student: Evaluating the Performance and Detectability of Large Language Models in the Classroom. CEUR Workshop Proceedings 3487: 89–100. Available online: https://openreview.net/pdf?id=9ZKJLYg5EQ (accessed on 7 January 2024).
  • Chan, Cecilia Ka Yuk. 2023. A comprehensive AI policy education framework for university teaching and learning. International Journal of Educational Technology in Higher Education 20: 38. [ Google Scholar ] [ CrossRef ]
  • Chan, Cecilia Ka Yuk, and Wenjie Hu. 2023. Students’ voices on generative AI: Perceptions, benefits, and challenges in higher education. International Journal of Educational Technology in Higher Education 20: 43. [ Google Scholar ] [ CrossRef ]
  • Chan, Cecilia Ka Yuk, and Wenxin Zhou. 2023. An expectancy value theory (EVT) based instrument for measuring student perceptions of generative AI. Smart Learning Environments 10: 64. [ Google Scholar ] [ CrossRef ]
  • Chang, Daniel H., Michael Pin-Chuan Lin, Shiva Hajian, and Quincy Q. Wang. 2023. Educational Design Principles of Using AI Chatbot That Supports Self-Regulated Learning in Education: Goal Setting, Feedback, and Personalization. Sustainability 15: 12921. [ Google Scholar ] [ CrossRef ]
  • Chiu, Thomas. 2023. The impact of Generative AI (GenAI) on practices, policies and research direction in education: A case of ChatGPT and Midjourney. Interactive Learning Environments , 1–17. [ Google Scholar ] [ CrossRef ]
  • Chun, John, and Katherine Elkins. 2023. The Crisis of Artificial Intelligence: A New Digital Humanities Curriculum for Human-Centred AI. International Journal of Humanities and Arts Computing 17: 147–67. [ Google Scholar ] [ CrossRef ]
  • Cowling, Michael, Joseph Crawford, Kelly-Ann Allen, and Michael Wehmeyer. 2023. Using leadership to leverage ChatGPT and artificial intelligence for undergraduate and postgraduate research supervision. Australasian Journal of Educational Technology 39: 89–103. [ Google Scholar ] [ CrossRef ]
  • Crawford, Joseph, Carmen Vallis, Jianhua Yang, Rachel Fitzgerald, Christine O’Dea, and Michael Cowling. 2023a. Editorial: Artificial Intelligence is Awesome, but Good Teaching Should Always Come First. Journal of University Teaching & Learning Practice 20: 01. [ Google Scholar ] [ CrossRef ]
  • Crawford, Joseph, Michael Cowling, and Kelly-Ann Allen. 2023b. Leadership is needed for ethical ChatGPT: Character, assessment, and learning using artificial intelligence (AI). Journal of University Teaching & Learning Practice 20: 02. [ Google Scholar ] [ CrossRef ]
  • Currie, Geoffrey. 2023a. A Conversation with ChatGPT. Journal of Nuclear Medicine Technology 51: 255–60. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Currie, Geoffrey. 2023b. GPT-4 in Nuclear Medicine Education: Does It Outperform GPT-3.5? Journal of Nuclear Medicine Technology 51: 314–17. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Currie, Geoffrey, and Kym Barry. 2023. ChatGPT in Nuclear Medicine Education. Journal of Nuclear Medicine Technology 51: 247–54. [ Google Scholar ] [ CrossRef ]
  • Currie, Geoffrey, Clare Singh, Tarni Nelson, Caroline Nabasenja, Yazan Al-Hayek, and Kelly Spuur. 2023. ChatGPT in medical imaging higher education. Radiography 29: 792–99. [ Google Scholar ] [ CrossRef ]
  • Dai, Yun, Ang Liu, and Cher P. Lim. 2023. Reconceptualizing Chatgpt and Generative AI as a Student-driven Innovation in Higher Education. Procedia CIRP Volume 119: 84–90. [ Google Scholar ] [ CrossRef ]
  • Dogru, Tarik, Nathana Line, Lydia Hanks, Fulya Acikgoz, Je’Anna Abbott, Selim Bakir, Adiyukh Berbekova, Anil Bilgihan, Ali Iskender, Murat Kizildag, and et al. 2023. The implications of generative artificial intelligence in academic research and higher education in tourism and hospitality. Tourism Economics 30: 1083–94. [ Google Scholar ] [ CrossRef ]
  • Duong, Cong Doanh, Trong Nghia Vu, and Thi Viet Nga Ngo. 2023. Applying a modified technology acceptance model to explain higher education students’ usage of ChatGPT: A serial multiple mediation model with knowledge sharing as a moderator. The International Journal of Management Education 21: 100883. [ Google Scholar ] [ CrossRef ]
  • Eager, Bronwyn, and Ryan Brunton. 2023. Prompting Higher Education Towards AI-Augmented Teaching and Learning Practice. Journal of University Teaching & Learning Practice 20: 5. [ Google Scholar ] [ CrossRef ]
  • Elkhodr, Mahmoud, Ergun Gide, Robert Wu, and Omar Darwish. 2023. ICT students’ perceptions towards ChatGPT: An experimental reflective lab analysis. STEM Education 3: 70–88. [ Google Scholar ] [ CrossRef ]
  • Farrelly, Tom, and Nick Baker. 2023. Generative Artificial Intelligence: Implications and Considerations for Higher Education Practice. Education Sciences 13: 1109. [ Google Scholar ] [ CrossRef ]
  • Farrokhnia, Mohammadreza, Seyyed Banihashem, Seyyed Kazem Banihashem, Omid Noroozi, and Arjen Wals. 2023. A SWOT analysis of ChatGPT: Implications for educational practice and research. Innovations in Education and Teaching International 61: 460–74. [ Google Scholar ] [ CrossRef ]
  • Gong, Furong. 2023. The Impact of Generative AI like ChatGPT on Digital Literacy Education in University Libraries. Documentation, Information & Knowledge 40: 97–106, 156. [ Google Scholar ] [ CrossRef ]
  • Han, Bingyi, Sadia Nawaz, George Buchanan, and Dana McKay. 2023. Ethical and Pedagogical Impacts of AI in Education. In Artificial Intelligence in Education . Edited by Ning Wang, Genaro Rebolledo-Mendez, Noboru Matsuda, Olga Santos and Vania Dimitrova. Lecture Notes in Computer Science. Cham: Springer, pp. 667–73. [ Google Scholar ] [ CrossRef ]
  • Hassoulas, Athanasios, Ned Powell, Lindsay Roberts, Katja Umla-Runge, Laurence Gray, and Marcus J. Coffey. 2023. Investigating marker accuracy in differentiating between university scripts written by students and those produced using ChatGPT. Journal of Applied Learning and Teaching 6: 71–77. [ Google Scholar ] [ CrossRef ]
  • Hernández-Leo, Davinia. 2023. ChatGPT and Generative AI in Higher Education: User-Centered Perspectives and Implications for Learning Analytics. CEUR Workshop Proceedings , 1–6. Available online: https://ceur-ws.org/Vol-3542/paper2.pdf (accessed on 7 January 2024).
  • Hidayat-ur-Rehman, Imdadullah, and Yasser Ibrahim. 2023. Exploring factors influencing educators’ adoption of ChatGPT: A mixed method approach. Interactive Technology and Smart Education . ahead-of-print . [ Google Scholar ] [ CrossRef ]
  • Ilieva, Galina, Tania Yankova, Stanislava Klisarova-Belcheva, Angel Dimitrov, Marin Bratkov, and Delian Angelov. 2023. Effects of Generative Chatbots in Higher Education. Information 14: 492. [ Google Scholar ] [ CrossRef ]
  • Javaid, Mohd, Abid Haleem, Ravi Pratap Singh, Shahbaz Khan, and Haleem Ibrahim. 2023. Unlocking the opportunities through ChatGPT Tool towards ameliorating the education system. Bench Council Transactions on Benchmarks, Standards and Evaluations 3: 100115. [ Google Scholar ] [ CrossRef ]
  • Kaplan-Rakowski, Regina, Kimberly Grotewold, Peggy Hartwick, and Kevin Papin. 2023. Generative AI and Teachers’ Perspectives on Its Implementation in Education. Journal of Interactive Learning Research 34: 313–38. Available online: https://www.learntechlib.org/primary/p/222363/ (accessed on 7 January 2024).
  • Karunaratne, Thashmee, and Adenike Adesina. 2023. Is it the new Google: Impact of ChatGPT on Students’ Information Search Habits. Paper presented at the European Conference on e-Learning (ECEL 2023), Pretoria, South Africa, October 26–27; pp. 147–55. [ Google Scholar ] [ CrossRef ]
  • Kelly, Andrew, Miriam Sullivan, and Katrina Strampel. 2023. Generative artificial intelligence: University student awareness, experience, and confidence in use across disciplines. Journal of University Teaching & Learning Practice 20: 12. [ Google Scholar ] [ CrossRef ]
  • Kohnke, Lucas, Benjamin Luke Moorhouse, and Di Zou. 2023. Exploring generative artificial intelligence preparedness among university language instructors: A case study. Computers and Education: Artificial Intelligence 5: 100156. [ Google Scholar ] [ CrossRef ]
  • Laker, Lauren, and Mark Sena. 2023. Accuracy and detection of student use of ChatGPT in business analytics courses. Issues in Information Systems 24: 153–63. [ Google Scholar ] [ CrossRef ]
  • Lemke, Claudia, Kathrin Kirchner, Liadan Anandarajah, and Florian Herfurth. 2023. Exploring the Student Perspective: Assessing Technology Readiness and Acceptance for Adopting Large Language Models in Higher Education. Paper presented at the European Conference on e-Learning, (ECEL 2023), Pretoria, South Africa, October 26–27; pp. 156–64. [ Google Scholar ] [ CrossRef ]
  • Limna, Pongsakorn, Tanpat Kraiwanit, Kris Jangjarat, and Prapasiri Klayklung. 2023a. The use of ChatGPT in the digital era: Perspectives on chatbot implementation. Journal of Applied Learning and Teaching 6: 64–74. [ Google Scholar ] [ CrossRef ]
  • Limna, Pongsakorn, Tanpat Kraiwanit, Kris Jangjarat, and Yarnaphat Shaengchart. 2023b. Applying ChatGPT as a new business strategy: A great power comes with great responsibility [Special issue]. Corporate & Business Strategy Review 4: 218–26. [ Google Scholar ] [ CrossRef ]
  • Lopezosa, Carlos, Carles Lluís Codina, Carles Pont-Sorribes, and Mari Vállez. 2023. Use of Generative Artificial Intelligence in the Training of Journalists: Challenges, Uses and Training Proposal. Profesional De La información Information Professional 32: 1–12. [ Google Scholar ] [ CrossRef ]
  • Martineau, Kim. 2023. What Is Generative AI? IBM Research Blog . April 20. Available online: https://research.ibm.com/blog/what-is-generative-AI (accessed on 7 January 2024).
  • Mondal, Himel, Shaikat Mondal, and Indrashis Podder. 2023. Using ChatGPT for Writing Articles for Patients’ Education for Dermatological Diseases: A Pilot Study. Indian Dermatology Online Journal 14: 482–86. [ Google Scholar ] [ CrossRef ]
  • Moorhouse, Benjamin, Marie Alina Wan, and Yuwei Wan. 2023. Generative AI tools and assessment: Guidelines of the world’s top-ranking universities. Computers and Education Open 5: 100151. [ Google Scholar ] [ CrossRef ]
  • Overono, Acacia L., and Annie Ditta. 2023. The Rise of Artificial Intelligence: A Clarion Call for Higher Education to Redefine Learning and Reimagine Assessment. College Teaching , 1–4. [ Google Scholar ] [ CrossRef ]
  • Page, Matthew J., Joanne E. McKenzie, Patrick M. Bossuyt, Isabelle Boutron, Tammy C. Hoffmann, Cynthia D. Mulrow, Larissa Shamseer, Jennifer M. Tetzlaff, Elie A. Akl, Sue E. Brennan, and et al. 2021. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 372: n71. [ Google Scholar ] [ CrossRef ]
  • Pechenkina, Ekaterina. 2023. Artificial intelligence for good? Challenges and possibilities of AI in higher education from a data justice perspective. In Higher Education for Good: Teaching and Learning Futures . Edited by Laura Czerniewicz and Catherine Cronin. Cambridge, UK: Open Book Publishers, pp. 239–66. [ Google Scholar ] [ CrossRef ]
  • Perkins, Mike, Jasper Roe, Darius Postma, James McGaughran, and Don Hickerson. 2023. Detection of GPT-4 Generated Text in Higher Education: Combining Academic Judgement and Software to Identify Generative AI Tool Misuse. Journal of Academic Ethics 22: 89–113. [ Google Scholar ] [ CrossRef ]
  • Pitso, Teboho. 2023. Post-COVID-19 Higher Learning: Towards Telagogy, A Web-Based Learning Experience. IAFOR Journal of Education 11: 39–59. [ Google Scholar ] [ CrossRef ]
  • Plata, Sterling, Maria Ana De Guzman, and Arthea Quesada. 2023. Emerging Research and Policy Themes on Academic Integrity in the Age of Chat GPT and Generative AI. Asian Journal of University Education 19: 743–58. [ Google Scholar ] [ CrossRef ]
  • Rudolph, Jürgen, Samson Tan, and Shannon Tan. 2023a. War of the chatbots: Bard, Bing Chat, ChatGPT, Ernie and beyond. The new AI gold rush and its impact on higher education. Journal of Applied Learning and Teaching 6: 364–89. [ Google Scholar ] [ CrossRef ]
  • Rudolph, Jürgen, Samson Tan, and Shannon Tan. 2023b. ChatGPT: Bullshit spewer or the end of traditional assessments in higher education? Journal of Applied Learning and Teaching 6: 342–63. [ Google Scholar ] [ CrossRef ]
  • Ryall, Adelle, and Stephen Abblitt. 2023. “A Co-Pilot for Learning Design?”: Perspectives from Learning Designers on the Uses, Challenges, and Risks of Generative Artificial Intelligence in Higher Education. In People, Partnerships and Pedagogies. Proceedings ASCILITE 2023 . Edited by Thomas Cochrane, Vickel Narayan, Cheryl Brown, MacCallum Kathryn, Elisa Bone, Christopher Deneen, Robert Vanderburg and Brad Hurren. Christchurch: Te Pae Conference Center, pp. 525–30. [ Google Scholar ] [ CrossRef ]
  • Santiago, Cereneo S., Steve I. Embang, Ricky B. Acanto, Kem Warren P. Ambojia, Maico Demi B. Aperocho, Benedicto B. Balilo, Erwin L. Cahapin, Marjohn Thomas N. Conlu, Samson M. Lausa, Ester Y. Laput, and et al. 2023. Utilization of Writing Assistance Tools in Research in Selected Higher Learning Institutions in the Philippines: A Text Mining Analysis. International Journal of Learning, Teaching and Educational Research 22: 259–84. [ Google Scholar ] [ CrossRef ]
  • Solopova, Veronika, Eiad Rostom, Fritz Cremer, Adrian Gruszczynski, Sascha Witte, Chengming Zhang, Fernando Ramos López, Lea Plößl, Florian Hofmann, Ralf Romeike, and et al. 2023. PapagAI: Automated Feedback for Reflective Essays. In KI 2023: Advances in Artificial Intelligence. KI 2023 . Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Cham: Springer, vol. 14236, pp. 198–206. [ Google Scholar ] [ CrossRef ]
  • Sridhar, Pragnya, Aidan Doyle, Arav Agarwal, Christopher Bogart, Jaromir Savelka, and Majd Sakr. 2023. Harnessing LLMs in Curricular Design: Using GPT-4 to Support Authoring of Learning Objectives. CEUR Workshop Proceedings 3487: 139–50. [ Google Scholar ]
  • Sullivan, Miriam, Andrew Kelly, and Paul McLaughlan. 2023. ChatGPT in higher education: Considerations for academic integrity and student learning. Journal of Applied Learning and Teaching 6: 31–40. [ Google Scholar ] [ CrossRef ]
  • Tominc, Polona, and Maja Rožman. 2023. Artificial Intelligence and Business Studies: Study Cycle Differences Regarding the Perceptions of the Key Future Competences. Education Sciences 13: 580. [ Google Scholar ] [ CrossRef ]
  • van den Berg, Geesje, and Elize du Plessis. 2023. ChatGPT and Generative AI: Possibilities for Its Contribution to Lesson Planning, Critical Thinking and Openness in Teacher Education. Education Sciences 13: 998. [ Google Scholar ] [ CrossRef ]
  • Walczak, Krzysztof, and Wojciech Cellary. 2023. Challenges for higher education in the era of widespread access to Generative AI. Economics and Business Review 9: 71–100. [ Google Scholar ] [ CrossRef ]
  • Wang, Ting, Brady D. Lund, Agostino Marengo, Alessandro Pagano, Nishith Reddy Mannuru, Zoë A. Teel, and Jenny Pange. 2023. Exploring the Potential Impact of Artificial Intelligence (AI) on International Students in Higher Education: Generative AI, Chatbots, Analytics, and International Student Success. Applied Sciences 13: 6716. [ Google Scholar ] [ CrossRef ]
  • Watermeyer, Richard, Lawrie Phipps, Donna Lanclos, and Cathryn Knight. 2023. Generative AI and the Automating of Academia. Postdigital Science and Education 6: 446–66. [ Google Scholar ] [ CrossRef ]
  • Wolf, Leigh, Tom Farrelly, Orna Farrell, and Fiona Concannon. 2023. Reflections on a Collective Creative Experiment with GenAI: Exploring the Boundaries of What is Possible. Irish Journal of Technology Enhanced Learning 7: 1–7. [ Google Scholar ] [ CrossRef ]
  • Yilmaz, Ramazan, and Fatma Gizem Karaoglan Yilmaz. 2023. The effect of generative artificial intelligence (AI)-based tool use on students’ computational thinking skills, programming self-efficacy and motivation. Computers and Education: Artificial Intelligence 4: 100147. [ Google Scholar ] [ CrossRef ]
  • Zawiah, Mohammed, Fahmi Y. Al-Ashwal, Lobna Gharaibeh, Rana Abu Farha, Karem H. Alzoubi, Khawla Abu Hammour, Qutaiba A. Qasim, and Fahd Abrah. 2023. ChatGPT and Clinical Training: Perception, Concerns, and Practice of Pharm-D Students. Journal of Multidisciplinary Healthcare 16: 4099–110. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Selected Group of StudentsStudents Who Answered the Questionnaire
MFMF
1st year595342
2nd year365294
1st year393242
2nd year212152
CountryN.CountryN.CountryN.CountryN.
Australia16Italy2Egypt1South Korea1
United States7Saudi Arabia2Ghana1Sweden1
Singapore5South Africa2Greece1Turkey1
Hong Kong4Thailand2India1United Arab Emirates1
Spain4Viet Nam2Iraq1Yemen1
United Kingdom4Bulgaria1Jordan1
Canada3Chile1Malaysia1
Philippines3China1Mexico1
Germany2Czech Republic1New Zealand1
Ireland2Denmark1Poland1
CountryN.CountryN.CountryN.CountryN.
Singapore271United States15India2Iraq0
Australia187Italy11Turkey2Jordan0
Hong Kong37United Kingdom6Denmark1Poland0
Thailand33Canada6Greece1United Arab Emirates0
Philippines31Ireland6Sweden1Yemen0
Viet Nam29Spain6Saudi Arabia1
Malaysia29South Africa6Bulgaria1
South Korea29Mexico3Czech Republic0
China17Chile3Egypt0
New Zealand17Germany2Ghana0
CategoriesSubcategoriesNr. of DocumentsReferences
HE with Gen AI 15 ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ).
15 ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ).
14 ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ).
8 ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ).
Assessment in Gen AI/ChatGPT times 8 ( ); ( ); ( ); ( ); ( ); ( ); ( ); ( ).
New challenges to academic integrity policies 4 ( ); ( ); ( ); ( ).
Have You Tried Using a Gen AI Tool?Nr.%
Yes5246.4%
No6053.6%
Categories and Subcategories%Unit of Analysis (Some Examples)
1. Learning support:
1.1. Helpful to solve doubts, to correct errors34.6%
1.2. Helpful for more autonomous and self-regulated learning19.2%
2. Helpful to carry out the academic assignments/individual or group activities17.3%
3. Facilitates research/search processes
3.1. Reduces the time spent with research13.5%
3.2. Makes access to information easier9.6%
4. Reduction in teachers’ workload3.9%
5. Enables new teaching methods1.9%
Categories and Subcategories%Unit of Analysis (Some Examples)
1. Harms the learning process:
1.1. What is generated by Gen AI has errors13.5%
1.2. Generates dependence and encourages laziness15.4%
1.3. Decreases active effort and involvement in the learning/critical thinking process28.8%
2. Encourages plagiarism and incorrect assessment procedures17.3%
3. Reduces relationships with teachers and interpersonal relationships9.6%
4. No negative effect—as it will be necessary to have knowledge for its correct use7.7%
5. Don’t know7.7%
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Saúde, S.; Barros, J.P.; Almeida, I. Impacts of Generative Artificial Intelligence in Higher Education: Research Trends and Students’ Perceptions. Soc. Sci. 2024 , 13 , 410. https://doi.org/10.3390/socsci13080410

Saúde S, Barros JP, Almeida I. Impacts of Generative Artificial Intelligence in Higher Education: Research Trends and Students’ Perceptions. Social Sciences . 2024; 13(8):410. https://doi.org/10.3390/socsci13080410

Saúde, Sandra, João Paulo Barros, and Inês Almeida. 2024. "Impacts of Generative Artificial Intelligence in Higher Education: Research Trends and Students’ Perceptions" Social Sciences 13, no. 8: 410. https://doi.org/10.3390/socsci13080410

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • R Soc Open Sci
  • v.9(3); 2022 Mar

Strategies that reduce Stroop interference

2 Sackler Centre for Consciousness Science, University of Sussex, Pevensey Building 1, North South Road, Brighton, East Sussex BN1 9QH, UK

B. A. Parris

3 Department of Psychology, University of Bournemouth, Poole, UK

A. F. Collins

1 School of Psychology, University of Sussex, Brighton, UK

Associated Data

  • Palfi B, Parris BA, Collins AF, Dienes Z. 2022. Strategies that reduce stroop interference. Figshare .

The materials, the data and the analysis script of the experiments can be retrieved from https://osf.io/6a58r .

The data are provided in electronic supplementary material [ 92 ].

A remarkable example of reducing Stroop interference is provided by the word blindness post-hypnotic suggestion (a suggestion to see words as meaningless during the Stroop task). This suggestion has been repeatedly demonstrated to halve Stroop interference when it is given to highly hypnotizable people. In order to explore how highly hypnotizable individuals manage to reduce Stroop interference when they respond to the word blindness suggestion, we tested four candidate strategies in two experiments outside of the hypnotic context. A strategy of looking away from the target words and a strategy of visual blurring demonstrated compelling evidence for substantially reducing Stroop interference in both experiments. However, the pattern of results produced by these strategies did not match those of the word blindness suggestion. Crucially, neither looking away nor visual blurring managed to speed up incongruent responses, suggesting that neither of these strategies is the likely underlying mechanism of the word blindness suggestion. Although the current results did not unravel the mystery of the word blindness suggestion, they showed that there are multiple voluntary ways through which participants can dramatically reduce Stroop interference.

1.  Introduction

An essential feature of the human cognitive system is its ability to attend to and use goal-related stimuli while ignoring the distractors of the environment. The Stroop task ([ 1 ]; for a review see [ 2 ]) provides a window into selective attention and since its publication, it has inspired many theories of attention and cognitive control [ 3 – 7 ]. This task requires participants to name the displayed colour of the presented words while they disregard the meaning of the words. People produce the quickest responses on congruent trials in which the meaning of the presented word is in accordance with its displayed colour (e.g. RED displayed in red ), followed by the neutral trials in which the meaning of the presented words is unrelated to colours (e.g. LOT displayed in red ). The slowest response times (RTs) can be observed on incongruent trials where the displayed colour and the meaning of the words are not in harmony (e.g. RED displayed in blue ). Performance on the task can be assessed by computing RT differences between these experimental conditions. The Standard Stroop effect is the RT difference between incongruent and congruent trials, and it can be broken down into two components; namely, the Stroop interference effect, which is the RT difference of incongruent and neutral trials, and the Stroop facilitation effect, which is the RT difference of the neutral and congruent trials.

The Stroop effect is remarkably large, and many report experiencing cognitive conflict during an incongruent trial [ 2 ]. A long line of research has demonstrated that the presence of the Stroop effect is very robust; it persists despite long-term training (e.g. [ 8 ]), and bringing it under control through the application of deliberate strategies is difficult [ 2 ]. While methods have been reported that result in reduced Stroop effects [ 9 – 12 ] all involve a manipulation of the stimulus context (e.g. colouring a single letter instead of all letters or decreasing the response–stimulus interval) so as to provide exogenous support to control mechanisms, and are thus not likely the consequence of deliberate, top-down control. Even financial rewards offered to increase motivation to perform well result in either possibly no effect on reaction times other than a general speeding up on all trial types [ 13 , 14 ] or only small (approx. 10 ms) reductions of the Stroop effect [ 13 ].

One of the few exceptions to the robustness of the Stroop effect may be provided by the word blindness post-hypnotic suggestion [ 15 , 16 ]. When the word blindness suggestion, a suggestion to see the words during the Stroop task as gibberish or meaningless characters, is given to highly hypnotizable people (henceforth highs), they can substantially reduce the Stroop effect compared with a standard, no suggestion condition. This finding has been replicated by the original authors as well as across independent laboratories (e.g. [ 16 – 21 ]). The magnitude of Stroop interference in the suggestion condition is roughly half the size of the effect in the no suggestion condition (for a meta-analysis, see table 1 of [ 22 ]). By contrast, the influence of the word blindness effect on the facilitation component of the Stroop effect (neutral RT minus congruent RT) appears to be more volatile. Importantly, responding to the suggestion speeds up RTs during incongruent trials compared with the no suggestion condition, as compared with the control group of low hypnotizable people. Hence, the effect is an interesting use of cognitive control that is not produced simply by holding back and so slowing down on neutral and congruent trials (thereby equalizing RTs on all trials; [ 23 ]).

Thus, the question arises: what exactly happens when highs respond to this post-hypnotic suggestion? Many of the theories of hypnosis concur that responding to a hypnotic suggestion involves top-down cognitive control processes and that the feeling of involuntariness, which is the central feature of the hypnotic phenomena [ 24 , 25 ], is only the result of a deteriorated or relinquished metacognition ([ 26 ,– 30 ]; for a review see [ 31 ]). 1 Perhaps, the simplest theory of hypnosis (i.e. operates with the fewest assumptions) is the cold control theory, which takes reduced metacognition as the fundamental process of hypnotic responding. Specifically, it asserts that hypnotic responding is implemented by intentional control. Subjects intentionally engage in perceptual or cognitive strategies to create the experiences described in the suggestion while they are able to alter their monitoring over their intentions and make themselves believe that they are not acting deliberately [ 27 , 36 , 37 ]. The theory draws on the higher order thought (HOT) theories of consciousness [ 38 , 39 ], according to which a mental state becomes conscious by virtue of a higher order state referring to it. For instance, to create the experience of a buzzing mosquito, one can form the following first-order intention: ‘imagine a buzzing mosquito'. To be aware that one is engaged in imagination, one would need a second-order state that refers to the first-order state (i.e. ‘I intend to imagine a buzzing mosquito'). One can also create the experience of this noise without being aware of the first-order intention (i.e. ‘I do not intend to imagine a buzzing mosquito'), and in that case it would feel as if it happened by itself, akin to the experience of hallucination. Importantly, this experience of involuntariness is what hypnotic subjects report about their behaviour when they respond to suggestions. Taken together, according to cold control theory, responding to a suggestion consists of engaging in a strategy to produce the experience described in the suggestion without being aware of using a strategy. From this assumption, it follows that the sole difference between a hypnotic and a non-hypnotic response is the form of the accompanying second-order state. Therefore, if one is capable of reducing the Stroop interference effect by responding to the word blindness suggestion, one should be able to do it by voluntary, non-hypnotic means as well, using the very same strategy that they used when they responded to the suggestion. Identifying such a strategy is central to cold control theory and to simple, metacognitive explanations of hypnosis, as the lack of a clear explanation involving intentional actions invites more complex theories to address the word blindness suggestion.

We review four unique strategies here that have the potential to be regarded as an underlying mechanism of the word blindness suggestion. Some of these strategies were reported to be spontaneously used by highs (and lows) when they undertook the Stroop task outside of the hypnotic context [ 40 ], and some are simple strategies that have the face validity to be able to reduce Stroop interference. The most straightforward candidate is the looking-away strategy. Subjects may divert their attention from the word so that they can easily process the colour but not the meaning of the word, which can result in a reduced interference. Indeed, it has been demonstrated that lows can reduce the Stroop interference by diverting their attention from the words [ 19 ]. However, Raz et al . [ 15 , 19 ] argued that it is unlikely that highs engage in this strategy when they respond to the suggestion. For example, subjects first reported that they observed words in all instances and that they claimed that they did not engage in any spatial attentionally related strategies. Second, the experimental sessions were videotaped, and independent judges were unable to distinguish between highs and lows based on their eye-movement patterns. Nonetheless, these arguments are far from bulletproof. As stated earlier, it is the essence of hypnosis that when subjects respond hypnotically, they can engage in strategies without being aware of doing so [ 27 , 30 ]; hence, asking them whether they used any strategies may not be a sensitive way to explore the underlying mechanism of the suggestion. Moreover, human judges may not be able to observe eye-movement patterns, and thus an objective criterion based on, for instance, the fixation time outside of the area of interest defined around the words could provide a more severe test of the strategy.

A more subtle form of the looking-away strategy is when subjects focus their attention toward a single letter, or a portion of a letter so that they can more easily name the font colour. There is ample evidence that colouring only the last or the first letter of a Stroop word compared with the middle letter decreases the size of the Stroop interference effect ([ 10 , 12 , 41 ]; for a review see [ 42 ]). Moreover, highs can respond more quickly during incongruent trials when this strategy is provided in a hypnotic context ([ 40 ]; cf. [ 43 ]). Nonetheless, the Sheehan et al . [ 40 ] study lacked a non-hypnotic strategy condition; hence, it is unclear whether the inclusion of hypnosis in the strategy condition increased the motivation and expectations of highs compared with the non-hypnotic baseline condition. Thus, the lack of appropriate control could create a ‘hold-back' effect [ 30 , 44 ] in the non-hypnotic baseline condition as a way of satisfying demand characteristics.

Another visually related strategy is blurring. Subjects may adjust visual accommodation (e.g. by relaxing of the muscles around their eyes) so that the image of the word does not fall directly on the retina. Blurring may prioritize the colour of the word over the meaning. Raz et al . [ 19 ] provided a test of this strategy by administering a pharmacological agent to highs to disrupt visual accommodation (in other words, induce the state of cycloplegia). The subjects were exposed to two drops of 1% cyclopentolate hydrochloride and their vision was corrected by lenses so that they saw the words clearly during the Stroop task. Remarkably, highs still decreased the Stroop interference effect when they responded to the suggestion compared with the no suggestion condition. One might, therefore, conclude that highs achieved the reduction by means other than visual blurring. However, this conclusion is conditional on the participants being in a state of complete cycloplegia. There was no outcome neutral test examining whether the participants had completely lost their ability to accommodate vision. The authors point out that residual accommodation can still occur, especially for younger participants, when this particular agent is used.

Finally, there is evidence that subjects spontaneously resort to a strategy that involves the rehearsal of the task instructions, such as the word ‘colour' [ 40 ]. Goal maintenance has been shown to play a critical role in task performance in the Stroop task, therefore, a strategy that sustains an active goal representation might help participants mitigate Stroop interference [ 9 , 45 , 46 ].

The purpose of this project is to explore the underlying mechanism or mechanisms of the word blindness suggestion by testing whether any of these four strategies (looking-away, visual blurring, single-letter focus and goal-maintenance) could be one that highs use when they respond to the suggestion. 2 The test of these strategies is especially relevant to the cold control theory of hypnosis, as it expects that suggestions are implemented by intentional actions, but it also has the potential to further our understanding of the Stroop task and cognitive control. To test the efficiency of the strategies, we designed a fully within-subjects experiment in which participants undertook the Stroop task in five separate blocks: in four blocks they were explicitly asked to use one of the mentioned strategies and in one block they were told to not use any of these strategies (baseline/control condition). According to the cold control theory, if a strategy can be applied hypnotically to reduce the Stroop effect, it should be equally available and applicable non-hypnotically. Hence, the experiment was administered outside of the hypnotic context; in fact, no reference was made to hypnosis or to the word blindness suggestion. The key tests were whether each strategy could reduce Stroop interference, and whether the reduction happens via speeding up RTs of incongruent trials. We did not define a key test involving Stroop facilitation as the effect of the word blindness suggestion on this component is unclear [ 22 ]. In order to allow for the comparison of our results and the results of earlier studies demonstrating the word blindness effect, the stimuli and design (e.g. manual version of the Stroop task) of our experiments were largely the same as those of the original study of Raz et al . [ 15 ].

As a secondary analysis, we tested whether the efficiency of a specific strategy is related to hypnotizability beyond the effect of expectations and motivations conditional on a hypnotic context. Cold control theory postulates that individual differences in hypnotizability are grounded in differential metacognitive skills (which may or may not be limited to the domain of intentions) and not in differential cognitive control. Empirical evidence is in harmony with this assumption [ 48 – 50 ]. Consequently, lows and mediums should be able to use a specific strategy just as efficiently as highs, when they are sufficiently motivated. This assumption also has practical relevance to the current study, as it implies that to test the strategies recruitment does not need to be limited to highs only. Nonetheless, if the results reveal a positive relationship between hypnotizability and strategy efficiency outside of the hypnotic context, the purely metacognitive account of hypnosis would need to be revised, and the plausible strategies would need to be tested only on highs. To exclude the effect of expectations and motivations regarding hypnosis, we recruited participants from a subject pool where the majority of the people had already been screened for hypnotizability, so that we would not need to disclose the hypothesis to the participants. Consent to link results to hypnotizability scores was acquired after the experiment; therefore, it is unlikely that they could associate the current experiment in any way with hypnosis or hypnotizability.

2.  Experiment 1

2.1. methods, 2.1.1. participants.

We recruited 78 participants from which 57 (mean age = 19.61, s.d. = 1.47, females = 51) had been screened for hypnotizability with the Sussex-Waterloo Scale of Hypnotizability (SWASH; [ 51 ]). As we specified in the pre-registration, we excluded the data of those who did not have a SWASH score from all of the analyses. The experiment was advertised for first- and second-year psychology students of the University of Sussex who finished a module earlier in which they had the opportunity to participate in a hypnosis screening session. High and low hypnotizability were defined as scoring in the top and bottom 15% of the SWASH, respectively. We calculated the cut-off a priori based on the composite (objective and subjective) SWASH scores of all the first- and second-year students in our database. The cut-off for highs was 5.35 whereas the cut-off for lows was 2.00 (on a scale of 0 to 10). From the 57 participants, 10 were high, 39 medium and 8 low hypnotizables. The participants were proficient readers of English and they attended the experiment in exchange for course credits. All participants gave their informed consent before the experiment as well as after the experiment when we revealed that we wished to correlate their performance with their hypnotizability scores. The Ethical Committee of the University of Sussex approved the study (ER/BP210/5).

2.1.2. Stimuli and apparatus

The current stimuli closely followed those used by Raz et al. [ 15 ] for the purpose of comparability. The stimulus set included four types of colour words (RED, BLUE, GREEN and YELLOW) and four types of neutral words (LOT, SHIP, KNIFE and FLOWER). The congruent trials consisted of colour words presented in colours matching the meaning of the words (e.g. RED in the colour red). The incongruent trials were colour words displayed in colours mismatching the meaning of the word (e.g. RED in the colour blue) covering all possible pairings of presented colours and meanings. The colour and the neutral words were frequency and length matched. All words were written in upper-case font and presented against a white background. The words were presented in the following hex colour codes: #ff0000 (red), #0000ff (blue), #008000 (green) and #ffef36 (yellow). The vertical visual angle of the stimuli was 0.5°, while the horizontal visual angle of the stimuli lie between 1.3° and 1.9° depending on the length of the word. The distance between the participants' eyes and the computer screen was approximately 65 cm. The response keys used in the experiment were ‘V', ‘B', ‘N' and ‘M' for the colours red, blue, green and yellow, respectively. The keyboard buttons were not colour labelled (note that Raz et al . [ 15 ] used colour labels; however, we did not provide these visual aids to control for a potential colour-matching strategy). The experiment was produced in and run by the software OpenSesame [ 52 ] on a computer with a screen resolution of 1366 × 768 (15.6-inch screen).

2.1.3. Design and procedure

The study had a 3 × 5 × 3 mixed design with the independent variables of the congruency type of the trial (congruent versus neutral versus incongruent), the strategy used in the conditions (no strategy, looking away, blurring, single-letter focus, goal-maintenance) and hypnotizability (low, medium or high). 3 The proportion of congruent, neutral and incongruent trials was equal (33%) in each. The order of conditions as well as the order of the Stroop trials (144 per condition) were randomized across participants.

The experiment took place in a dimly lit room with the experimenter present and only one participant at a time. The participants were told that they will undertake the Stroop task several times and, in some cases, they will be provided with explicit instructions to use a specific strategy to help them with the task. After providing their informed consent to the study, the participants engaged in a practice Stroop task (36 trials). The participants were instructed to place their left middle finger on ‘V', left index finger on ‘B', right index finger on ‘N' and right middle finger on ‘M' while undertaking the Stroop task. They were asked to respond to the colour of the word on the screen as quickly and as accurately as they can. The participants were instructed to focus on the fixation cross and retain their focus on the centre of the screen during the Stroop task. After 1500 ms, the fixation cross was replaced by one of the Stroop words and remained on the screen until a response was given or for 2000 ms. Finally, a feedback (CORRECT or INCORRECT) flashed in black on the screen and then a new trial started with the fixation cross. The response to stimulus interval was 2000 ms. This sequence remained constant across all conditions.

Next, the participants undertook the five experimental conditions. The order of the conditions was randomly generated for each participant. In the no strategy condition, the participants were asked to not use any of the mentioned strategies, and to respond as fast and as accurately as they could. All strategy conditions started with a screen explaining the strategy they are asked to use on each trial. For the visual strategies, an example word was presented so that the participants could practice the strategy (see the appendix for exact instructions). Before the start of the condition, the experimenter asked the participants whether they had understood how to use the strategy and provided clarification on request. After each strategy condition, the participants were asked to report the percentage of the trials on which they managed to use the strategy. (What do you think, on what percentage of the trials did you use the strategy? Please answer with a number between 0 and 100.) After finishing the last condition, the participants were thanked and debriefed.

2.2. Data analysis

2.2.1. statistical analyses.

We conducted all of our analyses with the statistical software R 3.3.1 [ 53 ]. We calculated difference scores for the RTs so that we were able to directly test all of our hypotheses with Bayesian paired t -tests (comparing two conditions or testing whether a regression slope is different from zero) or Bayesian independent t -tests. Note that we did not run any omnibus tests (e.g. F test including all five conditions at a time) as it would not be informative in respect of hypotheses of the current project. We reported p -values for each statistical test, but we used the Bayes factor (B) to draw conclusions.

2.2.2. Bayes factor

We applied the R script of Dienes & McLatchie [ 54 ] to calculate the Bayes factors. This calculator has a t -distribution as a likelihood function for the data as well as for the model of H1. We set the degrees of freedom of the model of H1 to 10 000 in each analysis to have a likelihood function for the theory following a normal distribution. To calculate the B, one also needs to specify the prediction of the two models (H1 and H0) under comparison. Every tested hypothesis had directional prediction; hence, we applied a half normal distribution with a mode of zero to model the predictions of H1. We specified the distribution as a half-normal since it is in line with the assumption that smaller effects are more probable than larger effects [ 55 ]. We report Bs as B H(0,X) , in which H indicates that the model is half-normal, the first parameter (0) indicates the mode of the distribution and the second parameter (X) represents the s.d. of the distribution. We used various strategies to define the s.d.s of the different H1s.

Concerning the outcome neutral tests of the Stroop interference and the Stroop effects, we informed the s.d. of the models predicting these effects based on the results of the baseline condition of a recent study of ours that used identical Stroop materials [ 47 ]. That is, the s.d. of the models of the Stroop interference and Stroop effects were 60 and 105 ms, respectively. For the critical analysis, testing the efficiency of the strategies, we used 30 ms, which is half of the baseline Stroop interference. This value is based on the finding that the word blindness suggestion usually halves the baseline Stroop interference and we expect that a successful strategy should produce about the same effect size [ 22 ]. Incidentally, this value is exactly the same as we would obtain by using the room-to-move heuristic to define the maximum possible effect size, provided that the baseline Stroop interference is 60 ms [ 56 ]. The s.d. of the model predicting a positive relationship between hypnotizability and reduction in Stroop interference by strategy application was 5 ms, and it was based on the findings of Parris & Dienes [ 18 ], who demonstrated a positive link between hypnotizability and the imaginative word blindness effect. In other words, H1 predicts that one unit increase on the SWASH aids the ability to reduce the Stroop interference using one of the strategies with about 5 ms.

In order to draw conclusions about the compared models, we used the convention of B > 3 to distinguish between insensitive and good enough evidence for the alternative hypotheses [ 57 ]. By symmetry, we used the cut-off of B < 1/3 to identify good enough evidence for the null hypothesis. To evaluate the robustness of our Bayesian conclusions to the s.d.s of the H1 models, we report a robustness region for each B, providing the range of s.d.s of the half-normal models that qualitatively support the same conclusion (using the threshold of 3 for moderate evidence for H1 and ⅓ for moderate evidence for H0) as the chosen s.d. [ 56 , 58 . The robustness regions are reported as: RRconclusion x1 x2] where x1 is the smallest and x2 is the largest s.d. that gives the same conclusion: B < 1/3, 1/3 < B < 3, B > 3.

2.3. Pre-registration

The design and analysis plan of this experiment was pre-registered at https://osf.io/4z3xu . We closely followed the steps of the pre-registration when running the experiment and the analysis. Nonetheless, we added an analysis to the set of the crucial tests (Crucial test 1): the test of the efficiency of the strategies with all participants who had SWASH scores. This analysis is critical to demonstrate whether or not there is a main effect of successful strategy application irrespective of the participants' hypnotizability.

2.4. Results

2.4.1. data processing.

We excluded the trials with errors from the analyses (8.2% in total, of which 1.3% were from the no strategy, 2.1% from the looking away, 1.6% from the blurring, 1.7% from the single letter focus and 1.5% from the goal-maintenance conditions). 4 Following the outlier exclusion criterion of Raz et al . [ 15 ], we omitted trials with RTs that were three standard deviations either above or below the mean. The proportions of outliers were low and comparable across conditions (we excluded 1.2% of the correct trials, of which 0.2% were from the no strategy, 0.3% from the looking away, 0.3% from the blurring, 0.2% from the single letter focus and 0.2% from the goal-maintenance conditions).

2.4.2. Outcome neutral checks 1 (non-preregistered): on what percentage of the trials did the participants use the strategies?

The conditions in descending order based on the means of the reported percentages of strategy usage: goal-maintenance ( M = 86%, 95% CI [82%, 90%]); looking away ( M = 83%, 95% CI [80%, 87%]); blurring ( M = 73%, 95% CI [68%, 78%]); and single-letter focus conditions ( M = 66%, 95% CI [61%, 71%]).

2.4.3. Outcome neutral tests 2: is there a Stroop interference effect in the no strategy condition?

As anticipated, the RTs in the no strategy condition were the fastest in the congruent trials followed by the neutral trials and then the incongruent trials ( table 1 for condition means and s.d.s). The comparison of the incongruent and neutral trials yielded evidence for the Stroop interference effect ( t 56 = 7.74, p < 0.001, M diff = 78 ms, d z = 1.03, B H(0,60) = 1.49 × 10 8 , RR B > 3 [3, 2.76 × 10 4 ]). The contrast of the incongruent and congruent trials revealed evidence in support of the Stroop effect ( t 56 = 11.73, p < 0.001, M diff = 126 ms, d z = 1.55, B H(0,105) = 2.23 × 10 14 , RR B > 3 [4, 4.62 × 10 4 ]).

Summary table about the means of the RTs (ms) in the five strategy conditions. Note: the standard deviations (s.d.) of the means are shown within the brackets.

strategy conditioncongruency type
incongruentneutralcongruent
no strategy808 (127)730 (101)682 (94)
looking-away815 (94)802 (94)771 (97)
blurring821 (121)776 (119)739 (114)
single-letter focus880 (157)812 (133)766 (130)
goal-maintenance804 (142)726 (107)689 (90)

2.4.4. Crucial test 1 (non-preregistered): are the strategies effective in reducing the Stroop interference effect?

Using the data of all the participants we tested whether any of the four strategies decreased Stroop interference (incongruent RTs—neutral RTs). Comparing the no strategy and strategy conditions revealed evidence for the effectiveness of the looking-away ( t 56 = 4.99, p < 0.001, M diff = 65 ms, d z = 0.66, B H(0,30) = 3.93 × 10 3 , RR B > 3 [5, 2.05 × 10 4 ]) and the blurring ( t 56 = 2.85, p = 0.006, M diff = 33 ms, d z = 0.38, B H(0,30) = 20.05, RR B > 3 [6, 365]) strategies. There was anecdotal evidence for no difference between no strategy and the single-letter focus ( t 56 = 0.73, p = 0.469, M diff = 9 ms, d z = 0.10, B H(0,30) = 0.73, RR 1/3 < B < 3 [0,74]), and between the no strategy and goal-maintenance strategies ( t 56 = 0.01, p = 0.993, M diff = 0 ms, d z = 0.00, B H(0,30) = 0.38, RR 1/3 < B < 3 [0, 34]). The Bayes factor of the latter two tests did not reach the level of good enough evidence. See figure 1 for the distribution of the Stroop interference scores and table 1 for congruency condition means and s.d.s broken down by the strategy conditions.

An external file that holds a picture, illustration, etc.
Object name is rsos202136f01.jpg

Violin plot depicting the distribution of Stroop interference score differences (ms) between the no strategy and the four strategy conditions. Each black dot represents the reduction of the Stroop interference score (incongruent RT—neutral RT) by a specific strategy of a single participant.

2.4.5. Crucial test 2 (non-preregistered): do the strategies decrease the RTs of the incongruent trials?

Interestingly, the mean RTs of incongruent trials in the looking-away and blurring conditions were numerically higher than that of the no strategy condition. We found evidence that neither the looking-away ( t 56 = −0.46, p = 0.647, M diff = −7 ms, d z = −0.06, B H(0,30) = 0.34, RR 1/3 < B < 3 [0, 30]) nor the blurring strategies ( t 56 = −0.86, p = 0.392, M diff = −13 ms, d z = −0.11, B H(0,30) = 0.27, RR B < 1/3 [23, ∞]) reduced the RTs of incongruent trials. Bayesian evidence regarding the slow-down of incongruent RTs remained insensitive for both the looking-away (B H(0,30) = 0.65, RR 1/3 < B < 3 [0, 66]) and the blurring strategies (B H(0,30) = 0.93, RR 1/3 < B < 3 [0,101]).

2.4.6. Crucial test 3: is there a relationship between hypnotizability and the extent to which people can reduce the Stroop interference by the tested strategies?

To this aim, we regressed the SWASH scores on the extent of the reduction in Stroop interference by the strategies and tested the regression slopes against zero. Even though the raw regression slopes are comparable to zero, we did not gain good enough evidence for the null in any case. The raw regression slopes in descending order: blurring ( t 55 = 0.25, p = 0.801, b = 1.74 ms/SWASH unit, β = 0.03, B H(0,5) = 0.91, RR 1/3 < B < 3 [0, 24]), single-letter focus ( t 55 = 0.11, p = 0.920, b = 0.79 ms/SWASH unit, β = 0.01, B H(0,5) = 0.92, RR 1/3 < B < 3 [0, 23]), looking-away ( t 55 = 0.06, p = 0.950, b = 0.49 ms/SWASH unit, β = 0.01, B H(0,5) = 0.86, RR 1/3 < B < 3 [0, 23]) and goal-maintenance strategy ( t 55 = −0.11, p = 0.911, b = −0.81 ms/SWASH unit, β = −0.2, B H(0,5) = 0.78, RR 1/3 < B < 3 [0, 18]). Figure 2 depicts the scatterplots, regression slopes and their 95% confidence intervals for each strategy separately. The electronic supplementary material reports an alternative analysis of this question in which we directly compared the group of highs and lows in the extent to which they reduced the Stroop interference effect. Importantly, the results are in accordance across the analyses.

An external file that holds a picture, illustration, etc.
Object name is rsos202136f02.jpg

Scatterplots showing the relationship between hypnotizability (measured by the SWASH) and the reduction in the Stroop interference induced by the four strategies. The four panels indicate the looking-away ( a ), blurring ( b ), single-letter focus ( c ) and goal-maintenance ( d ) strategies.

2.4.7. Supporting test of interest 1 (non-preregistered): do the strategies influence RTs in general?

Table 1 suggests that the strategies may trigger a general slow-down effect on RT. To test this, we ran four Bayesian t -tests comparing the average of the incongruent and neutral RTs of the no strategy and every strategy conditions. Note that these analyses are equivalent to four tests of the main effect of each strategy on the RTs of incongruent and neutral trials. We found good enough evidence for a general slow-down effect for the looking-away ( t 56 = 3.20, p = 0.002, M diff = 40 ms, d z = 0.40, B H(0,30) = 43.33, RR B > 3 [6, 992])), the blurring ( t 56 = 2.42, p = 0.019, M diff = 29 ms, d z = 0.26, B H(0,30) = 8.34, RR B > 3 [8, 132])) and the single-letter focus strategies ( t 56 = 2.42, p = 0.019, M diff = 29 ms, d z = 0.26, B H(0,30) = 8.34, RR B > 3 [8, 132])). By contrast, goal-maintenance did not increase the RTs to incongruent and neutral trials ( t 56 = 0.39, p = 0.700, M diff = −4 ms, d z = −0.04, B H(0,30) = 0.26, RR B > 3 [23, ∞])).

2.5. Discussion

In this experiment, we tested four strategies that putatively reduce the Stroop interference effect to examine whether any of these strategies can be the underlying mechanism of the word blindness suggestion. The crucial test of the strategies provided insufficient evidence either way for whether the single-letter focus or the goal-maintenance strategies could mitigate the extent of the interference. On the other hand, the looking-away and the visual blurring strategies passed the crucial tests as they drastically decreased the extent of interference for all levels of hypnotizability. Moreover, the blurring strategy approximately halved the extent of Stroop interference (reduction of 33 ms from the baseline of 78 ms), which is precisely what the word blindness suggestion achieves in general [ 22 ]. However, as mentioned earlier, the word blindness effect has another distinctive feature: it realizes the reduction of the interference effect by reducing the RTs of incongruent trials ([ 23 ]; see electronic supplementary material, table S1 for a meta-analysis of studies demonstrating the word blindness effect and the reduction of incongruent RTs in the suggestion compared with the no suggestion conditions). Surprisingly, our results do not match this pattern; there is evidence that neither of the strategies managed to decrease the RTs of the incongruent trials (for the looking-away strategy as the corresponding B = 0.34 was just above the conventional rough guideline of i < 1/3). If this finding is robust, it challenges the idea that these strategies are the underlying mechanisms of the suggestion. Therefore, in the next experiment, we pre-registered the reduction in incongruent RTs as a test of the strategies. An alternative analysis of this question, in which we compared the strategy conditions with the word-blindness suggestion condition of a different experiment that had the same Stroop materials as the current experiment (see the electronic supplementary material) yielded the same result, namely, that the strategies and the word blindness suggestion produced different patterns.

Another key characteristic of the word blindness suggestion is that it seems to reduce interference by attenuating response competition and not by de-automatizing reading per se [ 17 , 59 ]. By introducing colour-associated words (e.g. sky), Augustinova & Ferrand [ 17 ] distinguished the effect of the suggestion on the semantic and the response conflict components. Crucially, the de-automatization of reading account predicts the reduction of semantic conflict, whereas the response competition account expects that semantic processing remains unaffected by the suggestion. In two experiments, it was demonstrated that the word blindness suggestion modulated only the response conflict component deeming it unlikely that the suggestion operates via the dampening of semantic processing. 5 Parris, Dienes & Hodgson [ 22 ] argued on the basis of response time distributional analysis that the word blindness suggestion took its effect on the portion of the response time distribution associated with response conflict and not semantic conflict (for other behavioural evidence supporting the response competition account, see Palfi, Parris, Seth, & Dienes [ 60 ]; cf. the neural correlates of the word blindness suggestion found by [ 20 ]. Hence, the strategy that underlies the suggestion should not take its effect by dampening the visual input of the meaning of the words; rather it should aid the subjects to handle response conflict between the competing response options. It is not clear, however, whether looking away or visual blurring would be in accordance with this notion. Therefore, in the next experiment, we introduce a new condition to dissociate the semantic and response conflict components of the Stroop interference effect, and we specify a new crucial test. Namely, a strategy to be deemed a plausible underlying mechanism of the suggestion should only reduce response conflict and should not influence semantic conflict.

3.  Experiment 2

In this experiment, we aim to test whether the beneficial effects of the looking-away and visual blurring strategies on the mitigation of Stroop interference can be replicated. As argued earlier, the cold control theory assumes that hypnotizability is only related to metacognitive abilities and so strategies used during hypnosis should be applicable to anyone (irrespective of their hypnotizability) inside or outside of hypnosis. As the first experiment did not provide sensitive evidence against this assumption, we retained it and tested the strategies by recruiting participants from the whole range of hypnotizability.

We defined two conditions that the strategies ought to meet to be considered as appropriate underlying mechanisms of the word blindness suggestion: (i) they need to reduce incongruent RTs, and (ii) as suggested by previous findings (e.g. [ 17 ]) they should alleviate response conflict rather than semantic conflict. In order to test the latter assumption, we added non-response set incongruent trials to all of the experimental conditions. These trials consist of colour words that are not part of the response set (e.g. brown) displayed in one of the colours of the response set. Therefore, responding to these types of trials should not involve response competition, and the non-response set interference (RT difference between non-response set incongruent and neutral trials) can be taken as an index of conflict that occurs during semantic processing [ 61 , 62 ]. 6 Henceforth, we refer to the non-response set interference effect simply as semantic conflict or semantic interference effect.

3.1. Methods

3.1.1. participants.

We recruited 35 participants; however, one of the participants claimed that they did not follow the instructions closely and used visual blurring in the no strategy condition. We excluded the data of that participant, and all analyses were run on the data of 34 participants (mean age = 21.82, s.d. = 4.38, females = 27). The participants received either course credits or payment (£5) in exchange for attending the study.

3.1.2. Stimuli and apparatus

The materials of the registered experiment closely followed those in the first experiment. We added four colour words to the stimulus set (BROWN, PINK, GREY, ORANGE) and created two independent stimulus sets defined by the colours in which the words are presented (A and B). In set A, all words were presented in one of the original colours (red, blue, green or yellow) and so the non-response set incongruent trials comprised the new colour words presented in the original colours. In set B, all words were presented in one of the new colours (brown, pink, grey or orange) and so the non-response set incongruent trials consisted of the original colour words presented in the new colours. The hex colour codes of the new colures were #a52a2a (brown), #ffaaff (pink), #808080 (grey) and #ffa500 (orange). We ran the experiment in OpenSesame [ 52 ] and the resolution of the computer screen was 1920 × 1080 (18-inch screen).

3.1.3. Design and procedure

There were three major changes in this experiment: we did not include the single-letter and goal-maintenance strategy conditions; there were more trials in each strategy condition as we included non-response set trials as well (we had 48 trials from each trial type, and so 192 trials in total in each strategy condition); we did not take into account the hypnotizability of the participants. The experiment had a 4 × 3 × 2 mixed design with congruency type (congruent, neutral, incongruent non-response set, incongruent response set) and strategy condition (no strategy, looking away, visual blurring), and non-response set groups (response set being equivalent [A] versus not equivalent to the first experiment [B]) as independent variables. The participants were assigned to response set groups A or B based on the parity of their subject number. Group membership determined whether the colours of A or B would have corresponding response buttons. For instance, if someone was assigned to group B, then the colours brown, pink, grey and orange had the corresponding response buttons of ‘V','B','N' and ‘M', respectively. In this case, none of the words were displayed in red, blue, green or yellow. Apart from this, the procedure of the experiment was identical to that of the first experiment.

3.2. Data analysis

The steps of the data analysis are in line with those of the first experiment, including the exclusion criterion regarding RT data and how we drew conclusions based on the results of the Bayes factors. We informed the parameters of the model predicting the presence of the semantic interference effect based on the findings of Augustinova & Ferrand [ 17 ], who found in two experiments that the size of the semantic interference (using colour-associated words) was about 20 ms. We expect that an intervention impacting semantic processing should approximately halve this effect. For the test of the regressions slopes investigating the relationship of general response speed and the extent of the Stroop effect, the model parameters of H1 were stemmed from the finding that the slope was 0.13 ms in the no strategy condition in the first experiment. We used this value as the s.d. of H1 for the tests of the slopes against zero as well as for their comparisons.

3.3. Pre-registration

The design and analysis plan of the experiment were pre-registered, and they can be accessed at https://osf.io/gbsaf . We closely followed the steps of the design and of the analysis plan.

3.4. Results

3.4.1. data processing.

First, we omitted trials with errors from the analyses (10.4% in total, of which 2.3% were from the no strategy, 4.4% from the looking away, 3.7% from the blurring conditions). Next, we eliminated trials with RTs that were three standard deviations either above or below the mean. Similarly to the first experiment, the proportions of outliers remained low and comparable across conditions (we excluded 1.2% of all correct trials, of which 0.5% were from the no strategy, 0.4% from the looking-away, 0.3% from the blurring conditions).

3.4.2. Outcome neutral checks 1 (non-preregistered): on what percentage of the trials did the participants use the strategies?

The participants reported that, on average, they used on 80% (95% CI [75%, 85%]) of the trials the looking-away strategy, and on 73% (95% CI [66%, 81%]) of the trials the blurring strategy.

3.4.3. Outcome neutral tests 2: is there a difference between the two response set groups regarding the magnitude of the Stroop interference and the semantic Stroop effect (in the no strategy condition)?

Before collapsing the data across response set groups, we compared the two groups in terms of the extent of the Stroop interference and semantic Stroop effects. For instance, the colours used in set A were more saturated and luminous than those used in set B, which may made it easier for the participants to differentiate between the response options in the former case. This in turn may have produced a smaller interference or semantic Stroop effect in set A than in set B. The size of the Stroop interference effect was comparable in the two response set groups ( M A = 78 ms, M B = 79 ms) and there is weak evidence in favour of the model predicting no difference ( t 30.66 = −0.05, p = 0.958, M diff =1 ms, d z = 0.02, B N(0,60) = 0.38, RR 1/3 < B < 3 [0, 69]); however, the strength of evidence did not reach the conventional cut-off of good enough evidence. The size of the semantic Stroop effect was numerically larger in the group with the response set of the first experiment ( M A = 49 ms, M B = 15 ms); however, the analysis yielded data insensitivity ( t 29.46 = 1.27, p = 0.212, M diff = 35 ms, d z = 0.44, B N(0,20) = 1.08, RR 1/3 < B < 3 [0, 179]). Consequently, we decided to conduct all of the subsequent analyses on the collapsed data.

3.4.4. Outcome neutral tests 3: is there a Stroop interference and a semantic Stroop effect in the no strategy condition?

As in the first experiment, the RTs in the no strategy condition were the fastest in the congruent trials followed by the neutral trials. The RTs of the non-response set incongruent trials were slower than those of the neutral trials, and the longest RTs were observed in the incongruent trials (see table 2 for condition means and s.d.s). The analyses revealed strong evidence for Stroop interference ( t 33 = 6.56, p < 0.001, M diff = 79 ms, d z = 1.12, B H(0,60) = 2.48 × 10 5 , RR B > 3 [5, 2.6 × 10 4 ]) as well as for the Stroop effect ( t 33 = 10.16, p < 0.001, M diff = 130 ms, d z = 1.74, B H(0,105) = 3.36 × 10 9 , RR B > 3 [6, 4.57 × 10 4 ]). Moreover, the contrast of the non-response set incongruent and the neutral trials yielded evidence for the semantic Stroop interference effect ( t 33 = 2.53, p = 0.016, M diff = 34 ms, d z = 0.43, B H(0,20) = 8.29, RR B > 3 [8, 177]).

Summary table about the means of the RTs (ms) in the three strategy conditions. Note: the standard deviations (s.d.s) of the means are shown within the brackets.

strategy conditioncongruency type
incongruentincongruent non-response setneutralcongruent
no strategy791 (131)746 (112)712 (97)661 (81)
looking-away838 (126)822 (126)830 (127)790 (118)
blurring822 (130)812 (130)786 (128)737 (119)

3.4.5. Crucial test 1: are the strategies effective in reducing the Stroop interference effect?

First, we examined whether or not the beneficial effect of the looking-away and blurring strategies replicated in the current experiment. We found strong evidence that both the looking-away ( t 33 = 4.42, p < 0.001, M diff = 71 ms, d z = 0.76, B H(0,30) = 297.77, RR B > 3 [7, 1.93 × 10 4 ]) and the blurring strategies ( t 33 = 3.05, p = 0.005, M diff = 43 ms, d z = 0.52, B H(0,30) = 24.93, RR B > 3 [7, 632]) helped the participants to reduce the Stroop interference compared with the no strategy condition. Figure 3 depicts the distribution of the Stroop interference scores broken down by the strategy conditions, and table 2 presents the congruency condition means and s.d.s.

An external file that holds a picture, illustration, etc.
Object name is rsos202136f03.jpg

Violin plot portraying the distribution of Stroop interference score differences (ms) between the no strategy and the two strategy conditions. Each black dot represents the reduction of Stroop interference (incongruent RT—neutral RT) by a specific strategy of a single participant.

As an additional analysis, we tested whether the strategies reduced the response conflict component (incongruent RTs—non-response set RTs) of the Stroop interference effect so that our results can be compared with those of Augustinova & Ferrand [ 17 ]. The analyses revealed moderate evidence supporting that the blurring strategy reduced response conflict ( t 33 = 1.98, p = 0.056, M diff = 34 ms, d z = 0.34, B H(0,30) = 3.94, RR B > 3 [16, 64]) and anecdotal evidence that looking-away strategy reduced response conflict ( t 33 = 1.61, p = 0.117, M diff = 29 ms, d z = 0.27, B H(0,30) = 2.40, RR 1/3 < B < 3 [0, 364]) compared with the no strategy condition. Note: these two last tests were not pre-registered.

3.4.6. Crucial test 2: do the strategies diminish the RTs of the incongruent trials?

We found moderate evidence supporting the claim that neither the looking-away ( t 33 = −2.35, p = 0.025, M diff = −47 ms, d z = −0.40, B H(0,30) = 0.22, RR B < 1/3 [19, ∞]) nor the blurring strategy ( t 33 = −1.99, p = 0.055, M diff = −31 ms, d z = −0.34, B H(0,30) = 0.19, RR B < 1/3 [16, ∞]) reduced the incongruent RTs compared with the no strategy condition. In fact, we found moderate evidence regarding the slow-down of incongruent RTs for both the looking-away (B H(0,30) = 6.37, RR B > 3 [13, 175]) and the blurring strategies (B H(0,30) = 3.98, RR B > 3 [14, 58]).

3.4.7. Crucial test 3: do the strategies influence the magnitude of the semantic Stroop interference effect?

There was anecdotal evidence that the looking-away strategy reduced the semantic Stroop interference effect ( t 33 = 2.41, p = 0.022, M diff = 42 ms, d z = 0.41, B H(0,10) = 2.80, RR 1/3 < B < 3 [0, 11]). In fact, the strategy eliminated the semantic Stroop effect in the looking-away strategy condition ( t 33 = −1.06, p = 0.296, M diff = −8 ms, d z = −0.18, B H(0,20) = 0.20, RR B < 1/3 [12, ∞]) in the case of the blurring strategy, there was no evidence either way for whether or not semantic Stroop interference was reduced ( t 33 =0.50, p = 0.617, M diff = 8 ms, d z = 0.09, B H(0,10) = 1.07, RR 1/3 < B < 3 [0, 74]).

3.4.8. Supporting test of interest 1 (non-preregistered): do the strategies influence RTs in general?

We repeated the test of the main effect of strategy on the RTs of incongruent and neutral trials. The analyses revealed strong evidence for a general slow-down effect for both the looking-away ( t 33 = 4.98, p < 0.001, M diff = 83 ms, d z = 0.71, B H(0,30) = 7.66 × 10 2 , RR B > 3 [7, 2.47 × 10 4 ]) and blurring strategies ( t 33 = 3.72, p < 0.001, M diff = 53 ms, d z = 0.44, B H(0,30) = 94.59, RR B > 3 [6, 3.66 × 10 3 ]).

3.5. Discussion

Once more, both looking-away and blurring strategies demonstrated utility in reducing Stroop interference, and the blurring strategy approximately halved the Stroop interference effect as the word blindness suggestion tends to do when it is given to highly hypnotizable people. We also replicated the finding that neither of the strategies sped up responses during incongruent trials, and the direct comparison of the strategy conditions with the word blindness condition of a different experiment yielded evidence for their dissimilarity (for the latter analysis, see the electronic supplementary material). By introducing non-response set incongruent trials, we were able to distinguish the semantic and response conflict component of the interference effect, and we found some evidence that the looking-away strategy alleviates both sources of conflicts, whereas for the blurring strategy, the evidence is not clear whether it solely reduces response conflict or diminishes semantic conflict as well. Importantly, we specified these two latter analyses as severe tests that can disconfirm the idea that looking away or blurring are responsible for the word blindness effect. Consequently, we ought to conclude that none of the strategies have met the criteria and are unlikely to be the strategies that highs resort to when they respond to the word blindness suggestion.

4.  General discussion

The purpose of the project was to investigate whether cognitive or perceptual strategies can attenuate the Stroop interference effect. According to cold control theory of hypnotic responding [ 27 ], people use strategies to create the experience that was described to them in the suggestion. Hence, the investigation of strategies is crucial to assess cold control theory and to understand how highs can manage to reduce the interference effect when they respond to the word blindness suggestion. Importantly, the ability of highs to respond hypnotically (with the feeling of involuntariness) seems to be independent of their first-order executive functions, such as cognitive inhibition [ 48 ] and selective attention [ 50 ], that could help them overcome cognitive conflict during the Stroop task (see [ 49 ], for a review). We found no evidence one way or the other for a correlation between hypnotizability and the extent to which any of the strategies could decrease Stroop interference.

Next, we probed the efficiency of the four strategies: looking away, visual blurring, single-letter focus and goal-maintenance. Importantly, looking-away and blurring strategies were shown to be useful in diminishing the interference effect in both experiments, substantiating the notion that participants are able to reduce Stroop interference by consciously engaging in simple strategies; a finding that has been rarely demonstrated in the Stroop literature (cf. [ 19 ]). Nonetheless, none of these strategies should be considered as likely candidates for being the underlying mechanism of the word blindness suggestion, as they did not meet other criteria, such as reducing the RT of incongruent trials. Rather, these strategies seemed to attenuate Stroop interference by affecting the general speed of responses (see Supporting test of interest 1). Participants responded slower overall and made the RTs of different trial types more similar. This slow-down effect is not a unique finding; for instance, neutral RTs were demonstrated to increase due to experimental manipulations, such as goal-priming [ 16 , 46 ] or single-letter colouring and spatial cueing [ 64 ], that reduce Stroop interference. And in some cases, the latter manipulation leaves incongruent RTs unaffected or even elevates them similarly to the looking-away and blurring strategies (e.g. [ 65 , 66 ]). Future research is needed to understand the cognitive mechanisms underlying these processes.

The idea that goal-maintenance plays a crucial role in responding quickly and accurately to a Stroop word is well established (e.g. [ 9 , 45 ]) and it is embedded in many of the cognitive control models (e.g. [ 4 , 67 ]). It is important to note that our findings do not challenge this idea. In this project, we solely aimed to test whether a simple way to update one's goal (i.e. rehearsal of the target) is sufficient to improve performance in the Stroop task. We did not provide strong evidence one way or the other for whether highs achieve the reduction of the Stroop interference when they respond to the word blindness suggestion by internally rehearsing task instructions. However, it is still possible that the strategy with which highs reduce Stroop interference facilitates goal-maintenance. In fact, based on the finding that the word blindness suggestion operates better when the response–stimulus interval is short (500 ms) than when it is long (3500 ms), it remains possible that the strategy that highs employ influences processes related to goal-maintenance ([ 16 ]; cf. [ 22 ]).

In many cases, the word blindness suggestion impacts the RTs of neutral trials as well, and surprisingly, it reduces them (e.g. [ 15 – 17 , 46 ]). This feature of the suggestion is completely in harmony with a strategy that condenses the interference by simply speeding up all responses. However, it is unlikely that either the looking-away or the blurring strategy operates by this mechanism. First, none of these strategies reduced the neutral RTs (tables ​ (tables1 1 and ​ and2, 2 , and Bayesian evidence supporting that the strategies increased RTs overall). Second, we conducted a formal analysis to test this notion, in which we compared the conditions in terms of the patterns of the relationship between the general speed of responses and the magnitude of the interference effect (cf. [ 68 , 69 ]). These analyses confirmed that there is no relationship between the general speed of responses and the extent of the interference in the looking-away and in the blurring conditions (for the details of the analyses, see the electronic supplementary material, Exploration S3).

Finally, it is established in hypnosis research that sometimes people use different strategies to respond to the same suggestion [ 70 , 71 ]. Hence, one may question if none of our strategies can explain the suggestion effect, then could a combination of them, with different highs using different strategies? To assess this possibility, we assigned the participants into idiosyncratic strategy groups based on their subjective reports of strategy usage and then repeated Crucial tests 1 and 2 on these groups and on a combined dataset (For more details of the analysis see the electronic supplementary material). 7 None of the participants found the single-letter focus strategy to be the easiest to use, deeming it unlikely that this strategy contributes to the word blindness effect. While a combination of the looking-away and blurring strategies (data of Experiment 2) decreased Stroop interference, it failed to reduce incongruent RTs, deeming this combination of strategies insufficient to account for the word blindness effect. Nonetheless, the combination of the goal-maintenance, looking-away and blurring strategies (data of Experiment 1) passed Crucial test 1 and provided insensitive Bayesian evidence regarding the reduction of incongruent RTs. Thus, as it stands, the combination of these three strategies may be able to explain the word blindness suggestion, but future research is needed to settle this option.

Another possibility is that moving along the interference-overall RT slope is a strategy in itself. For example, a simple model of motivation is that it moves people along this slope, speeding up overall RT and hence reducing Stroop interference (cf. [ 69 ]). Indeed, enhanced motivation has most commonly led to an overall speeding-up of responses [ 13 , 14 , 72 ]. Nonetheless, the introduction of a reward has not often produced large reductions in Stroop effects [ 13 , 14 ]. More promising, setting up competition for reward in the presence of a competitive other has been shown to result in a greater than 50% reduction in Stroop interference [ 73 ]. 8 One might argue that the hypnotic context provides stronger motivation for highs than monetary reward by itself or combined with competition. However, the re-analysis of an earlier study that had an identical design to the current experiment in terms of the Stroop test, but used the word blindness suggestion, revealed a raw slope of zero ( b = 0.005 ms ms −1 , 95% CI [−0.04, 0.05]) between Stroop interference and overall RT (sum of RTs of incongruent and neutral trials) in the suggestion condition (Pilot study of [ 47 ]). That is, it does not appear that in the suggestion condition people simply move along a fixed slope, generally speeding up and thereby reducing interference. Instead, people typically reduce the RT in especially the incongruent condition when responding to the suggestion. A proper understanding of the relation of motivation to the word blindness suggestion remains to be explored.

One simple strategy still remains that was not tested in the current experiment. When highs are suggested to see meaningless words throughout the Stroop task, perhaps, they take the instructions literally, and they create the experience of meaninglessness by imagining a counterfactual world in which words are truly meaningless. One may argue that imagining a counterfactual world is not needed to create an experience of meaninglessness as subjects may simply see the words as something similar to foreign words and do not actually see them as foreign words. Nonetheless, the phenomenology of this ‘seeing as’ scenario does not align well with what highs generally say they experience when they are given hypnotic suggestions, such as the word blindness suggestion (e.g. [ 47 ]). Highs typically report they experience the requested phenomenology as being a genuine one (e.g. they report seeing foreign words when responding to the word blindness suggestion), which aligns better with the notion of them imagining a counterfactual world without being aware of doing so.

Seeing the Stroop words as meaningless characters by imagining a counterfactual world might influence top-down cognitive control processes in a way that helps subjects reduce Stroop interference. There are two reasons why this notion is plausible. First, imagination can have an impact on behaviour as well as on cognitive processes. For instance, mental practice can improve one's performance in golf [ 74 ]. Moreover, imagination can advance self-regulation [ 75 ], confirm, or in some cases, challenge and mitigate prejudice [ 76 ], create false autobiographical memories [ 77 ], and, finally, even enhance performance of visual search [ 78 , 79 ]. Second, cognitive penetrability is not completely unprecedented in the Stroop task. For instance, expectations modulated by placebo-suggestion were shown to influence performance, measured by accuracy [ 80 ], though such placebo Stroop reduction does not appear to match the word blindness suggestion in reducing Stroop interference in RTs (contrast response expectancy theory [ 32 ]). Depending on the instructions of the placebo-suggestion, it can either enhance or impair the accuracy of responses. There is, however, evidence from independent laboratories that a prime to deteriorate one's reading abilities, by imagining what it is like to have dyslexia, can help people reduce the Stroop interference effect compared with a baseline condition with a neutral prime that has no reference to reading [ 59 , 81 ].

Interestingly, the dyslexia prime and word blindness suggestion phenomena share many properties. They both substantially decrease the interference effect by speeding up the RT of incongruent trials compared with no suggestion/no prime baseline conditions when the response mode is manual (see Experiment 1 of [ 59 ]; and Experiment 1 of [ 81 ]). The dyslexia prime, similarly to the word blindness suggestion, affects the response competition component of the interference while it leaves the semantic conflict component unaffected [ 17 , 59 ]. This latter feature of the dyslexia prime is particularly important in challenging the initially proposed mechanism, namely the de-automatization of reading account that putatively underlies these phenomena. An even more remarkable similarity between the instructions of the dyslexia prime and the word blindness suggestion experiments is that both invite participants to think about disrupting one's reading abilities. One could develop this line of thought and propose that both of these effects are achieved via deliberate strategy engagement, specifically the imagination of a counterfactual world in which words are meaningless. Theories of social priming argue that responses to primes are unintentional and purely triggered by the activation of a specific social concept [ 82 , 83 ]. However, there are many reasons to retain scepticism about the unintentional nature of the responses to social primes, such as the presence of demand characteristics, or the absence of valid and reliable outcome neutral tests demonstrating that the participants were not aware of the link between the social prime and the dependent variable of the experiment [ 84 – 87 ]. These criticisms apply to the dyslexia studies as well, deeming it plausible that the participants reduced the Stroop interference via intentional strategy usage rather than via the unintentional or automatic activation of the concept of dyslexia.

Nonetheless, the idea that imagining that one is unable to derive meaning from the Stroop words, facilitates the resolution of response competition, is a conjecture that needs to be tested. Recently, a registered report undertook such a test by requesting highs to voluntarily imagine the words during the Stroop task as meaningless characters so that they can reduce the Stroop interference compared with a baseline condition in which they are asked to not engage in imagery strategies [ 47 ]. Given the results of the current study it is likely that the subjects of the registered report did exactly what they were asked to do and used imagination rather than one of the strategies tested here to achieve the experience of meaninglessness and to reduce Stroop interference. Nevertheless, the evidence against the combination of the goal-maintenance, looking-away and blurring strategies is insensitive so the efficiency of the imagination strategy should be directly tested. Moreover, the registered report only recruited highs, so to explore the reach of the imagination strategy, it still needs to be tested whether those from the full spectrum of hypnotizability can use the imagination strategy to alleviate interference.

Finally, it is important to bear in mind that the purpose of this study and its design were inspired by the cold control theory and so the conclusions regarding the word blindness suggestion are most meaningful under the assumptions of this theory. For instance, special process theories of hypnosis, such as the integrative cognitive theory [ 88 ], the neodissociation theory [ 89 ] and the dissociated control theory of hypnosis [ 90 , 91 ] postulate that hypnosis influences non-metacognitive processes as well. Hence, they presume that a strategy that is unsuccessful outside of the hypnotic context may be successful when applied under hypnosis. Nonetheless, we are not aware of experimental evidence disconfirming the simpler theory, cold control, which provides the basis of the current study (see [ 47 ] for a review of the evidence in support of the core assumption of cold control theory). Moreover, the above-cited registered report deems it unlikely that in the case of the word blindness suggestion, highs would be using a strategy under hypnosis that they cannot use outside of hypnosis.

One might ask how cold control theory accounts for highs responding to the word blindness suggestion by reducing the Stroop effect, but lows do not, even without a hypnotic induction (e.g. [ 18 ]). That is, if highs do not have any special attentional or control abilities (i.e. highs and lows only differ in capacity to control awareness of intentions), how do highs reduce the Stroop effect where lows do not? One hypothesis is that highs are more motivated to respond to imaginative suggestions; if lows were incentivized to engage as much as highs, they too would reduce the Stroop effect just as much by use of their imagination. This remains a hypothesis for future research to test.

In sum, reducing interference in the Stroop task via intentional means is difficult and the current study provided compelling evidence that there are at least two strategies, looking away from the target word and visual blurring, that any subject can apply. Interestingly, none of these strategies met the criteria to be considered as a potential underlying mechanism of the word blindness suggestion, and thus the modus operandi of the word blindness suggestion remains open. Although these findings further the mystery surrounding the word blindness suggestion, we hypothesize that imagination (i.e. imagining that the Stroop words are meaningless) may be the key strategy with which subjects reset top-down cognitive processing to comply with the request of the suggestion, and lead to the reduction of the Stroop interference.

Supplementary Material

Acknowledgements.

Bence Palfi is grateful to the Dr Mortimer and Theresa Sackler Foundation which supports the Sackler Centre for Consciousness Science.

Appendix A. Instructions in the experimental conditions of Experiment 1

A.1.  no strategy.

‘This time do not use any of the strategies we have instructed you in previous blocks.

We would now like you to respond to the colour of the word on the screen as quickly and as accurately as you can.'

A.2.  Looking away

‘We would like you to focus on the top-right corner of the screen throughout the following experimental block and use only your peripheral vision to identify the colour of the words that appear on the screen.

You can practice this strategy now on an example word.'

In this condition, the participants were told that they can focus on a spot that is closer to the word if they found the top-right corner to be too far away to easily identify the colour of the word.

A.3.  Blurring

‘We would like you to blur your vision throughout the following experimental block by focusing on the screen as if you were looking into the distance.

A.4.  Single-letter focus

‘We would like you to attend to a portion of the last coloured letter of each word in the next experimental block.

A.5.  Goal-maintenance

‘We would like you to internally repeat the phrase ‘displayed colour’ whenever you see the fixation cross.

Please repeat the phrase until the target appears on the screen.'

1 One exception to this is the response expectancy theory [ 32 , 33 ], which provides a simple explanation of hypnotic responding that does not involve altered metacognitive processes. The theory postulates that expectations, produced by hypnotic suggestions, are enough by themselves to create the experiences and behaviour of hypnotic subjects. The subjects feel these responses are involuntary due to the processes being truly unintentional, as there is no need to involve intentional cognitive control processes. This theory is not mutually exclusive with the theories involving cognitive control and metacognitive processes. However, measured expectations do not fully account for hypnotic responding [ 34 , 35 ]. These findings may be due to measure unreliability but they also give rise to alternative accounts such as the metacognitive theories of hypnotic responding. Therefore, in this paper we focus on the explanation and predictions of the metacognitive theories to understand the underlying mechanism of the word blindness suggestion.

2 Note that there may be a fifth strategy. It may be that highly hypnotizable people use the instructions of the word blindness suggestion as a strategy and by creating the experience of meaninglessness they can reduce Stroop interference. We tested this possibility in another paper of ours [ 47 ] and we discuss its results and implications in the General discussion.

3 Note that hypnotizability was measured as a continuous variable, and we created groups using cut-offs described in the Participants subsection of the Methods section.

4 See electronic supplementary material for the analyses of the error rates (Exploration S5).

5 Note that the lack of a significant reduction in semantic conflict by the word blindness suggestion can also indicate data insensitivity. Therefore, we calculated Bayes factors comparing evidence for the de-automatization account and against the response competition model. We found a B H(0,10) of 0.42 in Experiment 1 and a B H(0,10) of 0.39 in Experiment 2. Next, we meta-analytically combined evidence from these two experiments and calculated a meta Bayes factor: B H(0,10) = 0.29. This implies that we have Bayesian evidence in support of the model predicting no effect on semantic conflict (i.e. the response competition model is supported).

6 To distinguish between the semantic and response conflict components of the Stroop interference effect, one can also use colour-associated words (e.g. sky) that tend to produce longer RTs than neutral words but shorter RTs than response set incongruent trials [ 61 ]. For instance, Augustinova and Ferrand [ 17 ] applied colour-associated words in their experiments to assess the magnitude of semantic conflict and to present evidence that the word blindness suggestion influences solely the response conflict component of the interference effect. Nonetheless, their experiments employed vocal responses, and when it comes to manual responses, the colour-associated interference effect is volatile [ 62 , 63 ].

7 We thank a reviewer, Jerome Sakur, for recommending this analysis.

8 But note in the Huguet et al. [ 73 ] study the baseline level of interference (and reaction times) were unusually large, resulting in reduced manual response Stroop interference values still greater than 70 ms, considerably larger than in the typical word blindness suggestion (about 35 ms).

Data accessibility

Authors' contributions.

B.P.: conceptualization, data curation, formal analysis, investigation, methodology, project administration, software, writing—original draft, writing—review and editing; B.A.P.: conceptualization, methodology, writing—review and editing; A.F.C.: investigation, methodology, writing—review and editing; Z.D.: conceptualization, methodology, supervision, writing—review and editing.

All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

Competing interests

At the time of writing, Prof. Zoltan Dienes was a Board Member of Royal Society Open Science but had no involvement in the review or assessment of the paper. All other authors declare no competing interests. The project was not supported by any grant.

We received no funding for this study.

IMAGES

  1. (PDF) The Stroop Effect From a Mixture of Reading Processes: A Fixed

    stroop effect research paper

  2. Research Method / Stroop effect Paper Example

    stroop effect research paper

  3. Stroop task

    stroop effect research paper

  4. I Need Help on My Research Paper on the Stroop Effect (400 Words

    stroop effect research paper

  5. Stroop Effect on Memory Function

    stroop effect research paper

  6. The Stroop Effect Essay Example

    stroop effect research paper

VIDEO

  1. The Stroop Effect. #psychologyfacts #theory #shorts

  2. Cognitive training with the Stroop effect #braingames #braintraining

  3. ¿Conoces el Efecto Stroop?

  4. 14 : Research Writing

  5. Stroop Effect Brain Game

  6. Expo-Science 2015: Stroop Effect

COMMENTS

  1. (PDF) Replicating the Stroop Effect

    PDF | A replication study based on J. Ridley Stroop's original 1935 experiment titled "Studies of Interference in Serial Verbal Reactions". | Find, read and cite all the research you need on ...

  2. The loci of Stroop effects: a critical review of methods and evidence

    The Stroop task is also ubiquitously used in basic and applied research—as indicated by the fact that the original paper (Stroop, 1935) is one of the most cited in the history of psychology and cognitive ... Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin. 1991; 109 (2):163-203. doi: 10.1037 ...

  3. The loci of Stroop effects: a critical review of methods and evidence

    The Stroop task is also ubiquitously used in basic and applied research—as indicated by the fact that the original paper (Stroop, 1935) is one of the most cited in the history of psychology ... & Tzelgov, J. (2018). Focussing on task conflict in the Stroop effect. Psychological Research Psychologische Forschung, 82(2), 284-295. Article ...

  4. Stroop effects from newly learned color words: effects of memory

    If this second group showed a Stroop effect only in their second Stroop block, a practice effect must indeed be underlying the hypothesized Group 1 pattern. If however, Group 2 shows the effect already in their first Stroop block, then this difference to the Group 1 pattern must be a consequence of the passage of time, providing an opportunity ...

  5. Frontiers

    The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used to assess the ability to inhibit cognitive interference that occurs when the processing of a specific stimulus feature impedes the simultaneous processing of a second stimulus attribute, well-known as the Stroop Effect. The aim of the present work is to verify ...

  6. The Stroop effect and mental imagery

    One of the most widely researched psychological phenomena of all times is the Stroop effect (see Stroop, 1935 ). The classic Stroop task is very simple: you have to name the color of words printed on a page. If these words are color words (like "red" or "blue"), where the color named and the color it is printed in are different (say ...

  7. Reclaiming the Stroop effect back from control to input-driven

    According to a growing consensus, the Stroop effect is understood as a phenomenon of conflict and cognitive control. A tidal wave of recent research alleges that incongruent Stroop stimuli generate conflict, which is then managed and resolved by top-down cognitive control. We argue otherwise: control studies fail to account for major Stroop results obtained over a century-long history of ...

  8. Half A Century of Research on the Stroop Effect

    199 l, Vol. 109, No. 2, 163-203 0033-2909/91/$3.00. Half a Century of Research on the Stroop Effect: An Integrative Review. Colin M. MacLeod. Division of Life Sciences. University of Toronto ...

  9. Meta-Analysis of Social Presence Effects on Stroop Task Performance

    The Stroop effect is widely recognized as one of the most robust findings in cognitive psychology, ... These examples highlight that, while our analysis is grounded in the arguments of the reviewed papers, future research might propose alternative interpretations of our data. Therefore, we choose to view this analysis as providing clarification ...

  10. The Stroop effect and mental imagery

    One of the most widely researched psychological phenomena of all times is the Stroop effect (see Stroop, 1935). The classic Stroop task is very simple: you have to name the color of words printed on a page. If these words are color words (like "red or blue), where the color named and the color. " " ". it is printed in are different (say ...

  11. PDF The loci of Stroop effects: a critical review of methods and evidence

    ubiquitously used in basic and applied research - as indicated by the fact that the original paper (Stroop, 1935) is one of the most cited in the history of psychology and cognitive science (e.g., Gazzaniga, Ivry, & Mangun, 2013; MacLeod, 1992). ... Stroop effect can be reduced and even eliminated by colouring a single letter instead of all

  12. The loci of stroop effects: A critical review of methods and evidence

    Despite instructions to ignore the irrelevant word in the Stroop task, it robustly influences the time it takes to identify the color, leading to performance decrements (interference) or enhancements (facilitation). The present review addresses two questions: (1) What levels of processing contribute to Stroop effects; and (2) Where does attentional selection occur? The methods that are used in ...

  13. What Stroop tasks can tell us about selective attention from childhood

    Background. The Stroop effect refers to our tendency to experience difficulty (conflict or interference) naming a physical colour (we use the term 'hue') when it is used to spell the name of a different colour (the incongruity effect), but not when we simply read out colour words (Stroop, 1935).The RT difference between a neutral condition (e.g., a block of colour or using a hue to spell a ...

  14. (PDF) Half a century of research on the Stroop effect: An integrative

    Three experiments varied the extent of practice in an analog of the Stroop color-word task. Each experiment involved four phases: (a) baseline naming of four familiar colors, (b) training in consistently naming four novel shapes by using the names of the same four colors, (c) naming the colors when they appeared in the form of the shapes, and (d) naming the shapes when they appeared in color. ....

  15. Reclaiming the Stroop Effect Back From Control to Input-Driven ...

    According to a growing consensus, the Stroop effect is understood as a phenomenon of conflict and cognitive control. A tidal wave of recent research alleges that incongruent Stroop stimuli generate conflict, which is then managed and resolved by top-down cognitive control. We argue otherwise: control studies fail to account for major Stroop ...

  16. The Stroop effect occurs at multiple points along a cascade of control

    This article argues that the Stroop effect can be generated at a variety of stages from stimulus input to response selection. As such, there are multiple loci at which the Stroop effect occurs. Evidence for this viewpoint is provided by a review of neuroimaging studies that were specifically designed to isolate levels of interference in the Stroop task and the underlying neural systems that ...

  17. PDF The Stroop Effect

    Variants of the Classic Stroop Task In essence, Stroop's paradigm provides a template for studying interference,and investigators have often mined that template to create Stroop-like tasks suited to their particular research purposes. Figure 2 illustrates some of the many alternate versions in the literature. The best known is the picture-word

  18. The Stroop Effect on Color and Word Identification

    The present research examines the Stroop effect in a word and color identification task. The Stroop effect normally occurs because of conflicting response while being presented with. incompatible ...

  19. The Stroop Effect Occurs at Multiple Points Along a Cascade of Control

    For that reason, in this paper, I review the findings of studies designed to isolate the different loci of the Stroop effect and their neural underpinnings, many of which are drawn from our laboratory's program of research that has melded specific behavioral paradigms with a cognitive neuroscience approach.

  20. Stroop Effect Experiment in Psychology

    The Stroop effect refers to a delay in reaction times between congruent and incongruent stimuli (MacLeod, 1991). Congruency, or agreement, occurs when a word's meaning and font color are the same. For example, if the word "green" is printed in green. Incongruent stimuli are just the opposite. That is the word's meaning and the color in ...

  21. Stroop effect

    Stroop effect. Naming the displayed color of a printed word is an easier and quicker task if the word matches the color (top) than if it does not (bottom). In psychology, the Stroop effect is the delay in reaction time between congruent and incongruent stimuli. The effect has been used to create a psychological test (the Stroop test) that is ...

  22. Stroop Interference, Practice, and Aging

    As is true for the cognitive literature in general (cf. MacLeod, 1991), the Stroop effect is a mainstay of research on age-related differences in selective attention, automaticity, inhibitory processes, and executive control.A major focus of the aging research has been on the relative size of Stroop interference effects in younger and older adults.

  23. PDF Generative AI in Real-World Workplaces: Microsoft's Second Research

    Effect sizes lower than some expectations, higher than others Some benefits are hard to capture • People report using Copilot much less mentally demanding, but do not differ in a test of cognitive load (Stroop test) • People using GitHub Copilot report they don't want to do tasks without it, but we find no effect on

  24. Social Sciences

    In this paper, the effects of the rapid advancement of generative artificial intelligence (Gen AI) in higher education (HE) are discussed. A mixed exploratory research approach was employed to understand these impacts, combining analysis of current research trends and students' perceptions of the effects of Gen AI tools in academia. Through bibliometric analysis and systematic literature ...

  25. Strategies that reduce Stroop interference

    Figure 2. Scatterplots showing the relationship between hypnotizability (measured by the SWASH) and the reduction in the Stroop interference induced by the four strategies. The four panels indicate the looking-away ( a ), blurring ( b ), single-letter focus ( c) and goal-maintenance ( d) strategies. 2.4.7.