1,818
Views
0
CrossRef citations to date
0
Altmetric
Review Article

Feasibility of decoding visual information from EEG

ORCID Icon, , , &

ABSTRACT

Decoding visual information, such as visual imagery and perception, from EEG data can be used to improve understanding of the neural representation of visual information and to provide commands for BCI systems. The appeal of EEG as a neuroimaging tool lies in its high temporal resolution, cost-effectiveness, and portability. Nevertheless, the feasibility of using EEG for visual information decoding remains a subject of ongoing inquiry. In this review, we explore the neural correlates of this visual information, specifically focusing on visual features such as colour, shapes, texture, and also naturalistic whole objects. We begin to examine which visual features can be effectively measured using EEG, taking into account its inherent characteristics, such as its measurement depth, limited spatial resolution, and high temporal resolution. Using a systematic approach, the review provides an in-depth analysis of the current state-of-the-art in EEG-based decoding of visual features for BCI purposes. Finally, we address some potential methodological improvements that can be made to the experimental design in EEG visual information decoding studies, such as palette cleansing, augmentation to bolster dataset size, and fusion of neuroimaging techniques.

1. The question of feasibility

In recent years, advances have been made in decoding visual imagery and perception for Brain–Computer Interface (BCI) inputs, using a range of neuroimaging techniques, such as functional Magnetic Resonance Imaging (fMRI). The exploration of Electroencephalography (EEG) for decoding visual information has yielded mixed results, despite its appealing attributes and advantages. EEG, as a noninvasive alternative to intracranial electroencephalography (iEEG), offers higher portability, superior temporal resolution, and a more cost-effective solution compared to fMRI and functional Near-Infrared Spectroscopy (fNIRS). However, there are also concerns about EEG’s suitability for decoding visual information due to its limited spatial resolution and restricted signal acquisition from deep brain regions. This review examines the current state of, and potential for decoding visual information using EEG. Notably, significant progress and growing interest in this area have emerged since the last review in 2015, which combined EEG with fMRI [Citation1]. Thus, this review is particularly timely for researchers intrigued by the prospect of decoding visual information from EEG data.

We begin in Section 2 by introducing visual information. Section 3 outlines the motivations behind decoding visual information via EEG, while Section 3.2 explores the visual processing in the brain. Following this, we investigate the current state-of-the-art in EEG-based decoding of visual information. We consider the viability of EEG for decoding different aspects of visual information divided into simple types of visual information in Section 5 such as colour, texture, shape and position, whilst Section 6 addresses complex decoding. In both sections, we assess the quality of existing studies and datasets, pinpointing key neural regions and time-points evidenced to be usable for decoding specific types of visual information. This information aids in optimal channel placement and temporal window selection. By collating and reviewing widely used EEG visual information decoding datasets, our goal is to assist researchers in making informed choices about dataset selection and determining when generating new datasets might be necessary. As creating new datasets can be time-consuming and expensive, making use of existing resources is a valuable option. Section 7 considers constraints and advantages of the EEG hardware itself, such as the spatial and temporal resolution, practicality, cost, and safety. Whilst Section 8 and 9 shift the focus to challenges related to data gathering and explores potential strategies to enhance data quality and, consequently, improve decoding performance.

2. What is visual information

Visual information includes visual properties such as colour, shape, orientation, size, contrast, luminance, spatial location, movement, and texture. These properties can be experienced via visual perception and imagination.

Visual perception involves receiving and then interpreting visual information in the surrounding environment such as the vibrant orange and red hues of a sunset, or the roundness of a beach ball, when they are within sight. This process is largely, though not exclusively driven by external input. Perception is also impacted by beliefs and past experiences [Citation2]. Visual imagery, is referred to in the literature as visual mental imagery, visual imagination, or the mind’s eye. Visual imagery is the capacity to evoke the appearance of things such as the sunset or beach ball, in their absence – meaning the object is not in the line of the visual field [Citation3,Citation4], or rather not seeable. This internally generated appearance should include some of the sensory visual characteristics of the object, such as shape, colour, and size, and is considered to be a conscious visual experience [Citation5]. Though mental imagery is sometimes used to refer specifically to visual imagination in the literature, we regard mental imagery as encompassing other sensory modalities such as inner-speech, imagined taste, touch and also imagined movement, i.e. motor imagery.

We note that event-related potentials (ERP) are a response of the brain, to visual information, which can be captured with EEG. ERPs have a long history of successful use as inputs for BCI inputs [Citation6] and for assessment of cognitive [Citation7], sensory and perceptual capacities Woodman [Citation8]. For example, visual-P300 is a component which occurs approximately 300 ms after presentation of a visual stimulus, with an increased amplitude for surprising stimuli [Citation9], and has been used in spelling-based BCIs [Citation10,Citation11]. Visually evoked potentials are a subtype of ERPs. Steady stage visually evoked potential (SSVEPs) are brain signals that occur in response to a visual stimulus flickering at a fixed frequency, and are also used as BCI input [Citation10,Citation11]. Whilst SSVEP and visual-P300 are visual, they relate less to visual and semantic properties of the visual stimulus such as colours, shapes, and category. For this reason, they are out of the scope of this review which aims to focus on the aforementioned visual properties. Further, though machine learning is an important part of the decoding pipeline, and innovations in which may boost decoding performance, this review does this review does not cover the specific algorithms used in EEG visual information decoding literature and more generally. This is because there is a wealth of review papers dedicated to this, for example, Aggarwal and Chugh [Citation12]; Cao [Citation13]; Roy et al. [Citation14]; Xu et al. [Citation15].

3. Decoding visual information from EEG

3.1. Why decode visual information

There are many motives for researchers to decode visual imagery and perception from neural activity, such as 1) for inputs to BCIs as mentioned previously, which can have potential applications in communication [Citation16] and gaming [Citation17] 2) studying neural dynamics and the mechanisms of visual imagery and perception which can provide insights into disorders related to imagery, such as post-traumatic stress disorder (PTSD) and schizophrenia, within clinical psychology [Citation18], and 3) on a philosophical level, this search can help externalize internal processes and address questions about the nature of thought. Neural decoding can use classification and reconstruction to retrieve stimulus content [Citation19,Citation20]; see . Classification involves the prediction of the class a stimulus belonged to given some brain recording data. For example, was the person perceiving an image of an animal or a tool? Reconstruction entails generating a pictorial representation of the perceived animal picture, based on the brain activity. On the other hand, with feature extraction, a set of features such as colour or shape that capture important stimuli characteristics may be obtained from the brain activity.

Figure 1. Classic pipeline for decoding visual information. First the neural signal is measured whilst the individual performs a visual imagery or perception task. This neural signal is then decoded. The output contents include an example of pictorial reconstruction (a), semantic classification (b), and identification of low level visual information such as shape (c) and colour (d).

Figure 1. Classic pipeline for decoding visual information. First the neural signal is measured whilst the individual performs a visual imagery or perception task. This neural signal is then decoded. The output contents include an example of pictorial reconstruction (a), semantic classification (b), and identification of low level visual information such as shape (c) and colour (d).

3.1.1. Visual based BCIs

BCIs as mentioned in Section 2, can be driven by visual information, whether by visual input into the sensory system or internal generation of visual content. BCI research has predominantly focused on neural decoding of VEPs, the P300 component, motor movement, and motor imagery [Citation21], and more recently on speech decoding [Citation22]. In contrast, decoding visual pictorial-level content for BCI use is relatively unexplored. While there are a few studies comparing which mental states users prefer [Citation23,Citation24], the BCI field lacks research directly comparing the three modalities, auditory, visual, and motor, for information transmission and user experience. Information transmission quality can include the amount of information conveyed, the information redundancy, accuracy and stability of the command, and speed of information transmission. User experience can include comfort, the cognitive load of processing and generating the information, and how intuitive the modality is for a given task. Visual and auditory commands have some other advantages over motor commands, providing an alternative option in cases where motor command is not appropriate. Visual and auditory commands may also enable individuals to communicate a more semantically rich variety of information, not restricted to the motor domain.

There may be benefits to using the visual modality as input over auditory. Many BCIs focus on simple auditory properties such as auditory event-related potentials [Citation25]. Though, here we take auditory commands to refer to inner-speech, as this is an increasingly explored input for BCIs in the auditory domain [Citation26–28]. It is said that a ‘picture paints a thousand words’. In the context of BCI usage, this claim requires empirical validation. However, it is feasible that for BCI usage imagining or perceiving a single picture may very rapidly convey rich information. For example, basic dissociation of scene categories from visual perception seems achievable from just the first 100 ms of EEG data [Citation29]. This would likely be faster than the auditory presentation and comprehension of a sentence describing this same picture. For instance, seeing a picture of ”dark green trees in front of a blue glacier and a snow-capped mountain” will be quicker than internally speaking or hearing this sentence, which take 4000 ms (see ). Thus, a BCI system driven by visual perception may enable fast identification of, and response to, an individual’s input. Though of course, the ideal system would be driven by visual imagery rather than visual perception, it is possible that the internal generation speed of visual imagery varies substantially depending on the stimulus complexity, and also between individuals. Indeed, it has been demonstrated that individuals exhibit varying dominance in sensory modalities for mental imagery, with auditory and visual modalities generally the more dominant [Citation30]. Research directly comparing the two would make for interesting future work. For now, such individual differences should be noted when selecting the sensory modality for BCI input commands. We discuss these individual differences in more depth in Section 8.1. A disadvantage of visually driven BCIs is that good vision is required – therefore potentially excluding those with vision impairments.

Figure 2. This demonstrates the relative speed taken to process similar information content in either a. visual or b. auditory modalities. Object recognition, such as this glacier scene, can be detected as early as 100ms from visual perception, figure taken and adapted from Lowe et al. [Citation29]. How long does it take to say the sentence out loud in B?

Figure 2. This demonstrates the relative speed taken to process similar information content in either a. visual or b. auditory modalities. Object recognition, such as this glacier scene, can be detected as early as 100ms from visual perception, figure taken and adapted from Lowe et al. [Citation29]. How long does it take to say the sentence out loud in B?

3.1.2. Understanding neural mechanisms

The bulk of the visual information decoding literature is motivated by deepening our understanding of neural dynamics and the mechanisms behind visual imagery and perception. A pivotal contribution of visual-decoding research has been on ending the debate as to whether mental imagery is propositional (symbolic and language based) or depictive (i.e pictorial) – visual mental images involve depictive representations [Citation31], as visual information can be decoded from V1, also known as the primary visual cortex, activity. Whilst it seems there is phenomenal depictive similarity between perception and imagery, another goal has been to understand the extent that the two overlap in their spatiotemporal dynamics. Visual decoding in this context has been applied by cross-decoding, i.e. training a classifier on perception neural data then testing it on imagination data and vice versa. This cross-decoding has revealed that perception and imagery show similar top-down activity in visual and frontoparietal regions of the brain [Citation32]. Further by applying the classifier to different time points of MEG data, it shows that imagery and perception most overlap in time-windows related to high-level processing. The authors used this finding to show support that these two processes rely on the same prediction processes. A further motivation of visual decoding is to externalize internal subjective experiences, such as dreams and visual imagery experienced during waking, and learn more about their respective contents and their neural representation. One pivotal study trying to decode visual contents of dreams from fMRI activity [Citation33]. They trained decoders with brain activity obtained during waking in which participants saw images labeled with feature values extracted from a deep convolutional neural network. These decoders were then applied to brain activity captured whilst dreaming, which revealed that dreams have a hierarchical visual representation, not just categorical. Additional future motivations may include gaining insight into disorders related to imagery, such as PTSD and schizophrenia [Citation18] as a way to externalize related hallucinations or intrusive visual imagery thoughts, or another case is investigating how psychedelics modulate visual imagery experience by examining the spatiotemporal dynamics and how these relate to vividness, change in altered conscious experience and to the imagined content itself.

3.2. How is visual information represented in the brain

Here, we make the distinction between spatial-temporal dynamics that are related to the processing of visual information, and the neural correlates which actually encode the visual information. We can group stimuli into two levels when decoding – low and high level information. Some researchers focus on one level exclusively, others combine both, or rather do not consider a distinction. In this context, low-level refers to pictorial qualities such as colour, shape, edges and orientation, whereas high-level relates to complex compositions and eventually semantic representation, for example whether the stimulus can be labeled as a ’dog’ or a ’cat’. This high level information can be invariant to the low level pictorial details, and in some cases invariant across sensory modalities, such as in work by Man et al. [Citation34]; Viganò et al. [Citation35]. We provide a brief overview to introduce brain regions discussed later within the wider context of visual processing dynamics.

3.2.1. Visual processing

In visual perception, light passes into the eye to produce a reversed image of the visual stimulus on the retina. The light-sensitive receptors in the retina, rods, and cones, then convert this light into electrical signals. This visual information is relayed to the visual cortex, part of the occipital lobe, at the back of the brain, via the lateral geniculate nucleus (LGN) in the dorsal thalamus [Citation36]. The visual cortex is composed of six core regions, the V1, V2, V3, V4, V5 and inferotemporal cortex – these can considered hierarchical in the complexity of information they process (see for a schematic of some key regions related to visual processing). The LGN sends visual information first to V1, also known as the primary visual cortex, which then is passed through to V2, the secondary visual cortex, and so forth. The visual system can be divided into a ventral stream for object visual information, and a dorsal stream for spatial visual information [Citation37]. Visual perception is not exclusively driven by external input, as it entails both top-down and bottom-up processing, and is demonstrated to evoke recurrent processing [Citation38]. In contrast, visual imagery involves predominantly top-down processing. The subjective vividness of visual imagery is evidenced to be modulated by the strength of coupling from intraparietal sulcus to the early visual cortex [Citation39]. Both imagery and perception overlap in the neural regions involved, namely the visual cortex and parietal and premotor/frontal areas [Citation3,Citation40], yet, as mentioned, they have distinct temporal dynamics [Citation41]. Still, not all areas involved in visual processing encode visual information relevant to the stimulus-specific properties. For example, subcortical regions are often content invariant. Levinson et al. [Citation42] found subcortical regions such as the basal ganglia and brain stem to be involved in visual processing but not to encode detectable category information. This may be due to the sensitivity of neuroimaging techniques, though in this specific case, 7 Tesla fMRI was used, which has high spatial resolution.

Figure 3. This demonstrates key regions related to visual processing, such as V1–3 and the lateral geniculate nucleus.This image is adapted and taken from BodyParts3D, © the Database center for Life Science licensed under CC attribution-share alike 2.1 Japan.

Figure 3. This demonstrates key regions related to visual processing, such as V1–3 and the lateral geniculate nucleus.This image is adapted and taken from BodyParts3D, © the Database center for Life Science licensed under CC attribution-share alike 2.1 Japan.

4. Review methodology

Here, we outline our approach for the following review of the literature for EEG-based decoding of simple and complex stimuli. Google Scholar and PubMed search engines were used to find papers. The targets for simple stimuli included shapes, colour and texture; whereas, the targets for complex stimuli included naturalistic compositions such as faces, scenes, places, and characters (see for the keyword variations). Note, that predominantly only peer-reviewed work is included, however exceptions were made where a non-peer reviewed paper has been foundational in the field, with many citations. We have chosen to include these as we are evaluating the current state-of-the-art, and such papers have already substantially influenced the field.

Table 1. Key words used to identify papers included in the following literature review for each target.

Information transfer rate (ITR) is a metric used to measure the amount of information in a set period of time, in bits. This metrics is useful when comparing across studies, as it takes into account the number of classes and the time window used to calculate the decoding accuracy. ITR was defined in 1998 by Wolpaw et al. [Citation43], and is as follows in equation 1:

(1) B=log2N+Plog2P+(1P)log21PN1(1)

B is the bit rate ITR. N is the number of classes. P is the decoding accuracy. To obtain the BCI ITR, time in seconds (T) is incorporated in equation 2:

(2) ITR=Bx60T(2)

Where sufficient information is provided in the decoding papers, we include the ITR alongside the reported decoding accuracy.

5. Simple visual features feasibility

Simple visual features refers to visual properties such as colour, shape, texture, orientation, and position, which can be considered as the building blocks of vision. Visual perception and imagination decoding of simple features via EEG mostly focus on colour decoding which we discuss first. Next, we discuss spatial information such as shape, orientation, and position, combined into one section. Finally, we discuss texture decoding which is a relatively unexplored area. For an overview of key EEG datasets relating to simple visual features see .

Table 4. Simple visual features EEG datasets.

5.1. Colour

colour decoding involves determining the colour contents of a stimulus. Everyday stimuli generally contain a multitude of colours. Still most colour decoding focuses on monochromatic stimuli. Past studies typically divide colours into discrete categories, such as red, blue, green, yellow. colour could also be decoded along a continuous colour spectrum rather than discrete categories, though to the best of our knowledge this has not been attempted.

5.1.1. Neural representation of colour

When considering the neural correlates of colour, it is important to distinguish the processing of chromatic content from actual colour perception. In this context, chromatic content refers to hues, colour intensity, and spectral information, whereas colour perception relates to the subjective experience of seeing and interpreting colour. In the early stage of colour processing, chromatic light stimuli from retinal input contributes to object-contour perception, which enables objects to be distinguished from their backgrounds. An fMRI experiment [Citation52] showed this chromatic information to be most decodable from early retinotopic areas including V1,2,3. Though there were some cells in V2 that also contained information about perceived colour surface of objects, this level of colour perception was most decodable from V4, specifically in the lateral occipital cortex. Further evidence for V4’s involvement in perceived colour comes from a colour perception and imagination fMRI study which decoded imagined green, yellow, or red stimuli. V4 was found to be the best predictor of colour category [Citation53]. It is possible there a verbal and/or semantic representation of colour that can can also be used for decoding in higher-level regions.

5.1.2. Colour state of the art

There are at least two EEG-based imagery and perception colour datasets, and two which are restricted to perception. The first perception dataset includes neural data from 30 participants who perceived Gabor patches (sinusoidal gratings usually with a Gaussian envelope) with 48 colour variations and 48 different orientations [Citation50]. Two Gabor patches were presented simultaneously. Rather than passive viewing, participants were asked to remember the colours and orientation for a later recall test, to encourage engagement. Though data was gathered from 61 electrodes at 1000 Hz, in their decoding pipeline, the researchers used a subset of the 17 most posterior electrodes, as well as binning the orientation and colours into 12 bins, to achieve reliable classification of features using Linear Discriminant Analysis. This was because a searchlight analysis showed posterior electrodes at 150 to 350 ms post stimulus to be optimal for decoding. This identification of posterior electrodes supports that the colour decoding was based on visual information as opposed to semantic colour labels. In the other colour decoding perception dataset [Citation54], 9 participants perceived red, blue and green, with EEG data measured from 59 channels. The data was down-sampled to 200 Hz, and decoded via binary classification using a Support vector Machine (SVM) and a feedforward neural network, red vs blue; blue vs green; green vs red. When decoding between two colours using an SVM, the average accuracy was 79.1% (ITR = 51.5 bpm). Channel selection for decoding included the posterior electrodes P7, P5, P3, P1, Pz, P2, P4, P6, P8, PO7, PO3, POz, PO4, PO8, O1, Oz and O2. This study did not consider frontal electrodes, however found that the best performing electrode out of the posterior channels, was POz. Interestingly, the visual second-layer channels (POz – PO8), which would be involved in a slightly later stage of processing, demonstrated higher performance than the first-layer channels (Oz – O2), related to early visual processing. This result seems consistent with the fMRI literature as discussed in 5.1.1, showing information related to perceived colour to be encoded in V4 rather than earlier visual processing areas.

The two imagery and perception datasets use a mix of RGB colours, and white, yellow, and gray. One dataset uses 14 channels to record data whilst 10 participants perceive, then imagine, red, green, blue, white, and yellow LED lights [Citation49]. On this dataset, 61.5% accuracy has been achieved for imagination (ITR = 0.3 bpm). The other dataset is recorded from four, electrodes positioned in occipital and posterior regions, from 7 participants who perceived and imagined RGB and gray-coloured squares displayed on a large curved projector [Citation51]. This dataset was used by other researchers to achieve an average of 38% accuracy with random forest classification for the three perceived colours (ITR = 1.2 bpm), and 36% using an SVM classifier for the imagined colours (ITR = 0.9 bpm) [Citation55] For a comparison table of the colour decoding studies see , and for examples of colour stimuli see .

Figure 4. Types of stimuli used in the three mentioned colour decoding studies; a) shows examples of coloured Gabor patch stimuli, b) depicts the coloured square on a curved projector in Rasheed [Citation51] and c) shows the Arduino with led lights used in Yu and Sim [Citation49]. B and C images are adapted from their respective papers.

Figure 4. Types of stimuli used in the three mentioned colour decoding studies; a) shows examples of coloured Gabor patch stimuli, b) depicts the coloured square on a curved projector in Rasheed [Citation51] and c) shows the Arduino with led lights used in Yu and Sim [Citation49]. B and C images are adapted from their respective papers.

Table 2. Decoding performance metrics for colour, including details on the classifier employed, feature extraction methods, spatial areas selected, chosen time windows, and the calculation of information transfer rate (ITR).

5.1.3. Colour feasibility

In summary, categorical colour decoding seems possible with EEG for both perception and imagination. This can be achieved via channels placed in posterior and occipital regions which is compatible with findings that colour is represented in V1-V4. Whether decoding category from the verbal/semantic labeling as opposed to the low-level representation is possible from EEG, requires further research where frontal channels are used for decoding. Potentially, combining posterior and occipital relating to low level and frontal related to high-level representation could boost decoding performance.

5.2. Spatial information

An object for decoding can vary in its spatial location such as its shape, orientation, and location. For example, it may be a triangle, square, or circle, and positioned in the center of the field of vision, or it may be toward the top right corner and angled 45°. There are a few ways of operationalizing shape, for example, by its contours and edges, or by its skeletal structure, a geometric model based on the medial axis – these are commonly used in pose estimation-based decoding. Orientation can be operationalized categorically such as horizontal or vertical, or on a continuous spectrum of angle degrees. Location refers to the positioning of an object, and is generally operationalized categorically, such as up, down, left, right, or center.

5.2.1. Neural representation of spatial information

Contour shape information is neurally represented in mostly mid-to-higher visual regions. One fMRI study found shape identities of a circle and square to be represented in V3, lateral occipital cortex, intraparietal sulcus and the parahippocampal cortex. Though some decodability of shape was also found in early visual regions such as V1–3 [Citation56]. This is somewhat surprising, as a previous study found reduced V1 activity during shape perception [Citation57]. The decodability of V1 in Erlikhman et al. [Citation56] was interpreted as potentially due to feedback connections.

An object’s shape features are often correlated with its category-level information. For example, the shape of a car or a hammer are fairly consistent across within class variations. One study explicitly disentangled shape and category-level information [Citation58]. They found that low-level pixel-based shape (image silhouette) to high-level perceived shape (perceived shape similarity) followed a posterior-to-anterior gradient, they found ventral category-sensitive regions also showed sensitivity to shape information.

Skeletal shape is revealed to be represented in V3 and LO through two fMRI studies [Citation59,Citation60]. The second presented 20 participants with 20 novel objects which were devoid of semantic meaning, they explored representation in V1–V4, LO, and posterior fusiform sulcus (pFs). Unsurprisingly, these skeletal structures were not found to be represented in pFs, which is often implicated in object recognition [Citation61]. pFs is a new functionally defined region which mostly refers to the fusiform gyrus [Citation62]).

5.2.2. Spatial state of the art

It is shown that shape is encoded throughout the visual cortex, lateral occipital cortex and can be entangled with semantic representation via fMRI studies. To what extent is shape information decodable using EEG data? There are several studies which have classified perceived or imagined shape categories from EEG data. In one study, 7 participants perceived and then imagined 7 geometric shapes including a circle, square, horizontal line, triangle, pentagon, hexagon, and parallelogram. The EEG channels included parietal-occipital regions P3, P4, PO3, POz, PO4, PO7, PO8, and Oz, recorded at 250 Hz. When classifying between all 7 shapes, 35% accuracy was achieved for imagination (ITR = 3.8 bpm), with baseline at 14.3%. On another dataset but using the same hardware, the same authors had 11 participants, for which they classified between two imagined geometric shapes. Perception of the shape was followed by a 5 seconds imagination task, with an average of 70% accuracy across participants (ITR = 5.8 bpm) [Citation44]. This indicates decodability of shape category from posterior brain regions. However, as there was no gap between perception and imagination, it may have been the case that the imagined shape was actually decoded from remnant perceptual activity (see Section 8.5).

Another EEG study chose more distributed electrode positioning − 14 including AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4. EEG data was recorded from 10 participants, at 128 Hz [Citation46]. Participants viewed five shapes, a cube, sphere, cylinder, pyramid and cone (see ). Colour and size was kept constant, with 2 seconds to perceive and 5 second to imagine. Using a linear discriminant classifier, 44.6% accuracy was achieved (ITR = 2.6 bpm). There was a second iteration of the experiment in which a verbal cue for the image was given to ensure the classification result was not due to remnant visual perception (see Section 8.5). Interestingly, the classifier trained on session 1 performed just as well on session 2 with the verbal cues. Though the difference in performance between testing (100 images) and image testing (50 training images) may be partly attributed to the variation in training data, making a direct comparison less fair. As it was not explored which electrodes are contributing to decoding performance, the decoding-relevant information at different levels cannot be teased apart. This includes the possibility of semantic labels in frontal electrodes and higher-level shape perception in posterior electrodes.

Figure 5. In this figure depicted are: a. The geometric shapes used in [Citation46] b. seven geometric shapes from [Citation44].

Figure 5. In this figure depicted are: a. The geometric shapes used in [Citation46] b. seven geometric shapes from [Citation44].

Shapes can vary in their spatial location and orientation. A study using 21 electrodes in posterior-occipital regions successfully classified eight spatial patterns; up, down, middle horizontal, left, middle, right vertical, and oblique 45° and 135°, which were flashed at 2 Hz [Citation45]. Each perception block was 4 seconds, followed by a checkerboard stimulus for 0.5 second to clear the previous pattern. Averaged across participants, binary classification was 92.9%, and classification between 4 categories was 7˜0% (ITR = 9.7 bpm).

In another perception study, with high spatial resolution (256 channels), visual stimuli were varied along two dimensions, shape (how elongated or stubby the profile is) and toolness (tool or graspable object). Classification of shape was highest around 100−200 ms and again at 350 ms. This study also used source localization to enhance the spatial resolution by acquiring MRI images of the participants – recorded at 1000 Hz [Citation47]. See for a comparison of these decoding studies.

Table 3. Decoding performance metrics for spatial features, including details on the classifier employed, feature extraction methods, spatial areas selected, chosen time windows, and the calculation of information transfer rate (ITR).

5.2.3. Spatial feasibility

In summary, shape and positioning decoding both seems feasible by EEG with channel placement on posterior-occipital regions, and at least for shape, distributed channel placement across the head. Future studies should disentangle whether decoding is possible with frontal regions alone. It is interesting that previous studies have chosen to select posterior-occipital regions when shape and category-level information representation is evidenced in the ventral temporal cortex [Citation63]. Further, the neural representation of shape literature indicates it should be possible due to its correlation with category information. For time window selection at least for perception, timepoints corresponding to mid to late stages of visual processing have been implicated as having peak decodability for shapes. Further investigation is required for time-window selection for orientation and positioning/location, and for imagination for all types of spatial information, as currently spatial imagery decoding literature is scarce. Imagined spatial information could be an interesting feature to decode in the case of aphantasia, an impairment related to visual imagery [Citation64] suggest a distinction between object and spatial aphantasia. For a more in depth discussion of individual differences in visual imagery see Section 8.1.

5.3. Texture

Image texture refers to spatially global complex image regions, often with repeated elements. Texture information provides a useful cue for discriminating between image classes, facilitating object recognition. In fact, there is an interesting study [Citation65] showing that, convolutional neural networks, which have been used to model the human visual cortex [Citation66], have a bias toward relying on texture rather than shape information when classifying images. Decoding of texture can either be operationalized as classifying categorisable, nameable and typically artificial textures, such as checkerboards or stripes, or through reconstruction of perceived and imagined textures, whether naturalistic or artificial, which involves a more continuous representation.

5.3.1. Neural representation of texture

Texture can be considered a mid-level representation, captured in the intermediate visual cortex. This representation is composed of low-level visual representations such as orientation and intensity. Yet, texture relates to summary-statistics which are invariant to transformations such as translation and moments of intensity, therefore the spatial arrangement of visual features becomes less important. An fMRI study showed that V2 response strength well predicted the naturalistic structure of different texture types. However, the early visual cortex i.e. V1 did not [Citation67]. Additionally, in an fMRI-adaption paradigm which enables identification of finer-scale neural responses, neural selectivity in the fusiform face and parahippocampal place area was found to be sensitive to an image’s texture properties [Citation68] when participants perceived faces and places. These results indicates that texture representation is distributed throughout the brain but predominantly in early visual cortex such as V2.

5.3.2. Texture state of the art

To the best of our knowledge, there are only two studies which reconstruct texture from EEG activity. Specifically, visually evoked potentials (VEPs) relating to 500 ms perception of 166 natural texture images, were fed into a reconstruction pipeline. The dataset used in both studies consists of 15 participants, with a BrainVision 19 electrode cap, consisting of Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T7, T8, P7, P8, Fz, Cz, and Pz, and recorded at 1000 Hz. In the first study, texture statistics for each image are obtained via a Portilla-Simoncelli (PS) texture synthesis algorithm [Citation48], in which high-order statistical constraints are applied to local image features [Citation69]. These statistics were used a response variables in a linear regression with the VEPs as the predictors.

In the second study, which uses the same dataset, a multimodal variational auto encoder (MVAE) is emplyed [Citation70]. The texture image and corresponding VEP are passed into separate encoding models, which then form a joint latent space, after which the image and VEP pass through separate decoders for reconstruction. This approach enables reconstruction from partially observable information, i.e. reconstructing the texture image from just the VEP input, as the latent variables can be inferred.

In we illustrate the respective results of the two studies. Notably, the second study shows improved reconstruction, both in terms of correlations between the original and reconstructed images, and from naive observer ratings.

Figure 6. a) shows reconstructions using MVAE wakita et al. [Citation70] and b) are reconstructions from Orima and Motoyoshi [Citation71] using the figure is adapted from Orima and Motoyoshi [Citation71]; wakita et al. [Citation70].

Figure 6. a) shows reconstructions using MVAE wakita et al. [Citation70] and b) are reconstructions from Orima and Motoyoshi [Citation71] using the figure is adapted from Orima and Motoyoshi [Citation71]; wakita et al. [Citation70].

5.3.3. Texture feasibility

The results depicted in indicate promising feasibility for EEG texture decoding. This is particularly the case as only 19 electrodes were used, which indicates texture can be decoded from very low spatial resolution. Interestingly, there seems to be a distinction for decoding naturalistic vs non-naturalistic texture images; different VEPs were found for each at 200–300 ms, from occipital electrodes. To the best of our knowledge there is no existing work on decoding imagined texture from EEG. As texture can rely on fine spatial resolution of an image, it will be interesting to see the general capacity of individuals to generate stable representations of texture – likely decoding will depend on the vividness of the internal representation.

Here, we present existing datasets for visual imagery and/or visual perception decoding from EEG activity. The information in refers to the raw datasets only, as often the researchers reusing the datasets apply electrode/channel selection and down sampling during pre-processing.

Table 5. Decoding performance metrics for complex stimuli, including details on the classifier employed, feature extraction methods, spatial areas selected, chosen time windows, and the calculation of information transfer rate (ITR).

Table 6. Datasets for naturalistic/complex stimuli relaying stimuli details, acquisition device, sampling rate and duration of the imagination or perception task.

6. Complex visual stimuli feasibility

Complex visual stimuli refers to compositions of whole objects and images, rather than the building blocks of visual features such as colour, orientation, shape. For the neural decoding use-case of BCIs, decoding complex visual stimuli is arguably more valuable as a goal than simple stimuli, as these complex stimuli can carry and convey more information. Further naturalistic complex stimuli relate more to the visual information that we are exposed to on a daily basis. In this section, we cover both imagined and perceived complex stimuli. Common stimuli choices for decoding include naturalistic stimuli such as faces, scenes, and objects, and artificial stimuli such as simple objects, and also artificial stimuli such as digits and letters. We consider faces separately as there are distinct neural representations for face processing [Citation78]. For an overview of key EEG datasets relating to complex visual features see . For a comparison of the decoding results, and time and spatial features selection for complex stimuli, see .

6.1. Neural representation

When considering the neural representation of complex stimuli, we consider both semantic and pictorial content to be important. There are some specific areas related to object identity, for example the fusiform gyrus is considered a computational hub for face processing [Citation78,Citation79]. The parahippocampal place area has been implicated in processing of place-related stimuli, for example pictures of landscapes or scenes [Citation80]. Additionally, inanimate and animate stimuli distinctions, such an animal vs. a tool can be decoded in the inferior temporal cortex [Citation81] Of course, it can be argued that category-level information can become amodal, and no longer debatedly visual information. On one hand, it can be hard to disentangle the two to ascertain which is leading to decodability. On the other, amodal category-level information can be useful extra information, which can boost the decoding of visual information.

6.2. Imagined objects, digit, letters

Overall, there are fewer datasets for imagined than for perceived visual information. This is in part because studies that focus on decoding imagery will also capture perceptual data, whereas the reverse is not usually the case. For visual imagery, there is one core dataset used by many researchers which consists of visually imagined coloured digits (0–9), letters, and non-text objects from 23 participants [Citation72]. This dataset was recorded using a 14 channel Emotiv EPOC+ device, at a sampling frequency of 2048 Hz, though this was later down sampled to 128 Hz. Each participant saw and imagined each image just once. There was a 10-second duration for perception followed immediately by a 10-second imagination of the same stimulus. Between each stimulus was a 20 second gap. While this gap prevented contamination between stimuli, the lack of gap between the perception and imagination trial is problematic, as the prior perception neural signals can contaminate the visual imagery brain signals. This is a common methodological flaw in visual information decoding work, not unique to this study, which we discuss in depth in Section 8.5. Presentation order of the stimuli was kept constant and not randomized. Classification performance on this dataset was high. Kumar et al. [Citation72] achieved an average classification accuracy of 85.2% when decoding which of the three categories the neural activity relates to, when combining all participants’ data (ITR = 5 bpm). Tirupattur et al. [Citation82] achieved within category classification accuracies of 72.9% for digits, 71.2% for characters and 72.9% for objects. This dataset was further used by Jolly et al. [Citation83] to create a GRU-based universal encoding pipeline which can be used to extract meaningful features on multiple datasets. Whilst the results would indicate that decoding imagined objects, digits and letters is feasible from EEG data – the methodological issues within the dataset make this conclusion shaky.

6.3. Perceived faces

One body of research aims to classify and reconstruct perceived faces [Citation73]. The dataset produced for this work includes 64-electrode EEG recordings from 13 participants obtained whilst perceiving faces. Specifically, 70 different individuals’ faces were displayed. The focal stimuli of the experiment was 54 images of unfamiliar Caucasian males, of which a happy and neutral expression was shown for each (see ). Though an additional six famous males were included to promote alertness, and 10 female faces used as part of a gender recognition go/no-go task. Each face is shown on a black background, with an oval mask which keeps the shape consistent throughout. Images were presented in a pseudo-random order, with each male face presented twice and female faces presented once in each block. As there were 32 blocks, this results in 64 trials for each unfamiliar male face. The original authors achieved 64% (ITR = 5.7 bpm) and 71% (ITR = 13.1 bpm) accuracy for across and within expression facial identity respectively, which remains almost unchanged after normalization. Though, it is worth noting here that chance was at 50% due to the classification approach used. They also explored the temporal dynamics, so identified which time points were most important for decoding. In terms of which timepoints were most valuable for decoding for high-level information, it was found that from 150 ms across-expression was most robustly discriminated, whereas within expression decoding was discriminated at earlier time points.

Figure 7. Examples of the unfamiliar male faces used. The top row are the original displayed faces, the bottom row are their respective reconstructions. Adapted from Nemrodov et al. [Citation73].

Figure 7. Examples of the unfamiliar male faces used. The top row are the original displayed faces, the bottom row are their respective reconstructions. Adapted from Nemrodov et al. [Citation73].

6.4. Imagination and perception - binary classification of naturalistic stimuli

Several datasets have been created that are binary classification tasks of naturalistic stimuli. Though these are binary classification problems, the stimuli used are 3D and colour, meaning they are relatively complex, which may explain the low performance.

One such dataset was recorded from 16 participants, using a 1000 Hz 64-electrode Brain Products ActiCAP system [Citation74]. The two decoding categories include the Sydney Harbour bridge images (places) and Santa-Claus images (faces), though exemplar-level decoding is also attempted. Participants viewed four sequential images, an ensemble of santa-claus and Sydney Harbour bridge photographs (see . Once cued, the participant visually imagined one of these four images and then indicated which one they had chosen by clicking a picture of it on the screen. An advantage of this dataset is that they provide Vividness of Visual Imagery Questionnaire (VVIQ) scores for each participant. For perception, above-chance accuracy was obtained for category-level decoding, there was also some success for exemplar-level decoding. Though neither category nor exemplar level decoding was successful for imagination.

Another dataset involves 26 participants who perceived then imagined a picture of a hammer, and a flower against black backgrounds [Citation75]. The EEG data was gathered using a 36 channel g.tec kit (see ). Using a spectrally weighted common spatial patterns classifier, 60% accuracy (ITR = 1.1 bpm) was achieved for category level perceived images, and 52% accuracy for the imagery task (ITR = 0.03 bpm).

Figure 8. Showing the types of binary stimuli used. In a) classification of an imagined/perceived hammer vs. A flower adapted from Kosmyna et al. [Citation75], b) classification of Sydney Harbour bridge vs. Santa Claus adapted from Shatek et al. [Citation74].

Figure 8. Showing the types of binary stimuli used. In a) classification of an imagined/perceived hammer vs. A flower adapted from Kosmyna et al. [Citation75], b) classification of Sydney Harbour bridge vs. Santa Claus adapted from Shatek et al. [Citation74].

6.5. Perception classification and reconstruction of naturalistic stimuli

A core dataset for visual perception [Citation76] decoding, has been used in at least 9 further studies (see [Citation85–89]). This is likely due to the large number of classes (40), complex stimuli, and high electrode count. This EEG dataset was gathered using a 128 channel Brainvision system, and consists of neural responses from 6 participants (5 male, 1 female). The stimuli used were easily identifiable, naturalistic stimuli. Specifically, they use a subset of images from ImageNet − 40 classes with 50 exemplars from each class. The task involved passively perceiving images from each class being displayed in bursts for 500 ms each. A common evaluation method for assessing how realistic and high-quality generated images are, is the Inception Score (IS) [Citation90]. The lowest IS score is 0, and the highest is bounded by the number of classes. A higher IS indicates a higher quality generated image. The generated images are passed through the pre-trained Inception V3 to predict the class probabilities. The IS algorithm uses these conditional probabilities to output a score based on the quality and diversity of the generated images. This score is shown to correlate with human observer ratings of generated image realism. Zheng et al. [Citation84] achieve an IS of 5.53 (chance is 2.5%), a slight improvement on Palazzo et al. [Citation91] in which the IS is 5.07. See for example reconstructions from Zheng et al. [Citation84].

There is another dataset of naturalistic stimuli, in which participants passively perceive video clips belonging to five different categories [Citation77]. These categories include abstract geometric shapes, human faces displaying different emotions, goldberg mechanisms, natural waterfalls and extreme sports. The extreme sports category includes first person videos of high-speed motion activities. This introduces the confound that the brain states associated with each category will also differ by emotional state; for example, the waterfall will be relaxing whereas the sports will be stressful. This dataset was recorded using a 128 channel system on 17 participants. Reconstruction performance was assessed by human observers who found roughly 90% of the reconstructed images to be identifiable as belonging to their intended category. One motivation for using videos rather than images, is that participants may be more motivated and engaged during the experiment. Problematically, undergoing long sessions of EEG results in boredom, mental fatigue and low alertness, which reduce attention on the task. In fact, changes in alpha and theta [Citation92,Citation93] power is found to change over time as a consequence of mental fatigue and boredom. Therefore using videos as stimuli which are more engaging than passive viewing of single images may enable longer testing sessions and therefore more data per participant.

Figure 9. Examples of reconstruction results from Zheng et al. [Citation84].

Figure 9. Examples of reconstruction results from Zheng et al. [Citation84].

Figure 10. Illustrates that at the time of this review, at least eight further visual decoding studies have been carried out which use Spampinato et al. [Citation76]’s foundational dataset.

Figure 10. Illustrates that at the time of this review, at least eight further visual decoding studies have been carried out which use Spampinato et al. [Citation76]’s foundational dataset.

7. Feasibility of EEG hardware

When selecting a neuroimaging modality to use for visual information decoding, there are trade-offs to consider including spatial and temporal resolution, practicality, cost and safety. These all impact the feasibility of using EEG for visual information decoding. We examine each in turn with consideration of how the feasibility with EEG compares to other neuroimaging techniques (see for a comparison table).

Table 7. This table compares the relative cost, spatial and temporal resolution, portability, and invasiveness of the neuroimaging modalities discussed in this current section on EEG hardware.

7.1. Spatial resolution

Throughout the visual cortex, visual information is represented in a retinotopic reference frame which relates to where the perceived or imagined stimuli are spatially located [Citation94]. Exploiting the rich spatial information via neuroimaging technique with high spatial resolution can boost decoding. Unsurprisingly then, fMRI has been a popular neuroimaging choice for visual decoding due to its high spatial resolution – around 2–3 mm spatial resolution with 3 Tesla fMRI and under 1 mm with 7 Tesla fMRI. Consequently, it has been used to produce impressive visual decoding results [Citation95,Citation96]. One of the main concerns for visual information decoding with EEG, is its low spatial resolution. Improvements can be made by using high-density devices, though even a 128 channel system is limited to around 6–8 cm3 spatial resolution. Deep learning super resolution algorithms [Citation97] may be applied to boost the spatial resolution of EEG, though to the best our knowledge these have not yet been applied in the visual information decoding context. To boost spatial accuracy, or rather, the localization of the signal origin, source localization algorithms can be applied. This is a more commonly used technique for EEG data than super resolution, though it is less typically used in visual decoding studies as generally source reconstruction requires obtaining an MRI anatomical image.

Another feasibility question is to what depth and extent EEG can record from the brain. Electrical activity picked up by scalp electrodes, must first pass through cerebrospinal fluid, insulating skull and finally skin, which distorts the signal. Additionally, the cortex is not flat, it undulates with gyri, sulci and deep fissures. This makes establishing the source of a signal difficult. To record from the primary visual cortex which mostly sits in fissures, electrodes can be placed over the calcarine fissure, yet just the central 10 degrees of the visual field is located on the surface [Citation98]. There is also a general consensus that deep brain regions are not visible via EEG. Such subcortical regions include the thalamus, nucleus acumbens and hippocampus. One recent study recorded high density (256 channel) scalp EEG alongside intracranial electrodes which provide a ground truth for subcortical activity [Citation99]. An MRI scan was acquired to enable EEG source reconstruction. Importantly, the authors found high correlation between the intracranial and source reconstructed EEG recordings. Whilst this does not demonstrate that the extent of subcortical signal obtainable via EEG will improve visual information decoding, it illustrates that with further progress in source reconstruction algorithms, there is scope for EEG to measure at a deeper capacity.

7.2. Temporal resolution

Visual processing has complex and quick temporal dynamics, for example a study using 1200 Hz MEG showed image categories of faces and scenes could be distinguished from 85 ms after the image was presented van de Nieuwenhuijzen et al. [Citation100]. EEG’s superior temporal resolution makes it a potentially strong candidate for visual imagery and perception decoding, which contain temporal information that can be exploited for improved decoding. These two processes have some temporal similarities – Xie et al. [Citation40] demonstrate that visual perception and imagery overlap in the alpha band, which is associated with top-down or rather feedback processing. However, they also have distinct temporal profiles, which enables identification of whether an individual is engaging in visual perception or imagination [Citation75]. For example, imagery ERPs are shown to have a longer latency than the perceptual condition, with a delay of about about 200–400 ms [Citation101]. The authors suggest this may be because accessing mental representation and reconstructing the image, take more time than solely sensory processes. The temporal information related to visual processing may be exploitable for decoding meaningful information extraction of the visual image. Non-invasive technologies such as fNIRs and fMRI which have high spatial resolution, rely on the hemodynamic response, which is slow which results in a typical sampling rate of around 2 seconds in fMRI. These consequently suffer from poor temporal resolution which can be a limitation in visual decoding.

7.3. Practicality and cost

We suggest that a lot of the motivation behind selecting EEG in visual decoding studies is due to its accessibility. EEG is accessible due to its relatively low cost and the wide range of EEG-based technology, from consumer to research grade, which differ in the number of electrodes, sampling frequency and cost. EEG is also very versatile for use cases which require portability. While traditionally EEG use was confined to a faraday cage and large amplifiers which can be suitable for a clinical setting, there are now systems with amplifiers smaller than an A4 notepad which can operate wirelessly and with stability in the field. MEG offers relatively high spatial and temporal resolution and is consequently used to research visual processing spatio-temporal dynamics [Citation100,Citation102]. Yet, whilst MEG is a potential solution to the spatial temporal trade-off, it is costly and lacks mobility. Still, EEG setup can be time-consuming depending on the system used. For example, channels may need to be carefully placed, and for wet electrode systems, application of conductive gel can be a lengthy process particularly with high-density channels. Wet electrodes remain the gold standard for acquiring optimal signal quality compared to dry electrodes, which whilst quicker to setup, can result in signal degradation and susceptibility to artifacts [Citation103]. We recommend researchers to select wet electrodes to boost the feasibility of decoding visual information via EEG.

7.4. Safety

Some of the more successful attempts to decode visual information have involved invasive techniques such as microelectrode arrays on mice [Citation104] and electrocortiography (ECoG) on humans [Citation105,Citation106] who are usually undergoing surgery for conditions such as epilepsy. Microelectrode arrays, often inserted inside the cortex, can yield single-unit or multi-unit activity, enabling extremely high spatial resolution. ECoG electrodes placed subdurally on the surface of the brain also offer high spatial resolution. Consequently, there is growing interest in using invasive techniques for people without neurological injuries. For example, the NeuraLink device has impressive spatial resolution and wireless capability [Citation107]. The company aims to develop a neural implant that can be used to control a computer or mobile device anywhere the user goes, for those with disabilities but also those without. So far, this has only been implanted in pigs and monkeys, though there is a large body of invasive BCI research in humans [Citation108,Citation109]. For example Willett et al. [Citation110] developed an intracortical BCI that facilitated decoding of handwriting movements in a participant with hand paralysis. Further, stereo-electroencephalography (sEEG), due to the deep brain regions it can reach with its implantation of intracranial electrodes, has a high signal-to-noise ratio, and demonstrated efficacy in a P300 paradigm [Citation111]. Still, there are safety and ethical concerns surrounding invasive neuroimaging techniques (see Burwell et al. [Citation112] for a review), such as the risks of surgery and infection alongside glial scarring which can impede the implant and impair the signal,

Alongside the development of invasive technologies, there remains a need for further advances in alternative, noninvasive technologies. EEG alongside other noninvasive techniques such as fMRI, fNIRS and MEG can be considered safer, and also are not limited to participant pools of solely those already undergoing surgery.

8. Challenges during data gathering

The design choices made during data gathering can impact decoding performance, validity of results and the generalizability of approaches and findings to other contexts. Well thought out experimental design can boost the feasibility of using EEG for visual information decoding. We outline and discuss the design factors which should be carefully considered for visual perception and imagery decoding studies. These include, (1) individual differences in visual imagery ability when recruiting participants; (2) block versus event-related design; (3) controlling for confounding perceptual input; (4) whether the task is active or passive and the impact of eye movement; and (5) the need for gaps between trial components, referred to as ‘palette cleansing’.

8.1. Participants − individual differences in visual imagery ability

There is large variance in how different individuals experience visual imagery. When recruiting participants for an EEG study involving visual imagery tasks, it is important to assess their experience of visual imagery, for optimal task design and decoding. Depending on the specific task, the potential participant’s visualization ability may serve as exclusion or inclusion criteria or be useful information when designing the decoding pipeline.

Vividness is one common imagery variation. Recently, the terms aphantasia and hyperphantasia were coined to refer to extreme ends of a vividness of visual imagery spectrum [Citation113]. Aphantasia refers to the inability to voluntarily visually imagine, though with some nuances: there is a case study where an hallucinogen improved visual imagery [Citation114] and some aphantasics report having visual content in dreams [Citation115]. Hyperaphantasia refers to the capacity for extremely vivid, as detailed as if ‘seeing’, visual images. During data gathering for a visual imagery task, if a person experiences no visual imagery, then they might interpret the instruction ‘imagine the face you saw previously’ to mean ‘think of the semantic details of that face’ as opposed to its visual content. Additionally, neural activity will be modulated by the vividness of the imagined stimulus [Citation3], as well as the participant’s ability to visualize demonstrated in both resting state and task-based fMRI [Citation116]. We recommend assessing a potential participant’s visualization experience by using the Vividness of Visual Imagery Questionnaire (VVIQ) which measures how vivid the visualization is, alongside asking them to describe any nuances or characteristics of the visual imagery they generate. For example, individuals report nuances in the way of three-dimensionality, movement, and ability to project the mental imagery into space [Citation117].

While it is interesting to include participants with a wide range of imagery strategies in studies, it is useful to consider these strategies when creating decoding pipelines. For example, if one person has a widely different experience of imagery, it may be sensible to create separate decoding pipelines on an individual basis rather than combining across participants.

8.2. Is block design problematic?

Block design and event-related design are two types of experimental design often employed in neuroimaging studies. In a block design, each condition is presented continuously for an extended time period. The block relating to one condition is compared to the block for another condition, rather than comparing individual trials. In contrast, event-related design involves discrete trials of short duration for each condition, often in a randomized order. There is debate in the visual information decoding from EEG literature on whether block designs are appropriate to use or if stimulus randomization is essential [Citation118–120]. Contention surrounds one of the core datasets mentioned in Section 6.0.5, heavily critiqued for its block design [Citation119,Citation121]. If this critique is warranted, then this would weaken the dataset and the fairly large body of work already built upon it as well as emerging new decoding studies.

The dataset, as described in Section 6.0.5, consists of 40 image classes. For each class, 50 image exemplars are shown consecutively, in a block, in 0.5 second bursts. High classification accuracy by the original authors and later researchers for these naturalistic, complex stimuli has resulted in extensive media attention. Li et al. [Citation119] draw attention to the lack of stimuli randomization and jittering, as problematic. The classification could be driven by long-term static neural activity relating to the time block rather to the stimuli class itself. In fact, they demonstrate a reduction in classification accuracy when temporal correlates are accounted for. Similar critique has been made of a perception and imagery decoding dataset in which stimuli presentation order was kept constant and not randomized. As this could result in unwanted temporal correlations driving the classification accuracy, the study setup has been questioned by Rekrut et al. [Citation122] and by Li et al. [Citation118]. The original authors of the 40-class dataset contest this critique of block designs, arguing that people react faster and more consistently when conditions are presented in blocks, resulting in more stable responses with a higher signal to noise ratio.

Indeed, block design can facilitate a higher signal to noise ratio and was standard until event-related fMRI work began in the early 1990s [Citation123]. Liu [Citation124] notes that the convolutional model for fMRI analysis [Citation125] was one of the key advances resulting in more complex designs for event-related work. EEG research in this area is still in its early stages and suffers a notoriously low signal to noise ratio. Hence, given the current state of the art, classification work which decodes from block design datasets is arguably the most suitable approach until advances are made to reduce noise.

Nonetheless, it should be acknowledged that using block design can limit ecological validity. When applying BCIs in a real-world setting, it is essential to decode visual stimuli which are temporally sandwiched between other, irrelevant visual stimuli. In this context, it is unrealistic to rely on successive presentation of visual stimuli from the same category as is the case with block design. While currently there are advantages to using block design to increase the signal to noise ratio, and thus decodability, the findings will generally not have the same applicability as an event-related design. Researchers creating or reusing block design datasets should be aware of these limitations.

8.3. Confounding external sensory input

In perception decoding experiments, a participant must keep their eyes open to perceive the stimuli. Problematically, perceptual inputs unrelated to the stimulus can interfere with the stimulus related neural signal. Similarly, in imagery decoding experiments, perceptual inputs can interfere with the neural activity related to the visual imagery tasks. This is due to the overlap in visual perception and imagery systems. Decoding success may then be attributable to erroneous correlations between the perceptual information and imagery task rather than to imagery activation itself. Most confounding perceptual inputs will be those that exhibit a systematic bias, i.e. they vary in parallel with the stimuli. One example here, are text and image-based cues. These can result in systematic eye movements whilst reading the word or scanning the image. A more nuanced confounding perceptual input can be the environment itself, such as a room full of different object that are in the field of view of the participant. Visual attention may be deployed to different aspects of this external environment throughout the duration of the experiment. This may become systematic if there are, for example, semantic associations between the experimental task and objects in the visual field. It is therefore important to control for perceptual inputs external to the stimuli by keeping these inputs constant and, when unrelated to the experiment, out of view of the participant. Another potential benefit of eradicating perceptual interference is an increase in visual working memory capacity due to reducing perceptual input, allowing for more complex stimuli to be decoded, although this hypothesis remains to be tested.

Two possibilities for controlling these external perceptual inputs during imagery tasks include (1) eye closure (2) a pitch black environment with eyes open. Both options have relative advantages and disadvantages. Requiring a pitch black setting constrains the generalizability of results to artificial settings. Eye closure is more feasible in naturalistic settings than a pitch black room, however, it is linked to strong alpha and related motor signals which can create interference. These could create difficulties in picking out the more subtle signals related to the stimuli. Further, eye closure impairs tracking eye movements in hybrid designs, where eye tracking is used in conjunction with EEG. Though a pitch black room can also be used for visual perception tasks, past studies tend to rely on participants’ focusing on the stimuli display and ignoring extra perceptual information.

Each approach to reduce perceptual interference has been used in past EEG imagery decoding studies. Yu and Sim [Citation49] use a dim, though not pitch black, room. Alharbi et al. [Citation126] and Rasheed [Citation51] also use a dark room. To the best of our knowledge, Kumar et al. [Citation72] is the only study to employ eye closure, though this dataset has also been used in several subsequent studies. Several studies use neither eye closure nor lightless conditions to prevent perceptual interference.

8.4. Passive vs active tasks – fixation points and eye movements

Here, we discuss the impact of eye movements in passive vs active viewing tasks. In a passive task, participants are asked to simply perceive a visual stimulus without any additional actions. Whereas, in the case of an active task, the participant may be required to perform an additional task based on the contents of the stimulus they perceive, such as pressing a button – therefore, they must be actively perceiving the stimulus. Eye movements have many important roles in visual perception, for example, preventing perceptual fading [Citation127] and supporting pattern completion [Citation128]. However, there are indications that when decoding perceived stimuli from neural activity, eye movement can result in confounding early visual cortex activity [Citation129]. We also consider the impact of eye movement in visual imagery decoding, with consideration of fixation points.

In general, eye movement is considered to be confounding. Eye movement has been demonstrated to be systematic when active viewing is required to perform the task [Citation129]. This can drive decodability in visual perception tasks, which confounds neural activity related to visual information. In Thielen et al’.s experiment [Citation129], participants performed both passive viewing of oriented square-wave gratings under fixation constraints, and active viewing requiring a button press when the stimulus was perturbed. In the active task, eye movements drove classification of the stimulus perceived, whereas this was not possible in the passive version. The authors recommend post hoc analyses to check for confounding effects of eye movements in active studies; they found removing saccades reduced the confounding effect. Though, we note that when the aim is not to understand neural mechanisms, there will be some BCI researchers for whom the activity driving decoding performance is not important so long as performance is still high.

Most perceptual decoding studies considered in this paper use passive tasks [Citation51,Citation55,Citation75,Citation76,Citation87,Citation126,Citation130–132]. Only a few studies use active viewing tasks, such as Shatek et al. [Citation74] and the go/no-go paradigm used by Ling et al. [Citation133] and Nemrodov et al. [Citation73]. The majority of perceptual and imagery decoding studies incorporate fixation points. This is desirable for reducing confounding eye movement driven neural activity. Fixation points also ensure that overt attention is consistent across trials. However, preventing eye movement may be problematic in imagery decoding studies. This is because there is evidence of image-specific fixation reinstatement during imagery [Citation134], and reenacted scan paths [Citation135]. It is debated whether eye movement plays a functional role in visual imagery [Citation136]. Yet, recent work does support a functional role of eye movement in memory retrieval [Citation128,Citation137]. Problematically, preventing eye movement seems to impair visual imagery [Citation135]. Consequently, when decoding studies use fixation points as a response to reducing noisy signals driven by eye movements, the compromise is that this may reduce the quality of the mental image formed and thus the decodability of the neural signal.

It has not been directly tested whether eye movement can result in confounding decodability in imagery decoding as has been demonstrated in perception decoding [Citation129]. Yet the overlap between the two processes means that we may infer that eye movement will also be problematic in visual imagery decoding paradigms. Consequently, there is a trade-off between preventing eye movement, and thus impairing visual imagery, and including it, risking eye driven neural activity and confounded decoding.

8.5. Palette cleansing

Studies that lack a sufficient gap between visual trials, risk contaminating neural signals. Addressing this methodological flaw is essential to ensure accurate decoding of visual information.

In particular, when conducting perceptual decoding tasks, it is crucial to ensure that neural activity associated with perceiving the target stimulus is not impacted by perception of a prior stimuli. Similarly, in visual imagery decoding studies, neural activity for imagery and perceptual tasks can become confounded by previous stimuli, especially as the stimulus is often shown to the participant prior to the imagination task. To mitigate this, it is important to introduce a temporal gap between the perception task and imagery task. Additional measures should be taken to minimize overlap in the neural representations of the two tasks, ensuring that visual imagery is not solely decoded from residual neural activity associated with perception. Visual noise masks can help reduce this interference. We explore various considerations when selecting a visual noise mask, including factors like structured versus unstructured and static versus dynamic options.

Figure 11. The left side depicts the difference between unstructured, structured and dynamic visual noise. For unstructured dynamic, black and white dots are selected at random with multiple iterations. This example shows three random dot variations shown consecutively. The right hand side depicts how visual noise can be injected into the experiment pipeline, before and after each perception and imagery task.

Figure 11. The left side depicts the difference between unstructured, structured and dynamic visual noise. For unstructured dynamic, black and white dots are selected at random with multiple iterations. This example shows three random dot variations shown consecutively. The right hand side depicts how visual noise can be injected into the experiment pipeline, before and after each perception and imagery task.

The literature indicates that visual noise should be unstructured, for example random black and white dots, rather than structured, such as fragments of letters [Citation138]. Visual noise can also be static or dynamic (see ). Dynamic means the unstructured noise changes during one presentation. Direct comparison between the dynamic and static masks indicates that they are each effective at disrupting representation of images, and there are no unintended side effects of dynamic masks [Citation139,Citation140].

In previous decoding work visual noise masks, when used, are generally static. However, there are some likely advantages of using dynamic noise. Static noise can result in an ‘imprint’ of the previously seen visual stimulus onto the displayed noise, after the disappearance of the actual stimulus. As dynamic visual noise disrupts the imprinted perceptual trace, it prevents ‘imprinting’. We therefore recommend the use of dynamic unstructured visual noise in perceptual and imagery decoding studies.

In most perception decoding studies, participants are shown a blank screen for varying intervals of time between each perceived stimulus [Citation76,Citation130,Citation131], though Nemrodov et al. [Citation73] used a screen with a fixation point. A white noise mask was used in just one study, Ling et al. [Citation133], with a duration of 100 ms. A mixture of ‘palette cleansers’ have been used in decoding studies with both perception and imagination tasks, In one study, ‘REST’ in white letters was shown after the imagination task followed by a blank screen [Citation75], though no interlude was given between perception and imagination. Kumar et al. [Citation72] leave a gap of 20 seconds. Yu and Sim [Citation49] turn off the LED colour light for 5000 ms. A stimulus cue is shown between perception and imagination in Shatek et al. [Citation74]. While Rasheed [Citation51] uses a 9-second rest with closed eyes after 3 seconds, it occurs only after every three sequences. All of these approaches may partly reduce the impact of the previous perceived or imagined stimulus, however, we recommend using dynamic visual noise masks as an evidence backed and consistent method of palette cleansing.

9. EEG dataset size limitations

We begin by discussing synthetic data generation and augmentation (Section 9.1). This solution addresses the common challenge of limited datasets, and sits between data gathering and data decoding of limited datasets. We break down this augmentation section into first traditional machine learning techniques in Section 9.2, then deep learning augmentation approaches in Section 9.3. Finally, in Section 9.4 we discuss how multimodal or rather fusing EEG with additional neuroimaging modalities may boost decoding performance.

9.1. Small datasets - augmentation

In the previous section, we established the importance of gathering high-quality datasets with robust design choices, to improve performance and validity in the decoding stage. Data gathering for visual information-related processing can be a time consuming, resource intensive and financially costly process. Further, visual imagery tasks can be cognitively taxing, which limits the amount of data that can be gathered in one session. Using multiple sessions can be negatively impacted by the non-stationarity of EEG time-series data. Non-stationarity refers to the change over time in the EEG signal due to technical changes such as dried electrodes, changes in mental state such as fatigue and disengagement, and to changes in neural dynamics [Citation141]. This creates difficulties in creating large real-world datasets. Yet, large, high-quality datasets are a prerequisite for decoding, particularly when using deep learning models. One solution to having only small datasets is to create synthetic data, typically by augmenting empirical data. Synthetic data generation can be used to boost the size of an existing dataset and also to increase its generalizability to a test set. This generalisabiltiy is facilitated by the increase in data variance that augmentation provides. While synthetic datasets are becoming quite common in fields such as computer vision [Citation142], this technique is only recently gaining traction for BCI applications [Citation143].

9.2. Traditional data augmentation techniques

There is a wide variety of traditional methods that can be used for EEG data augmentation. These augmentations differ somewhat from those applied to image data, although EEG signal data can be transformed into images to enable augmentation and decoding techniques from the image analysis domain to be more readily applied [Citation144]. Typically, augmentation is applied to the time-series data. For example, evoke CutCat for motor imagery data, an approach that cuts the time window related to one trial and concatenating it with a time window from another related trial. Only afterward is the time-series converted into 2D images (spectrograms) to input to a CNN.

In their review of data augmentation approaches for EEG, Lashgari et al. [Citation145] show that a sliding windows technique is the most commonly used approach. This involves sliding a window over the time series file, and cropping it at various time points. Jittering is also a popular technique, and entails adding noise to the raw EEG or image transformed input data, such as Gaussian, Salt and pepper or Poisson noise [Citation145]. Rotation, a common image augmentation technique, in the case of EEG, can be used to create robustness for the differently shifted electrode locations that can occur when testing the same participant across multiple sessions [Citation146]. The spatial data, where the electrodes are located on the head, can be considered in 3D. The data can then be augmented by rotating it around these three main head axes.

Identifying the best augmentation technique can be tricky as in prior work the baseline decoding performance without augmentation is rarely provided. However, an empirical comparison of data augmentation techniques on 128 time series datasets [Citation147] found magnitude domain transformation such as jittering, magnitude warping, and scaling all achieved similar performance for boosting accuracy. As did the time domain transformation time-warping. However, flipping, slicing and window-warping achieved lower performance. This comparison used eight types of data, varying from ECG and ECoG, to traffic and trajectory data. Therefore, these findings may not generalize to visual perception and imagery EEG data.

While it is not visual information decoding, Lashgari et al. [Citation145] directly compare data augmentation techniques on the same motor imagery EEG dataset, BCI IV-2a. The techniques compared include noise addition, sliding window, Fourier transform, recombination of segmentation and the deep learning augmentation technique Generative Adversarial Networks (GAN). Augmentation using GAN showed the largest boost in accuracy. We discuss such deep learning approaches to data augmentation in the following section.

9.3. Deep learning data augmentation techniques

One of the more popular deep learning techniques for data augmentation is to generate synthetic data using Generative Adversarial Networks (GANs) [Citation148]. In this context, a GAN is used to generate additional EEG time series data to be used as input to the decoding model. GANs consist of a jointly optimized generator and discriminator neural network: the generator attempts to produce realistic synthetic EEG data while the discriminator tries to identify whether the data produced by the generator is real or synthetic. We note that GANs can also be used in the decoding process, such as a reconstruction of an image that a person perceived.

There are many variants of GAN that can be used for synthetic data generation [Citation149], such as deep convolutional GAN, conditional GAN, auxillary GAN, and Wasserstein GAN (W-GAN) [Citation150]. Synthetic data generated using W-GAN has led to improvement in decoding performance in applications such as emotion recognition [Citation151] and producing synthetic high-resolution EEG data from low-resolution samples [Citation152].

Conditional deep convolutional GANs, in which label information is given to the generator and discriminator, have also been used for augmenting motor imagery datasets resulting in performance enhancement [Citation153]. Fahimi et al. [Citation153] have used these GANs to boost performance on datasets where attention is diverted.

Of course, synthetic data, while increasing dataset size and variation and thus filling gaps, will vary in quality. In some circumstances it will introduce substantially more noise into the dataset, even reducing performance. In fact, GAN approaches to data generation have received criticism due to the low likelihood of the generated synthetic data being based on the same data distribution as the original target distribution [Citation154]. Consequently, it is important to evaluate the quality of synthetic data produced by comparing it to the real-world data. Some evaluation metrics include Euclidean distance, sliced-wasserstein distance, Frechet Inception Distance [Citation155], Inception Score [Citation90] and Jensen-Shannon Divergence which is based on KL divergence [Citation156]. To the best of our knowledge, these synthetic data generation techniques have not yet been applied to BCI EEG-based visual information decoding. Researchers in this field should weigh up the potential costs and benefits of using synthetic EEG data and use appropriate evaluation techniques to assess the quality of the synthetic data. We recommend Lashgari et al. [Citation145]’s recent review for a more general overview on data augmentation techniques for deep learning-based EEG.

9.4. Multimodal neuroimaging fusion

As detailed in Section 7, each neuroimaging modality has its own set of limitations. For instance, fMRI offers high spatial resolution but suffers from low temporal resolution, while EEG, conversely, provides high temporal resolution but has limited spatial resolution. Combining the superior temporal resolution of EEG with the enhanced spatial resolution of fNIRS or fMRI has the potential to significantly enhance decoding capabilities, mitigating the shortcomings of each individual modality. This approach holds particular promise for harnessing fine-grained spatial features and retinotopic mapping related to visual information. Although there is limited prior research involving the fusion of EEG and fMRI for visual information decoding, and none that combines EEG with fNIRS, there are relevant studies, such as Cichy and Oliva [Citation157] for MEG and fMRI fusion in the investigation of spatiotemporal dynamics in visual object perception and Muukkonen et al. [Citation158] for face perception.

Currently, there is only one dataset that combines intracranial EEG and fMRI, consisting of 18 participants watching an audiovisual film; however, these tasks were performed non-simultaneously [Citation159]. As this dataset is relatively recent, no comparisons between bimodal and unimodal decoding have been conducted. For speech decoding, EEG has shown promising results when fused with fNIRS data [Citation160]. In this study, a recording device with 64 EEG and 16 fNIRS channels was used to collect data from 16 participants during overt and imagined speech tasks. Bimodal decoding outperformed both unimodal methods for both overt and imagined speech, with an average decoding performance of 46.41% for bimodal, 43.83% for EEG unimodal, and 32.62% for fNIRS unimodal. The fusion approach used in this study included concatenating the feature extractions from separate sub-networks for each modality, followed by additional processing in subsequent layers. This is an example of intermediate fusion, one type of fusion approach.

Different fusion approaches can be categorized into early, intermediate, or late fusion. Early fusion combines the original input data before it is fed into the decoder architecture, intermediate fusion seeks marginal data representations, and late fusion combines decisions from unimodal architectures. Early and intermediate fusion can be valuable for finding cross-modal relationships between the two modalities. These inter-dependencies may provide extra information for decoding. An example of early fusion is fMRI-informed EEG source localization to enable decoding of music from EEG signal [Citation161]. Here, the locations of significant voxels in the fMRI data are identified on a source model which is used to create the EEG features. As well as facilitating decoding on participants whose fMRI and EEG had been recorded, this approach enabled decoding on a separate dataset of only EEG recording.

We recommend further exploration of EEG fusion with fNIRS or fMRI to enhance decoding performance, especially techniques like fMRI-informed EEG source localization which holds promise in BCI applications. This approach leverages the advantages of fMRI while remaining practical for EEG-only contexts. Unfortunately, for visual information, there is a lack of datasets that provide either simultaneous or non-simultaneous acquisition of multimodal neuroimaging data. Therefore, creating an open-source dataset for researchers to investigate optimal data representation, feature extraction, and fusion techniques is a logical next step.

10. Overall evaluation and recommendations

Using EEG for decoding visual information is gaining traction for BCI applications, and also for furthering our understanding of the brain. There are appealing reasons for selecting EEG, it is noninvasive, portable, is relatively cheap and has high temporal resolution, Yet, this growing interest is matched with concerns about the feasibility of using EEG for these purposes. The last review of decoding visual information from EEG was in 2015 [Citation1], and one prior to that on fMRI encoding and decoding in 2011 [Citation162]. As a substantial amount of new work has emerged since this time, this current work provides a more up to date review to account for the new developments in this field, so researchers can make informed choices about using EEG for visual information decoding purposes.

In this review paper, we evaluated the feasibility of using EEG by examining the capacity of the hardware, and identifying the current state of the art for decoding via EEG both simple visual information and complex information. Here we summarize the conclusions drawn around feasibility and reiterate methodology recommendations for future researchers to consider if they choose to select EEG as their neuroimaging technique for visual information decoding.

EEG has low spatial resolution and is considered limited in recording from subcortical regions. Yet visual encoding and processing involve fine-grained retinotopic mapping [Citation94] and subcortical regions [Citation42]. There is emerging evidence that scalp-EEG has the power to measure from subcortical regions [Citation99], though we cannot claim that this will be meaningful information in the context of visual information decoding, it does hint at further unlocked potential in EEG that can be explored and exploited to boost decoding performance. Whilst EEG cannot compete with the high spatial resolution of invasive techniques such as ECoG, some improvements can be made to boost EEG’s resolution. For example, MRI acquisition can be used for source localization. Though, we also note that as MRI scans are generally expensive and non-portable, this measure may negate the practical benefits of choosing EEG in the first place.

Through reviewing previously used datasets, we demonstrate that researchers have selected complex naturalistic images, basic images such as digits and letters, and low level visual features such as colour for decoding. The simple features we considered in this review include colour, texture and spatial information (orientation, position and shape). Perceptual decoding has been shown to be possible for all the aforementioned categories [Citation44,Citation45,Citation47,Citation48,Citation50,Citation51,Citation54,Citation70]. Imagination decoding lacks exploration for texture and spatial position and orientation, but attempts to decode colour and shape have shown success [Citation46,Citation49,Citation55]. We recommend future research explore the channel placement and time points to use for decoding more explicitly, as often the optimal spatial and temporal selection have not been fully explored, instead all channels and time-points are used for decoding. For example, a proper comparison of frontal vs posterior channels may indicate whether decoding is occurring based on low-level feature information of high-level categorical information such as verbal labeling. In the context of simple features, this would be whether decoding is occurring from representation in the early visual cortex, or based on categorical representation in more frontal regions, for example, the shape category square vs. triangle. At least for colour, perceptual decoding has been achieved from posterior regions, indicating feasibility of decoding colour from visual information alone. Excitingly there is also feasibility indicated for imagination colour decoding.

The complex stimuli previously used for decoding are varied, including faces, places, objects, tools, characters, and basic shapes and colours. For complex stimuli, most of the decoding work has focused on perception [Citation72–75,Citation77], which in general indicate that this is a feasible aim. Though the impressive decoding performances with naturalistic stimuli are potentially limited by the methodology used to gather this particular dataset [Citation76]. Performance for decoding perceived faces is relatively low – chance was at 50% [Citation73].

When considering what tweaks can be made to existing methodologies to boost performance, and what challenges can be mitigated (see Sections 8 and 9), we begin by returning to an ongoing debate on block vs event-related design, which impacts many previously collected datasets in this domain (Section 8.2). There is merit in using block designs as EEG-based visual information decoding is still in its early stages. However, we caution researchers that block design may hamper ecological validity. These limits of block design should be noted, and event-related designs should be considered when possible.

We next drew attention to individual differences in visual imagery, such as vividness and other nuanced characteristics (Section 8.1). Visual imagery is a noisy experience, differing not just across individuals but also within individuals over time. We recommend using the VVIQ in the participant screening phase to identify individuals’ visual imagery abilities, as well as noting characteristics of their experience such as faded edges or distorted perceptions. Through this process, it may be ascertained that visual imagery is not a suitable BCI command choice for a particular individual, and alternatives can then be considered.

Not just imagery but also visual perception is a noisy experience, therefore, to achieve high decoding performance it becomes important to reduce extra neural noise and confounding inputs. During visual perception, we recommend using fixation crosses to reduce confounding eye movements, which is a technique used in the majority of past studies (Section 8.4). With visual imagery, eye movements may be a functional part of image generation, and disrupting eye movement can reduce imagery quality. Researchers should take this into consideration, particularly in studies that measure both perception then subsequent visual imagery of that same image. Although this will also depend on whether or not the imagery task involves eye closure. We recommend reducing confounding external perceptual inputs, other than the stimuli, in both perceptual and imagery tasks. In some previous studies, researchers have addressed this for visual imagery tasks through eye closure or using a pitch black room. Darkening the testing room is not commonly done for visual perception tasks, and we recommend that researchers consider it in future studies. Researchers should weigh up the two strategies, noting that eye closure is more naturalistic than a pitch black room but also induces strong alpha and related motor signals [Citation163] which can add unwanted noise to the neural signal.

Another important design choice for reducing confounding and noisy neural signals is using palette cleansers (Section 8.5). We recommend using unstructured, dynamic visual noise before and after every perception and imagery task. This helps remove lingering signal left over from previous tasks or other irrelevant internally generated content.

We also offer data decoding recommendations. EEG datasets are often small due to time, money and participant concentration constraints. We recommend synthetic data generation for augmenting EEG datasets (Section 9.1, notes some popular algorithms choices for this). This has been shown to enhance performance decoding in other EEG domains, however, has not yet been applied to visual information decoding. Additionally, future researchers can explore fusion with other neuroimaging techniques, for example complementing the high temporal resolution of EEG with another high spatial resolution neuroimaging technique such as fMRI.

11. Conclusion

In summary, decoding visual information has interesting use-cases for BCIs and understanding neural representation of visual imagination and perception. EEG is an appealing modality for doing this due to its low cost, portability, the facts its noninvasive, and high temporal resolution. As described in this review there are concerns over the feasibility of using EEG for visual information decoding due to its low spatial resolution. It is essential to choose a neuroimaging technique that has the capacity to acquire meaningful information for decoding visual features or high-level categories. The current review evaluated the feasibility of decoding visual information from EEG data first from a theoretical perspective, by assessing the capacity of EEG hardware in comparison to other neuroimaging techniques, and second by evaluating what types of visual information have already been successfully decoded with EEG. Finally, potential problems can arise from the methodology used to gather the datasets and the dataset size we proposed solutions for these, and discussed the merits of fusing EEG data with another neuroimaging technique with higher spatial resolution.

Acknowledgements

We thank Max Townsend at University of Leeds for his valuable comments on earlier iterations of this work. We acknowledge our funding bodies.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

Holly Wilson’s research is funded by the UK Engineering and Physical Sciences Research Council (EPSRC). Eamonn O’Neill and Michael J. Proulx’s research is supported and partly funded by the UKRI Centre for the Analysis of Motion, Entertainment Research and Applications (CAMERA 2.0; EP/T022523/1) at the University of Bath.

References

  • Zafar R, Malik AS, Kamel N, et al. Decoding of visual information from human brain activity: a review of fMRI and EEG studies. J Integr Neurosci. 2015;14(2):155–168. doi: 10.1142/S0219635215500089
  • Vetter P, Newen A. Varieties of cognitive penetration in visual perception. Conscious Cogn. 2014;27:62–75. doi: 10.1016/j.concog.2014.04.007
  • Dijkstra N, Bosch SE, van Gerven MA. Vividness of visual imagery depends on the neural overlap with perception in visual areas. J Neurosci. 2017a;37(5):1367–1373. doi: 10.1523/JNEUROSCI.3022-16.2016
  • Zeman A, MacKisack M, Onians J. The eye’s mind – visual imagination, neuroscience and the humanities. Cortex. 2018;105:1–3. doi: 10.1016/j.cortex.2018.06.012
  • Pearson J. The human imagination: the cognitive neuroscience of visual mental imagery. Nat Rev Neurosci. 2019;20(10):624–634. doi: 10.1038/s41583-019-0202-9
  • Farwell LA, Donchin E. Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroencephalogr Clin Neurophysiol. 1988;70(6):510–523. doi: 10.1016/0013-4694(88)90149-6
  • Gozke E, Tomrukcu S, Erdal N. Visual event-related potentials in patients with mild cognitive impairment. Int J Gerontol. 2016;10(4):190–192. doi: 10.1016/j.ijge.2013.03.006
  • Woodman GF. A brief introduction to the use of event-related potentials in studies of perception and attention. Atten Percept Psychophys. 2010;72(8):2031–2046. doi: 10.3758/BF03196680
  • Picton TW. The p300 wave of the human event-related potential. J Clin Neurophysiol. 1992;9(4):456–479.
  • Friman O, Volosyak I, Graser A. Multiple channel detection of steady-state visual evoked potentials for brain-computer interfaces. IEEE Trans Biomed Eng. 2007;54(4):742–750. doi: 10.1109/TBME.2006.889160
  • Garcia-Molina G. High frequency SSVEPs for BCI applications. Extended Abstracts on Human Factors in Computing Systems; Florence, Italy. Citeseer; 2008.
  • Aggarwal S, Chugh N. Review of machine learning techniques for eeg based brain computer interface. Archiv Comput Methods Eng. 2022;1–20. doi: 10.1007/s11831-022-09819-3
  • Cao Z. A review of artificial intelligence for eeg-based brain- computer interfaces and applications. Brain Sci Adv. 2020;6(3):162–170. doi: 10.26599/BSA.2020.9050017
  • Roy Y, Banville H, Albuquerque I, et al. Deep learning-based electroencephalography analysis: a systematic review. J Neural Eng. 2019;16(5):051001. doi: 10.1088/1741-2552/ab260c
  • Xu L, Xu M, Jung T-P, et al. Review of brain encoding and decoding mechanisms for eeg-based brain–computer interface. Cogn Neurodyn. 2021;15(4):569–584. doi: 10.1007/s11571-021-09676-z
  • van den Boom MA, Vansteensel MJ, Koppeschaar MI, et al. Towards an intuitive communication-bci: decoding visually imagined characters from the early visual cortex using high-field fmri. Biomed Phys Eng Express. 2019;5(5):055001. doi: 10.1088/2057-1976/ab302c
  • Kaplan AY, Shishkin SL, Ganin IP, et al. Adapting the P300-based brain–computer interface for gaming: a review. IEEE Trans Comput Intell AI Games. 2013;5(2):141–149. doi: 10.1109/TCIAIG.2012.2237517
  • Pearson J, Naselaris T, Holmes EA, et al. Mental imagery: functional mechanisms and clinical applications. Trends Cogn Sci. 2015;19(10):590–602. doi: 10.1016/j.tics.2015.08.003
  • Nestor A, Lee AC, Plaut DC, et al. The face of image reconstruction: progress, pitfalls, prospects. Trends Cogn Sci. 2020;24(9):747–759. doi: 10.1016/j.tics.2020.06.006
  • van Gerven MA, Seeliger K, Güçlü U, et al. Current advances in neural decoding. In: Explainable AI: interpreting, explaining and visualizing deep learning. Springer Cham; 2019. p. 379–394.
  • Singh A, Hussain AA, Lal S, et al. A comprehensive review on critical issues and possible solutions of motor imagery based electroencephalography brain-computer interface. Sensors. 2021;21(6):2173.
  • Lee S-H, Lee M, Lee S-W. Neural decoding of imagined speech and visual imagery as intuitive paradigms for BCI communication. IEEE Trans Neural Syst Rehabil Eng. 2020;28(12):2647–2659. doi: 10.1109/TNSRE.2020.3040289
  • Bos DP-O, Poel M, Nijholt A. A study in user-centered design and evaluation of mental tasks for BCI. In International Conference on Multimedia Modeling. Taipei, Taiwan. Springer; 2011. p. 122–134.
  • Weyand S, Schudlo L, Takehara-Nishiuchi K, et al. Usability and performance-informed selection of personalized mental tasks for an online near-infrared spectroscopy brain-computer interface. Neurophotonics. 2015;2(2):025001. doi: 10.1117/1.NPh.2.2.025001
  • Kübler A, Furdea A, Halder S, et al. A brain–computer interface controlled auditory event-related potential (p300) spelling system for locked-in patients. Ann N Y Acad Sci. 2009;1157(1):90–100. doi: 10.1111/j.1749-6632.2008.04122.x
  • Nieto N, Peterson V, Rufiner HL, et al. Thinking out loud, an open-access eeg-based bci dataset for inner speech recognition. Sci Data. 2022;9(1):1–17. doi: 10.1038/s41597-022-01147-2
  • Panachakel JT, Ramakrishnan AG. Decoding covert speech from eeg-a comprehensive review. Front Neurosci. 2021;15:392. doi: 10.3389/fnins.2021.642251
  • Simistira Liwicki F, Gupta V, Saini R, et al. Rethinking the methods and algorithms for inner speech decoding and making them reproducible. Neurosci. 2022;3(2):226–244.
  • Lowe MX, Rajsic J, Ferber S, et al. Discriminating scene categories from brain activity within 100 milliseconds. Cortex. 2018;106:275–287. doi: 10.1016/j.cortex.2018.06.006
  • Sulfaro AA, Robinson AK, Carlson TA. Comparing mental imagery experiences across visual, auditory, and other sensory modalities. bioRxiv. 2023;2023–2025.
  • Pearson J, Kosslyn SM. The heterogeneity of mental representation: ending the imagery debate. Proc Nat Acad Sci. 2015;112(33):10089–10092. doi: 10.1073/pnas.1504933112
  • Dijkstra N, Bosch SE, van Gerven MA. Shared neural mechanisms of visual perception and imagery. Trends Cogn Sci. 2019;23(5):423–434. doi: 10.1016/j.tics.2019.02.004
  • Horikawa T, Kamitani Y. Hierarchical neural representation of dreamed objects revealed by brain decoding with deep neural network features. Front Comput Neurosci. 2017;11:4. doi: 10.3389/fncom.2017.00004
  • Man K, Kaplan JT, Damasio A, et al. Sight and sound converge to form modality-invariant representations in temporoparietal cortex. J Neurosci. 2012;32(47):16629–16636.
  • Viganò S, Borghesani V, Piazza M. Symbolic categorization of novel multisensory stimuli in the human brain. Neuroimage. 2021;235:118016. doi: 10.1016/j.neuroimage.2021.118016
  • Usrey WM, Alitto HJ. Visual functions of the thalamus. Annu Rev Vis Sci. 2015;1(1):351. doi: 10.1146/annurev-vision-082114-035920
  • Konen CS, Kastner S. Two hierarchically organized neural systems for object information in human visual cortex. Nat Neurosci. 2008;11(2):224–231. doi: 10.1038/nn2036
  • Kietzmann TC, Spoerer CJ, Sörensen LK, et al. Recurrence is required to capture the representational dynamics of the human visual system. Proc Natl Acad Sci. 2019;116(43):21854–21863. doi: 10.1073/pnas.1905544116
  • Dijkstra N, Zeidman P, Ondobaka S, et al. Distinct top-down and bottom-up brain connectivity during visual perception and imagery. Sci Rep. 2017b;7(1):1–9.
  • Xie S, Kaiser D, Cichy RM. Visual imagery and perception share neural representations in the alpha frequency band. Curr Biol. 2020;30(13):2621–2627. doi: 10.1016/j.cub.2020.04.074
  • Dijkstra N, Mostert P, de Lange FP, et al. Differential temporal dynamics during visual imagery and perception. Elife. 2018;7:e33904. doi: 10.7554/eLife.33904
  • Levinson M, Podvalny E, Baete SH, et al. Cortical and subcortical signatures of conscious object recognition. Nat Commun. 2021;12(1):1–16.
  • Wolpaw JR, Ramoser H, McFarland DJ, et al. Eeg-based communication: improved accuracy by response verification. IEEE Trans Neural Syst Rehabil Eng. 1998;6(3):326–333.
  • Llorella FR, Iáñez E, Azorn JM, et al. Classification of imagined geometric shapes using eeg signals and convolutional neural networks. Neurosci Inform. 2021;1(4):100029. doi: 10.1016/j.neuri.2021.100029
  • Qiao J, Tang J, Yang J, et al. Basic graphic shape decoding for eeg-based brain-computer interfaces. In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), IEEE; 2021. p. 812–815.
  • Esfahani ET, Sundararajan V. Classification of primitive shapes using brain–computer interfaces. Comput Aided Des. 2012;44(10):1011–1019. doi: 10.1016/j.cad.2011.04.008
  • Gurariy G, Mruczek RE, Snow JC, et al. Using high-density electroencephalography to explore spatiotemporal representations of object categories in visual cortex. J Cognitive Neurosci. 2022;34(6):967–987. doi: 10.1162/jocn_a_01845
  • Portilla J, Simoncelli EP. A parametric texture model based on joint statistics of complex wavelet coefficients. Int J Comput Vis. 2000;40(1):49–70. doi: 10.1023/A:1026553619983
  • Yu J-H, Sim K-B. Classification of color imagination using Emotiv EPOC and event-related potential in electroencephalogram. Optik. 2016;127(20):9711–9718. doi: 10.1016/j.ijleo.2016.07.074
  • Hajonides J, van Ede F, Nobre K, et al. Parametric decoding of visual colour from contralateral scalp electroencephalography. J Vis. 2020;20(11):1139–1139. doi: 10.1167/jov.20.11.1139
  • Rasheed S. Recognition of primary colours in electroencephalograph signals using support vector machines. Università degli Studi di Milano. 2011.
  • Pasupathy A, Kim T, Popovkina DV. Object shape and surface properties are jointly encoded in mid-level ventral visual cortex. Curr Opin Neurobiol. 2019;58:199–208. doi: 10.1016/j.conb.2019.09.009
  • Bannert MM, Bartels A. Human v4 activity patterns predict behavioral performance in imagery of object color. J Neurosci. 2018;38(15):3657–3668. doi: 10.1523/JNEUROSCI.2307-17.2018
  • Wu Y, Zeng X, Feng K, et al. (2022). Decoding human visual colour eeg information using machine learning and visual evoked potentials.
  • Torres-Garca AA, Molinas M. Analyzing the recognition of color exposure and imagined color from EEG signals. In 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE); Athens, Greece. IEEE; 2019. p. 386–391.
  • Erlikhman G, Gurariy G, Mruczek RE, et al. The neural representation of objects formed through the spatiotemporal integration of visual transients. Neuroimage. 2016;142:67–78. doi: 10.1016/j.neuroimage.2016.03.044
  • de Wit LH, Kubilius J, Wagemans J, et al. Bistable gestalts reduce activity in the whole of v1, not just the retinotopically predicted parts. J Vis. 2012;12(11):12–12. doi: 10.1167/12.11.12
  • Bracci S, de Beeck HO. Dissociations and associations between shape and category representations in the two visual pathways. J Neurosci. 2016;36(2):432–444. doi: 10.1523/JNEUROSCI.2314-15.2016
  • Ayzenberg V, Kamps FS, Dilks DD, et al. Skeletal representations of shape in the human visual cortex. Neuropsychologia. 2022;164:108092. doi: 10.1016/j.neuropsychologia.2021.108092
  • Lescroart MD, Biederman I. Cortical representation of medial axis structure. Cerebral Cortex. 2013;23(3):629–637. doi: 10.1093/cercor/bhs046
  • Weiner KS, Zilles K. The anatomical and functional specialization of the fusiform gyrus. Neuropsychologia. 2016;83:48–62. doi: 10.1016/j.neuropsychologia.2015.06.033
  • Weiner KS, Natu VS, Grill-Spector K. On object selectivity and the anatomy of the human fusiform gyrus. Neuroimage. 2018;173:604–609. doi: 10.1016/j.neuroimage.2018.02.040
  • Grill-Spector K, Weiner KS. The functional architecture of the ventral temporal cortex and its role in categorization. Nat Rev Neurosci. 2014;15(8):536–548. doi: 10.1038/nrn3747
  • Palermo L, Boccia M, Piccardi L, et al. Congenital lack and extraordinary ability in object and spatial imagery: an investigation on sub-types of aphantasia and hyperphantasia. Conscious Cogn. 2022;103:103360. doi: 10.1016/j.concog.2022.103360
  • Hermann K, Chen T, Kornblith S. The origins and prevalence of texture bias in convolutional neural networks. Adv Neural Inf Process Syst. 2020;33:19000–19015.
  • Lindsay GW. Convolutional neural networks as a model of the visual system: past, present, and future. J Cognitive Neurosci. 2021;33(10):2017–2031. doi: 10.1162/jocn_a_01544
  • Freeman J, Ziemba CM, Heeger DJ, et al. A functional and perceptual signature of the second visual area in primates. Nat Neurosci. 2013;16(7):974–981.
  • Coggan DD, Watson DM, Wang A, et al. The representation of shape and texture in category-selective regions of ventral-temporal cortex. Eur J Neurosci. 2022;56(3):4107–4120.
  • Vacher J, Briand T. The portilla-simoncelli texture model: towards understanding the early visual cortex. Image Processing On Line. 2021;11:170–211. doi: 10.5201/ipol.2021.324
  • Wakita S, Orima T, Motoyoshi I. Photorealistic reconstruction of visual texture from eeg signals. Front Comput Neurosci. 2021;15:15. doi: 10.3389/fncom.2021.754587
  • Orima T, Motoyoshi I. Analysis and synthesis of natural texture perception from visual evoked potentials. Front Neurosci. 2021;15:876. doi: 10.3389/fnins.2021.698940
  • Kumar P, Saini R, Roy PP, et al. Envisioned speech recognition using EEG sensors. Pers Ubiquitous Comput. 2018;22(1):185–199.
  • Nemrodov D, Niemeier M, Patel A, et al. The neural dynamics of facial identity processing: insights from EEG-based pattern analysis and image reconstruction. eNeuro. 2018;5(1):ENEURO.0358–17.2018.
  • Shatek SM, Grootswagers T, Robinson AK, et al. Decoding images in the mind’s eye: the temporal dynamics of visual imagery. Vision. 2019;3(4):53. doi: 10.3390/vision3040053
  • Kosmyna N, Lindgren JT, Lécuyer A. Attending to visual stimuli versus performing visual imagery as a control strategy for EEG-based brain-computer interfaces. Sci Rep. 2018;8(1):1–14. doi: 10.1038/s41598-018-31472-9
  • Spampinato C, Palazzo S, Kavasidis I, et al. Deep learning human mind for automated visual classification. In Proceedings of the IEEE conference on computer vision and pattern recognition; Honolulu, HI, USA; 2017. p. 6809–6817.
  • Rashkov G, Bobe A, Fastovets D, et al. Natural image reconstruction from brain waves: a novel visual bci system with native feedback. bioRxiv. 2019;787101.
  • Ghuman AS, Brunet NM, Li Y, et al. Dynamic encoding of face information in the human fusiform gyrus. Nat Commun. 2014;5(1):1–10.
  • Kanwisher N, McDermott J, Chun MM. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J Neurosci. 1997;17(11):4302–4311. doi: 10.1523/JNEUROSCI.17-11-04302.1997
  • Aminoff EM, Kveraga K, Bar M. The role of the parahippocampal cortex in cognition. Trends Cogn Sci. 2013;17(8):379–390. doi: 10.1016/j.tics.2013.06.009
  • Kriegeskorte N, Mur M, Ruff DA, et al. Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron. 2008;60(6):1126–1141. doi: 10.1016/j.neuron.2008.10.043
  • Tirupattur P, Rawat YS, Spampinato C, et al. Thoughtviz: visualizing human thoughts using generative adversarial network. In Proceedings of the 26th ACM international conference on Multimedia; Seoul, Korea; 2018. p. 950–958.
  • Jolly BLK, Aggrawal P, Nath SS, et al. Universal EEG encoder for learning diverse intelligent tasks. In 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM); Singapore. IEEE; 2019. p. 213–218.
  • Zheng X, Chen W, Li M, et al. Decoding human brain activity with deep learning. Biomedical Signal Processing And Control. 2020a;56:101730. doi: 10.1016/j.bspc.2019.101730
  • Fares A, Zhong S-H, Jiang J. Eeg-based image classification via a region-level stacked bi-directional deep learning framework. BMC Med Inform Decis Mak. 2019;19(6):1–11. doi: 10.1186/s12911-019-0967-9
  • Fares A, Zhong S-H, Jiang J. Brain-media: a dual conditioned and lateralization supported gan (dcls-gan) towards visualization of image-evoked brain activities. In Proceedings of the 28th ACM International Conference on Multimedia; Seattle, WA, USA; 2020. p. 1764–1772.
  • Jiao Z, You H, Yang F, et al. Decoding EEG by Visual-guided Deep Neural Networks. IJCAI. 2019;28:1387–1393.
  • Li D, Du C, He H. Semi-supervised cross-modal image generation with generative adversarial networks. Pattern Recognition. 2020a;100:107085. doi: 10.1016/j.patcog.2019.107085
  • Mukherjee P, Das A, Bhunia AK, et al. Cogni-net: cognitive feature learning through deep visual perception. In 2019 IEEE International Conference on Image Processing (ICIP); Taipei, Taiwan. IEEE; 2019. p. 4539–4543.
  • Salimans T, Goodfellow I, Zaremba W, et al. Improved techniques for training gans. 30th Conference on Neural Information Processing Systems (NIPS 2016); Barcelona, Spain; 2016. p. 29.
  • Palazzo S, Spampinato C, Kavasidis I, et al. Generative adversarial networks conditioned by brain signals. In Proceedings of the IEEE international conference on computer vision; Venice, Italy; 2017. p. 3410–3418.
  • Touryan J, Apker G, Lance BJ, et al. Estimating endogenous changes in task performance from eeg. Front Neurosci. 2014;8:155. doi: 10.3389/fnins.2014.00155
  • Wascher E, Rasch B, Sänger J, et al. Frontal theta activity reflects distinct aspects of mental fatigue. Biol Psychol. 2014;96:57–65. doi: 10.1016/j.biopsycho.2013.11.010
  • Gardner JL, Merriam EP, Movshon JA, et al. Maps of visual space in human occipital cortex are retinotopic, not spatiotopic. J Neurosci. 2008;28(15):3988–3999.
  • Güçlütürk Y, Güçlü U, Seeliger K, et al. Reconstructing perceived faces from brain activations with deep adversarial neural decoding. Adv Neural Inf Process Syst. 2017;30:4246–4257.
  • VanRullen R, Reddy L. Reconstructing faces from fMRI patterns using deep generative neural networks. Commun Biol. 2019;2(1):1–10. doi: 10.1038/s42003-019-0438-y
  • Kwon M, Han S, Kim K, et al. Super-resolution for improving eeg spatial resolution using deep convolutional neural network–feasibility study. Sensors. 2019;19(23):5317. doi: 10.3390/s19235317
  • Creel DJ. Visually evoked potentials. Vol. 160, Handbook of Clinical Neurology. 2016. p. 501–522.
  • Seeber M, Cantonas L-M, Hoevels M, et al. Subcortical electrophysiological activity is detectable with high-density eeg source imaging. Nat Commun. 2019;10(1):1–7.
  • van de Nieuwenhuijzen ME, Backus A, Bahramisharif A, et al. Meg-based decoding of the spatiotemporal dynamics of visual category perception. Neuroimage. 2013;83:1063–1073. doi: 10.1016/j.neuroimage.2013.07.075
  • Proverbio AM, Tacchini M, Jiang K. What do you have in mind? erp markers of visual and auditory imagery. Brain Cogn. 2023;166:105954. doi: 10.1016/j.bandc.2023.105954
  • Amano K, Goda N, Nishida S, et al. Estimation of the timing of human visual perception from magnetoencephalography. J Neurosci. 2006;26(15):3981–3991.
  • Shad EHT, Molinas M, Ytterdal T. Impedance and noise of passive and active dry eeg electrodes: a review. IEEE Sens J. 2020;20(24):14565–14577. doi: 10.1109/JSEN.2020.3012394
  • Iqbal A, Dong P, Kim CM, et al. Decoding neural responses in mouse visual cortex through a deep neural network. In 2019 International Joint Conference on Neural Networks (IJCNN); Budapest, Hungary. IEEE; 2019. p. 1–7.
  • Date H, Kawasaki K, Hasegawa I, et al. Deep learning for natural image reconstruction from electrocorticography signals. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); San Diego, CA, USA. IEEE; 2019. p. 2331–2336.
  • Kapeller C, Ogawa H, Schalk G, et al. Real-time detection and discrimination of visual perception using electrocorticographic signals. J Neural Eng. 2018;15(3):036001. doi: 10.1088/1741-2552/aaa9f6
  • Musk E. An integrated brain-machine interface platform with thousands of channels. J Med Internet Res. 2019;21(10):e16194.
  • Pandarinath C, Nuyujukian P, Blabe CH, et al. High performance communication by people with paralysis using an intracortical brain-computer interface. Elife. 2017;6:e18554. doi: 10.7554/eLife.18554
  • Vansteensel MJ, Pels EG, Bleichner MG, et al. Fully implanted brain–computer interface in a locked-in patient with als. N Engl J Med. 2016;375(21):2060–2066.
  • Willett FR, Avansino DT, Hochberg LR, et al. High-performance brain-to-text communication via handwriting. Nature. 2021;593(7858):249–254.
  • Huang W, Zhang P, Yu T, et al. A p300-based bci system using stereoelectroencephalography and its application in a brain mechanistic study. IEEE Trans Biomed Eng. 2020;68(8):2509–2519. doi: 10.1109/TBME.2020.3047812
  • Burwell S, Sample M, Racine E. Ethical aspects of brain computer interfaces: a scoping review. BMC Med Ethics. 2017;18(1):1–11. doi: 10.1186/s12910-017-0220-y
  • Zeman A, Milton F, Della Sala S, et al. Phantasia–the psychological significance of lifelong visual imagery vividness extremes. Cortex. 2020;130:426–440. doi: 10.1016/j.cortex.2020.04.003
  • dos Santos RG, Enyart S, Bouso JC, et al. “ayahuasca turned on my mind’s eye”: Enhanced visual imagery after ayahuasca intake in a man with “blind imagination”(aphantasia). J Psychedelic Stud. 2018;2(2):74–77. doi: 10.1556/2054.2018.008
  • Dawes AJ, Keogh R, Andrillon T, et al. A cognitive profile of multi-sensory imagery, memory and dreaming in aphantasia. Sci Rep. 2020;10(1):1–10.
  • Milton F, Fulford J, Dance C, et al. Behavioral and neural signatures of visual imagery vividness extremes: aphantasia vs. hyperphantasia. Cereb Cortex Commun. 2020;2(2).
  • Marks DF. Phenomenological studies of visual mental imagery: a review and synthesis of historical datasets. Vision. 2023;7(4):67. doi: 10.3390/vision7040067
  • Li R, Johansen JS, Ahmed H, et al. Training on the test set? an analysis of spampinato et al.[31]. arXiv preprint arXiv:1812.07697. 2018.
  • Li R, Johansen JS, Ahmed H, et al. The perils and pitfalls of block design for EEG classification experiments. IEEE Trans Pattern Anal Mach Intell. 2020b;43(1):316–333. doi: 10.1109/TPAMI.2020.2973153
  • Palazzo S, Spampinato C, Schmidt J, et al. Correct block-design experiments mitigate temporal correlation bias in EEG classification. arXiv preprint arXiv:2012.03849. 2020.
  • Cudlenco N, Popescu N, Leordeanu M. Reading into the mind’s eye: boosting automatic visual recognition with EEG signals. Neurocomputing. 2020;386:281–292. doi: 10.1016/j.neucom.2019.12.076
  • Rekrut M, Sharma M, Schmitt M, et al. Decoding semantic categories from EEG activity in object-based decision tasks. In 2020 8th International Winter Conference on Brain-Computer Interface (BCI); Gangwon, South Korea. IEEE; 2020. p. 1–7.
  • Blamire AM, Ogawa S, Ugurbil K, et al. Dynamic mapping of the human visual cortex by high-speed magnetic resonance imaging. Proc Nat Acad Sci. 1992;89(22):11069–11073. doi: 10.1073/pnas.89.22.11069
  • Liu TT. The development of event-related fMRI designs. Neuroimage. 2012;62(2):1157–1162. doi: 10.1016/j.neuroimage.2011.10.008
  • Friston KJ. Functional and effective connectivity in neuroimaging: a synthesis. Human Brain Mapp. 1994;2(1–2):56–78. doi: 10.1002/hbm.460020107
  • Alharbi ET, Rasheed S, Buhari SM. Feature selection algorithm for evoked EEG signal due to RGB colors. In 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI); Datong, China. IEEE; 2016. p. 1503–1520.
  • Martinez-Conde S, Macknik SL, Troncoso XG, et al. Microsaccades counteract visual fading during fixation. Neuron. 2006;49(2):297–305. doi: 10.1016/j.neuron.2005.11.033
  • Wynn JS, Ryan JD, Buchsbaum BR. Eye movements support behavioral pattern completion. Proc Nat Acad Sci. 2020;117(11):6246–6254. doi: 10.1073/pnas.1917586117
  • Thielen J, Bosch SE, van Leeuwen TM, et al. Evidence for confounding eye movements under attempted fixation and active viewing in cognitive neuroscience. Sci Rep. 2019;9(1):1–8. doi: 10.1038/s41598-019-54018-z
  • Kaneshiro B, Perreau Guimaraes M, Kim H-S, et al. A representational similarity analysis of the dynamics of object processing using single-trial EEG classification. PLoS One. 2015;10(8):e0135697. doi: 10.1371/journal.pone.0135697
  • Khasnobish A, Konar A, Tibarewala D, et al. Object shape recognition from EEG signals during tactile and visual exploration. In International Conference on Pattern Recognition and Machine Intelligence; Kolkata, India. Springer; 2013. p. 459–464.
  • Zheng X, Chen W, You Y, et al. Ensemble deep learning for automated visual classification using EEG signals. Pattern Recognition. 2020b;102:107147. doi: 10.1016/j.patcog.2019.107147
  • Ling S, Lee AC, Armstrong BC, et al. EEG-based decoding of visual words from perception and imagery. J Vis. 2019;19(10):33–33. doi: 10.1167/19.10.33
  • Bone MB, St-Laurent M, Dang C, et al. Eye movement reinstatement and neural reactivation during mental imagery. Cerebral Cortex. 2019;29(3):1075–1089.
  • Laeng B, Teodorescu D-S. Eye scanpaths during visual imagery reenact those of perception of the same visual scene. Cogn Sci. 2002;26(2):207–231. doi: 10.1207/s15516709cog2602_3
  • Mast FW, Kosslyn SM. Eye movements during visual mental imagery. Trends Cogn Sci. 2002;6(7):271–272. doi: 10.1016/S1364-6613(02)01931-9
  • Wynn JS, Shen K, Ryan JD. Eye movements actively reinstate spatiotemporal mnemonic content. Vision. 2019;3(2):21. doi: 10.3390/vision3020021
  • Borst G, Ganis G, Thompson WL, et al. Representations in mental imagery and working memory: evidence from different types of visual masks. Mem Cognit. 2012;40(2):204–217. doi: 10.3758/s13421-011-0143-7
  • Avons S, Sestieri C. Dynamic visual noise: no interference with visual short-term memory or the construction of visual images. Eur J Cognit Psychol. 2005;17(3):405–424. doi: 10.1080/09541440440000104
  • Vasques R, Garcia RB, Galera C. Short-term memory recall of visual patterns under static and dynamic visual noise. Psychology & Neuroscience. 2016;9(1):46. doi: 10.1037/pne0000039
  • Krumpe T, Baumgaertner K, Rosenstiel W, et al. Non-stationarity and Inter-subject variability of EEG characteristics in the context of BCI development. GBCIC. 2017;7:260–265.
  • Mikołajczyk A, Grochowski M. Data augmentation for improving deep learning in image classification problem. In 2018 international interdisciplinary PhD workshop (IIPhDW); Swinoujscie, Poland. IEEE; 2018. p. 117–122.
  • LEE JS, O LEE. Ctgan vs tgan? which one is more suitable for generating synthetic eeg data. J Theor Appl Inf Technol. 2021;99(10):2359–2372.
  • Lee HK, Choi Y-S. A convolution neural networks scheme for classification of motor imagery EEG based on wavelet time-frequecy image. In 2018 International Conference on Information Networking (ICOIN); Chiang Mai, Thailand. IEEE; 2018. p. 906–909.
  • Lashgari E, Liang D, Maoz U. Data augmentation for deep-learning-based electroencephalography. J Neurosci Methods. 2020;346:108885. doi: 10.1016/j.jneumeth.2020.108885
  • Krell MM, Kim SK. Rotational data augmentation for electroencephalographic data. In 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); Jeju Island, South Korea. IEEE; 2017. p. 471–474.
  • Haradal S, Hayashi H, Uchida S. Biosignal data augmentation based on generative adversarial networks. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); Honolulu, Hawaii, USA. IEEE; 2018. p. 368–371.
  • Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. Adv Neural Inf Process Syst. 2014;2:27.
  • Kazeminia S, Baur C, Kuijper A, et al. Gans for medical image analysis. Artif Intell Med. 2020;109:101938. doi: 10.1016/j.artmed.2020.101938
  • Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. In International conference on machine learning; Sydney, Australia. PMLR; 2017. p. 214–223.
  • Luo Y, Lu B-L. EEG data augmentation for emotion recognition using a conditional Wasserstein GAN. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); Honolulu, Hawaii, USA. IEEE; 2018. p. 2535–2538.
  • Corley IA, Huang Y. Deep EEG super-resolution: upsampling EEG spatial resolution with generative adversarial networks. In 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI); Las Vegas, NV, USA. IEEE; 2018. p. 100–103.
  • Fahimi F, Dosen S, Ang KK, et al. Generative adversarial networks-based data augmentation for brain-computer interface. IEEE transactions on neural networks and learning systems. 2020;32:4039–4051.
  • Arora S, Ge R, Liang Y, et al. Generalization and equilibrium in generative adversarial nets (gans). In International Conference on Machine Learning; Sydney, Australia. PMLR; 2017. p. 224–232.
  • Heusel M, Ramsauer H, Unterthiner T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv Neural Inf Process Syst. 2017;30.
  • Nowozin S, Cseke B, Tomioka R. f-gan: Training generative neural samplers using variational divergence minimization. In Proceedings of the 30th International Conference on Neural Information Processing Systems; Barcelona, Spain; 2016. p. 271–279.
  • Cichy RM, Oliva A. Am/eeg-fmri fusion primer: resolving human brain responses in space and time. Neuron. 2020;107(5):772–781. doi: 10.1016/j.neuron.2020.07.001
  • Muukkonen I, Ölander K, Numminen J, et al. Spatio-temporal dynamics of face perception. Neuroimage. 2020;209:116531. doi: 10.1016/j.neuroimage.2020.116531
  • Berezutskaya J, Vansteensel MJ, Aarnoutse EJ, et al. Open multimodal ieeg-fmri dataset from naturalistic stimulation with a short audiovisual film. Sci Data. 2022;9(1):1–13. doi: 10.1038/s41597-022-01173-0
  • Cooney C, Folli R, Coyle D. A bimodal deep learning architecture for eeg-fnirs decoding of overt and imagined speech. IEEE Trans Biomed Eng. 2021;69(6):1983–1994. doi: 10.1109/TBME.2021.3132861
  • Daly I. Neural decoding of music from the eeg. Sci Rep. 2023;13(1):624. doi: 10.1038/s41598-022-27361-x
  • Naselaris T, Kay KN, Nishimoto S, et al. Encoding and decoding in fMRI. Neuroimage. 2011;56(2):400–410. doi: 10.1016/j.neuroimage.2010.07.073
  • Legewie H, Simonova O, Creutzfeldt O. Eeg changes during performance of various tasks under open-and closed-eyed conditions. Electroencephalogr Clin Neurophysiol. 1969;27(5):470–479. doi: 10.1016/0013-4694(69)90187-4