Free Cognitive Capacity Assessed by the P300 Method During Manual Docking Training in Space
INTRODUCTION: The classical P300 brain potential method was used to assess the cognitive capacity during training of manual docking in space. The aim of the study was to enhance the safety of this operation during a mission. METHODS: To examine this, N = 8 cosmonauts had to perform the manually controlled docking task simultaneously with an acoustic monitoring task. The P300 component was evoked by the acoustic stimuli of the secondary task. The docking task had to be executed at three difficulty levels: low (station not turning); medium (station turning around one axis); and difficult (station turning around three axes). In the secondary task, subjects had to discriminate between a low and a high tone, which occurred with a probability of 90% and 10%, respectively. Subjects had to count the high tones. After the 10th high tone, they had to inspect the power supply by giving an oral command. RESULTS: A methodology for event-related potentials was successfully demonstrated under space conditions. The P300 amplitude was largest and the latency shortest during the medium difficult task. DISCUSSION: The results suggest that P300 can be recorded during the complex manual docking task in space and could be used to assess individual available cognitive capacity of cosmonauts during a space mission. Bubeev JA, Johannes B, Kotrovska TI, Schastlivtseva D, Bronnikov S, Hoermann H-J, Gaillard AWK. Free cognitive capacity assessed by the P300 method during manual docking training in space. Aerosp Med Hum Perform. 2024; 95(4):187–193.
Occasionally, we are reminded that human spaceflight is far from being routine. For example, when in 1997 the Soyuz TM-25 spacecraft with the German astronaut Reinhold Ewald on board was approaching the Mir space station, a few meters before contact the automated control system was disengaged because of a misalignment with the docking port. The commander took control and performed a successful manual docking. British astronauts Helen Sharman on board Soyuz TM-12 in 1991 and Timothy Peake with Soyuz TM-19M in 2015 had the same experience. The "Kurs" docking navigation system failed on the final approach and manual docking had to be performed to prevent accidental damage or a catastrophic outcome.26 These and further safety-related occurrences with the remotely operated Progress freighter28 confirm the significance of training cosmonauts and astronauts in manually controlled docking operations. Manually controlled docking of a spacecraft is a challenging and critical maneuver during space missions18 because the cosmonaut/astronaut has to control the spacecraft’s movements with six degrees of freedom. Three control devices and a two-dimensional display serve this purpose.11 Moreover, Salnitski24,25 observed during the Russian Mir missions that manual control performance eroded due to lack of training over 60 mission days such that the safety of a docking maneuver could be jeopardized. Therefore, Manzey16 recommended continuous performance monitoring in space. Psychophysiological measurement, such as the electroencephalogram (EEG), have proven sensitivity to levels of attention and cognitive activities of humans.19 An assessment of theta waves with the EEG in space13 indicated differences between mission phases and was related to decreased alertness even during a simulated docking task in space. Herein we focused on another EEG approach—assessing event-related potentials (ERPs).
Reliable human performance requires extra cognitive resources that can be devoted to unexpected demands which might occur during a task execution.6,12,27 Several EEG methods have been applied in secondary task settings on the ground.1,7,15 To gauge free cognitive capacity, a secondary task was added to the main task, which evoked a positive ERP around 300 ms, called P300 or P3.22 A comprehensive summary of studies examining ERPs in relation to varied cognitive task demands can be found in Prinzel et al.23
In previous studies,8,10 we found that P300 indicated the free cognitive capacity during training of a spacecraft docking maneuver. We applied a secondary task4,17 that was realistically embedded into the docking maneuver scenario. As Bubeev et al.3 ascertained, the nature of the dual-task design is essential for the cosmonaut’s motivation to commit to such research. The cosmonaut’s enthusiasm increases with the perceived mission relevance of the tasks. Therefore, the secondary task was framed as part of adjusting solar panels toward the sun. In this study, we tested whether a secondary task implemented in a docking task with the space-certified equipment Neurolab-201011 evoked a reliable P300 under space conditions. A key condition for a reliable P300 assessment is a reliable EEG. The research was conducted as the final step of developing a space application after many years of preparatory international cooperation on the ground.
METHODS
Subjects
Eight male Russian cosmonauts participated in this study. They were between 34 and 54 yr old (mean 46 ± 7.3 SD) with a body mass index of 27 (± 2.7 SD) kg · m−2. This space experiment, “PILOT-T”, was approved both by the local internal review board (Institute of Biomedical Problems) and the Human Research Multilateral Review Board (for International Space Station experiments).
The German participation in the experiment was approved by the Ethics Committee of the Medical Association North-Rhine in Düsseldorf, Germany. Written informed consent was obtained from all subjects.
Procedure
Subjects were trained in manual docking according to the standard educational procedure in the beginning of their professional career as cosmonauts. The “easy” task simulates the standard setting of a manually controlled spacecraft docking maneuver at a space station. The spacecraft is located abeam the docking point, the subject was looking at the point of contact. The spacecraft had to be flown sidewards in a 90° curve, maintaining a safety distance until stabilized at the centerline, and finally approaching and docking port. In the standard setting the space station is fixed relative to the spacecraft. The experimental tasks were identical to earlier studies in previous publications.2,9 In the first two tasks the cosmonauts had to perform the easy task (data aggregated as “easy”), which required flying in the x-plane, resembling horizontal conditions on Earth. These tasks simulate realistic operational training; the station was stable, as for regular standard training of the cosmonauts. The “medium” task required flying in the x-y plane (vertical) and turning down. With respect to the degrees of freedom, the medium difficult task was physically identical to the easy task but mentally more demanding, because humans living on Earth are usually not familiar with such movements. In the “difficult” tasks the station was rotating, with fixed rotation speed and two rotating axes. In this third condition the target turned around the y-axis faster and additionally turned constantly around its x- and z-axes.
The docking performance was evaluated by a factor analytical approach, described in detail by Johannes et al.11 Exploratory and confirmatory factor models were verified in 10 independent subcohorts; the vector sums of the scores in underlying factors were adopted as integrated performance scores. A set of discriminant functions made these factor models applicable to the actual data.
A series of tones, in two different pitches (750 and 1000 Hz), were presented to the cosmonauts via headphones (presentation time 50 ms, volume 80 dB, interstimulus interval fixed 2.2 s). The relevant tone was higher than the irrelevant tone and occurred less often, 1 time out of 10 tones. To make the secondary task more authentic to the cosmonauts, we related the task to the monitoring of the battery power. Subjects had to count the high tones and switch to the solar panel system; when 10 high tones were detected, they had to give the voice command “переключить” (switch). We used voice commands to avoid interference with the manual docking task. The number of incorrect reactions was counted as errors in the secondary task. If the subjects reacted too early (for example already switching after the eighth tone), the error score would have the absolute value of “−2”. Vice versa, if they reacted too late (e.g., 12th tone), the error score was “2.”
The sampling plan was defined by the operational restrictions of spaceflights. The number of subjects from the space agency was eight. The number of experiments (sessions) corresponded to the standard of spaceflights. Preflight an additional training session without registrations was run. In this familiarization session the same conditions were given as in the experimental session. The primary goal was to familiarize the subjects with the secondary task, and with the use of voice commands. The experiment was run during the three flight phases: preflight (three sessions in the training center on the ground 14 d prior to departure with 3 intersession days), in flight (half year, one session each second week in space in the space station), and postflight (two sessions in the training center on the ground 1 and 3 d after landing).
In each session, the subjects had to perform five tasks, two with the easy condition, one with the medium condition, and two with the difficult condition. The data obtained in the two easy conditions and in the two difficult conditions were combined.
The Neurolab-2010 and the hand controls that resemble those used on the Russian Soyuz for docking on the International Space Station were developed and produced for space applications by Koralewski Industrie Elektronik oHG, Hambühren, Germany. The complete software package to manage the measurement systems was developed by SpaceBit GmbH, Eberswalde, Germany. The core module (which has been in space since 2015) controlled the entire communication with the experimental computer via USB interface and registered all electrophysiological parameters. Due to the limited sanitary conditions on the space station, cosmonauts often rejected wet electrodes. Therefore, we used dry electrodes. Either 8 or 19 electrodes (Standard 10-20) provided reliable EEG signals of high quality. The reference electrode was placed between Fp1 and Fp2, and the ground electrode was located between Fp1 and F7. Impedance was kept below 50 kΩ by applying drops of water. EEG was registered with a sample rate of 500 Hz, gain = 22,000, and 24-bit digitalization depth using an ADS 1299 chip from Texas Instruments at gain level 12 (22,878 digits/mV). Data were filtered with 0.5-Hz high pass and 20-Hz low pass frequencies. We had to exclude the very first dataset preflight due to technical problems.
Statistical Analyses
To develop an autonomous onboard analysis and feedback loop of the EEG results, the R system was used (R3.5.2, package nlme, R Foundation, Vienna, Austria). The EEG channels were examined for whether they successfully passed a signal quality check, including peak-to-peak moving window artifact detection (ERPLAB14). To obtain robust and reliable data, the ERPs were averaged across channels. Because the Cz-channel at the middle of the skull (Vertex) is a standard for ERP studies under laboratory conditions, these data were compared to the averaged ERPs of the other channels. For a more reliable P300 assessment difference waves (DWs) were calculated.5 The ERPs for the irrelevant tones were subtracted from the ERPs for the relevant tones. Finally, the EEG responses were averaged separately for relevant and irrelevant tones.6,21
The following measures of the P300 were obtained. Latency is the time between the presentation of a tone and the start of the P300 component (in a window of 200–450 ms), which indicates that our brain has discriminated between a high and a low tone; this is the start of the P300 and the end is when the two curves come together again. Amplitude is the maximum peak (µV) in a window of 200–500 ms. Magnitude is the area (µV) between the beginning of the P300 and the end of the P300 and the largest peak, similar to the amplitude in a window of 200–500 ms. The statistical analyses of the data were done with the SPSS IBM package (vs. 21; IBM, Armonk, NY, United States). Correlations between performance parameters were estimated by Spearman’s rho. Linear mixed effect models were developed to test the statistical significances of the independent variables flight phases and task difficulty as fixed effects.
RESULTS
The docking performance for the three flight phases is illustrated in Fig. 1. The accuracy of docking decreased at the higher difficulty levels for each of the flight phases. The accuracy pattern remained nearly the same across all three flight phases.
Citation: Aerospace Medicine and Human Performance 95, 4; 10.3357/AMHP.6192.2024
When the difficulty increased, the docking accuracy decreased (Spearman’s rho = −0.337, P < 0.001, Fig. 1); also, the performance accuracy was significantly different between flight phases [F(num:2, denum: 354.781) = 3.935, P = 0.020] and levels of difficulty [F(num: 2, denum: 353.539) = 13.071, P < 0.001].
In flight, 121 responses to the secondary task were counted for all 134 flights. Fig. 2 presents the number of errors in the secondary task, switching the system “too early” (16.54%) or “too late” (13.81%). Most responses (69.65%) were made in time.
Citation: Aerospace Medicine and Human Performance 95, 4; 10.3357/AMHP.6192.2024
The number of errors in the secondary task clearly increased with task difficulty. In general, subjects tended to react too early. The low number in the “correct” category is due to the coding of “correct" errors with “0.” In general, subjects tended to react too early (Fig. 3).
Citation: Aerospace Medicine and Human Performance 95, 4; 10.3357/AMHP.6192.2024
Secondary task performance was not correlated with docking accuracy. The secondary task performance was extremely high; the task may have been too easy for the subjects. The number of errors, too early or too late, increased with docking difficulty. However, the accuracy differences were not correlated with differences in secondary task performance.
We had to exclude the very first dataset, preflight, due to technical problems. The effects of combining EEG channels were analyzed. A comparison of ERPs which were averaged over 8 or 19 channels with those which were obtained from the Cz alone showed sufficient similarity. An averaged correlation of 0.94 was found preflight for the standard task (easy); in flight the respective averaged correlation was 0.89. However, postflight measures were worse and not significantly correlated.1
Fig. 4A, B, and C present the EEG DWs. The ERPs to the lower tones were subtracted from the higher tones5 separately for the three flight phases: preflight, in flight, and postflight, and during easy, medium, and difficult docking tasks. Table I presents the averaged DW magnitude values of the P300. In general, the DWs were similar among flight phases as well as among difficulties.
Citation: Aerospace Medicine and Human Performance 95, 4; 10.3357/AMHP.6192.2024
The latencies were assessed in the area between 200–450 ms. P300 amplitudes and magnitudes were measured in a larger window (200–500 ms).
Magnitude is an area measure. The differences between difficulties as well as between flight phases were tested for significance. For the EEG data the most popular analysis of variance could not be applied because the premise of the normal distribution of the input data could not be statistically confirmed. Therefore, we decided to use nonparametric tests without the limitation of distribution. For a dependent data analysis (Friedman test), the data had to be averaged by subject, condition, and measurement points to have equal amounts of data. The independent analysis (Kruskal-Wallis) will ignore the subject’s dependency. However, all three analysis types provided highly significant results confirming each other. So, we will present only a summarizing verbal evaluation.
The P300 magnitude differed highly significantly (P < 0.001) between the difficulty levels during preflight and in flight, but not at all during postflight experiments. It also differed highly significantly between all flight phases. Between in-flight and postflight experiments the significance level was slightly lower: P = 0.004 (still highly significant). Reducing the information to a set of parameters per P300 reduces the statistical power so drastically that nearly no significances were left in all comparisons except for the P300-slope.
The number of stimuli depended on the individual flight duration and differed between task difficulties and subjects. The average number of ERPs for relevant stimuli was 124 for the easy tasks, 71 for the medium tasks, and 150 for the difficult tasks.
The P300 of the DWs were visually existent in all three task conditions. However, the P300 magnitude did not differ significantly between difficulties as a single value per ERP. There were also no differences in latency between task difficulties or between Flight Phases. For the slopes, we found a significant (P = 0.046) difference among difficulties, indicating a steeper slope during the medium difficult tasks. No slope-differences were found among flight phases.
DISCUSSION
This experiment was the latest attempt in a Russian-German Joint project to use in-space electroencephalographic measures for diagnostic purposes for workability of cosmonauts during a mission. The data were obtained with three generations of a system named “Neurolab-B,” “Neurolab-2000,” and “Neurolab-2010.” The first joint experiment in space was realized in 1996.
The main result of the herein presented experiment are not any statistically surprising new relations between, e.g., performance and physiological correlates, but the successful application of a P300 methodology in space during a highly complex operational task, first applied in 2008. This may become of high relevance in future attempts to evaluate cosmonauts’ readiness and proficiency while manually docking a spacecraft in space on a station, eventually very autonomously and far from Earth, supported by onboard expert systems.
EEG signals are highly sensitive to several environmental and behavioral factors. By averaging of channels, it was intended to accumulate the reliable variations across all data. We compared our “averaged” channels with the standard Cz channel. The results confirmed a wide range of variability, but also a good chance to substitute the single channel measure Cz by an averaged, more stable measurement. With this presented EEG methodology, we found several important results which provide information on human’s workability, especially as an indicator of available cognitive capacity. This could be used, for example, as a quick first feedback to the operators about their current performance and mental state.
Overall, the acquisition and analysis of these data were successful. Despite the methodological restrictions in space, it was possible to apply the P300 methodology to assess free cognitive capacity during docking training. Our study confirms that the method of combining dual tasks and evoked brain potentials is suitable for assessing an operator’s cognitive spare capacity in operationally relevant tasks under laboratory terrestrial conditions20 as well as in space. On Earth several studies demonstrated that P300 was affected by the difficulty of the primary task.6,12 The same was observed in space.
Real docking in space is a prime example of a professional task where poor operator performance is likely to result in catastrophic consequences. Self-evaluation alone is not sufficient for judging one’s readiness for proper task performance.
Usually, P300 is visually inspected and manually analyzed. Such an approach is time consuming and cannot be used to provide immediate feedback regarding free cognitive capacity for cosmonauts in space. Instead, software tools based on intraindividual statistical analyses could provide immediate feedback on board in the context of performing the docking task.
One limitation of our study is the limited number of subjects. However, the carried-out effort and the reported results recommend pursuing this research to further develop this prototype application for practical usage in space with an improved data analysis procedure. The present status of the automated data analysis already supported the scientific work, but still required too much interaction with the researcher.
The findings should be further confirmed in independent samples under real or simulated space conditions (such as isolation or bedrest studies). However, the experience gained from the initial approach seems promising. Some findings are of special interest. We did not find a linear relation between P300 occurrence and better performance, but an inverted V-shaped relation to difficulty with respective performance differences. Thus, smaller P300-amplitudes with either easy or very difficult conditions were found.
We could also not significantly verify a tradeoff between the primary and secondary task. Performance in the secondary task was constantly very high and did not vary. Thus, any correlational analysis was obsolete. We concluded that the reason is a ceiling effect of the performance data in this special cohort of well-trained professionals.
Overall, we suggest that P300 is a useful method for gauging an operator’s cognitive capacity during mission relevant tasks like the hand-controlled docking of a spacecraft on a space station. Decreased P300 magnitude could be an indicator for a lack of free cognitive capacity, which is needed for unexpected changes or events. Kramer et al.13 showed that during a simulated aircraft flight scenario the amplitude of P300 decreased with task difficulty, but only in well trained subjects. We propose that training should be continued until individuals demonstrate sufficient cognitive spare capacity by a clear P300 during the training. Thereby, reliability of this operation and mission safety would be enhanced. However, this study generally confirmed the high level of skills of cosmonauts in this very specific operational task—the manual docking maneuver.

Docking accuracy as a function of difficulty and flight phase.

Number of errors in the secondary task as a function of flight phase and task difficulty.

Type of error in the secondary task in relation to task difficulty.

EEG difference waves for the flight phases and levels of difficulty.
Contributor Notes