Editorial Type:
Article Category: Review Article
 | 
Online Publication Date: 01 Apr 2025

Methodologies Using Artificial Intelligence to Detect Cognitive Decrements in Aviation Environments

,
, and
Page Range: 327 – 338
DOI: 10.3357/AMHP.6555.2025
Save
Download PDF

INTRODUCTION: Despite significant advancements in aerospace engineering and safety protocols over the last decade, U.S. Naval mishap rates have remained essentially unchanged. This paper explores how researchers may leverage current artificial intelligence (AI) technologies to enhance aviation safety.

METHODS: A critical review was performed identifying aviation research protocols which have incorporated machine learning (ML) to enhance the accuracy of detecting common aviation hazards leading to cognitive decrements. The review proposes a three-step methodology for creating protocols to identify cognitive decrements in aviators: 1) sensor selection; 2) preprocessing techniques; and 3) ML algorithm development. Natural language processing was utilized to assist with the development of aviation-related denoising and ML algorithm tables.

RESULTS: Several psychophysiological biosensors, enhanced by ML modeling, show promise in identifying cognitive deficits secondary to fatigue, hypoxia, and spatial disorientation. The most cited biosensors integrated with ML models include electroencephalographic, electrocardiographic, and eye-tracking devices. The application of preprocessing techniques to biosensor data is a critical methodological step prior to applying ML algorithms for data training and classification. ML algorithms utilized were categorized into supervised, unsupervised, and semi-supervised types, often used in combination for more accurate predictions.

DISCUSSION: Current literature suggests that AI, when used in conjunction with various psychophysiological sensors, can predict and potentially mitigate common aeromedical hazards such as fatigue, spatial disorientation, and hypoxia in simulated settings. The miniaturization of preprocessing and ML algorithmic hardware is the next phase of transitioning AI to operational environments for real-time continuous monitoring.

Rice GM, Linnville S, Snider D. Methodologies using artificial intelligence to detect cognitive decrements in aviation environments. Aerosp Med Hum Perform. 2025; 96(4):327–338.

Over the last decade, rates of U.S. Naval aviation Class A mishaps for all platforms have remained essentially constant, averaging 1.41 per 100,000 flight hours (Fig. 1).1 Resultant loss of life and total cost of all naval aviation mishaps over this last decade has been 100 fatalities and over $8 billion.2 At the heart of these events lie human factors, which have consistently contributed to upwards of 80% of these mishaps.3 For the U.S. Navy, among the potentially detectable aeromedical preconditions for these human factors involving Class A, B, and C mishaps, the top three include fatigue, spatial disorientation (SD), and respiratory/physiological events3 (Fig. 2). The common denominator for these preconditions is a degradation of the aviator’s cognitive performance. How might aeromedical researchers leverage the use of artificial intelligence (AI) to combat the most common detectable human factors that contribute to aviation mishaps in real time?

Fig. 1.Fig. 1.Fig. 1.
Fig. 1. U.S. Navy Class A mishap rates FY2014Q1 – FY2024Q1.

Citation: Aerospace Medicine and Human Performance 96, 4; 10.3357/AMHP.6555.2025

Fig. 2.Fig. 2.Fig. 2.
Fig. 2. HFACS 8.0, leading aeromedical preconditions associated with Class A, B, and C Naval mishaps 2013-FY2024.

Citation: Aerospace Medicine and Human Performance 96, 4; 10.3357/AMHP.6555.2025

Generally, there are several ways AI may be broken down categorically into various classifications (Fig. 3). For example, Mukhamediev recently categorized AI into seven various classifications: machine learning (ML), natural language processing (NLP), planning, robotics, expert systems, speech, and vision recognition.4 The underlying foundation for each of these categories is the development of ML algorithms and computational models that enable machines to simulate intelligent behavior. Although the development of these algorithms is central to simulating and identifying the aviation hazard or precondition one seeks to model, there are other key steps in protocol development that are essential prior to using AI to mitigate a potential mishap.

Fig. 3.Fig. 3.Fig. 3.
Fig. 3. Subcategories of artificial intelligence (AI).

Citation: Aerospace Medicine and Human Performance 96, 4; 10.3357/AMHP.6555.2025

The objective of this paper is to provide a critical review of the available research integrating psychophysiological sensors with ML algorithms to detect cognitive decrements in pilots. Psychophysiological sensors, as the name implies, are biosensors that in some way convey the cognitive state of the subject through physiological measurement; for brevity purposes, these will be deemed “biosensors.” Several systematic reviews of the literature on the topic, specifically with regards to the identification of aviation hazards, suggest a three-step methodology by which researchers create the foundations of their protocols to identify cognitive decrements for aviators.57 These steps may be broken down to include: hazard identification/sensor selection; preprocessing or denoising techniques; and development of ML algorithms. Within the following paragraph, to illustrate this methodology for using AI to identify cognitive decrements in the cockpit, we will provide a synopsis of a recently published aeromedical protocol whose objective was to identify a precondition (hypoxia) that could conceivably result in cognitive decline and potentially a mishap.8,9

Realizing the need for real-time sensors to detect cognitive performance decrements in the cockpit, Rice et al., evaluated dry electroencephalograms’ (EEG) ability to detect hypoxia.8 As compared to wet EEG, dry EEG, as the name implies, does not require extensive preparation of the subject to connect and does not require transducer gel to improve signal transduction. Both advantages lend themselves to transitioning this technology to an operational environment. Their research suggested that a reduction in overall dry-EEG power could identify hypoxia in lieu of aviators not recognizing their own meaningful decreases in oxygen saturation and cognitive performance.8 Snider et al. advanced this work further by reducing the variance of the data sets through the preprocessing technique of principal component analysis (PCA) and then applying various ML algorithms, such as decision tree (DT), neural network (NN), and naïve Bayes (NB).9 By doing so, these researchers increased the sensitivity and specificity of dry-EEG technology to detect hypoxia to greater than 97%.

Utilizing the framework described in the previous aeromedical protocol example, which is consistent with the methodology of most protocols found within available systematic reviews on aviation safety and ML,57 we find three common steps in the aforementioned process of using AI to prevent aviation mishaps. These are: 1) biosensor selection; 2) preprocessing of data/denoising; and 3) ML algorithm development. Below in Fig. 4, a schematic process of AI’s use to identify hazards which may affect pilots’ cognitive performance and subsequently result in mishap is illustrated.

Fig. 4.Fig. 4.Fig. 4.
Fig. 4. Methodology steps of integrating artificial intelligence (AI) with psychophysiologic sensors.

Citation: Aerospace Medicine and Human Performance 96, 4; 10.3357/AMHP.6555.2025

Acknowledging that most aeromedical researchers may not be routinely exposed to denoising techniques, such as PCA, or ML algorithms, such as DT or NB, we have developed quick reference tables to orientate the reader as to their purpose when these terms arise (Tables I and II). The overarching goal of this review is that aeromedical practitioners may use this paper as a blueprint for future research involving AI to identify and mitigate cognitive performance decrements in the cockpit.

Table I. Denoising Techniques for Time Series Data.
Table I.
Table II. Description of the Three Major Types of Machine Learning.
Table II.

METHODS

For each of the above methodological steps, biosensor selection, preprocessing techniques, and ML algorithm development, we performed a critical review of the literature, identifying aviation-applicable citations that would provide the reader a basic conceptual understanding of how current researchers are integrating ML models with psychophysiological sensors to identify cognitive deficits.

Specifically, within “Preprocessing and Denoising Techniques” in the development of Table I, we utilized the NLP program Chat GPT v. 3.5 (San Francisco, CA, United States), with the query “Denoising techniques for EEG” as a starting point, and cross-referenced this list with published aviation-applicable references and protocols (see Supplement A, which can be found in the online version of this article). ChatGPT is an AI tool that responds to user questions and can handle a variety of tasks, making it more flexible than traditional AI systems that are designed for specific functions like face recognition or playing chess. In some ways, it mimics human thinking, known as artificial general intelligence. The information that is generated from ChatGPT is not the final product and often requires editing and cross-referencing. The supplement provided demonstrates the output provided to the user from ChatGPT and the editing and verification required to present this data in a scientifically valid format.

RESULTS

Biosensor Selection

There have been numerous biosensors utilized to assess cognitive states of pilots over the last decade.5,36 For example, EEG, electrocardiograms (ECG), galvanic stimulation recorder (GSR), near-infrared spectroscopy (NIRS), electrooculograms (EOG), electromyogram (EMG), and eye-tracking (ET) have all been used directly to monitor cognitive states or indirectly as surrogate markers of current or impending cognitive deficits.36 A preponderance of the recent research involving ML and aviation has relied upon noninvasive dry or wet EEG because of its temporal ability to monitor cognitive states directly in real time.5,37,38 Subsequently, this review will focus on EEG as the primary biosensor used in combination with ML, and to a lesser extent, other sensors such as ECG and ET will also be discussed.

As we will be focusing much of the discussion on the interpretation of EEG data, it is appropriate to provide a brief primer as to how EEG data is typically characterized. Classically, neuroscientists describe the various frequencies of brainwaves from highest to lowest as gamma (γ), beta (β), alpha (α), theta (θ), and delta (δ).39,40 The frequencies have been correlated with various levels of cognitive functioning: γ (38–100 Hz) associated with high levels of cognitive processing; β (16–38 Hz) associated with alertness and concentration; α (8–16 Hz) associated with relaxation and calmness; θ (3–8 Hz) associated with meditation and presleep states; and δ (1–3 Hz) associated with deep sleep and cognitive disorders.40,41 These ranges vary slightly upon which references you read; however, in general, they tend to be consistent at identifying predominant cognitive states. A systematic review of the current research involving EEG indices to access cognitive human performance suggests that the power of these individual frequencies, and to a lesser degree, the amplitudes of event-related potentials (ERPs: i.e., stimuli-induced, millisecond “snapshots” of EEG), are the primary features of EEGs extracted to identify performance decrements.37 These spatiotemporal changes in EEG frequency, power, and amplitude can be exploited and introduced within ML models to accurately predict in real time the mental states of those monitored. Moreover, there are several types of EEG sensors that have been regularly used for research purposes which are cited in the literature. A nonexhaustive list of these products was recently compiled using Google Scholar by Liu in his most recent review of cognitive neuroscience and robotics and updated for this paper (Table III).38

Table III. Summary of Popular EEG Headsets (Statistics 7/15/2024).
Table III.

One of the first papers to enhance the interpretation of biosensor data with ML algorithms was performed by Harrivel et al.42 Noting that most commercial aviation accidents were due to a loss of flight crew airplane state of awareness, her team evaluated attention-related human performance limiting states (AHPLS) of 24 commercial pilots with multimodal psychophysiological sensing. Extracting features from five different biological sensing modalities [EEG, heart rate variability (HRV), ECG, respiration, and GSR], they identified unique indices of attention for each modality and subsequently trained ML algorithms to accurately identify AHPLS. Specifically for EEG, they extracted power spectral density (PSD) estimates for the various brain wave frequencies 1–40 Hz and their corresponding EEG channels. PSD, as the name implies, refers to the distribution of power into individual frequency components of the signal, and has been used extensively in neuroscience research to better classify epileptic seizures.43 The selected features of each sensor modality in Harrival’s study were then trained on four ML models: 1) random forest (RF); 2) gradient boosting (GB); 3) Nu-Support Vector Machine (Nu-SVM); and 4) polynomial kernals. They noted that using the combination of EEG, respiration, and GSR features provided them the best accuracy at determining AHPLS in their study population.42 Although groundbreaking with regards to augmenting biosensor data with ML, specifications about EEG features extracted were limited to the mention of PSD and wavelet decomposition without any mention of which EEG bandwidths were of particular importance or any preprocessing techniques being employed.

Evolving the methodology of identifying fatigue states with ML, Masse introduced the identification of inattentional deafness in the form of alarm omission and alarm detection.44 Their methodology employed preprocessing techniques of bandpass filtering and independent component analysis (ICA) to reject eye and muscle artifacts. Additionally, PSD was obtained for each brain frequency bandwidth for corresponding time-frequency analysis of omission “hits” or omission errors. They identified that the PSD of γ and mid-wavelength β frequencies tended to be higher for subjects who did not omit auditory alarm alerts. Quantifying these power spectrum differences between frequencies for those who omitted alarm alerts and those who did not, they developed ML models that could accurately identify those individuals who would have omission errors at 74.6% across their study population, with a maximum accuracy of 90.4% for one individual.44

Lee et al., in addition to quantifying the various PSD for each EEG bandwidth, utilized the amplitude of the individual frequency bandwidths from aviators as input for his ML models.45 Combining both the spatial-temporal features of EEG data within layers of convoluted neural network (CNN) ML models and stacking them in front of a long short-term memory (LSTM) ML model, they were able to accurately identify multiple abnormal mental states, such as high workload, low workload, low distraction, high distraction, high fatigue, and low fatigue. Their abnormal mental states were objectively quantified by: complexity of flight operations for workload; counting the number of words in the ATC message while maintaining the predefined conditions of the aircraft for distraction; and for fatigue, the subjects fed the Karolinska sleepiness scale (KSS) as input, which is estimated as the significant index of subjective drowsiness level. Their hybrid ML model “Mentalnet” achieved a 68.8% accuracy, demonstrating the utility of combining various layers of ML models.

Concerning actual in-flight acquisition of EEG data, Caldwell demonstrated the ability to obtain EEG data from sleep-deprived helicopter pilots sensitive enough to detect fatigue.46 Taheri-Gorji et al., recently contributed to the field by establishing that EEG feature extraction may be enhanced by ML algorithms during actual flight.47 His team evaluated 16 pilots with a 20-channel, dry-EEG device during training flights in either a Piper Archer or a Cessna 172S. They characterized pilot workload by the complexity of flight operations. For example, straight-level flight would be considered low workload, whereas precision approach would be considered high workload. EEG features extracted were either the aforementioned PSD or the log energy entropy of each bandwidth. Log energy entropy describes the amount of information carried by a signal or how much randomness is in a signal.48 They trained their ML models on over 200 EEG features of δ, θ, α, and β wavelengths from various EEG channels, ultimately achieving a 93% accuracy at determining low, medium, and high workload states.

Thus far, the vast amount of discussion has been on the identification of cognitive workload and fatigue with ML models, using predominately EEG data; but, what about the other two leading causes of abnormal mental states that are associated with aviation mishaps, SD, and hypoxia? Recently, researchers have noted EEG features which have identified vection illusions and unperceived somatosensory illusions.49,50 Specifically, Hoa noted statistically significant increases in α waves of the right frontal cortex for subjects who experienced unrecognized vection.49 Sciortino noted widespread power spectral decreases in α and β for subjects exposed to a perceptual illusion in which participants experienced a fake model hand as being part of their own body, a.k.a., the rubber hand illusion.50 In the foreseeable future, researchers could develop analogous machine training models described previously to evaluate changes in workload and fatigue states to accurately predict SD, using the EEG indices noted by these studies, in actual flight.

Regarding hypoxia, our initial aeromedical protocol example developed by Rice et al., later further evaluated by Snider et al., established the utility of EEG indices, specifically decreasing PSD of β waves and θ waves, and the ability of DT, NB, and NN ML to predict hypoxia over 97%.8,9 Liu, demonstrated similar utility of support vector machine (SVM) ML models in differentiating sustained attention performance of adults exposed to high altitude compared to healthy norms by utilizing EEG indices of ERPs, achieving an accuracy of 92.54%.51

In summary, regarding the use of EEGs as biosensors to detect changes in mental states, the literature suggests that feature extraction of high-frequency β PSD typically decreases in fatigued states, and that α- and θ-wave PSD have been shown to increase. Although fewer studies exist regarding the utilization of ML to identify the aviation hazards of SD and hypoxia, both conditions have demonstrated unique changes in PSD and, more recently, ERP and gravity frequency of PSD transition. Taken together, these EEG feature extractions, in combination with current ML models, hold promise of high accuracy in their ability to identify cognitive decrements in aviation environments.

There is a substantial body of scientific evidence that features of ECG, specifically HRV, may be useful in identifying aspects of mental workload and cognitive states in a nonaviation environment.52,53 HRV is a physiological phenomenon characterized by fluctuations in the time intervals between consecutive heartbeats, and it reflects the influence on the sinus node of the two limbs of the autonomic nervous system (ANS)—sympathetic (SNS) and parasympathetic (PNS).54,55

HRV indices have been identified within flight simulators to index cognitive workload states.5658 Specifically, the ratio of low-frequency (LF) HRV (LF: 0.04–0.15 Hz; index of SNS) and high-frequency (HF) HRV (HF: 0.15–0.4 Hz; index of PNS) has been observed to increase due to the predominance of SNS during stressful events.57 Capitalizing on this observation with regard to HRV, Qin applied both unsupervised ML, in the form of Toeplitz Inverse Covariance Based Clustering (TICC), and supervised ML, in the form of SVM models, to these ECG features and achieved a 91.8% accuracy at identifying mental fatigue produced by prolonged flight missions.59 Indeed, in some cases, HRV has performed as well or better than EEG as a biosensor in predicting pilots’ cognitive workload during takeoff, cruise, and landing phases when combined with common ML algorithms such as SVM or K-Nearest Neighbors (k-NN).60

Less is known with regard to HRV’s ability to specifically identify SD and hypoxia in aviation environments. Lower HRV has been demonstrated in numerous pilot communities who have undergone SD training.61 As suggested previously, lower HF HRV may be seen in a variety of stressful conditions. So, although this type of biosensor index is not unique to SD, it could be used in conjunction with other biosensors to infer abnormal spatial perception. As for hypoxia, HRV has been shown to decrease with mild normobaric hypoxia at 10,000 ft (3048 m), equivalent to slightly above commercial aviation cabin altitudes. This decrease in HRV was enhanced when combined with higher cognitive workload states, thus suggesting a synergistic response to two stressful conditions.62 Similarly, Castro-Herrera recently exposed 44 aviators to acute severe hypobaric hypoxia at a simulated altitude of 25,000 ft (7620 m), resulting in decreases in both HF and LF HRV upon arriving at terminal altitude.63 Other researchers have questioned the validity of LF HRV and LF/HF ratio in determining cardiac sympatho-vagal balance and subsequent physiological states.6466 From the current literature, data suggest the ANS response to hypoxia is complex and may not be useful to specifically characterize acute hypoxia, especially with higher heart rates, regardless of applying ML models to the data.

Eye-tracking (ET) applications for aviation to identify various cognitive states have evolved greatly over the last decade, so much so that a total of three systematic reviews have been performed on the topic.6769 To summarize, these reviews evaluated the literature on the subjects involved (civilian pilots, military pilots, air traffic control), type of visual equipment used, eye metrices extracted, and aviation hazard they were attempting to identify (fatigue, SD, hypoxia, cognitive workload). All reviews concluded that ET has the potential to be effective in terms of preventing errors or injuries by detecting, for example, fatigue or performance decrements.

As of this writing, there are four main methods used to measure eye movements. These methods are electro-oculography (EOG), scleral contact lens/search coil, photo-oculography (POG) and video-oculography (VOG).68,70 In ET aviation research, distinct eye metrics have been identified and related to different cognitive, emotional, and physiological states, which can be used to gain a wider understanding of the human mind.69 These eye metrics are fixation, saccadic movements, pupillary response, and eye blink rate.

Fixation refers to when the eye remains still, meaning the pupil is stationary for approximately 180–300 ms.71 Per Mengtao’s aforementioned review, a majority of aviation research in the last decade has involved some measurement of fixation.68 Ziv noted that experienced pilots tended to fixate more on multiple instruments as compared to novice aviators, who often focused on fewer.70 Moreover, pilot’s situational awareness (SA) performance and expertise level can be inferred from the distribution of fixations and fixation duration on relevant areas of interest.72 Fudali-Czyz observed the effective dwells (dwell times exceeding 600 ms) to the stimulation area can reflect if pilots have incurred SD.73

Saccades are rapid eye movements that occur when a person shifts between fixations.71 Saccades last around 10–100 ms, during which time visual information transfer is suppressed; therefore, it is generally concluded that saccades are not directly related to cognitive processing. However, the literature suggest that saccade velocity may be related to lethargy, stress, and fatigue.7476 Scannella noted better utility with the measurement of saccades in detecting cognitive workload as compared to cardiac metrices such as HRV during actual flight.77 Regarding aviation specific preconditions that may result in mishaps, decreases in saccadic drift and velocity have been found to be associated with both hypoxia and fatigue.78,79

Utilizing the biometric indices of fixation and saccadic movements, researchers have incorporated them within ML models to identify pilots’ attention distribution and accurately predict their SA.80 Specifically, Jiang monitored the flight deviations of cadets during specified assigned headings, altitude, and airspeeds. They found that they could accurately determine the SA of these cadets by extracting their main visual area of interest during the flight and applying CNN and LTSM ML to these ET indices. Although not cited as frequently as fixation or saccades, blink rate and pupil diameter have also been incorporated in multimodal ML algorithms to detect fatigue81 and have the potential to be informative in identifying hypoxia82 and cognitive workload.83,84

In summary, several biosensors have shown individual promise in identifying cognitive deficits during both simulated flight environments and actual flight. Their ability to detect cognitive deficits has been enhanced with recent advancements in ML computer modeling. The most cited biosensors that have been integrated with ML models are EEG, ECG, and ET devices. Not explicitly covered in this section are nonpsychophysiological sensors, such as sensors which monitor deviation in flight control inputs. An important example of such research is Wang’s study evaluating joystick deviation as a predictor of space module docking crashes when combined with semi-supervised ML algorithms.85 Future integration of these engineering sensors with biosensors into multinodal machine–human interfaces is on the foreseeable horizon. On a final note, Mengtao concluded the real-time application of this technology, as with all sensors of these types, is still rare due to preprocessing times of raw data.68 This topic will be explored in the next section of this review.

Preprocessing & Denoising Techniques

A variety of time series biodata (e.g., EEG, ECG, GSR, blood oxygen saturation, and eye movements) can be recorded and mathematically interpreted through ML as a biofeedback system for pilots. However, analyzing these time series data is challenging due to electronic or biological noise.

The following discussion proposes a framework that aims to: 1) clean and interpret these physiological measures, which could impact cognitive performance during common aeromedical hazards (fatigue, hypoxia, and SD); and 2) transition this technology to operational environments. The goal is to use any multidimensional biodata for AI input.

Preprocessing techniques start from the hardware used for data collection to mathematical algorithms that separate noise from signals. ML techniques (supervised, unsupervised, and semi-supervised learning discussed in the next section) are employed to denoise signals. Table I, provided within the introduction of this paper, highlights just a few of these denoising methods. In general, the purpose of these methods focuses on achieving clarity of primary signals, with different denoising techniques serving various purposes.

The importance of preprocessing biosensor data prior to incorporating it within ML models cannot be overemphasized. As an example, initial efforts by Snider to apply ML to Rice’s dry-EEG data on aviators exposed to hypoxia without denoising techniques resulted in an accuracy of their ML models of only 67%.86 However, upon applying these denoising techniques to their ML models, they achieved an accuracy of over 97%.9

Since EEG’s discovery in 1929 by Hans Berger, noise has complicated its interpretation, despite noise filtering.87 Medical specialists use EEG extensively but still face challenges with subtle abnormalities or complex conditions, requiring collaboration to interpret. Like the human mind, machines can now aid in real-time EEG interpretation. The development of an artificial general intelligence model that adapts denoising techniques in real time for specific biosensors could enhance ML accuracy for instantaneous biofeedback. Thus, preprocessed biosensor data can then be used by various ML algorithms, each with unique benefits, as detailed in the next section.

Machine Learning Algorithm Development

ML algorithms form the foundation of AI, enabling decision-making.88,89 Successful ML development requires data preprocessing, as described in the previous section. Following this, exploratory data analysis and ML algorithm selection are critical steps. ML algorithms are broadly classified into supervised, unsupervised, and semi-supervised categories, which are discussed below and summarized in Table II. The challenge lies in selecting the appropriate algorithm and parameters, depending on data characteristics and analysis objectives. This section will explore these ML categories and their applications in monitoring pilots’ cognitive and physiological states. Table IV summarizes the various aviation studies cited in this paper that have incorporated ML models to enhance the interpretation of biosensor data by category and the aviation hazard they were attempting to identify.

Table IV. Selected Aviation Related Studies Utilizing Machine Learning in this Review.
Table IV.

Supervised learning uses labeled data during training to map input data to output labels accurately.26,27 The training dataset, comprising precategorized observations, helps the model learn input–output relationships. By analyzing labeled examples, the model identifies patterns, enabling it to generalize to new data. Examples of supervised learning include: Boolean classification, which predicts binary outcomes such as whether an email is spam; nominal classification, which assigns inputs to predefined categories, such as classifying images as “dog,” “cat,” or “bird;” and regression, which predicts continuous values, such as house prices.

Common algorithms in supervised learning include: linear regression, which predicts continuous values by assuming a linear relationship; logistic regression, which is used for binary classification, thereby modeling the probability of category membership; DTs, which are applicable for classification and regression by splitting data into subsets; and NNs, which model complex relationships using interconnected nodes. An example of an aviation protocol that has utilized supervised ML models is the study by Snider et al., who extracted EEG indices such as PSD values and applied them to DT and NB algorithms to accurately identify hypoxia.9 Likewise, Masse applied the supervised algorithms of RF and SVM to identify EEG indices predictive of cognitive fatigue.44

Unsupervised learning works with unlabeled data, finding patterns and structures without specific guidance.29 The algorithm independently explores the data to uncover its inherent structure, aiming to discover hidden patterns or groupings by analyzing it to find natural clusters or underlying organizations. This method adapts based on the data’s properties, providing insights that might be missed with predefined labels. Key tasks for unsupervised learning include: clustering, which involves grouping similar items based on features, with algorithms like k-means and hierarchical clustering partitioning data into clusters; anomaly detection, which identifies outliers or anomalies by understanding normal data patterns; and data visualization, which simplifies data with techniques like PCA to make it easier to visualize and interpret.

Common unsupervised learning algorithms include clustering algorithms, such as k-means and hierarchical clustering, and dimensionality reduction algorithms, such as PCA and association rule learning. Examples of these types of ML being utilized in recent aviation protocols include recognizing pilots’ fatigue status using a deep contractive autoencoder network by Wu et al.,90 as well as Li’s protocol predicting unsafe pilot operations utilizing k-means clustering unsupervised ML.31

Semi-supervised learning combines labeled and unlabeled data during training, using a small amount of labeled data with a large amount of unlabeled data.32 This approach is beneficial when labeled data is scarce or costly and unlabeled data is abundant, such as Xu’s evaluation of ML algorithms’ ability to interpret wearable-ECG data.33 The algorithm first learns from labeled data to understand input–output relationships, then utilizes unlabeled data to identify patterns and structures, improving performance by exploiting the information in unlabeled data.

Common semi-supervised learning approaches include: self-training, where the algorithm trains on labeled data, then uses its predictions to label unlabeled data and iteratively refines its predictions; cotraining, where multiple classifiers train on different feature subsets, label the unlabeled data, and train each other; label propagation, where labels propagate from labeled to unlabeled data based on similarity; and active learning, where a human expert assigns class labels to “kickstart,” augment, or reinforce the learning process. Semi-supervised ML models have recently become a focus of interest in aviation to detect anomalies and predict incident risk.34,35

Conceptualizing the methodological steps of sensor selection, preprocessing, and ML algorithm development into operational aviation environments, we can envision how this technology may optimize a pilot’s performance in next-generation aircraft. The studies we have summarized suggest EEG, ECG, and ET indices of a pilot may be used to identify cognitive decrements. When simultaneously combined with preprocessing techniques and ML algorithms, they hold the potential of mitigating cognitive decrements and enhancing human performance in real time.

DISCUSSION

In this paper, we presented a three-step methodology by which AI may be applied to data obtained from biosensors to identify cognitive decrements in aviators. This included sensor selection, preprocessing, and ML algorithm development. Intentionally, data integration was not mentioned specifically. To accurately link the cognitive decline you are trying to detect with your biosensor, effective data integration is essential. For example, in Rice’s study, the frequency sampling rate of the cognitive performance task being monitored required the data to be time-matched with the same frequency sampling rate for both biosensors, EEG and oxygen saturation, in order to correlate precisely the independent variable under investigation.8 So, if one biosensor has a sampling rate of 200 Hz and another biosensor has a sampling rate of 250 Hz, the lowest sampling rate must be utilized and integrated to appropriately correlate with one another. Various computer platforms such as LabVIEW® (Austin, TX, United States), MATLAB® (Natick, MA, United States), and Python® (Fredericksburg, VA, United States) have been used to accomplish this integration within simulated and actual flight.44,45,47

There is less published research on more difficult preconditions, such as motivation, overconfidence, and personality style, that could contribute to mishaps but do not yet lend themselves readily to “real-time” AI identification. As such, these preconditions were not the focus of our methodologies. However, these more challenging to detect human behavioral preconditions have been the focus of ML investigations in both the systematic analysis of aviation mishap reports and human factor classification in recent publications.9193

This review purposefully incorporated NLP into sections of this paper to demonstrate the utility of these ML models and conceptualize the capabilities of this technology for aerospace environments. Most meta-analysis and systematic reviews incorporate some form of NLP within their methodology to perform searches for reference inclusion/exclusion in their papers. Some researchers have noted the potential for bias when utilizing NLP models such as Chat GPT.94 This bias has been shown to be a product predominately of opinion-generated references. We attempted to exclude this condition when developing our tables by ensuring references cited were peer-reviewed, aviation-related, and incorporated established ML algorithms.

From an educational perspective, we have introduced to the audience several new methodologies of AI for which it is certainly difficult to conceptualize the importance in an operational environment. Specifically, the methodological steps of preprocessing and ML algorithm development are not routinely encountered by aeromedical professionals, so presenting their importance within referenced tables explicitly demonstrates the operational impact they may have. Moreover, utilizing NLP as a bridge to crystallize these concepts from a laboratory to an operational environment is and will become an ever-present methodological tool researchers will use to make scientific gains. Exposing readers to appropriate referencing of such tools is of value to future papers and not so much a novelty but rather our current state of science.

In Alreshidi’s systematic analysis of ML and aviation safety, only 10% of the 80 papers included in their review had obtained data during actual flight.5 None of these studies provided biosensor data feedback in real time to their pilots during flight. Future direction for this research will need to focus on the integration of both biosensors and ML computation into aircraft display systems and/or the helmet visor to provide meaningful real-time data to detect or prevent undesirable cognitive states. Miniaturization of the data preprocessing hardware and maturation of ML algorithmic selection will be the next phase of the transition to operational environments for real-time continuous monitoring.

This paper is not a traditional, systematic review of the literature regarding ML and aviation safety. More appropriately, it should be characterized as a critical review with the primary objective of serving as a guidepost for future aeromedical investigators interested in utilizing AI to enhance their research protocols. Reviews of this type emphasize the conceptual importance of the available literature as compared to systematic and meta-analytic reviews, which have a more structured methodology. As such, the paper, as Grant eloquently stated in his analysis of 14 scientific review types, “should serve as a starting point and not an endpoint.”95

Copyright: Copyright © by The Authors.
pdf
Fig. 1.
Fig. 1.

U.S. Navy Class A mishap rates FY2014Q1 – FY2024Q1.


Fig. 2.
Fig. 2.

HFACS 8.0, leading aeromedical preconditions associated with Class A, B, and C Naval mishaps 2013-FY2024.


Fig. 3.
Fig. 3.

Subcategories of artificial intelligence (AI).


Fig. 4.
Fig. 4.

Methodology steps of integrating artificial intelligence (AI) with psychophysiologic sensors.


Contributor Notes

Address correspondence to: Dr. G. Merrill Rice, 375 A Street, Norfolk, VA 23511, United States; gmerrillrice@gmail.com.
Received: 01 Jul 2024
Accepted: 01 Nov 2024
  • Download PDF