Prediction of Alzheimer’s Disease Using Machine Learning Classifiers

http Introduction Alzheimer’s disease (AD) is a type of brain dysfunction in which the mental abilities of the patient gradually disappear. Its most obvious manifestation is dementia and memory impairment (1). In 2010, about 35 million people were affected by the AD, and it is expected that by 2050, 1 in every 85 people will develop AD (2). In the United States in 2010, nearly 5.5 million people were diagnosed with AD, with a cost estimate of $230 million per year for treatment, medicine, and healthcare (3). Memory impairment usually progresses gradually. At first, memory impairment is limited to short-term memory and learning but gradually long-term memory would also be damaged. Researchers believe that AD starts many years before the onset of clinical signs. Because of its long duration before the manifestation of clinical symptoms, scientists are working to find ways to detect AD early in people at risk. Therefore, it is crucial to diagnose the structural changes in the brain of people with the high risk of this disease by timely and accurate detection to stop or at least slow down its development (4). Magnetic resonance imaging (MRI) can detect changes in the size of the brain regions. Measurement of the regions of interest (ROIs) where atrophy occurs during the development of AD can be considered as an indicator of diagnosis (5, 6). A large number of studies have been done on the diagnosis of AD using the MRI of the brain, but only a few limited regions of the brain in these studies have been monitored and the other regions have not been investigated yet. Most studies have used ROIs in multiple Prediction of Alzheimer’s Disease Using Machine Learning Classifiers


Introduction
Alzheimer's disease (AD) is a type of brain dysfunction in which the mental abilities of the patient gradually disappear. Its most obvious manifestation is dementia and memory impairment (1). In 2010, about 35 million people were affected by the AD, and it is expected that by 2050, 1 in every 85 people will develop AD (2).
In the United States in 2010, nearly 5.5 million people were diagnosed with AD, with a cost estimate of $230 million per year for treatment, medicine, and healthcare (3). Memory impairment usually progresses gradually. At first, memory impairment is limited to short-term memory and learning but gradually long-term memory would also be damaged. Researchers believe that AD starts many years before the onset of clinical signs.
Because of its long duration before the manifestation of clinical symptoms, scientists are working to find ways to detect AD early in people at risk. Therefore, it is crucial to diagnose the structural changes in the brain of people with the high risk of this disease by timely and accurate detection to stop or at least slow down its development (4). Magnetic resonance imaging (MRI) can detect changes in the size of the brain regions. Measurement of the regions of interest (ROIs) where atrophy occurs during the development of AD can be considered as an indicator of diagnosis (5,6). A large number of studies have been done on the diagnosis of AD using the MRI of the brain, but only a few limited regions of the brain in these studies have been monitored and the other regions have not been investigated yet. Most studies have used ROIs in multiple iejm.hums.ac.ir http slices of MRI based on volume image analysis to diagnose the disease. This will limit the result of the research to be applicable in the MRI centers. This study aimed to explain whether it is possible to distinguish between AD and HC subjects with an acceptable precision, using only one slice based on the pixel color. We tried to identify ROIs on one slice of MRI and apply support vector machine (SVM) and Bayesian SVM classifiers based on pixel weight.

Subjects
The data were obtained from an MRI database (MIRIAD) of 69 subjects including 46 AD (19 males and 27 females) and 23 HC (12 males and 11 females) subjects. These data were collected at a Central Hospital in London and was available to the public. People were categorized based on the two scales of NINCDS-ADRAD (7) and MMSE (8) as the gold standard to separate HC (MMSE = 29.4 ± 0.8 and negative NINCDS-ADRAD) and AD (MMSE = 19.2±0.4 and positive NINCDS-ADRAD) subjects. We separated the subjects into two HC and AD groups using brain MRI and the examination of ROIs and compared the results to the gold standard. For sample size (n) adequacy, between 5 and 15 times the sample per variable (k) should be available, and for the 10 variables of the present study, therefore, between 50 and 150 samples are required (15k <n <5k) (9).

MRI Acquisition
Scanning was performed using the same Signa 1.5T MRI Unit (GE Medical Systems, Milwaukee). T1-weighted volumetric images were obtained using a spoiled FAST GRASS sequence technique with a 24-cm field of view and a matrix of 256 * 256 to provide 124 contiguous 1.5-mmthick slices in the coronal plane. The scan acquisition parameters were as follows: repetition time = 15 ms, echo time = 5.4 ms, flip angle = 15o, and inversion time = 650 ms.

MRI Processing
In this study, each MRI consisted of 124 slices that contained different levels of the brain. For each person, we separated the identified ROIs (left para hippocampal gyrus, right para hippocampal gyrus, right hippocampus, left hippocampus, left insula, right insula, right middle temporal gyrus, left middle temporal gyrus, right superior temporal gyrus, left superior temporal gyrus) on one slice from MRI ( Figure 1) using SPM12 software (10). Each ROI forms an image and these images form pixels. Given that there are only two colors in the MRI, these images are considered as binary, which include only two white and gray colors. Next, we set the color of each pixel. Each pixel is composed of one byte and each byte is eight bits. The color of each pixel must be either white or gray. Finally, for each pixel, we had 28 states (256 states) or color spectra (11). This spectrum ranged from absolute white to absolute gray. This split was done in MATLAB software.

Classification
The pixel weight of each ROI was used, and subjects were classified using two SVM and Bayesian SVM classifiers. Additionally, R3.2.3 software was used for this part. Cross-validation way was used for running classifiers, applying about 70% of the data for training and 30% of the data for testing (12). Our data at first was not linearly separable and should have been mapped to a special space and kernel functions be used for separation (13). All kinds of SVM classifiers can be used to separate data linearly but when it is impossible, kernel functions should be used to separate them linearly (14,15). At this point, we used three kernel functions; linear, polynomial, and Gaussian radial basis function (RBF). Then, the kernel with the highest accuracy in identifying goodness-of-fit indicators was chosen. The only difference between Bayesian SVM and SVM is the use of past information based on Bayesian method for predicting parameters in the training part for using in the test part. Finally, the ROC curve was used in SPSS software version 16.0 and the cut-point for each significant ROI was determined at 0.05 level. To evaluate the output of the models, we used the following criteria: accuracy (16), sensitivity (17), and specificity (18).

Results
The data were extracted from a database containing brain MRI acquisitions from 69 people including 46 people with AD (19 men and 27 women) and 23 healthy elderly people (12 men and 11 women). In total, 55% of people were women and 45% were men.
In this study, with acceptable results, individuals were divided into two groups: HC and AD; however, the results obtained for the two models of SVM and Bayesian SVM with different kernels were not the same in distinguishing Alzheimer's from healthy. The SVM model with different kernels separated Alzheimer's and healthy individuals with distinct accuracy. These results were obtained with the use of SVM for each kernel by 800 times, and their iejm.hums.ac.ir http means were reported. We used three linear, polynomial, and Gaussian RBF kernels to run the program based on the SVM model. In the SVM model with linear kernel, the accuracy was 83.20% and with polynomial kernel, it was 85.83% and with Gaussian RBF kernel, 88.34% accuracy was obtained in distinguishing between Alzheimer's and healthy individuals. Moreover, the results of the Bayesian SVM with three kernels were obtained by 800 times, and their means were reported. We also used three linear, polynomial, and Gaussian RBF kernels to run the program based on the Bayesian SVM model in which 83.50%, 85.92%, and 86.72% accuracy were obtained with the three kernels noted for distinguishing between Alzheimer's and healthy individuals, respectively ( Table  1). The highest accuracy was obtained in the SVM model with Gaussian RBF kernel (88.34%) (Figure 2).
At the next step, the ROC curve was used for identifying the importance of all ROIs and as can be seen in Figure 3, six ROIs were statistically significant or close to significant at 0.05 level ( Table 2). As can be seen in Figure 3, four ROIs are the most important areas of the brain in AD including left para hippocampal gyrus, right para hippocampal gyrus, right hippocampus, and eft hippocampus. Then, for the 6 ROIs, a proper cut-point (the point with the highest sum of sensitivity and specificity) was determined (Table 2).
Then, the ROIs that were significant at 0.05 level in the gender variable were examined. For women, left para hippocampal gyrus, right para hippocampal gyrus, right hippocampus, and left hippocampus were significant at 0.05 level. These areas were more significant for women with AD, and the most important area for women was right hippocampus. For men, right para hippocampal gyrus, right hippocampus, right superior temporal gyrus, and left superior temporal gyrus were significant at 0.05 level. This means that these areas in men are more important for AD and the most important area in men is right para hippocampal gyrus. The importance of significant areas at 0.05 levels has been compared across the whole population, both in men and women.

Discussion
We distinguished Alzheimer's and healthy individuals by the SVM model with Gaussian RBF kernel, with an accuracy of 88.34%. In a study by Plant et al, to predict AD based on specific areas of the brain from MRI, three models were used and SVM model was reported to be highly reliable (19). It is the same as the result obtained with our model. In a study by Ferreira et al entitled "improvement of cognitive deficit prediction to AD using genetic algorithms" That were able to classify AD and HC by 88% accuracy (20). It is also consistent with the results of our study. In a study by Magnin et al, AD and Healthy subjects were distinguished with an accuracy of 94.5% (21). In a review study using ADNI data, Westman et al   iejm.hums.ac.ir http reported a predictive power of 80 to 90% (22). The results of their study were consistent with the results of our study.
The clinical objective of this study was to identify the most important ROIs for the diagnosis of Alzheimer's by a clinical specialist. Experts should focus on atrophy in the four ROIs; left para hippocampal gyrus, right para hippocampal gyrus, right hippocampus, and left hippocampus. In a study conducted by Hyman et al in the field of pathology of cells in AD, it was concluded that the most important areas in the brain for AD are hippocampus and para hippocampus gyrus (23), which is in line with the results of this study.
Moreover, based on the cut-point, if the weight of gray pixels in these ROIs reaches this level or higher, it can be an alarm for AD. For example, if the gray pixel weight in the right para hippocampal gyrus area is 57.03% or higher, it is a warning sign of AD and it must be examined more cautiously.
Different studies have been used to classify different regions of the brain for use in the classification and diagnosis of the disease. Most studies have used several levels of the brain (multiple slices), which is not practical for technicians in MRI centers. In this study, we only used one brain surface (one slice), which included important ROIs in AD and used pixel color. This method makes it possible to get the output of this study using a software package and apply it in the MRI centers that do screening tests for subjects who are at the risk of AD before a neurologist detects the incidence of AD but it can also be used under a software package in a specialist office.

Conclusion
The purpose of this study was to separate Alzheimer's and HC subjects based on SVM and Bayesian SVM classifiers using Brain MRI. The results of the study can be well implemented. The accuracy of the results obtained ranged from 83.20% to 88.34%. In the classifiers, the highest accuracy of distinguishing between Alzheimer's and HC was obtained in SVM with Gaussian RBF kernel (88.34%). Additionally, the specialist should focus on the examination of atrophy in the four areas of left para hippocampal gyrus, right para hippocampal gyrus, right hippocampus, and left hippocampus. However, if we want to examine more carefully, we must pay more attention to atrophy in the right hippocampus in women and atrophy in the right para hippocampal gyrus in men.