1,305
Views
42
CrossRef citations to date
0
Altmetric
Original Articles

Fast Discrimination of Apple Varieties Using Vis/NIR Spectroscopy

, &
Pages 9-18 | Received 05 Oct 2005, Accepted 15 Jan 2006, Published online: 31 Jan 2007

Abstract

We evaluated the potential of visible/near-infrared (Vis/NIR) spectroscopy for its ability to nondestructively differentiate apple varieties. The apple varieties used in this research included, Fuji apples, Red Delicious apples, and Copefrut Royal Gala apples. The chemometrics procedures applied to the Vis/NIR data were principal component analysis (PCA), wavelet transform (WT), and artificial neural network (ANN). The apple varieties could be qualitatively discriminated in the PC1-PC2 space resulted from PCA. Wavelet transform was used as a tool for dimension reduction and noise removal, reducing spectral to wavelet components. Wavelet components were utilized as input for three-layer back propagation ANN model. WT-ANN model gave the highest level of correct classification (100%) of the apple varieties.

INTRODUCTION

Apple is an excellent fruit for our diet. Its quality is defined by physical characteristics (such as texture, size, color, and odor) and chemical parameters (such as sugar content, starch content, lipids, or vitamins), and these factors are affected by the variety of the fruit. Near infrared (NIR) spectroscopy is a well-established technique for constituent analysis of agricultural and food products as it has many advantages when compared with classical chemical and physical analytical methods: NIR spectroscopy has a short measuring time; it requires limited sample preparation; it is chemical-free; and, it can easily be used in continuous quality evaluation. Nondestructive optical method based on Vis/NIR spectrometry has been evaluated for nondestructive estimation of the internal attributes, soluble solids content, dry matter content, acidity, and other physical properties of apples.[Citation1–5] Differences between sound and damaged tissues in visible and NIR diffuse reflectance are useful for detecting bruises, chilling injury, scald, decay lesions, and numerous other defects.[Citation6] Bruises on apples can be detected at specific NIR wavelengths. However, the wavelengths chosen for apples differ between fresh and aged bruises because of drying of the injured tissues.[Citation7]

Several nondestructive methods for instrumental varieties measurement of Melon genotypes,[Citation8] apples,[Citation9,Citation10] apple juice,[Citation11] Soybeans,[Citation12] and wheat[Citation13] have been attempted by using different techniques: near infrared spectroscopy (NIR),[Citation8] Electronic nose.[Citation10] nuclear magnetic resonance (NMR),[Citation11] and digital image analysis technique.[Citation13] Marrazzo et al.[Citation10] put forward a way to recognize apple varieties based on the ion concentrations (such as sugar, starch, nitrate, and potassium) obtained from ISFET sensors.[Citation9] This method is an indirect way, and the classification correctness is influenced by the measurement precision of the ion concentrations. Marrazzo et al.[Citation10] investigated the feasibility for the differentiation of apple cultivars using electronic nose chemical sensor. They used several classical discrimination tools, especially PCA, soft independent modeling of class analogy (SIMCA), and hierarchical cluster analysis (HCA). Different apple cultivars can be differentiated qualitatively, but the quantitative discrimination model for unknown sample was not built.[Citation10] Reid et al.[Citation14] evaluated the potential of mid-infrared (MIR) and near-infrared (NIR) spectroscopy to differentiate apple juice samples using partial least squares (PLS) regression analysis.[Citation14,Citation15]

NIR spectral data has been effectively combined with multivariate techniques, such as factor analysis and discriminant analysis for classification, discrimination, and authentication purposes.[Citation7] An NIR spectrum of a sample is typically measured by modern scanning instruments at hundreds of equally spaced wavelengths. The information in the spectral curve is used to predict the chemical compositions of the sample by extracting the relevant information from many overlapping peaks. So, how to extract the useful information from mass original spectral data is a pivotal step. Osborne et al.[Citation16] described standard approaches, such as linear discriminant analysis (LDA). These methods failed with many variables and different approaches are needed to be taken in the data analysis.[Citation17] A common solution is to reduce the dimension of the predictor matrix by using principal components and then apply LDA. Wu et al.[Citation18] compared several methods for classification based on mass spectra, including linear and quadratic discriminant analysis and classification trees methods. In their conclusions, the authors emphasized the needs for methods to remove noise from the data and select relevant features. Wavelet transform was employed to reduce dimension and remove noise.[Citation19]

The principal object of this article was to assess the potential of Vis/NIR spectroscopy to distinguish apple cultivars. Secondly, the capabilities of different chemometrics techniques to differentiate different varieties of apple samples based on the analytical results were also investigated.

MATERIALS AND METHODS

Plant Materials

A total of 90 apples were purchased from a local market, 30 each for the following three species: Fuji (from Shanxi, China), Red Delicious (from USA), and Copefrut Royal Gala (from USA). The skins of these samples were smooth and perfect. All the samples were first allowed to equilibrate to room temperature (25°C) before Vis/NIRS analysis.

System Set-up and Reflectance Measurements

Considering its 25° field-of-view (FOV), the spectrophotometer was placed approximately 100 mm above the sample. The light source, a Lowell pro-lam interior light source assembly with a Lowell pro-lam 150 W tungsten halogen Bulb, which could be used both in visible and near infrared regions, was placed at 120 mm from the fruit surface. The angle between the incident light (light source) and the detector was 45°. A 100 mm2 thick Teflon disk was used as the optical reference standard for the system. A reflectance (R) was calculated by comparing near infrared energy reflected from the sample (fruit) with that from the standard reference. The signals were pre-processed with the software, ViewSpec Pro version 2.14 (Analytical Spectral Device, Inc., 5335 Sterling Drive, Suite A, Boulder, CO 80301, USA).

From each fruit, the reflection spectra (325–1075 nm) were taken at three equidistant positions around the equator (approximately 120°)[Citation20] with a spectrophotometer (325–1075 nm), using the RS2 software for Windows. Ten scans were performed at each position, with a total scans of 30 for each sample. All spectra recorded were checked visually and averaged using ViewSpec. For each fruit, a mean spectrum was calculated by total scan spectra. shows the average spectra (reflectance (R)) for one sample. In our system, a big scattering that influenced the accuracy of data analysis was observed at the beginning and end of the spectra. The values for the first 75 nm and the last 115 nm were, therefore, taken out of all analysis.

Figure 1 Original reflectance spectra for one apple (325–1075 nm) (mean value of 30 scans).

Figure 1 Original reflectance spectra for one apple (325–1075 nm) (mean value of 30 scans).

Pretreatment of the Optical Data

To reduce the noise, the smoothing way of Savitzky was used. The segment size of smoothing is 9. It has been proved that many high frequency noises could be eliminated at this segment size. The second type of preprocessing was the multiplicative scatter correction (MSC).[Citation21] This technique was used to correct additive and multiplicative effects in the spectra. Due to the fruit fresh light scattering, the light does not always travel the same distance in the sample before it is detected. A longer light traveling path corresponds to a lower relative reflectance value, since more light is absorbed. This causes a parallel translation of the spectra. This kind of variation is not useful for the calibration models and need to be eliminated by the MSC technique. The preprocessing and calculations were carried out using The Unscrambler V9.1 (CAMO PROCESS AS, OSLO, Norway), a statistical software package for multivariate calibration. To avoid low signal-noise ratio, only the wavelengths ranging from 400 to 960 nm were used in this investigation.

Wavelet Transform (WT)

Wavelet transforms (WT) have been studied for many years by mathematicians and widely used in numerous applications. It is a kind of math transform that can map the time scope signal to a two-dimensional scope. Furthermore, it has many excellent features: 1) to view the signal in time scope or frequency scope at any selected scalar; and, 2) it is very suitable to compress data and filter noise. Its working principle is to build a wavelet function, which must obey some limit conditions.

For our analysis, we used the Daubechies wavelet, as it closely matches the signal to be processed, which is of great importance in wavelet applications. Due to the unique feature of providing multiple resolutions in both time and frequency by wavelets, the sub-band information can be extracted from the original signal.[Citation22,Citation23] When applied to discriminate varieties of apples, this sub-band information of spectra data provided useful signatures of varieties, so that the varieties classification was completed elegantly. Wavelet transform was conducted with the Matlab7.0 software programmed by the authors.

ANN Quantitative Analysis Model

ANN, with back propagation algorithm, can be used for the data compression tasks, as well as class discrimination tasks. In general, ANN consists of an input layer, a hidden layer and, an output layer of spectra data. The input and output modes are linked by a hidden layer of modes through connection weights. The weighted input signals are transferred to the hidden layer. Each node in the hidden layer computes the sum of its weighted inputs and transforms this sum by means of a linear and non-linear transfer function. The outputs from the hidden layer are weighted and then sent to the output mode. We used three–layered back propagation network architecture for developing neural classifiers for sorting apples based on varieties. The authors evaluated several networks with different numbers of hidden nodes to determine an optimal structure for generalization. Often, the number of hidden nodes is determined with the final result of minor error by several trials.

The training of the ANN was done with a basic error back propagation algorithm.[Citation24,Citation25] A three-layer ANN was built. Sigmoid function was used as the transfer function of each layer. The node of input layer was 26. The node of output neuron layer was 1. The goal error was set as 0.001. The speed of learning was set as 0.2. The time of training was set as 1000.

Principal Components Analysis

Principal component analysis is a very effective data mining technique. The principle of PCA is to find the linear combinations of the initial variables that contribute more to making the samples different from each other. These combinations are called principal components (PCs). They are computed iteratively, in such a way that the first PC is the one that carries most information (or in statistical terms: most explained variance). The second PC will then carry the maximum share of the residual information. Therefore, PCA finds an alternative set of coordinate axes, PCs, about which data set may be represented. The PCs are orthogonal to each other, and they are ranked so that each one carries more information. Score is the estimated value for a principal component (PC). Each spectrum has a score along each principal component. In this study, PCA was used to visualize the hyper spectral data to describe the varieties of the samples.

RESULTS AND DISCUSSION

Features of NIR Reflectance Spectra

The average reflectance spectra from 400 nm to 960 nm are shown in for randomly selected samples of Fuji apples, Red Delicious apples, and Copefrut Royal Gala apples. We observed that the spectral profiles were substantially distinguished from each other, especially at the visible wavelengths (i.e., 500–700 nm) and the NIR spectra (i.e., 720–960 nm). This indicates its potential application in discriminating one species from another. The baseline drift in the spectra shown in was mostly due to differences in sample sizes, since the NIR spectra also was reflected by the physical properties of the samples. With the MSC pretreatments, the distinct spectral features associated with different samples became more apparent.

Figure 2 Near infrared reflectance spectra of three different apple varieties.

Figure 2 Near infrared reflectance spectra of three different apple varieties.

Clustering Analyses Based on Principal Components Analysis

Principal component analysis (PCA) is an unsupervised clustering method requiring no knowledge of the data set structure and acts to reduce the dimensionality of multivariate data, while preserving most of the variance within it. The interpretation of the results of a principal component analysis is usually carried out by visualization of the component scores. PCA was performed on the whole spectra (400–960 nm), and hence, several principal components could be achieved. If the scores of these principal components were organized according to the number of the apples, a new apple plot could be created. The new plot was then called PCA scores image. If the scores of the first principal component (PC1) were used, the resultant image was then called PC1 scores image. If the scores of the first principal component (PC1) and the second principal component (PC2) scores were manipulated, the resultant image was then called PC1 and PC2 scores image, just as shown in . The advantage of using principal components scores image is that it can display the clustering information of species from multiple wavebands.

shows the reliabilities of PC 1, PC 2, and PC 3. It can be found that the cumulative reliabilities of PC 1 and PC 2 are 98%. It means that the PC1 and PC2 can explain 98% of the data variance. The PC1 and PC2 scores image appears to provide the clean clustering of the species.[Citation15] The remaining scored images did not give more useful information for detecting the species.

Table 1 PCs and reliabilities

The scatter plot of PC1 (variability 68%) × PC2 (variability 30%) was shown in . It could be found that the three groups were independently observed corresponding to different species, due to different spectral reflectance of the species. In this scatter plot, the differences among Fuji, Red Delicious, and Copefrut Royal Gala were pronounced. In short, Copefrut Royal Gala samples close clustered in the quadrant where PC1 was negative and PC2 was positive, while Red Delicious samples were always in the negative direction of PC2. In contrast, the Fuji samples were scattered over the quadrant where PC1 was positive. PCA can qualitatively analyze the varieties of apples, but it can't discriminate these varieties quantitatively. So, ANN was used to build quantitative analysis model to discriminate apple varieties shown below.

Figure 3 Principal component scores image (PC1 × PC2) of three species of apples.

Figure 3 Principal component scores image (PC1 × PC2) of three species of apples.

Extract Wavelet Components to Build Variety Discrimination Model

In this research, Daubechies-5 (db5) wavelet function was applied to compress the spectra data. It was used to compress the matrix that contained 90 samples and 561 variables. The spectral signal was used as the input to the filter bank, and the signal passed through a series of low-pass filters (LPF) and high-pass filters (HPF). After each filter, the signal was decomposed into low-frequency signal and high-frequency signal. The high-frequency signal contained mass noise and repeated information, and it could barely give any help to classify the varieties.[Citation26] So the low-frequency signal decomposed with 5 ranks by db5 wavelet function was used as wavelet components. The original spectra were shown in , and the wavelet components can be seen in . In , the abscissa represents wavelet components compressed by db5 wavelet function, and the ordinate represents the coefficient of wavelet components. The 561 original variables were compressed by the 26 characteristic wavelet components. The size of the new variables (wavelet components) was only 4% of that of original spectra data.

Figure 4 Original spectra of samples before wavelet transform.

Figure 4 Original spectra of samples before wavelet transform.

Figure 5 Characteristic information of spectra after wavelet transforms.

Figure 5 Characteristic information of spectra after wavelet transforms.

The whole spectra were replaced by the 26 characteristic wavelet components. The whole samples were separated randomly into two parts, randomly-selected 75 apples were used as calibration samples, and the rest were used as prediction samples. Each sample (fruit) was individually numbered. The matrix that contained 75 samples and 26 variables was used to build the BP-ANN model. The optimal architecture of neural network can be achieved by adjusting nodes of the hidden layer[Citation27] based on the DPS. In the model, 26 input neurons were connected with 12 hidden neurons, which transmitted their output to one single output neuron producing the final measure of resemblance as a number. Red Delicious apples, Copefruit Royal Gala apples, and Fuji apples were assigned dummy values of 1, 2 and 3, respectively. A sample was considered to be categorized correctly if the predicted value lay on the same side of the midpoint of the assigned values.[Citation28] Residual error was 9.993 × 10−6. Fifteen unknown samples were discriminated by this model, and the recognition rate was 100% ().

Table 2 Result of prediction for 15 unknown samples by BP-ANN model

The result of prediction obtained in this research are superior to those obtained by Reid et al.[Citation9] in discrimination apple juice varieties, whose method gave the correct classification of the apple juice samples from 82.4%(Golden Delicious), 87% (Jonagold), 100% (Bramley) to 100% (Elstar) (based on NIR data).

CONCLUSION

Our results indicate that it is possible to develop a non-destructive and fast technique to discriminate the varieties of apples. The visible/near infrared spectra technique can be used to achieve this purpose. The chemometrics procedures involving PCA, WT, and ANN can effectively deal with pattern recognition problem. Especially, the combination of WT and ANN can not only be used in classification, but also select pertinent features (detail coefficients) information from mass multidimensional signals. Further research will focus on establishing an optimal, standardized, and implemental multi-spectral system.

ACKNOWLEDGMENTS

This study was supported by the Teaching and Research Award Program for Outstanding Young Teachers in Higher Education Institutions of MOE, P. R. C., Natural Science Foundation of China, Specialized Research Fund for the Doctoral Program of Higher Education (Project No: 20040335034), and Natural Science Foundation of Zhejiang (Project No: RC02067).

REFERENCES

  • Kawano , S. , Watanabe , H. and Iwamoto , M. 1992 . Determination of Sugar Content in Intact Peaches by Near Infrared Spectroscopy with Fibre Optics in Interactance Mode . J. Japanese Soc. Hort. Sci. , 61 ( 2 ) : 445 – 451 .
  • Lovász , T. , Merész , P. and Salgo , A. 1994 . Application of Near Infrared Transmission Spectroscopy for the Determination of Some Quality Parameters of Apples . J. Near Infrared Spectro. , 2 : 213 – 221 .
  • Moons , E. , Dubois , A. , Dardenne , P. and Sindic , M. Nondestructive Visible and NIR Spectroscopy for the Determination of Internal Quality in Apple . Proc., Sensors for Non-destructive Testing Int'l. Conf. Orlando, Florida. February . pp. 122 – 132 . Ithaca
  • Bechar , A. , Mizrach , A. , Barreiro , P. and Landahl , S. 2005 . Determination of Mealiness in Apples Using Ultrasonic Measurements . Biosyst. Eng. , 91 : 329 – 334 .
  • Grotte , M. , Duprat , F. , Loonis , D. and Petri , E. 2001 . Mechanical Properties of the Skin and the Flesh of Apples . International Journal of Food Properties , 4 ( 1 ) : 149 – 161 .
  • Kleynen , O. , Leemans , V. and Destain , M.F. 2003 . Selection of the Most Efficient Wavelength Bands for “Jonagold” Apple Sorting . Postharvest Biology and Technology , 30 ( 3 ) : 221 – 232 .
  • Boscaini , E. , Mikoviny , T. , Wisthaler , A. , Hartungen , E.V. and Mark , T.D. 2004 . Characterization of Wine with PTR-MS . International Journal of Mass Spectrometry , 239 : 215 – 219 .
  • Seregely , Z. , Deak , T. and Bisztray , G.D. 2004 . Distinguishing Melon Genotypes Using NIR Spectroscopy . Chemometrics and Intelligent Laboratory Systems , 72 : 195 – 203 .
  • Alonso , J. , Artigas , J. and Jimenez , C. 2003 . Analysis and Identification of Several Apple Varieties Using ISFETs Sensors . Talanta , 59 : 1245 – 1252 .
  • Marrazzo , W.N. , Heinemann , P.H. , Crassweller , R.E. and LeBlanc , E. 2005 . Electronic Nose Chemical Sensor Feasibility Study for the Differentiation of Apple Cultivars . American Society of Agricultural Engineers , 48 ( 5 ) : 1995 – 2002 .
  • Belton , P.S. , Colquhoun , I.J. , Kemsley , E.K. , Delgadillo , I. , Roma , P. , Dennis , M.J. , Sharman , M. , Holmes , E. , Nicholson , J.K. and Spraul , M. 1998 . Application of Chemometrics to the 1H NMR Spectra of Apple Juices: Discrimination Between Apple Varieties . Food Chemistry , 61 ( 1 ) : 207 – 213 .
  • Turza , S. , Toth , A. and Varadi , M. In Multivariate Classification of Different Soybean Varieties . Journal of Near Infrared Spectroscopy: Proceedings of the 8th International Conference . March 1998 . Edited by: Davies , A.M.C. pp. 183 – 187 . Chichester, , UK : NIR Publications .
  • Haluk , U. 2000 . Application of the Feature Selection Method to Discriminate Digitized Wheat Varieties . Journal of Food Engineering , 46 : 211 – 216 .
  • Reid , L.M. , Woodcock , T. , O'Donnell , C.P. , Kelly , J.D. and Downey , G. 2005 . Differentiation of Apple Juice Samples on the Basis of Heat Treatment and Variety Using Chemometric Analysis of MIR and NIR Data . Food Research International , 38 : 1109 – 1115 .
  • Qi , X.M. , Zhang , L. , Du , X.L. , Song , Z.J. , Zhang , Y. and Xu , S.Y. 2003 . Quantitative Analysis Using NIR by Building PLS-BP Model . Spectroscopy and Spectral Analysis , 23 ( 5 ) : 870 – 872 .
  • Osborne , B.G. , Fearn , T. and Hindle , P.H. 1993 . Practical NIR Spectroscopy , U.K : Longman, Harlow .
  • Krzanowski , W.J. , Jonathan , P. , McCarthy , W.V. and Thomas , M.R. 1995 . Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data . Applied Statistics , 44 : 105 – 115 .
  • Wu , B. , Abbott , T. , Fishman , D. , McCurray , W. , Mor , G. , Stone , K. , Ward , D. , Williams , K. and Zhao , H. 2003 . Comparison of Statistical Methods for Classification of Ovarian Cancer Using Mass Spectrometry Data . Bioinformatics , 19 : 1636 – 1643 .
  • Marina , V. , Naijun , S. and Philip , J.B. 2005 . NIR and Mass Spectra Classification: Bayesian Methods for Wavelet-Based Feature Selection . Chemometrics and Intelligent Laboratory System , 77 : 139 – 148 .
  • Pereira , G.A. , Gomez , H.A. and He , Y. 2003 . Advances in Measurement and Application of Physical Properties of Agricultural Products . Transactions of the CSAE , 19 ( 5 ) : 7 – 11 .
  • Nigel , Y. 2001 . Potato Crisp Moisture Estimation Using Near Infrared Spectroscopy . International Journal of Food Properties , 4 ( 2 ) : 247 – 260 .
  • Mallat , S.G. 1989 . A Theory for Multiresolution Signal Decomposition: The Wavelet Representation . IEEE Trans. Pattern Anal. Machine Intell. , 11 ( 7 ) : 674 – 693 .
  • Daubechies , I. 1992 . Ten Lectures on Wavelets , Vol. 61 , 357 Philadelphia, PA : SIAM . 1992
  • Rich , E. and Knight , K. 1991 . “ Connectionist Models ” . In Artificial Intelligence , Edited by: Shapiro , D.M. and Murphy , J.F. 487 – 519 . Singapore : McGraw Hill Book Co .
  • He , Y. , Zhang , Y. and Xiang , L.G. 2005 . Study of Application Model on BP Neural Network Optimized by Fuzzy Clustering . Lecture Notes in Artificial Intelligence , 3789 : 712 – 720 .
  • Wang , H.B. and Wu , Y.F. 2004 . Recognition Based on Wavelet Transform and BP Neural Network . Computer Engineering and Applications , 24 : 51 – 53 .
  • Zhao , C. , QU , H.B. and Cheng , Y.U. 2004 . A New Approach to the Fast Measurement of Content of Amino Acids in Cordyceps Sinensis by ANN-NIR . Spectroscopy and Spectral Analysis , 24 ( 1 ) : 50 – 53 .
  • Wang , D. , Dowell , F.E. , Lan , Y. , Pasikatan , M. and Maghirang , E. 2002 . Determining Pecky Rice Kernels Using Visible and Near-Infrared Spectroscopy . International Journal of Food Properties , 5 ( 3 ) : 629 – 639 .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.