The volume of data grows at an unprecedented rate, in particular in the fields of scientific data, design and manufacturing, logistics engineering, medical, marketing, and financial data. Data mining offers tools for analysis of large databases and discovery of trends, patterns and knowledge. Since the first IEEE Data Engineering Conference in 1982, at least four data mining related journals have been established. Data mining is entering many applications in engineering design, manufacturing and logistics engineering. Numerous books have been dedicated to these applications. While many data mining papers have appeared in largely theory-based publications, journal issues discussing applications of data mining in engineering design, manufacturing and logistics engineering are rare. This special issue attempts to fill this coverage gap. It focuses on the theory and applications of data mining, text mining, web mining, and image mining in engineering design, manufacturing, and logistics engineering.
The process of editing this special issue was guided by the principle that the papers must be of the highest quality, offer new contributions, and be relevant to the practice and further research and development of production systems. Therefore, all papers have been reviewed at least twice by at least two outstanding researchers in this field. Two principles were followed to avoid conflicts of interest in the review process. First, papers submitted by the guest editors were seen by the journal Editor, John Middle. Second, the papers were reviewed largely by reviewers outside of the pool of authors. As a result, some authors have not been assigned any papers to review, while many papers were reviewed by professionals who did not submit papers to the special issue.
We have received papers from the following regions: Canada, China, Israel, Italy, Japan, Taiwan, United Kingdom, and the United States. illustrates application areas of the accepted papers.
Table 1. Areas of accepted papers.
Data mining methodology
Jung et al . propose a vertical group-wise threshold (VGWT) procedure for data reduction of multiple high-dimensional functions containing class information in the wavelet domain. They found that the VGWT procedure increases the class separation ability with a reasonably small loss of data reduction efficiency. In addition to the use of a real-world example from the metal stamping process to illustrate the procedure for the purpose of process fault detection, they have used a Monte Carlo simulation to study the impact of different levels of class variations and noises on the performance of the proposed procedure. The diverse and massive nature of input data sets limits on the practicality of obtaining test data sets through statistical sampling, while accessibility to public data sets is often limited as a result of proprietary and privacy rights that protect many sources of data. An alternative to obtaining actual data sets is to generate synthetic data sets based on partial information about associations between attributes on the data, which is usually available. Jeske et al . address this issue by using the iterative proportional fitting algorithm, a well known statistical method for facilitating the analysis of contingency table data. Their goal was to devise a scheme that incorporates all the information that can be found about associations between attributes, but not to force additional structure (e.g., the distribution assumption) into the scheme.
Liao et al . propose an adaptive clustering method based on a genetic algorithm for exploratory mining of feature vector and time series data. Their method is basically an implementation of the k-medoids algorithms with distance computed based on dynamic time warping for data with unequal length and Euclidean distance for data with equal length. They have used the grinding force data to demonstrate the performance of their procedure in comparison with some other clustering methods. Buddhakulsomsiri et al . present an association rule generation algorithm for mining automotive warranty data. The proposed algorithm uses elementary set concept and database manipulation techniques to develop useful relationships between product attributes and causes of failure, and they used the IF-THEN association rules to represent these relationships. They illustrated their proposed association rule generation algorithm with automotive warranty data to detect the root causes of a particular type of warranty problem. Caramia and Felici present a clique-based approach for mining relevant information on the web. It has applications in web-based design and manufacturing. They consider two related problems in web mining: How to select an appropriate set of keywords for a thematic engine taking into account the semantic and linguistic extensions of the search context, and how to select and rank a subset of relevant pages given a set of search keywords. Both problems are solved with the aid of a graph representation and by searching a particular subset of such graph. The maximum-weight clique algorithm is used to effectively identify the subsets.
Application in engineering design
Engineering configuration and requirements configuration are two interrelated critical issues in product configuration design and management. Shao et al . propose a methodology and system architecture for accomplishing the above two product configuration tasks and bridging the gap between them. This methodology is based on integration of popular data mining approaches (such as fuzzy clustering and association rule mining) and variable precision rough set and it focuses on the discovery of configuration rules from the purchased products according to customer groups. The proposed methodology is illustrated with a case study of an electric bicycle design. Capturing the design knowledge by tracking the activities involved in starting and detailing a design in an existing CAD system could be valuable in improving future designs and training current and new design engineers. Jin and Ishino present a design activity knowledge acquisition (DAKA) framework to help extract a designer's design activity knowledge from the CAD operation event data. The DAKA system is composed of a product model roadmap that represents the trajectory of designer's design moves and a function-based operation mining algorithm that extract meaningful design operations from the CAD event database. Their framework was illustrated by a case study in automotive door design.
Application in manufacturing
Orhan et al . present a two stage data mining approach for flaw identification in ceramics manufacture. In the first stage, digital micro-structural image processing is used to characterise the flaws and surface damage. In the second stage, an extreme value probability distribution is fitted using the information from stage one. Results from the two stage data mining showed that ceramic production method significantly affects flaw characteristics that, in turn, determine the ceramics’ fracture strength. The problem of detecting changes in the distribution of process variables is referred to as change point detection. This has been important in root cause analysis in manufacturing. Li et al . studied the important multivariate cases with multiple changes points and without an assumed distribution. A tree-based supervised learner is used to identify the variables that change in a multivariate process with hundreds of variables aimed at detecting the subset of variables that change. On the basis of the same sets of simulated data, the authors demonstrated the superiority of their proposed method over the multivariate exponentially weighted moving average (EWMA) control chart.
Raheja et al . present a data fusion/data mining-based architecture for condition-based maintenance (CBM). Data fusion is widely used in the defence industry to automatically combine information from multiple sources in order to make decisions regarding the state of an object. Data mining is defined in their application as a method to seek unknown patterns and relationships in large datasets. In the proposed architecture, they use both of these methods to determine the overall condition or health of a machine. This kind of information was then used by predictive maintenance models to determine the best course of action for maintaining critical equipment. Huang et al . present a rough-set based approach to manufacturing process document retrieval. To overcome the deficiency of the current methods, such as vector space model (VSM), Boolean model, fuzzy set model and probability model, in processing the qualitative and attribute types of data, they enhance the VSM by incorporating rough set. They then demonstrate the benefits of using their proposed rough set based approach over using the VSM approach alone in manufacturing document retrieval. Qian et al . present a clustering-based functional mixture approach to model the customer profile for churn detection in the telephone industry. Their method can be generalised to model sophisticated manufacturing processes or systems for quality improvement.
Application in logistics engineering
Tseng et al . present a hybrid data-mining approach to predicting preferred suppliers. In this hybrid data mining approach, they use the rough set algorithm for feature selection and the enhanced multi-class support vector machine (SVM) method for accurate prediction. This hybrid approach could simultaneously derive decisions rules, identify the most significant features, and generate a well-tuned prediction model with high accuracy.
List of reviewers:Footnote*
Nihat Altintas – Carnegie Mellon University, USA
Jirachai (Sim) Buddhakulsomsiri – University of Michigan, USA
Victoria Chen – University of Texas, USA
Yong Chen – University of Iowa, USA
John Edwards – Loughborough University, UK
Jack Feng (four papers) – Bradley University, USA
Nagi Gebraeel – University of Iowa, USA
Matt Giess – University of Bath, UK
Jenny Harding – Loughborough University, UK
Chun-Che Huang – National Chi Nan University, Taiwan
Daniel Jeska – University of California at Riverside, USA
Myong K. Jeong – University of Tennessee, USA
Wei Jiang – Stevenson Institute of Technology, USA
Yan Jin – University of Southern California, USA
Andrew Kusiak (two papers) – University of Iowa, USA
Sarah Lam – SUNY at Binghamton, USA
Mark Last – Ben-Gurion University, Israel
Peigen Li – Huazhong University of Science and Technology, China
Xiangyang (Sean) Li – University of Michigan, USA
Ming Liang – University of Ottawa, Canada
Gary Lin – Bradley University, USA
Jye-Chyi Lu – Georgia Institute of Technology, USA
A. Richard Mileham – University of Bath, UK
George Runger – Arizona State University, USA
Xinyu Shao – Huazhong University of Science and Technology, China
Jianjun Shi – University of Michigan, USA
Alice Smith – Auburn University, USA
Theodore Trafalis – University of Oklahoma, USA
Marietta Tretter – Texas A & M University, USA
Tzu-Liang (Bill) Tseng – University of Texas, USA
Janet Twomey – Wichita State University, USA
Zhonghao Wang – Huazhong University of Science and Technology, China
Jaekyung Yang – Chonbuk National University, Korea
Yuehwern Yih – Purdue University, USA
Yong Yin – Yamagata University, Japan
Armen Zakarian (two papers) – University of Michigan, USA
List of authors included in the special issue
Jirachai (Sim) Buddhakulsomsiri – University of Michigan, USA
Massimiliano Caramia – Institute of Applied Mathematics, Italy
Pei-Chann Chang – Yuan Ze University, Taiwan
Horng-Fu Chaung – Da Yeh University, Taiwan
Orhan Dengiz – Auburn University, USA
Giovanni Felici – Institute for Systems Analysis and Informatics, Italy
Jack Feng – Bradley University, USA
D. V. Gokhale – University of California at Riverside, USA
Johnny Ho – University of Texas, USA
Chun-Che Huang – National Chi Nan University, Taiwan
Yoko Ishino – Hiroshima University, Japan
Myong K. Jeong – University of Tennessee, USA
Dan Jeska – University of California, Riverside, USA
Fuhua Jiang – Georgia State University, USA
Wei Jiang – Stevenson Institute of Technology, USA
Yan Jin – University of Southern California, USA
Uk Jung – Georgia Institute of Technology, USA
Fang Li – Arizona State University, USA
Peigen Li – Huazhong University of Science and Technology, China
Xiangyang (Sean) Li – University of Michigan, USA
T. Warren Liao – Louisiana State University, USA
Hui-Fen Liang – National Chi Nan University, Taiwan
James Llinas – SUNY at Buffalo, USA
Jye-Chyi Lu – Georgia Institute of Technology, USA
Rakesh Nagi – SUNY at Buffalo, USA
Ian Nettleship – University of Pittsburgh, USA
Zhiguang Qian – Georgia Institute of Technology, USA
Dhruv Raheja – SUNY at Buffalo, USA
Carol Romanowski – Rochester Institute of Technology, USA
George Runger – Arizona State University, USA
Xinyu Shao – Huazhong University of Science and Technology, China
Yuri Siradeghyan – University of Michigan, USA
Alice Smith – Auburn University, USA
Chi-Feng Ting – Louisiana State University, USA
Tzu-Liang (Bill) Tseng – University of Texas, USA
Kwok-Leung Tsui – Georgia Institute of Technology, USA
Eugene Tuv – Intel (Chandler, Arizona), USA
Zhonghao Wang – Huazhong University of Science and Technology, China
Lan Ye – Wells Fargo Financial (Philadelphia), USA
Armen Zakarian – University of Michigan, Dearborn, USA
Acknowledgements
This special issue could not have been produced without the enthusiastic support of the numerous authors and reviewers. A list of all contributors is provided at the end of the Editorial. As guest editors, we are grateful for the timely and professional service of the reviewers, and to the timely response of the authors. We also appreciate the professional and timely support and seamless cooperation from the journal Editor, John Middle and his assistant Carmela Valentine, as well as production assistance from Taylor & Francis. While the authors, reviewers and editorial staff have contributed to the success of the special issue, the guest editors take full responsibility for any mistakes or errors that might have occurred. Finally, we are grateful to the respective departments of the two guest editors for support received during this project.
Notes
*Two anonymous reviewers are not listed here. They were assigned by Editor John Middle to review the paper submitted by Jack Feng and his colleagues.