ABSTRACT
Detecting cancer at an early stage is an important application, which reduces the risk of death for a cancerous patient. Data from tissues is generated using microarray DNA technology and these data contains genes/attributes of very large number and samples of relatively small number, making it tough to predict the classifier. Hence, several machine learning algorithms with more emphasis on feature selection were proposed to solve the problem of handling large number of genes/attributes. Most of the feature selection (FS) algorithms mentioned in the literature are Supervised learning algorithms. Authors proposed an Unsupervised feature selection algorithm, making it more independent of the class label. Simple Feature ranking algorithm using Single Value Decomposition (SVD)-Entropy is the first step in the proposed feature selection algorithm. SVD-Entropy based method selects attributes independent of the each other, hence reduces complexity involved in multiple association of attributes in large datasets. At second stage, Correlation among attributes is used to remove attributes/features that are highly correlated to each other. Once the features are selected, a logistic regression model for predicting the class label. The model predicts whether its a cancer or non-cancer causing tissue. The proposed algorithm proved to be an efficient approach in terms of accurately predicting the cancer causing tissues. Experiments were carried on three different datasets like Ovarian, Lung and Breast cancer. The accuracy scores achieved for these datasets are 100, 75 and 96.4 percent, respectively, proving the efficiency of the approach.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Additional information
Notes on contributors
![](/cms/asset/f7147582-fb2c-4a09-be63-451cded060f1/tijr_a_1878062_ilg0001.gif)
Ummadi Janardhan Reddy
U Janardhan Reddy obtained in BTech (Computer Science and Engineering) from JNT University, Hyderabad in 2007. MTech (Computer Science and Engineering) from JNT University, Kakinada, in 2011. He is pursuing his doctoral degree in the area of data mining in the Department of computer science and engineering, JNTUA, Anantapur, India. He is an assistant professor in Vignan's University Guntur, Andhra Pradesh, India. His areas of interest include data mining, machine learning algorithm techniques in the field of bioinformatics and healthcare.
![](/cms/asset/ef3e0402-fdd2-4c21-8185-750ab4dfa06e/tijr_a_1878062_ilg0002.gif)
B. Venkata Ramana Reddy
B Venkata Ramana Reddy graduated in BTech (Computer Science and Engineering) from JNT University, Hyderabad in 2007. M.Tech (Computer Science and Engineering) from JNT University, Kakinada, in 2011. He is pursuing his doctoral degree in the area of data mining in the Department of Computer Science and Engineering, JNTUA, Anantapur, India. He is an assistant professor in Vignan's University Guntur, Andhra Pradesh, India. His areas of interest include data mining, machine learning algorithm techniques in the field of bioinformatics and healthcare. Email: [email protected]
![](/cms/asset/d21c1b41-c508-4e75-83ea-1374c280b1aa/tijr_a_1878062_ilg0003.gif)
B. Eswara Reddy
B Eswara Reddy has obtained BTech from Sri Krishna Devaraya University in 1995, MTech in software engineering from JNTU Hyderabad in 1999 and PhD in computer science and engineering in 2008 from JNTU Hyderabad. He has over than 20 years of teaching and research experience. He is a member of IEEE, CSI, ISTE, IE(I), ISCA, IAENG. He has acted as both UG and PG Board of Studies Chairman. He has served as NSS programme officer, officer in charge of examinations and computer center, IEEE student branch counselor, program chair for the international conference on Emerging Trends in Electrical, Communication and Information Technologies (ICECIT), coordinator for MSIT and Incubation center, Head of the Department, Vice Principal, President Teacher's Association and presently serving as Principal, JNTUA College of Engineering, Kalikiri. He published over 120 research papers in international conferences and journals. Eight scholars received PhD degree under his guidance. He has co-authored two engineering text books – Data Mining: Principles and Approaches Elsevier publishers, ISBN: 978-93-82291-49-7 and Programming with Java, Pearson/Sanguine Publishers, ISBN: 978-81-317-5834-2. He received the grant of 10,66,000/- from UGC and completed Major Research Project(MRP) titled ‘Cloud computing framework for rural health care in Indian scenario. He has visited New York, USA in April, 2016 for presenting the research paper in IEEE Big Data Security conference held at Columbia University. His areas of interest include pattern recognition and image analysis, data mining, cloud computing. Email: [email protected]