In programmed to perform specific tasks; researchers interested

In this paper, we show that how
deep learning can be used for cancer detection and cancer type analysis as a
superior method in comparison to many other previously found machine learning
methods like Artificial Neural Network, Bayes Network, Support Vector Machine,
Decision tree. The technique is here applied to the classification of cancer
types. In this domain we show that the performance of this method is better
than that of previous methods, therefore promising a more comprehensive and
generic approach for cancer detection and diagnosis. In this paper a framework
has been presented where we show various cancer types and classification
methods applied to them. It was seen the size of dataset was a factor in determining
the cancer. Theoretically it was found out that the deep learning method for
classifying cancer is better than other old methods of classification.

 

 

 

 

 

 

 

 

 

 

 

 

Chapter-1

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

 

Introduction

 

. Machine
learning, the ability of machines to learn
without being explicitly programmed, has proved to be a promising part of
artificial intelligence. Because of new computing technologies, machine
learning today is not like machine learning of the past. It was born from
pattern recognition and the theory that computers can learn without being
programmed to perform specific tasks; researchers interested in artificial
intelligence wanted to see if computers could learn from data. The
iterative aspect of machine learning is important because as models are exposed
to new data, they are able to independently adapt. They learn from previous
computations to produce reliable, repeatable decisions and results. It’s a
science that’s not new – but one that has gained fresh momentum.

Artificial
intelligence has been the root to solving problems in many fields like
Economics, Robotics, Linguistics, Medical diagnosis and many others Machine learning in cancer research dates back
to the 20th century. Machine learning can be supervised, unsupervised
and semi-supervised, the later proving to be more useful. New methods
consisting of modified algorithms are being implemented in predicting, treating
cancer. Efficiency is highly required when predicting cancer because even one
life matters. If our method can provide us with accuracy of 100% then only it
means not even one prediction was wrong. So, search for the perfect method of
predicting cancer is essential. 

ANN
(artificial neural network) has been the gold standard, BN(Bayesin network)
proved to be good in predicting certain cancers like colon cancer,  Decision Trees (DT) has been efficient too
and  deep learning is that promising
method that has been recently researched about and something that has not been
yet included in that many review papers. It uses a variety of optimization
techniques that permits us to learn from past training and detect complex
patterns from large and complex data sets. In cancer prediction we need a large
dataset for training and testing, deep learning could be more efficient and in
comparison to other methods it could be the best.

Cancer
is the general name for a group of more than 100 diseases. Although cancer
includes different types of diseases, they all start because abnormal cells
grow out of control. Without treatment, cancer can cause serious health
problems and even loss of life. Early detection of cancer may reduce mortality
and morbidity. AI techniques are approaches that are utilized to produce and
develop computer software programs. AI is an application that can re-create
human perception. This application normally requires obtaining input to endow
AI with analysis or dilemma solving, as well as the ability to categorize and
identify objects. This paper describes various AI techniques, such as support
vector machine (SVM) neural network, fuzzy models, artificial neural network
(ANN), and K-nearest neighbor (K-NN). Feedforward neural networks that are
capable of classifying cancer cases with high accuracy rate have become an
effective tool. Computation time is fixed, and extremely high computation speed
results from the parallel structure. Moreover, the approach is fault-tolerant
because of the distributed nature of network knowledge. General solutions can
be learned from presented training data. Neural networks eliminate the
requirement to produce an explicit model of a process. Moreover, these networks
can easily model parts of a process that cannot be modeled or even usually
unidentified. A neural network could learn from incomplete and noisy data.

 

 

 

 

 

 

 

 

Chapter 2

 

Background

 

Various
methods of machine learning discussed in the referred papers are discussed
below:

 The purely supervised learning algorithms:

1.   
Logistic Regression –
using Theano for something simple

2.   
Multilayer
perceptron – introduction to layers

3.   
Deep Convolutional Network – a simplified version of LeNet5

The unsupervised and semi-supervised
learning algorithms:

·        
Auto
Encoders, Denoising Autoencoders – description of
autoencoders

·        
Stacked
Denoising Auto-Encoders – easy steps into
unsupervised pre-training for deep nets

·        
Restricted
Boltzmann Machines – single layer generative RBM model

·        
Deep
Belief Networks – unsupervised generative pre-training of stacked
RBMs followed by supervised fine-tuning

All of
these will be discussed ahead.

2.1 Methods

Restricted
Boltzmann Machine

In our cancer data analysis
problem, for each patient, we call the measured genomic data from each platform
(e.g., gene expression, miRNA expression and DNA methylation) the genomic
profile.

 

 

 

 

 

 

 

 

 

 

Artificial Neural Network

ANN

a. Input layer

The input
layer receives the values of the explanatory attributes for each observation.
Usually, the number of input nodes in an input layer is equal to the number of
explanatory variables, the patterns are introduced to the network, which
communicate to one or more ‘hidden layers’. Nodes of this layer do not change
the data. They receive a single value on their input and duplicate the value to
their many outputs-the hidden nodes.

b. Hidden layer

The Hidden
layers which can be many in number, apply given transformations to the input
values inside the network. It connects with outgoing arcs to output nodes or to
other hidden nodes. In this, the actual processing is done via a system of
weighted ‘connections’. The values entering a hidden node are multiplied by
weights, a set of predetermined numbers stored in the program. The weighted
inputs are then added to produce a single number.

c. Output layer

Output
layers are linked from the hidden layers. They receive connections from hidden
layers or from input layer and return an output value that corresponds to the
prediction of the response variable. In classification problems, there is
usually only one output node. Data is changed in this layer of the network. The
ability of the neural network to provide useful data manipulation lies in the
proper selection of the weights.

Some sub
methods of ANN:

 a) Fuzzy neural
network: A neuro-fuzzy system
is represented as special three-layer feedforward neural network.. The first layer corresponds to the input
variables. The second layer symbolizes the fuzzy rules. The third layer represents the output variables.
The fuzzy sets are
converted as (fuzzy) connection
weights.

b) k-NN: n pattern recognition, the k-nearest neighbors algorithm
(k-NN) is a non-parametric method used for classification and
regression. In both cases, the input consists of the k closest training examples
in the feature space. The output depends on whether k-NN is used for classification or regression

c) Multilayer perception (MLP): A multilayer perceptron (MLP) is a
class of feedforward artificial neural network. An MLP consists of at least
three layers of nodes. Except for the input nodes, each node is a neuron that
uses a nonlinear activation function.

d)  Self-organising
map: A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to
produce a low-dimensional (typically two-dimensional), discretized
representation of the input space of the training samples, called a map, and is therefore a method to
do dimensionality reduction.

 

Support Vector Machine(SVM):

A Support
Vector Machine (SVM) is a discriminative classifier formally defined by a
separating hyperplane. In other words, given labeled training data (supervised
learning), the algorithm outputs an optimal hyperplane which categorizes new
examples.

Hybrid network

 k-SVM: It uses John Platt’s SMO algorithm for
solving the SVM QP problem an most SVM formulations. On the spoc-svc, kbb-svc, C-bsvc and eps-bsvr formulations
a chunking algorithm based on the TRON QP solver is used. For
multiclass-classification with $k$ classes, $k > 2$, ksvm uses the
`one-against-one’-approach, in which $k(k-1)/2$ binary classifiers are trained;
the appropriate class is found by a voting scheme, The spoc-svc and the kbb-svc formulations
deal with the multiclass-classification problems by solving a single quadratic
problem involving all the classes. If the predictor variables include factors,
the formula interface must be used to get a correct model matrix.

 

 

Decision Tree:

A decision tree is a decision support tool that uses
a tree-like graph or model
of decisions and their
possible consequences, including chance event outcomes, resource costs, and
utility. It is one way to display an algorithm that only contains conditional
control statements.

C4.5/J48: C4.5 is an algorithm used to
generate a decision tree developed by Ross Quinlan. C4.5 is an extension of
Quinlan’s earlier ID3 algorithm. The decision trees generated by C4.5 can be
used for classification, and for this reason, C4.5 is often referred to as a
statistical classifier.

 

 

 

Bayesin network (BN) :

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model
is a probabilistic graphical model (a type of statistical model) that
represents a set of variables and their conditional dependencies via a directed
acyclic graph.

Suppose
that there are two events which could cause grass to be wet: either the
sprinkler is on or it’s raining. Also, suppose that the rain has a direct
effect on the use of the sprinkler (namely that when it rains, the sprinkler is
usually not turned on). Then the situation can be modeled with a Bayesian
network (shown below). All three variables have two possible values, T (for
true) and F (for false).

 

Deep learning

 

Deep Learning is a new area of
Machine Learning research, which has been introduced with the objective of
moving Machine Learning closer to one of its original goals: Artificial
Intelligence. See these course notes for a brief introduction to Machine Learning
for AI and an introduction to Deep Learning algorithms.

 

 Deep
learning allows computational models that are composed of multiple processing
layers to learn representations of data with multiple levels of abstraction.
These methods have dramatically improved the state-of-the-art in speech rec­ognition,
visual object recognition, object detection and many other domains such as drug
discovery and genomics. Deep learning discovers intricate structure in large
data sets by using the backpropagation algorithm to indicate how a machine
should change its internal parameters that are used to compute the
representation in each layer from the representation in the previous layer.
Deep convolutional nets have brought about breakthroughs in processing images,
video, speech and audio, whereas recurrent nets have shone light on sequential
data such as text and speech.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Performance Measures used:

•        
Sensitivity= tp /(tp+fn)

•        
Specificity= tn/(fp+tn)

•        
Accuracy= tp+tn/ tp+fp+fn+tn

•        
Area under the curve

 

where tp are true positive, fp false
positive, fn false negative, and tn true negative counts.

 

Tools used:

a)     
Datasets from UCI repository: Wisconsin original breast
cancer dataset, Breast cancer dataset(long), colon cancer dataset, lung cancer
dataset, all processed datasets but coding changed ( in JAVA) before running
through Weka.

 

b)     
WEKA application software: Weka contains a collection of visualization tools and
algorithms for data
analysis and predictive
modeling, together with
graphical user interfaces for easy access to these functions. The original
non-Java version of Weka was a Tcl/Tk
front-end to (mostly third-party) modeling algorithms implemented in other
programming languages, plus data
preprocessing utilities
in C, and a Makefile-based
system for running machine learning experiments. This original version was
primarily designed as a tool for analyzing data from agricultural
domains, but the more recent fully Java-based version (Weka 3), for which
development started in 1997, is now used in many different application areas,
in particular for educational purposes and research.

 

Advantages
of Weka include:

a)     
Free availability under the GNU
General Public License.

b)     
Portability, since it is fully implemented in
the Java
programming language and thus runs on almost any modern
computing platform.

c)     
A comprehensive collection of data
preprocessing and modeling techniques.

d)    
Ease of use due to its graphical user
interfaces.

 

 

 

 

Chapter 3

 

Literature
Review

 

3.1 Classifying tumor cell

Physicians
can benefit from the within abstract tumor attributes by better understanding
the properties of different types of tumors 2. Different kinds of machine
learning and statistical approaches are used to classify tumor cells 14.
Hybrid methods have proved to be very much accurate. K-SVM methodology, a
hybrid of ANN and SVM improves the accuracy to 97.38%, when tested on the
Wisconsin Diagnostic Breast Cancer (WDBC) data set from the University of
California – Irvine machine learning repository. The results shows capability
of diagnosis and time saving during the training phase 2.

 

 

 

3.2 Various techniques used for prediction

According
to the better designed and validated studies machine learning methods have
proved to substantially (15-25%) improve the accuracy of predicting cancer
susceptibility, recurrence and mortality 27. Even though some progress has
been achieved, there are still many challenges remaining and directions for
further research, such as developing better classification algorithms and
integration of classifiers to reduce false positives 1. Using automated
computer tools and in particular machine learning to facilitate and enhance
medical analysis and diagnosis is a promising and important area 3. A review
article showed survey application, opportunities and barriers of intelligent
data analysis as an approach to improve cancer care management. Here, it is
shown that Intelligent Data Analysis (IDA) definitely has significant role in
improving cancer care, prevention, increased speed and accuracy in diagnosis
and treatment, reduce costs, proving in every way that machine learning is a
promising way of detecting cancer. It has been noticed that different methods
provide high accuracies for different types of cancer.

 

Neural
networks are currently the most active research area in medical science,
especially in the areas of cardiology, radiology, oncology, urology and etc.
Classification between the normal, abnormal and cancerous cells identified by
using an artificial neural network, produces accurate results than the manual
screening methods like Pap smear and Liquid cytology based (LCB) test 14. But
ANN mentioned is an older technique, better ML techniques are available 10. A
growing dependence on protein biomarkers and microarray data, a strong bias
towards applications in prostate and breast cancer, and a heavy reliance on
older techniques such as artificial neural networks (ANNs), support vector
machine (SVM) can be noticed instead of more recently developed or more easily
interpretable machine learning methods27. The motivations beyond using
ensemble classifiers are that the results are less dependent on peculiarities
of a single training set and because the ensemble system outperforms the
performance of the best base classifier in the ensemble 26. Results of neural
network structures can be enhanced by proper settings of neural network
parameters. Although neural network techniques provide good classification
rate, but their training time is very high. Several researchers thus hybridize
neural network techniques with optimization algorithms like PSO for further
enhancement of accuracy. The optimization algorithms are used for
dimensionality reduction, they suppresses search space and therefore, reduces
the training time of neural network. FLANN alone shows 63.4% accuracy whereas
PSO-FLANN provides good classification rate with 92.36%. In future study,
accuracy of neural network can be enhanced by increasing the number of neurons
in the hidden layer .Different training and learning rules can be applied for
training ANN in order to improve the performance of classifier 14.

 

Results in
paper 13 shows that with increased number of training samples, the number of
false positive and false negative rates decreases and the author of the paper
further agrees in increasing the number of patients to be tested for
implementation of their proposed method called GONN. In one paper 6 SSL was
proved to be the best among ANN and SVM, and the differences in performance
were statistically significant 6.

 

3.3 Gap and superior method

Deep
learning- deep neural network has been considered to be a superior method by
several researches. DeeperBind, (mentioned in 20) an application using deep
learning method can model the positional dynamics of probe sequences. It can be
trained and tested on datasets containing different length sequences. It has
been claimed that this is the most accurate pipeline that can predict binding
specificities of DNA sequences from the data produced by high-throughput
technologies through utilization of the power of deep learning 23. A database
HGMD consisting of information about germline mutation in nuclear genes has
been compared with other related databases like OMIM, ClinVar and was found to
be superior 9. So, data can be collected from this database too though what
we used in this paper is from UCI repository including Wisconsin Breast cancer
original dataset. In paper 20 among the four main algorithms: SVM, NB, k-NN
and C4.5 on the Wisconsin Breast Cancer (original) datasets, SVM has proven its
efficiency in Breast Cancer prediction and diagnosis and obtains the best
performance in terms of precision and low error rate 20. But deep learning
methods were not implemented for the dataset, so we are interested in applying
the same dataset used in the mentioned paper, using deep learning techniques.

 

 

 

 

 

 

 

Chapter 4

 

Methodology

Running methods via Weka

For
finding out which machine learning method is superior, Weka, a software which
is a collection of machine learning algorithms was used. Targeted methods are
SVM shown as SMO, ANN as multi layer perception, Decision tree as decision
stump, Bayes Network specially NaiveBayes have been used to run through the
datasets of breast, colon and lung cancer. Wisconsin original dataset for
breast cancer have been run through Weka for rechecking the results.

These are
some screenshots showing results:

 ANN for Wisconsin original dataset(shows “Correctly
Classified Instances” as accuracy of 96.2264% )

SVM for the same:

SVM(SMO) for colon cancer dataset:

SVM for lung cancer
dataset (having highest accuracy/lowest incorrectness I.e. 4.4235%)

Chapter  5

 

Results
and Discussion

 

 

 

 

 

Methods

Sub-method
(If any)

Types of
cancer

Paper no.

Accuracy

Sensitivity

Specificity

Colon cancer

Breast
cancer

Oral
cancer

Basal
Cell
cancer

Lung
cancer

Decision
Tree
 

 

 

•       
 

 

 

 

5

0.936

0.958

0.907

17

0.93

Artificial
Neural Network

 

 

•       
 

 

 

 

6

0.65

0.73

0.58

 

 

•       
 

 

 

•       
 

17

0.835

Multi-layer
Perception(MLP)

 

•       
 

 

 

 

5

0.947

0.956

0.928

Support
Vector Machine

 

 

•       
 

 

 

 

1

0.6456

1

0.6449

6

0.51

0.65

0.52

5

0.957

0.971

0.945

17

0.69

 

 

 

•       
 

 

 

17

0.75

 

 

Bayesin
Network

 

 

•       
 

 

 

 

17

Deep Learning

 

 

 

 

•       
 

 

 

0.921

0.887

0.941

Semi-supervised
Learning

 

 

•       
 

 

 

 

6

0.71

0.76

0.65

Graph
based
 

 

•       
 

 

 

 

17

0.807

•       
 

 

 

 

 

17

0.767

Table: A framework of accuracy of different cancer
prediction techniques

 

The dots
indicate the category of cancer of the method involved.

 

 

 For Wisconsin original dataset for breast
cancer, according the finding of this paper the highest accuracy could be
achieved using ANN, it could be seen that with ANN huge datasets were not being
able to get any results from i.e with Weka it would take almost hours or more
than that to find the results. Thus, proving ANN inappropriate for huge
datasets. 

It can also
be seen that for long datasets the accuracy is very low for example for colon
cancer dataset the highest accuracy obtained was via SMO and the accuracy comes
to 85.4839%. J48/C4.5-a decision tree method gives near 83% clearly indicating
we need a better method for long datasets for more accuracy.

MLP (MultiLayer
Perception) gives 97.1% accuracy and PNN(Probabilistic Neural Network) which
provides 96% accuracy, Perception with 93 % and ART1 shows 92% accuracy as
well14.

“Recall”
shown in the screenshot is the sensitivity of each method.

 

 

 

Chapter
6

 

Conclusion

It can
clearly be seen from the screenshots and calculated results above that ANN is
comparatively an old approach for most of the classification purposes
(exceptions are with short datasets). Support Vector Machine proves to be a
superior measure though for colon and lung cancer datasets.

Further,
deep learning methods can and should be applied for better accuracy to be
achieved with its sophisticated approach.