CIO OPINION
Watching an artist develop a painting over
several weeks and seeing the canvas change
as they work towards their vision has many
parallels to trying out lines of code and
testing different algorithms.
I studied artificial intelligence (AI) at the
multidisciplinary Centre for Cognitive
Sciences at the University of Sussex and
taught interaction design at the Royal
College of Art, so I can see the strong
association between the creative arts,
programming and data science.
During the presentation, I discussed the
reasoning behind data science being
described as an art, because of the need to
adopt an exploratory workflow and what the
significant challenges data scientists face as
they work. These include:
• Firstly, clearly defining a problem that
may initially be ill-defined
• Identifying and working on preparing the
relevant data (described as data curation
and feature e ngineering)
• Choosing the algorithmic approach to take
• Adjusting these elements based on your
experience of running your system
It was recently said in Harvard Business
Review that, “if your data is bad, your
machine learning tools are useless.”
When you see examples of data science
failing, such as specialist cancer centre
MD Anderson’s use of IBM Watson, the
reasoning often comes down to failings
across these three principles.
In MD Anderson’s case, the cancer centre
placed the project on hold after issues with
data fed into Watson meant that it failed
to expedite clinical decision-making and
match patients to clinical trials.
How does data science work
in practice?
For many industries, data science is used
extensively in operations. Whether in the
retail sector, to personalise the customer’s
experience, or in the life sciences to aid
research processes when looking for a cure
to the likes of Alzheimer’s disease.
Whatever its use, creative data science
is delivered through a multidisciplinary
work place – which has domain experts,
specialists in creating taxonomies and
ontologies, and the experts that focus on
how best to apply the right algorithms
that make up AI approaches. Some good
examples of how this is done can be seen in
the following examples:
Rare disease treatment: Taking highly
curated data enabling predictions about
which drugs can be repurposed to treat
rare diseases.
Translational safety: Determine where
animal testing can be avoided in research
and development when testing drug toxicity.
Evidence selection: Applying neural
networks to identify complex evidence-based
statements to choose the right data for
building data science models.
Real-world data interpretation: Bringing
together machine learning classification of
text with taxonomies and alongside images,
to deliver learning across multimodal data
sets. These approaches create opportunities
to develop classification of data sources with
Supporting data science using three
key principles
While these challenges may seem
overwhelming, if data scientists can follow
these next three key principles, they will be
able to overcome obstacles they encounter:
Good data: Cleaned and curated to remove
noise, including curation and feature
engineering such as scaling or reducing
dimensionality. Companies should spend a
great deal of effort on data curation to make
the lives of data scientists easier.
Right data: To have enough of the relevant
data for a hypothesis to be able to build
a predictive model. Problem description
is important (defining the hypothesis or
model you want to explore) as is the choice
of algorithmic approach (are you choosing
Naïve Bayes, support vector machines, or
logistic regression?).
In-time data: Avoiding waiting hours between
process steps. Create a platform that brings
this together, so you have the information at
your fingertips when you need it.
52
INTELLIGENTCIO
www.intelligentcio.com