Intelligent CIO Europe Issue 08 | Page 52

CIO OPINION Watching an artist develop a painting over several weeks and seeing the canvas change as they work towards their vision has many parallels to trying out lines of code and testing different algorithms. I studied artificial intelligence (AI) at the multidisciplinary Centre for Cognitive Sciences at the University of Sussex and taught interaction design at the Royal College of Art, so I can see the strong association between the creative arts, programming and data science. During the presentation, I discussed the reasoning behind data science being described as an art, because of the need to adopt an exploratory workflow and what the significant challenges data scientists face as they work. These include: • Firstly, clearly defining a problem that may initially be ill-defined • Identifying and working on preparing the relevant data (described as data curation and feature e ngineering) • Choosing the algorithmic approach to take • Adjusting these elements based on your experience of running your system It was recently said in Harvard Business Review that, “if your data is bad, your machine learning tools are useless.” When you see examples of data science failing, such as specialist cancer centre MD Anderson’s use of IBM Watson, the reasoning often comes down to failings across these three principles. In MD Anderson’s case, the cancer centre placed the project on hold after issues with data fed into Watson meant that it failed to expedite clinical decision-making and match patients to clinical trials. How does data science work in practice? For many industries, data science is used extensively in operations. Whether in the retail sector, to personalise the customer’s experience, or in the life sciences to aid research processes when looking for a cure to the likes of Alzheimer’s disease. Whatever its use, creative data science is delivered through a multidisciplinary work place – which has domain experts, specialists in creating taxonomies and ontologies, and the experts that focus on how best to apply the right algorithms that make up AI approaches. Some good examples of how this is done can be seen in the following examples: Rare disease treatment: Taking highly curated data enabling predictions about which drugs can be repurposed to treat rare diseases. Translational safety: Determine where animal testing can be avoided in research and development when testing drug toxicity. Evidence selection: Applying neural networks to identify complex evidence-based statements to choose the right data for building data science models. Real-world data interpretation: Bringing together machine learning classification of text with taxonomies and alongside images, to deliver learning across multimodal data sets. These approaches create opportunities to develop classification of data sources with Supporting data science using three key principles While these challenges may seem overwhelming, if data scientists can follow these next three key principles, they will be able to overcome obstacles they encounter: Good data: Cleaned and curated to remove noise, including curation and feature engineering such as scaling or reducing dimensionality. Companies should spend a great deal of effort on data curation to make the lives of data scientists easier. Right data: To have enough of the relevant data for a hypothesis to be able to build a predictive model. Problem description is important (defining the hypothesis or model you want to explore) as is the choice of algorithmic approach (are you choosing Naïve Bayes, support vector machines, or logistic regression?). In-time data: Avoiding waiting hours between process steps. Create a platform that brings this together, so you have the information at your fingertips when you need it. 52 INTELLIGENTCIO www.intelligentcio.com