Blog Series: Steps to building an Artificial Intelligence & Machine Learning capability

As part of our 3-part series, our next post provides guidance on how to build an Artificial Intelligence & Machine Learning (AI/ML) capability for the first time. In case you missed our post last week, the first article covered three items required before beginning development on an AI/ML initiative. Moving forward, we’ll now focus on the steps to help prepare for development, recommendations on a development and model selection approach, and how to communicate results to stakeholders.

Set a specific objective.

Baseline stakeholder expectations by setting a very clear objective for the project. This will prevent scope creep during development and be more likely to produce results that are meaningful. One example may be to identify typical transactions, or fraudulent, made in 2019 by a group of customers. Can the new capability identify fraud where it was formerly not found? Are there trends you never discovered before? With these new insights, how much loss is avoided by the organization?

When setting the objective, ensure you clarify scope enough that results can be shown to prove the AI/ML capability works. Establish key performance parameters to measure performance, and at the same time, allow a common understanding on what success looks like.

Identify where the Artificial Intelligence & Machine Learning capability will live.

Working with Operations from the project onset prepares internal partners for possible changes to current systems. Their early inclusion enables a smoother handoff and the possibility of a more natural integration into existing systems. The following are a few approaches to consider; keep the AI/ML capability in a separate sandbox until performance is proved out over time, run the capability in parallel with current functionality, or replace/supplement the current functions with the new AI/ML capability. Depending on the degree of integration required, you may want to go as far as enlisting a representative from Operations as a stakeholder.

Prepare for development.

Configuration management for both data and software/models is crucial. If multiple trials are run with various models and sets of data, poorly managing that complexity can lead to costly and time-consuming mistakes. The flavor of development environment is not as important and varies widely across the breadth of data scientists and Artificial Intelligence & Machine Learning practitioners. As your team begins to aggregate data, know that the data preparation may take a significant amount of time depending on the data quality. To give a perspective on the time required for data prep, Anaconda surveyed data scientists and found that 45% of their time is still spent on data loading and data cleansing. 1

Establish the development approach.

When developing a POC capability, the best approach is iterative and adaptable, and will likely take many trials for success. Establish checkpoints for checking on POC performance and aim to produce a minimum viable product from the first iteration on. An agile methodology lends itself well to this type of effort, with the degree of process formality dependent on the appetite of the organization. Look at each iteration as an experiment and use each hypothesis for each iteration as an avenue to incorporate stakeholder expectations.

Select the software package or tool.

The amount of software development required when building a POC varies greatly depending on the package or tool selected. Free packages generally have a larger lift either setting up your environment, or for preparing the data and visualizing it. The most popular programming languages and packages to use are Python [SciPy, Scikit-learn, TensorFlow, etc.], R [randomforest, CARET, KernLab etc.] and JAVA [WEKA, Java-ML, etc.] but require time and expertise to set up and utilize. Tools like Alteryx are easy to use right out of the box and are great for someone not wanting to code but needing more power than Excel. The expertise of resources, budget available and the timeline desired for results will drive which packages and tools work best.

Develop the Artificial Intelligence& Machine Learning capability.  

The specific model to use is driven by the data that is available, objectives and the expertise level of the practitioner. As a best practice, most practitioners start with a simpler model to baseline performance. Complexity is increased if there is insufficient performance on the business objectives metrics. If performance is not as expected after a few iterations, do not give up hope as there may be promise in expanding the dataset to adjacent features or in utilizing another AI/ML method.

Communicate results.

Circle back on KPIs identified during the project onset showing any progress made. Compare the business performance from a current state perspective to the estimated performance provided by the AI/ML capability. Share how the new capability will work in the operational system, if adopted. If removing or reducing manual labor, highlight the extent of time savings for targeted users. If there appears to be lift, provide the estimated ROI once operational.

Next up in our Artificial Intelligence & Machine Learning Blog Series

Part 3 and final portion, where we’ll guide you through the steps of transitioning the AI/ML capability to operations, establishing maintenance routines, and ensuring performance continues to meet requirements for your business.

Connect with Strive!

Here at Strive Consulting, our subject matter experts’ team up with you to understand your core business needs, while taking a deeper dive into your organization’s growth strategy. Whether you’re interested in AI/ML implementation or an overall data and analytics assessment, Strive Consulting is dedicated to being your partner, committed to success.



  1. Anaconda 2020 State of Data Science Survey Report

Featured Authors