Algorithm Assurance | Canarys

Algorithm Assurance Services

_

Introduction

Data-driven algorithms are progressively making decisions that were previously made by humans, thanks to machine learning’s growing capability. Algorithm assurance is the process used to determine whether particular algorithms adhere to their desired objectives and produce the desired results.

It is a particular kind of IT assurance that aids in risk management and control over the use of risky algorithms in both products and enterprises. These algorithms are frequently referred to in organisations as advanced analytics, artificial intelligence applications (AI), or just predictive models.

At Canarys, we see our algorithms in the same light as our workforce. We show empathy to our algorithms, just as we do for our people. They should receive the same mentoring, assistance with their development, and performance evaluations as we do for our staff. With our assistance, you can promote the growth of your algorithm legally, efficiently, and productively.

Algorithm Assurance Introduction

Data Curation and Verification

Data is the new language for AI-driven solutions. These solutions must be tried and tested for every modification in the input data to ensure a system runs without any whisks. We often compare this with conventional testing methods, where any modification or a slight change in the code triggers testing of the revised code. When evaluating AI-based solutions, it’s crucial to take into account the following:

Developing curated training data sets
Developing curated training data sets

These are collections of semi-automated input and output data. It is essential to conduct static analysis of data dependencies to enable annotation of data sources and features—a crucial component for migration and deletion.

Developing test data sets
Developing test data sets

Each test data set is carefully constructed to test every possible combination and permutation to determine whether trained models are effective. As the training progresses and the amount of data becomes more decadent, the model is further improved.

Developing system validation test suites
Developing system validation test suites

These are based on algorithms and test data sets. For example, for a system built to predict patient outcomes based on pathological or diagnostics reports, scenarios must be designed around risk profiling of patients for the concerned disease, patient demography, patient treatment, and other similar test scenarios.

Reporting of test results
Reporting of test results

It must statistically be done because the range-based accuracy (confidence scores) produced by the validation of ML-based algorithms differs from what is anticipated. The testers must define and determine the confidence thresholds for each outcome within a given range.

The development of impartial systems has become crucial for contemporary businesses. Supervised learning techniques have identified data contaminated with logical reasoning and biases, which currently account for more than 70% of AI use cases. This makes evaluating the input training data sets’ “bias-free” quotient a “double-edged sword.” And if we do, data biases might enter the picture.

These can be decreased by Apriori testing the input labelled data for any hidden patterns, fictitious correlations, heteroscedasticity, etc. Let’s analyse some primary biases that testers must consider when conducting AI/ML testing.

AI Usecases
Data Skewness
Data bias

Often, the data that we use to train the model is extremely skewed. The most common example is sentiment analysis – most of the data sets do not have equal (or enough) number of data points for different types of sentiments. Hence, the resulting model is skewed and “biased” toward the sentiments that have larger data sets.

Prediction bias
Prediction bias

In a system that is working as intended, the distribution of predicted labels should equal that of the observed labels. While this is not a comprehensive test, it is a surprisingly useful diagnostic step. Changes in metrics such as this are often indicative of an issue that requires attention. For example, this method can help detect cases in which the system’s behavior changes suddenly. In such cases, training distributions drawn from historical data are no longer reflective of the current reality.

Relational bias
Relational bias

Users are typically limited and biased on how a data pattern or problem set needs to be solved, based on their view of the relational mapping of which solution would have worked for a specific kind of problem set. However, this can skew the solution towards what a user is comfortable with, avoiding complex or less familiar solutions.

While there is a need to resolve data biases, as explained above, we should also look at the problem of under-fitting or over-fitting of the model through training data, which happens much too often, resulting in poor performance of the model. The ability to measure the extent of over-fitting is crucial to ensuring the model has generalized the solution effectively, and that the trained model can be deployed to production.

Algorithm Assurance

Canarys Algorithm Assurance Offerings

Canarys has broadly classified its testing Algorithm Assurance (AI & ML) offerings in Four categories:

Algorithms
  • Natural Language processing/understanding
  • Image Processing
  • Machine Processing
  • Deep Learning
Data Creation and Curation
  • Domain Specific Data
  • Cleansing and identifying sample data set
  • Contextual data clusters
  • Data denoising
  • Data labelling
Smart Interaction Testing
  • Devices (Alexa, Siri etc.)
  • AR/VR
Real-Life Testing
  • Human Unbiased testing
  • Challenger Model (Algorithm accuracy testing)
  • Decision analysis (Explainable AI)
  • Deployment and accessibility Testing
  • Test Triangle( Unit, Service and UI)
  • White Box and Black Box Testing
  • Model back-testing

While we foresee that Explainable AI (XAI) and AutoMLtechniques will significantly improve testing effectiveness going forward, we will focus on only some of the techniques that will need to be used in real-life testing from a model and data set perspective.

Algorithm Assurance

Black Box and White Box testing

Like with conventional test techniques, testing ML models comprises both Black Box and White Box testing. It can be hard to locate training data sets that are extensive enough to address the requirements of “ML testing.”

Data scientists test the model performance during the model development phase by contrasting the model outputs (predicted values) with the actual values.

Black Box and White 
                        Box testing

Following are a few methods for Black Box testing ML models:

Model performance testing
Model performance testing

Involves testing with test data/new data sets and comparing the model performance in terms of parameters such as precision-recall, F-score, and confusion matrix (False and True positives, False and True negatives) to that of pre-determined accuracy with which the model is already built and moved into production.

Metamorphic testing
Metamorphic testing

This aims to solve the test oracle issue. A tester can determine whether a system responds accurately using a test oracle. It happens when it is challenging to ascertain whether the actual output is in line with the anticipated outcomes or the expected outcomes of chosen test cases.

Dual coding/Algorithm ensemble
Dual coding/Algorithm ensemble

Multiple models using different algorithms are built, and predictions from each of them are compared, given the same input data set. For assembling a typical model to address classification problems, multiple algorithms like Random Forest or a neural network like LSTM could be used – but the model that produces the most expected outcomes is finally chosen as the default model.

Coverage Guided Fuzzing
Coverage Guided Fuzzing

Data fed into the ML models is designed to test all the feature activations. For instance, for a model built with neural networks, testers need test data sets that could result in the activation of each of the neurons/nodes in the neural network.

NFR (Non-Functional Requirements) 
                        testing

NFR (Non-Functional Requirements) testing

In addition to performance and security testing, factors such as a representative sample view of things along with the deployment approach also need to be considered while testing ML Models. Questions such as: How do we replace an existing model in production? What is our view of A/B Testing or Challenger Models? – must also be answered.

Reach Us

With Canarys,
Let’s Plan. Grow. Strive. Succeed.