Machine Learning Confidence Score

03 July 2019

Machine Learning Confidence Score
In Codeit Version 4.11, users can now display a Machine Learning Confidence score for each coding suggestion. The score represents a measure of the confidence that a specific category should be applied to a verbatim text snippet. In this blog post, we will explain how the Confidence score is calculated, using a simple example, and how we use the score to improve Codeit’s Machine Learning accuracy.

The goal of the Machine Learning classifier is to predict if a verbatim should be tagged with a specific category available within the codeframe.

For example, given this codeframe:







And this training data:



Good screen, good sound

1, 2

Great screen


Dolby 7.0!


The classifier learns to predict each category based on the training data.

After training, it will output a Confidence (probability) between 0 and 1. This number represents how confident Codeit is for a particular suggestion.

For example, for the verbatim - “great screen and dolby 7.0” - the classifier might output 2 suggestions: 1: Screen with Confidence 0.9 and 2: Sound with Confidence 0.95.

In this case the model is very Confident because the verbatim closely matches our training data.

On the other hand, for the verbatim - “the bass was heavy” - our model might output a Suggestion 2: Sound but with a Confidence of only 0.5.

Understanding how the Confidence score is calculated

To understand how the Confidence score is calculated and how we can use it to improve the classifier, let’s use a simple example.

We’re going to use the Iris flower dataset from the great Python library sk-learn (more details here: The Iris dataset contains data for different types of Irises (Setosa, Versicolour, and Virginica). We will build a classifier to learn how to differentiate Setosa against the other 2 based on various features such as Sepal length and width.

Example of an Iris flower with a Sepal and Petal


In our training data, we have about 50 Setosa Iris and 100 examples of other Irises with known Sepal length and width. The goal of our classifier is to learn how to distinguish a Setosa Iris against another Iris using their Sepal length and width.

Below we have plotted our training data and the respective Sepal length and width.

Using this training data, the classifier can learn to predict if an Iris is a Setosa or not.

On the graph below:

  • the green zone represents where the classifier is highly confident an Iris is a Setosa. The Confidence score is close to 1.

  • The red zone is where the classifier, conversely, is really confident an Iris is not a Setosa. The Confidence will also be close to 1.

  • The orange zone reflects where the classifier is uncertain, and the Confidence score will therefore be be lower.

To predict if a new Iris is a Setosa, we will first need to measure its Sepal length and width. Depending on its position within the zone it lies in, we can now know if it is a Setosa and more importantly, how confident our classifier is about its prediction.

For example, we have measured our new Iris to have a Sepal width of 4.2cm and a Sepal length of 3.8. The new Iris is therefore represented by a vector (4.2, 3.8), which can be placed on our Confidence graph.

We can see that it is in the green zone, so the classifier is very confident the Iris is a Setosa.

As you can see, the Orange zone (where the classifier confidence score will be lower) is very wide. For Irises in this zone, our classifier cannot predict confidently if an Iris is a Setosa or not.

If we look at the training data once more, we can see points in the Orange zone which could be wrong:

As a researcher, I can go back to my training data and check that the Iris type for the points highlighted are indeed correct.

To efficiently improve a classifier’s accuracy, we need to focus on checking the data near the Orange zone and let the classifier know which ones are definitely correct. This is important as the points near the Orange zone will have the most impact on our classifier.

It may be that some of our points were indeed wrong in our training data. We can then correct them and run our classifier again. Having done this, we end up with the Confidence zones below:

We can see that the Orange zone is now narrower compared to the first example.

By focusing our efforts on correcting the data on the edge of the uncertainty zone, we have managed to easily and efficiently tweak the classifier to be more accurate. 


Introducing Confidence score in Codeit

The same principles discussed above can also be applied in Codeit for text classification.

In the newest version (4.11) of Codeit, you now have access to the Confidence score of each coding suggestion. Right-click the codeframe panel and select “Show Suggestions (all)”.

Show Suggestions filter in the codeframe panel

All the suggestions are then highlighted with a simple “Hover and Click” interface.

By hovering over the question mark, the Confidence score for each suggestion is displayed and the corresponding Code.

Examples of Highlighted Suggestions in Codeit

As we saw in the Iris example, it is best to focus efforts on the lower Confidence Scores (lower than 0.8) to correct the training data as they would be the examples with the most impact on our classifier.


Using the Confidence Score

The Confidence score is actually a measure of probability for a classifier, with a value between 0 and 1.

A Confidence score of 1 means that the classifier is 100% confident (certain) that the category should be applied. Users can be sure that this a correct classification,

In general, any Confidence score of above 0.8 should be applied.

Between 0.5 and 0.8, there will be a a mix of correct and incorrect suggestions. It is this area where most effort should be applied in order to improve the classifier.

And finally, below 0.5, the suggestion will generally be wrong except for very frequent categories.

What’s Next?

To further improve the usability and interactivity between our Machine Learning and Codeit’s User Interface, we plan in a future release (soon!) to display even more metrics to help our users understand which specific category they need to focus their efforts on.​​​​​​​ As part of our ongoing development to improve our users’ efficiency and productivity, we are also continuously adding new data filters, as well as allowing them to train more accurate Machine Learning classifiers.

For a demo of Codeit, please contact us at !

Back to Blog