Codeit “Layers” – ASC Conference Preview

19 October 2018

Codeit “Layers” – ASC Conference Preview
Digital Taxonomy will be presenting at the ASC One-Day Conference in London on Thursday 15th November. This blog post is a sneak preview of what we'll be talking about.

ASCCapture.PNG      Click here for full conference details.

Codeit provides all the necessary tools that allow people to code verbatims in the traditional Market Research way.  However, Codeit also contains an advanced Artificial Intelligence system that, with a bit of configuration and training, can automatically code verbatims without the need for human intervention.

The Codeit AI System

The Artificial Intelligence built into Codeit comprises a system of several layers, each with their own specific functions and capabilities. 

This architecture is born out of several years of experimentation and testing. During the development of Codeit, we discovered that no single technique is ideal in all circumstances. Different techniques offer different benefits and a system that blends these techniques will get the best results by playing to the strengths of each.  

The layers in the system are as follows:


1. Text Matching

The text matching layer in Codeit is the simplest and easiest to understand. 

This layer seeks to find exact prior examples of a given verbatim in the corpus of training data. For example, if a respondent gives the answer: “Good service”, Codeit will look for previous examples where respondents have given the exact same whole verbatim. If sufficient examples exist, then Codeit will make suggestions based on how this text has been coded in the past.  

This kind of matching is very useful in situations where responses are typically short and repetitive – the classic example is brand coding. Surprisingly, Machine Learning (see below) often struggles with this kind of coding. Partly this is because typical, real-world training data (coded by humans) always contains a percentage of miscoding. Faced with this contradictory information and with little else to go on, Machine Learning can struggle to make sense of the data. Another complication can be when different brands have very similar names (for example, “Cherry Coke” and “Diet Coke”) then machine learning can struggle to differentiate between the two.

Codeit will also attempt to match the verbatim text provided, with the text labels in the codeframe. An exact match between the verbatim and a codeframe item will result in a Codeit suggesting that item. For example, if a respondent gives the response “Good service” and the Codeframe contains an item labelled “Good service” then this will trigger a suggestion from Codeit.

2. Rules Matching  

Codeit allows you to assign text matching rules for the rules matching layer to apply to verbatims during processing. For example, you may wish to define a rule: “If ‘Excellent Service’ appears anywhere within a verbatim, then code this as (3) Good service”. 

The rules matching in Codeit allows you to be quite sophisticated with these rules. For example, you may wish to define a rule like this: 

(Good OR Excellent OR Brilliant OR Great) NEAR Service => (3) Good Service.

The advantage of this technique is that it will work even in the absence of any training data or prior examples. It is also deterministic, i.e. it will always give the same result every time so there is a clear link between the inputs and the outputs. 

Lastly, this technique is great if there are any specific “Hot Topics” you are looking to catch in the data. You can define rules that define the hot topic you’re looking for and Codeit will flag them if they are mentioned by a respondent.   

3. Text Analytics

Using NLP (natural language processing) Codeit will attempt to break up longer verbatims and separate the key useful elements from the surrounding ‘noise’. 

For example, if a respondent gives the verbatim: 

“They give good service, but it was expensive and the staff were rude”

Then the Text Analytics within Codeit will extract out the following key segments: 

“They give good service, but it was expensive and the staff were rude

Using these segments, Codeit is then able to treat these items as stand-alone items of text.  So, in the case of “good service” it will be able to match this against the previous examples of “good service”. Similarly, Codeit may also be able to match the other two segments against prior example or text matching rules. 

Text Analytics is therefore useful when there is no training data or prior example to work from. It can also extract meaning from longer verbatims where the Text Matching and Rules Matching techniques might often fail. 

The text analytics in Codeit is also capable of detecting a wide range of other types of segment. For example, it can detect profanity or names of people which can then be flagged and edited out of the verbatims before reporting. 

4. Machine Learning

Codeit has a machine learning layer built into the AI system. 

This layer will look at all of the previously coded examples given to it and create a predictive coding model.  Using this model, it can take a given input verbatim (or segment derived from the Text Analytics) and predict the codes that can be applied. 

Because this layer is using advanced machine learning techniques it gains a much more sophisticated “understanding” of how to code than the simpler techniques above. 

As a simple example, the system may learn by association, that the term “expensive” is synonymous with “pricey”, “not cheap”, “too costly” and so on. 

By using this method the AI layer becomes capable of learning and inference on a level that is much closer to human interpretation than the simpler techniques above. 

5. Human Coders

No automated system can be expected to automatically code 100% of items. 

There will always be outliers, exceptions and novel expressions given to us by respondents.  These can, of course, be ignored, but they may, in fact, contain new emerging themes and insights that do need to be captured. 

Codeit makes it easy for people to act as the “fallback” coding method. Any items that cannot be sufficiently handled by the AI layers can fall to an expert human coder to handle.  The advantage of this technique is primarily to ensure that nothing gets missed. However, a very useful byproduct is that the human coding provides further training examples back to the AI so it can improve its model and incrementally improve over time.   


Back to Blog