MURMURATE’S DATA LABELLING APPROACH

Understand how our approach to labelling data differs from others.
Understand how our approach to labelling data differs from others.
HAND-LABELLING DATA
MURMURATE’S DATA LABELLING
HAND-LABELLING DATA
Case Study: COVID-19 Misinformation in Tweets
Switch
Switch
MURMURATE’S DATA LABELLING
Case Study: COVID-19 Misinformation in Tweets
A large training dataset (circa 1,000,000 tweets) is collected which includes examples of what the final AI model will need to recognise.A large training dataset (circa 1,000,000 tweets) is collected which includes examples of what the final AI model will need to recognise.
Next, criteria is established for labelling tweets as misinformation. For example, at the most basic level this might be; ‘Does this tweet include false information? If yes, label as misinformation’.Tweet is labelled as misinformationhoaxFalse InformationNext, criteria is established for labelling tweets as misinformation. For example, at the most basic level this might be; ‘Does this tweet include false information? If yes, label as misinformation’.Tweet is labelled as misinformationhoaxFalse Information
Have people familiar with the topic go through each tweet and apply this criteria until at least 50,000 instances of misinformation have been identified and labelled.Subject MatterExpert1,000,000 TweetsHave people familiar with the topic go through each tweet and apply this criteria until at least 50,000 instances of misinformation have been identified and labelled.Subject MatterExpert1,000,000 Tweets
Presuming only 5% of tweets are misinformation, and it takes 5 seconds to read and decide if the criteria applies for each tweet, this process would take 1388 hours or 173 days work if it were one person .1388HOURSX1 =Or 173 Working Days (7.5 months)1234567891011121314151617181920212229233024312527272812345678910111213141516171819202122292330243125272728123456789101112131415161718192021222923302431252727281234567891011121314151617181920212229233024312527272812345678910111213141516171819202122292330243125272728123456789101112131415161718192021222923302431252727281234567891011121314151617181920212229233024312527272812345678910111213141516171819202122292330243125272728Even with 10 people labelling, this process would take almost a month.X10 = 17.3 DAYS12345678910111213141516171819202122292330243125272728Presuming only 5% of tweets are misinformation, and it takes 5 seconds to read and decide if the criteria applies for each tweet, this process would take 1388 hours or 173 days work if it were one person .1388HOURSX1 =Or 173 Working Days (7.5 months)1234567891011121314151617181920212229233024312527272812345678910111213141516171819202122292330243125272728123456789101112131415161718192021222923302431252727281234567891011121314151617181920212229233024312527272812345678910111213141516171819202122292330243125272728123456789101112131415161718192021222923302431252727281234567891011121314151617181920212229233024312527272812345678910111213141516171819202122292330243125272728Even with 10 people labelling, this process would take almost a month.X10 = 17.3 DAYS12345678910111213141516171819202122292330243125272728
The bulk of the labelled data is used for training the AI Model to be able to make predictions of misinformation on it’s own.LABELLED TRAINING DATASETUNTRAINED AI MODELTRAINING THE MODELThe bulk of the labelled data is used for training the AI Model to be able to make predictions of misinformation on it’s own.LABELLED TRAINING DATASETUNTRAINED AI MODELTRAINING THE MODEL
A smaller set from the hand-labelled data (known as a ground truth data set) is used to validate the trained AI Model’s predictions.MODEL PREDICTIONSUNLABELLED DATATRAINED AI MODELTESTING THE MODELGROUND TRUTHDATASETVALIDATIONA smaller set from the hand-labelled data (known as a ground truth data set) is used to validate the trained AI Model’s predictions.MODEL PREDICTIONSUNLABELLED DATATRAINED AI MODELTESTING THE MODELGROUNDTRUTHDATASETVALIDATION
If the AI Model needs updating as the topic or language of COVID misinformation changes, the data must be re-labelled by hand all over again.Subject MatterExpertRe-labelling time:1388HOURSIf the AI Model needs updating as the topic or language of COVID misinformation changes, the data must be re-labelled by hand all over again.Subject MatterExpertRe-labelling time:1388HOURS
A large training dataset (circa 1,000,000 tweets) is collected which includes examples of what the final AI model will need to recognise.A large training dataset (circa 1,000,000 tweets) is collected which includes examples of what the final AI model will need to recognise.
Design a set of rules for labelling which tweets are misinformation, called labelling functions. Murmurate automatically labels all 1,000,000 tweets according to these rules.LABELLING FUNCTIONS:Label as ‘misinformation’ if the text contains:Label as ‘misinformation’ if the text contains:vaccinesgatesbioweaponhoaxmicrochipcontrollethalwho.orgphilanthropyANDANDORORORNOTNOTabDesign a set of rules for labelling which tweets are misinformation, called labelling functions. Murmurate automatically labels all 1,000,000 tweets according to these rules.LABELLING FUNCTIONS:Label as ‘misinformation’ if the text contains:Label as ‘misinformation’ if the text contains:vaccinesgatesbioweaponhoaxmicrochipcontrollethalwho.orgphilanthropyANDANDORORORNOTNOTab
Each rule taken on its own does not accurately detect misinformation. A labelling model combines these labelling functions, comparing them against each other to assign a probabilistic weight to each that is used in training the AI model.The model finds the best combination of Labelling FunctionsMultiple Labelling FunctionsWeighted LabelsLABELLINGMODELacebdfEach rule taken on its own does not accurately detect misinformation. A labelling model combines these labelling functions, comparing them against each other to assign a probabilistic weight to each that is used in training the AI model.The model finds the best combination of Labelling FunctionsMultiple Labelling FunctionsWeighted LabelsLABELLINGMODELacebdf
The process of Murmurate automatically labelling the 1,000,000 tweets with the generated labels takes as little as 3 to 4 hours.3 to 4HOURSMURMURATE LABELLED TRAINING DATASET1,000,000 TWEETSThe process of Murmurate automatically labelling the 1,000,000 tweets with the generated labels takes as little as 3 to 4 hours.3 to 4HOURSMURMURATE LABELLED TRAINING DATASET1,000,000 TWEETS
The bulk of the labelled data is used for training the AI Model to be able to make predictions of misinformation on it’s own.LABELLED TRAINING DATASETUNTRAINED AI MODELTRAINING THE MODELThe bulk of the labelled data is used for training the AI Model to be able to make predictions of misinformation on it’s own.LABELLED TRAINING DATASETUNTRAINED AI MODELTRAINING THE MODEL
A smaller set of labelled data (known as a ground truth data set) is used to validate the trained AI Models’s predictions.MODEL PREDICTIONSUNLABELLED DATATRAINED AI MODELTESTING THE MODELGROUND TRUTHDATASETVALIDATIONA smaller set of labelled data (known as a ground truth data set) is used to validate the trained AI Model’s predictions.MODEL PREDICTIONSUNLABELLED DATATRAINED AI MODELTESTING THE MODELGROUNDTRUTHDATASETVALIDATION
If the AI Model needs updating as the topic or language of COVID misinformation changes, the data can be quickly re-labelled again with adjusted rules.LABELLED DATAUNLABELLED DATAADJUSTEDLABELLINGMODELRe-labelling time:3 to 4HOURSIf the AI Model needs updating as the topic or language of COVID misinformation changes, the data can be quickly re-labelled again with adjusted rules.Re-labelling time:3 to 4HOURSMODEL PREDICTIONSUNLABELLED DATAADJUSTEDLABELLING MODEL
TOTAL AI PROJECT TIME:
Several months, or even years.
3rd Iteration2nd Iteration1st Iteration123456789101112131415161718192021222923302431252727281234567891011121314151617181920212229233024312527272812345678910111213141516171819202122292330243125272728123456789101112131415161718192021222923302431252727281234567891011121314151617181920212229233024312527272812345678910111213141516171819202122292330243125272728123456789101112131415161718192021222923302431252727281234567891011121314151617181920212229233024312527272812345678910111213141516171819202122292330243125272728123456789101112131415161718192021222923302431252727281234567891011121314151617181920212229233024312527272812345678910111213141516171819202122292330243125272728123456789101112131415161718192021222923302431252727281234567891011121314151617181920212229233024312527272812345678910111213141516171819202122292330243125272728123456789101112131415161718192021222923302431252727281234567891011121314151617181920212229233024312527272812345678910111213141516171819202122292330243125272728123456789101112131415161718192021222923302431252727281234567891011121314151617181920212229233024312527272812345678910111213141516171819202122292330243125272728123456789101112131415161718192021222923302431252727281234567891011121314151617181920212229233024312527272812345678910111213141516171819202122292330243125272728
TOTAL AI PROJECT TIME:
A few hours, or a day or two.
3 to 4HOURS3 to 4HOURS3 to 4HOURS1st Iteration2nd Iteration3rd Iteration
Murmurate’s automated labelling process means that you can label your data in hours, not weeks or even months.
MURMURATE’S DATA LABELLING
3:45:55
INCREASED SPEED
The main benefit over other labelling approaches is speed. Instead of taking weeks and months, datasets can now be labelled in a matter of hours.
1:20:57
ITERATE QUICKLY
Your AI model may need to be updated to keep up with changing needs or data drift. Because our data labelling approach is so quick, it’s easy to relabel data and retrain your model.
REDUCE COSTS
Achieve in hours and with just one person what would typically take a large team many months. Traditional ways of labelling data are slow and costly, putting AI out of reach of many businesses.
INCREASED ACCURACY
With its reduced costs, our faster labelling approach means much larger training datasets can be created, which in turn creates more accurate models.
MURMURATE
NOW ANYONE CAN BUILD AI.
Murmurate’s unique and innovative approach is transforming the process of building AI and bringing it within the reach of many more businesses than before, both small and large. Book a demo for your team to learn how it works. Bring the power of AI to your business.
AI POWEREDMEDICAL + HEALTHCAREAI POWERED RETAIL + MARKETINGAI POWERED HUMANITARIANSERVICESAI POWERED SECURITY + DEFENSEAI POWERED ENVIRONMENTALSERVICESAI POWERED FINANCIALSERVICESAI POWERED HOUSING + DEVELOPMENT