- Product
Time for Tea?
Just like everyone else in the UK we are getting very excited about National Tea Day which takes place on the 21st April.
The day is a great opportunity to share tea brewing preferences, and the strength of the perfect 'cuppa' is always hotly debated.
So we though it would be interesting to get the view of AI.
What we did
We took 6 AI instances (2 from each of Chat GPT, Perplexity and Gemini) through the same assessment process
- Prompt - 'Look' at a Cup of Tea (CoT) and score it out of 20 (1 being 'Very Weak' and 20 being 'Very Strong'. Converted to a % score. (Initial Grade)
- Prompt - Compare the CoT to Colour Chart 1 (Intervention A)
- Prompt - "Review your initial grade" (Post Intervention 1)
- Prompt - Explain how Intervention A equates to the reviewed grade (Intervention B)
- Prompt - "Review your initial grade (Post Intervention 2)
- Prompt - Compare the CoT to Colour Chart 2 (Intervention C)
- Prompt - "Review the initial grade (Post Intervention 3)
What we found
There was a lot of dis-agreement (see below) and very little stability.
- Each AI Instance changed it's mind as it moved through the interventions. There was no obvious pattern.
- The 6 AI instances failed to agree on very much at all!
- Even The 2 instances within each product failed to agree with each other.
What's the point of this? Tea? Who cares?
Hopefully you've ascertained that there is a far wider play here beyond Tea. After all the colour-charts work just fine if anyone is really interested in assessing it. But what about all of those subjective things that we might want to assess that AI also struggle with? This is what Compare has been built for from the outset and now we are moving closer to our long standing ambition to provide an on-demand version.
Is there a better way?
There just has to be right? We can't have machines assessing the really important things in life like Tea - it's not right, and the results (as we have shown) are terrible.
Assessing the quality of Tea, and so many other things we treasure, is a human endeavour and one where comparative judgement can get to the truth, giving reliable, valuable and trusted assessments at scale.
What's even better is that there is a way to do this 'on-demand', even on a mobile device.
There is even the possibility that we can train the AI to do a bit better with our AI validation layer.
Watch this space for more details coming soon.