Comparing AI Services–the final analysis

bp1

I started out to provide an indication of the differences between different AI services here:

Testing the differences between AI services

I did a quick comparison here:

An analysis of how AI services vary

I then did a deep analysis of all the generated articles using:

Copilot Researcher

Gemini Deep Thinking

ChatGPT Deep Thinking

If you now take those three results and assign a score of 7 = highest and 1 = lowest recommendations of each and total them up, you end up with this ranking table:

AI Service Researcher Gemini ChatGPT Total Score
Deepseek 5 4 7 16
M365 Copilot 7 3 4 14
Copilot Researcher 6 6 1 13
Gemini 4 7 2 13
Copilot Studio 2 5 5 12
ChatGPT Deep Research 3 2 3 8
ChatGPT 1 1 6 8

 

The winner then appears to be, on average, Deepseek. However, you will note that most AI services tested, except ChatGPT have similar scores, with the ‘average’ score being 12, which most services, except again ChatGPT, scored at or above.

This analysis is far from perfect or ideal or for that matter without bias. There are so many variables that possibly come into play that it very difficult, if not impossible, to get a true ‘apples vs apples’ comparison of AI services. However, I think this result still does provide value if you are looking to answer the question of the ‘best’ AI service. That answer seems to largely be that most AI services, apart from ChatGPT, are pretty much the on par when it comes to prompting, so choosing from amongst these simply based on their response to prompts, doesn’t seem to matter all that much.

Of course, there are plenty of other factors, aside from prompt results, that should be considered. The quality of the generated results also is greatly affected by the actual prompts used and I am sure that also varies across the AI services as well.

What I’ll now be interested to see is what the ‘click’ rate is on each article after a period of time. Will the Google AI service generate more article ‘hits’ than the other articles? Time will tell and I’ll report back once enough time has elapsed. These results also make a good benchmark to potentially test again down the track to see if things have changed at all and the progress these AI agents have made.

Interesting time ahead.

4 thoughts on “Comparing AI Services–the final analysis

  1. This is a fascinating result, with ChatGPT (many people’s ‘go-to’) coming out last. I mainly use Copilot, as I do subscribe to CoPilot for M365 so have that increased functionality, and I am generally impressed with the results (fact checked where appropriate of course, and it is sometimes wrong) it returns. But I have many clients happy with ChatGPT, although some have gone very quiet when I explained the possible consequences of uploading confidential data to it without proper evaluation and thought.Thank you for your work, Robert.

    Like

Leave a reply to poetryvery39ab9f8629 Cancel reply