
I started out to provide an indication of the differences between different AI services here:
Testing the differences between AI services
I did a quick comparison here:
An analysis of how AI services vary
I then did a deep analysis of all the generated articles using:
Copilot Researcher
Gemini Deep Thinking
ChatGPT Deep Thinking
If you now take those three results and assign a score of 7 = highest and 1 = lowest recommendations of each and total them up, you end up with this ranking table:
| AI Service |
Researcher |
Gemini |
ChatGPT |
Total Score |
| Deepseek |
5 |
4 |
7 |
16 |
| M365 Copilot |
7 |
3 |
4 |
14 |
| Copilot Researcher |
6 |
6 |
1 |
13 |
| Gemini |
4 |
7 |
2 |
13 |
| Copilot Studio |
2 |
5 |
5 |
12 |
| ChatGPT Deep Research |
3 |
2 |
3 |
8 |
| ChatGPT |
1 |
1 |
6 |
8 |
The winner then appears to be, on average, Deepseek. However, you will note that most AI services tested, except ChatGPT have similar scores, with the ‘average’ score being 12, which most services, except again ChatGPT, scored at or above.
This analysis is far from perfect or ideal or for that matter without bias. There are so many variables that possibly come into play that it very difficult, if not impossible, to get a true ‘apples vs apples’ comparison of AI services. However, I think this result still does provide value if you are looking to answer the question of the ‘best’ AI service. That answer seems to largely be that most AI services, apart from ChatGPT, are pretty much the on par when it comes to prompting, so choosing from amongst these simply based on their response to prompts, doesn’t seem to matter all that much.
Of course, there are plenty of other factors, aside from prompt results, that should be considered. The quality of the generated results also is greatly affected by the actual prompts used and I am sure that also varies across the AI services as well.
What I’ll now be interested to see is what the ‘click’ rate is on each article after a period of time. Will the Google AI service generate more article ‘hits’ than the other articles? Time will tell and I’ll report back once enough time has elapsed. These results also make a good benchmark to potentially test again down the track to see if things have changed at all and the progress these AI agents have made.
Interesting time ahead.