Alibaba's artificial intelligence division, Qwen, has introduced a significant upgrade to its AI research tool, positioning it as a direct competitor to established platforms like Google's Gemini. The enhanced Qwen Deep Research tool can now automatically generate not only detailed reports but also fully functional webpages and multi-speaker podcasts from a single user prompt.
In a direct comparison with Gemini, ChatGPT, and Grok, Qwen demonstrated superior performance in research depth and output versatility, particularly with its unique webpage creation feature. While it matched Google's Gemini for accuracy, the two platforms showed different strengths, signaling a new rivalry in the specialized field of AI-driven research and content creation.
Key Takeaways
- Alibaba's Qwen AI now generates comprehensive research reports, live webpages, and podcasts from a single query.
- In a comparative analysis, Qwen and Google's Gemini tied for the highest accuracy, both outperforming ChatGPT and Grok.
- Qwen's unique strength is its automatic webpage generation, a feature not offered by its competitors.
- Gemini maintains an edge in multimedia, with higher-quality, more natural-sounding audio podcasts and video generation capabilities.
- The test highlights a growing specialization in the AI market, with Qwen targeting researchers and content creators, while Gemini focuses on a polished multimedia experience.
A New Contender in AI-Powered Research
The latest update from Qwen introduces a streamlined workflow for users needing to conduct in-depth research. The process begins with a query in the Qwen Chat interface. The AI then performs web searches, analyzes information from public sources, and compiles a comprehensive report complete with citations.
Following the report's generation, users are presented with two new options. The "Web Dev" feature creates a live, professional-looking webpage that is automatically deployed and hosted by Qwen. This page includes formatted text, tables, and inline graphics. The second option, "Podcast," produces an audio discussion about the research topic, featuring a conversation between two AI hosts.
How It Works
The new functionality is powered by a combination of three open-source models working together. Qwen3-Coder manages the web structure and layout, Qwen-Image generates relevant graphics for the webpage, and Qwen3-TTS drives the dynamic, multi-speaker audio narration for the podcasts.
Putting the AI Models to the Test
To evaluate its capabilities, Qwen was benchmarked against Google's Gemini, OpenAI's ChatGPT, and xAI's Grok. All four models were given the same complex research task: to analyze the philosophical and scientific arguments for and against the existence of God. The evaluation focused on accuracy, depth of information, clarity, and overall quality.
The results were telling. Both Qwen and Gemini emerged as the top performers, each earning a score of 9 out of 10. ChatGPT followed with an 8, while Grok received a 6. While the top two models tied in score, their strengths catered to different user needs.
Accuracy and Depth of Information
In terms of accuracy, Qwen and Gemini were nearly indistinguishable. Qwen excelled at citing reputable academic sources, referencing specific works from philosophers like Bertrand Russell and debates between figures such as William Lane Craig. Its citations often led to materials from institutions like Stanford, Princeton, and Oxford.
Gemini matched this precision, providing 94 numbered citations in its report and correctly distinguishing between nuanced philosophical concepts. Both models avoided the common AI pitfall of oversimplification.
Going Deeper Than the Competition
Qwen was the only model to generate a dedicated section titled "Critiques of Atheism: The Burden of Proof and the Nature of Evidence." This demonstrated an ability to explore tangential yet highly relevant aspects of a topic that other models overlooked, showcasing its superior research depth.
ChatGPT's research was competent but relied heavily on a few primary sources, such as the Stanford Encyclopedia of Philosophy. Grok provided accurate but much briefer summaries with less specific attribution.
Clarity, Multimedia, and the User Experience
While Qwen and Gemini led in academic rigor, ChatGPT and Grok were found to be more accessible for a general audience. ChatGPT used helpful parenthetical explanations to clarify complex ideas, making its report easier for non-experts to understand. Grok organized its findings into a clean, scannable table, ideal for quick reference.
In contrast, Qwen and Gemini adopted a more formal, academic tone that, while precise, required more focused reading. This suggests they are tailored more for serious researchers, academics, and students.
The Podcast and Webpage Battle
The new multimedia features are where the differences between Qwen and Gemini become most apparent. Qwen's standout capability is its one-click webpage generation. No other model offers the ability to instantly convert a research report into a live, shareable website.
This is a significant workflow advantage for creators, students, and professionals who need to publish their findings quickly without knowledge of web development. The output is clean, responsive, and immediately ready for an audience.
However, when it comes to audio, Gemini holds a clear advantage. Google's Audio Overviews, integrated into its NotebookLM and Gemini platforms, produce remarkably human-like podcasts. The speech patterns are natural, engaging, and even include conversational banter.
Qwen's podcast feature offers a wider variety of voices (17 hosts and seven co-hosts) but the quality is inconsistent. Many of the voices sound robotic and unnatural, which can detract from the listening experience. Furthermore, Gemini offers video generation, a multimedia dimension that Qwen currently lacks.
The Final Verdict: A Tale of Two Strengths
The competition between Qwen and Gemini highlights a maturing AI landscape where platforms are beginning to specialize. Neither model is definitively "better"; instead, they serve different purposes effectively.
Qwen is the clear winner for in-depth research and content creation. Its analytical depth, robust citation of academic sources, and groundbreaking webpage generation feature make it an unparalleled all-in-one tool for anyone who needs to research and publish findings efficiently.
Gemini remains the leader for a polished, multimedia-rich experience. Its superior audio quality and video generation capabilities make it the ideal choice for users who prefer to consume information through listening and watching, or for educators creating engaging learning materials.
Meanwhile, ChatGPT serves as an excellent educational tool for explaining complex topics clearly, and Grok functions as a reliable, if basic, tool for quick summaries. The latest advancements from Qwen ensure that the race for AI supremacy remains highly competitive, with users benefiting from the increasing specialization and innovation.





