TECHNICAL REPORT · MARCH 2026 · 72 PAGES

An Automated Survey of Generative Artificial Intelligence

Large Language Models, Architectures, Protocols, and Applications

Eduardo C. Garrido-Merchán and Álvaro López López

Institute for Research in Technology (IIT) · Universidad Pontificia Comillas, Madrid, Spain

📅 March 2026 📚 200+ references 📧 ecgarrido@comillas.edu
25+
LLM families analysed
15
Industry sectors
3
AI modalities
5
Deployment protocols
Download Full PDF (72 pages) Google Scholar

Abstract

Generative artificial intelligence, and large language models in particular, have emerged as one of the most transformative paradigms in modern computer science. This automated survey provides an accessible treatment of the field as of early 2026, with a strong focus on the leading model families, deployment protocols, and real-world applications. The core of the survey is devoted to a detailed comparative analysis of the frontier large language models, with particular emphasis on open-weight systems: DeepSeek-V3, DeepSeek-R1, DeepSeek-V3.2, and the forthcoming DeepSeek V4; the Qwen 3 and Qwen 3.5 series; GLM-5; Kimi K2.5; MiniMax M2.5; LLaMA 4; Mistral Large 3; Gemma 3; and Phi-4, alongside proprietary systems including GPT-5.4, Gemini 3.1 Pro, Grok 4.20, and Claude Opus 4.6. For each model, we describe the architectural innovations, training regimes, and empirical performance on current benchmarks and the Chatbot Arena leaderboard. The survey further covers deployment protocols including Retrieval-Augmented Generation, the Model Context Protocol, the Agent-to-Agent protocol, function calling standards, and serving frameworks. We present an extensive review of real-world applications across fifteen industry sectors, from financial services and legal technology to tourism and agriculture, supported by empirical evidence and case studies. Technical foundations covering the Transformer architecture and training methodologies are provided in a self-contained appendix. This survey synthesises over 200 references to provide researchers and practitioners with a unified reference for the current state and future directions of generative AI.

large language models transformers DeepSeek Qwen open-weight models RAG model context protocol RLHF DPO generative AI applications

Key Results

Interactive charts from the paper's comparative analysis. Data sourced from Chatbot Arena (arena.ai) and independent benchmarks as of March 2026.

Chatbot Arena Text ELO Rankings

Top 12 models as of March 2026, coloured by geographic origin (US vs China)

SWE-bench Verified Performance

Software engineering capability (%) for top models, March 2026

Geographic Leadership by Modality

Highest Arena ELO per region

US–China ELO Gap

Gap between top US and Chinese models

Parameters vs Arena ELO

Open-weight models with disclosed architectures

Model Ecosystem by Region

Number of frontier model families per modality

Key Findings

🏆

Claude Opus 4.6 leads at 1504 ELO

The top 10 text models span only 31 ELO points, indicating an extremely competitive landscape among proprietary systems.

🇨🇳

Chinese open-weight models reach near-parity

GLM-5 (1451) closes to within 53 ELO points of the best proprietary model. The open-weight premium is diminishing rapidly.

💻

SWE-bench convergence

The best US model (Claude, 80.8%) and Chinese model (MiniMax M2.5, 80.2%) achieve near-identical software engineering scores.

🌍

Text parity, image gap persists

The US-China gap is just 53 points in text but 117 points in image generation, suggesting uneven capability transfer across modalities.

🔍

MoE dominates the frontier

All open-weight models above 1400 ELO use mixture-of-experts, clustering at 400B-1T total parameters with 37B-170B active.

📈

China leads in model count

12 Chinese text model families vs 9 US. China also leads in video (4 vs 3). Europe has only 1 text and 1 image family at frontier level.

Download Full Paper (PDF)