Large Language Models, Architectures, Protocols, and Applications
Institute for Research in Technology (IIT) · Universidad Pontificia Comillas, Madrid, Spain
Generative artificial intelligence, and large language models in particular, have emerged as one of the most transformative paradigms in modern computer science. This automated survey provides an accessible treatment of the field as of early 2026, with a strong focus on the leading model families, deployment protocols, and real-world applications. The core of the survey is devoted to a detailed comparative analysis of the frontier large language models, with particular emphasis on open-weight systems: DeepSeek-V3, DeepSeek-R1, DeepSeek-V3.2, and the forthcoming DeepSeek V4; the Qwen 3 and Qwen 3.5 series; GLM-5; Kimi K2.5; MiniMax M2.5; LLaMA 4; Mistral Large 3; Gemma 3; and Phi-4, alongside proprietary systems including GPT-5.4, Gemini 3.1 Pro, Grok 4.20, and Claude Opus 4.6. For each model, we describe the architectural innovations, training regimes, and empirical performance on current benchmarks and the Chatbot Arena leaderboard. The survey further covers deployment protocols including Retrieval-Augmented Generation, the Model Context Protocol, the Agent-to-Agent protocol, function calling standards, and serving frameworks. We present an extensive review of real-world applications across fifteen industry sectors, from financial services and legal technology to tourism and agriculture, supported by empirical evidence and case studies. Technical foundations covering the Transformer architecture and training methodologies are provided in a self-contained appendix. This survey synthesises over 200 references to provide researchers and practitioners with a unified reference for the current state and future directions of generative AI.
Interactive charts from the paper's comparative analysis. Data sourced from Chatbot Arena (arena.ai) and independent benchmarks as of March 2026.
Top 12 models as of March 2026, coloured by geographic origin (US vs China)
Software engineering capability (%) for top models, March 2026
Highest Arena ELO per region
Gap between top US and Chinese models
Open-weight models with disclosed architectures
Number of frontier model families per modality
The top 10 text models span only 31 ELO points, indicating an extremely competitive landscape among proprietary systems.
GLM-5 (1451) closes to within 53 ELO points of the best proprietary model. The open-weight premium is diminishing rapidly.
The best US model (Claude, 80.8%) and Chinese model (MiniMax M2.5, 80.2%) achieve near-identical software engineering scores.
The US-China gap is just 53 points in text but 117 points in image generation, suggesting uneven capability transfer across modalities.
All open-weight models above 1400 ELO use mixture-of-experts, clustering at 400B-1T total parameters with 37B-170B active.
12 Chinese text model families vs 9 US. China also leads in video (4 vs 3). Europe has only 1 text and 1 image family at frontier level.