Generating applications with LLMs, a developer’s review

Cover image

Objective Link to heading

Test and compare different LLM models to create a simple Node.js application, in order to identify their strengths and weaknesses.

I have access to GitHub Copilot Pro for €10 per month, which gives me access to certain models.

Instructions given to the models (in french) Link to heading

Tu dois réaliser une application en NodeJs. Ne t'arrête pas tant que toutes les consignes ne sont pas respectées.

# Objectif

Servir une route sur http://localhost:3520/index.html qui affiche les menus de la semaine, avec des recettes aléatoires.

# Données

- Génère un set 100 de données représentant des idées de recettes (entrée, plat, dessert) classiques françaises dans un fichier JSON. Propose un format adapté et évolutif.

# Contraintes fonctionnelles

- Avoir un mode light et un mode dark.
- Avoir un affichage responsive.

# Contraintes techniques

- Utiliser NPM pour la gestion des dépendances.
- Utiliser Git pour le versionnage du code.
- Utiliser ESLint avec la configuration standard.
- Utiliser Prettier pour le formatage du code.
- L'application doit utiliser ExpressJs.
- Utiliser Handlebars comme moteur de template.
- Utiliser des fichiers JSON pour stocker les données.
- Couverture de code minimum de 85%, en utilisant Istanbul avec `mocha`. Ni trop, ni trop peu de tests.

Results Link to heading

All models produced a functional, responsive application with test coverage above 85% in a very short time.

Visuals Link to heading

Grok Code Fast 1

Cover image

Claude Sonet 4.5

Cover image

OpenAI GPT-5 Codex

Cover image

Gemini CLI

Cover image

Comparison table Link to heading

LLM Model Time (min) Lines of code Quality Cost Notes
Grok Code Fast 1 8 86 Simple and effective Non‑premium requests on GitHub Copilot 🔴 Outdated dependencies
Claude Sonet 4.5 14 440 Comprehensive, well‑structured code Premium requests on GitHub Copilot 🔴 Nitpicky about code coverage
🔴 Outdated dependencies
OpenAI GPT-5 Codex 25 506 Advanced, structured, but too verbose/excessive Premium requests on GitHub Copilot 🔴 Slow on large JSON files
🔴 Lint back‑and‑forth
🔴 No Git versioning
🟢 More extensible, parameterizable random function
Gemini CLI 8 87 Minimalistic, loosely structured Free requests within a monthly limit 🔴 Less convenient in the CLI than VS Code Copilot Chat
🔴 No npm start
🔴 Forgets .nyc_output in .gitignore
🔴 89 recipes generated instead of the requested 100

Personal conclusion Link to heading

For quick, small changes, Grok Code Fast 1 is responsive and efficient.

For projects requiring structure, extensibility, and comprehensive tests, GPT-5 Codex remains the best choice despite a longer generation time.

The models deliver fairly similar results and none of them is bad.

What’s next Link to heading

Continue exploring capabilities and test other use cases:

  • Bug fixing
  • Test generation
  • Code reviews

Comparison with existing benchmarks

Appendix — Code structure Link to heading

Grok Code Fast 1

Claude Sonet 4.5

OpenAI GPT-5 Codex

Gemini CLI

← Automating Jira ticket creation and correction with n8n and AI
How AI transforms software development →