Open Source · GitHub Pages · EN & ES

BC-Bench Guide

A step-by-step guide to evaluate coding agents on real Microsoft Dynamics 365 Business Central tasks using the BC-Bench framework.

🐛 101 real bugs

🤖 Multi-agent

📊 Statistical metrics

📚 5 chapters

🌐 EN · ES

Choose your language / Elige tu idioma

English
Start the guide in English Español
Empezar la guia en español

What is BC-Bench?

BC-Bench is an open-source benchmarking framework by Microsoft for evaluating coding agents (Claude Code, GitHub Copilot CLI, and others) on real-world Business Central (AL) development tasks. It includes:

101 real bugs from the BCApps and NAV repositories
Automated evaluation with compilation and test execution in BC containers
Configurable agents with custom instructions, skills, and MCP servers
Statistical metrics including pass rate, bootstrap CI, and pass@k

Guide Contents

#	English	Espanol
1	Introduction	Introduccion
2	Setup & First Evaluation	Setup y Primera Evaluacion
3	Agent Configuration	Configuracion de Agentes
4	Baselines & VM Scripts	Baselines y Scripts VM
5	Results & Analysis	Resultados y Analisis

Quick Start

# Install BC-Bench
gh repo fork microsoft/BC-Bench --clone && cd BC-Bench
uv python install && uv sync --all-groups

# Explore the dataset
uv run bcbench dataset list
uv run bcbench dataset view microsoft__BCApps-4822

# Run your first evaluation (patch only, no container needed)
uv run bcbench run claude microsoft__BCApps-4822 --category bug-fix --model claude-sonnet-4-6