A step-by-step guide to evaluate coding agents on real Microsoft Dynamics 365 Business Central tasks using the BC-Bench framework.
BC-Bench is an open-source benchmarking framework by Microsoft for evaluating coding agents (Claude Code, GitHub Copilot CLI, and others) on real-world Business Central (AL) development tasks. It includes:
# Install BC-Bench
gh repo fork microsoft/BC-Bench --clone && cd BC-Bench
uv python install && uv sync --all-groups
# Explore the dataset
uv run bcbench dataset list
uv run bcbench dataset view microsoft__BCApps-4822
# Run your first evaluation (patch only, no container needed)
uv run bcbench run claude microsoft__BCApps-4822 --category bug-fix --model claude-sonnet-4-6