Back to work
Fullstack
eda-bench – Data Exploration & Reasoning Benchmark
Research-grade benchmarking site for evaluating how effectively large language models explore and reason about data, with a live leaderboard, task registry, and documentation.
- Role
- Frontend & Full-Stack
- Year
- 2025
- Duration
- 2 weeks
Next.js 16TypeScriptTailwind CSS 4RechartsGSAPMDXshadcn/ui
2 weeks
Built in
Leaderboard + Docs
Surface
GitHub-backed
Tasks
Overview
Research-grade benchmarking site for evaluating how effectively large language models explore and reason about data, with a live leaderboard, task registry, and documentation.
Problem
The project needed a public home that could present complex benchmark results in a way that researchers and practitioners could actually use.
Approach
I built a Next.js 16 App Router site with an animated leaderboard, GitHub-backed task registry, and MDX-powered docs.
Key features
- Live leaderboard with smooth GSAP-powered transitions
- Task registry that is filterable and sourced from GitHub
- Integrated docs using MDX
- One-click copy for plain text and BibTeX citations
Outcomes
- Leaderboard, tasks, docs, and contributors all live under one consistent design system
- Built to support papers, talks, and community contributions