Skip to content
Back to work
Fullstack

eda-bench – Data Exploration & Reasoning Benchmark

Research-grade benchmarking site for evaluating how effectively large language models explore and reason about data, with a live leaderboard, task registry, and documentation.

Role
Frontend & Full-Stack
Year
2025
Duration
2 weeks
Next.js 16TypeScriptTailwind CSS 4RechartsGSAPMDXshadcn/ui

2 weeks

Built in

Leaderboard + Docs

Surface

GitHub-backed

Tasks

Overview

Research-grade benchmarking site for evaluating how effectively large language models explore and reason about data, with a live leaderboard, task registry, and documentation.

Problem

The project needed a public home that could present complex benchmark results in a way that researchers and practitioners could actually use.

Approach

I built a Next.js 16 App Router site with an animated leaderboard, GitHub-backed task registry, and MDX-powered docs.

Key features

  • Live leaderboard with smooth GSAP-powered transitions
  • Task registry that is filterable and sourced from GitHub
  • Integrated docs using MDX
  • One-click copy for plain text and BibTeX citations

Outcomes

  • Leaderboard, tasks, docs, and contributors all live under one consistent design system
  • Built to support papers, talks, and community contributions