Back to Portfolio

eda-bench

Benchmarking platform for evaluating how effectively large language models explore and reason about data. Features a live leaderboard, task registry, CLI documentation, and agent performance visualizations.

View Live Site →
eda-bench Banner

Challenge

Build a clean, research-grade web platform that presents LLM benchmark results in a way researchers and practitioners can actually use. The challenge was designing an interface that communicates complex model comparison data clearly — with a live leaderboard, task browsing, and citation tooling — while keeping the experience fast and accessible.

Approach

Developed a full-stack benchmark website using Next.js 16 with App Router that surfaces live leaderboard rankings, a browsable task registry, and animated agent performance visualizations. Task data is pulled dynamically from GitHub and presented in a minimal, research-grade UI.

Built an MDX-powered documentation system integrated directly into the app, covering CLI installation, usage guides, and first-steps walkthroughs — with consistent navigation across the entire site.

Category
Full-Stack Development
Year
2025
Role
Frontend & Full-Stack
Type
Research Platform

Technical Implementation

Leaderboard & Performance Visualization

Built an animated agent performance chart ranked by task resolution success rate. The leaderboard tracks model, organization, score, task count, and version with smooth bar animations on load using GSAP.

Task Registry & GitHub Integration

Implemented dynamic task fetching from GitHub to power a filterable task browser. Tasks are organized by domain, with featured tasks surfaced on the homepage.

MDX Documentation System

Built a docs site using MDX integrated into Next.js with sidebar navigation covering installation, CLI reference, and first-steps guides — consistent with the rest of the site's design system.

Citation & Licensing

Implemented one-click copy for both plain text and BibTeX citation formats with toast feedback, supporting proper academic attribution.

Homepage Light Mode
Homepage Dark Mode
Agent Performance Leaderboard
Task Registry
Benchmark Registry
MDX Documentation
Contributors Page

Core Features

  • • Live model leaderboard with animated performance bars
  • • Browsable task registry sourced from GitHub
  • • MDX documentation with sidebar navigation
  • • BibTeX + plain text citation with one-click copy
  • • Contributors page
  • • Dark/light theme toggle
  • • Fully responsive design
  • • Research-grade minimal UI

Delivered a polished, research-grade platform that makes complex benchmark data easy to explore and cite — giving the eda-bench project a credible public face and a solid foundation for community growth.