A code judge that doesn't just check your output — it checks if you understand your solution. Submit Python code, run it in an isolated Docker container, get evaluated against multiple test cases. On accepted, Gemini generates follow-up questions specific to your code and evaluates your answers.
CodeBench-Demo.mp4
- User submits Python code via the React frontend
- Rails saves the submission and enqueues a Sidekiq job — returns 202 immediately
- Frontend polls every 2 seconds until status is terminal
- Sidekiq worker runs the code inside an isolated Docker container per test case
- Output is compared against all test cases — partial results tracked (e.g. 3/4 passed)
- On wrong answer: Shows the failing input, your output & expected output
- On accepted: Gemini generates 3 short follow-up questions specific to your code
- User has 3 minutes to answer — Gemini evaluates leniently and explains each result
Key decisions:
- 202 + polling instead of WebSockets — execution takes 2-15s, polling at 2s is simpler and sufficient
- Docker sandbox for isolation — no network, memory capped, read-only filesystem, hard time limit
- Sidekiq keeps the web process non-blocking during code execution
- Gemini via raw HTTP — no SDK needed for a single endpoint
- SubmissionRecoveryWorker runs every 5 minutes via Sidekiq cron to re-enqueue stuck
pendingsubmissions and mark crashedrunningsubmissions asruntime_error
Each submission runs in a fresh Docker container with:
| Constraint | Value |
|---|---|
| Network | None |
| Memory | 128MB (swap also capped) |
| CPU | 0.5 cores |
| Max processes | 32 |
| Time limit | 10 seconds |
| Filesystem | Read-only |
accepted: All test cases passedwrong_answer: Output mismatch on at least one test caseruntime_error: Non-zero exit codecompile_error: Python SyntaxError detectedtime_limit_exceeded: Execution exceeded 10 seconds
- Python 3 only
- Output compared as plain strings — whitespace and formatting matter
- No per-user accounts — submissions are not tied to a user
- Gemini follow-up is synchronous — a slow API response delays the result
- API: Ruby on Rails 7.2 (API mode)
- Background jobs: Sidekiq + Redis
- Database: PostgreSQL
- Execution sandbox: Docker (python:3.11-alpine)
- AI follow-up: Google Gemini 2.5 Flash
- Frontend: React + Vite
Prerequisites: Ruby 3.2.2, Rails 7.2, PostgreSQL, Redis, Docker
git clone https://github.com/Alokxk/CodeBench.git
cd CodeBench
bundle install
rails db:create db:migrate db:seed
docker pull python:3.11-alpineCreate .env in the project root:
GEMINI_API_KEY=your_key_here
SIDEKIQ_WEB_PASSWORD=any_password_you_want
SIDEKIQ_WEB_SECRET=generate_and_paste_a_random_secret_here
NOTE:
- Run this to generate SIDEKIQ_WEB_SECRET:
ruby -e "require 'securerandom'; puts SecureRandom.hex(32)" - Get a free Gemini API key at aistudio.google.com.
CodeBench requires three separate processes.
# Terminal 1
rails server
# Terminal 2
bundle exec sidekiq -C config/sidekiq.yml
# Terminal 3
cd frontend && npm install && npm run devOpen http://localhost:5173 in your browser to access the frontend. You can submit code and see results in real-time.
The Sidekiq dashboard is available at http://localhost:3000/sidekiq (password from .env).
Contributions are welcome. Please open an issue or submit a pull request. Make sure to follow the existing code style. For major changes, please discuss them in an issue first.
