add live stress harness, app-level admin login rate limit

tests/stress/live_accuracy.mjs: classroom-scale accuracy + latency test that targets the deployed server (single-session, sid=main). Logs in as admin via /admin/login, resets the session, joins N students serially over HTTP, opens N student WebSockets in batches of 8 (250ms apart) plus the instructor WS, then drives every question through the admin "next" command. Each student picks uniformly random A-D, sends the submit, waits for the submit_ack, and records the round-trip latency. After session_ended, the script verifies that every student whose pick == correct got score > 0, every other submission got score == 0, and reports p50/p95/p99 ack latency. First live run: 50 students, 100 submits, 100% acks, 100% accuracy match, p99 555ms (≈intercontinental RTT to HK). tests/stress/live_loop.sh: tmux-friendly loop that runs the live test every 60s and appends a JSONL summary line per cycle to runs/live_summary.jsonl. Mirrors the morning's api_stress run_loop shape so per-cycle aggregates are easy to scrape. app/rate_limit.py: tiny in-memory token bucket. Capacity + refill in tokens/minute, keyed by client IP via X-Forwarded-For (with a fallback to request.client.host). Process-local state — admin login is the only user. POST /admin/login: rate-limited at 10 attempts/minute/IP. Generous for the legit instructor (who succeeds in 1-2 tries) and prohibitive for brute force from a single attacker IP. Student endpoints deliberately NOT rate-limited because campus students share NAT gateways and IP-level limits would false-positive a whole class. The bucket is per-app-instance (instantiated inside the router factory), so test apps each get a fresh one and tests don't poison each other.
2026-05-03 00:23:07 +08:00
parent 7a483ad3ee
commit 2136286275
5 changed files with 483 additions and 1 deletions
--- a/app/routes_admin.py
+++ b/app/routes_admin.py
@@ -22,17 +22,30 @@ from app import auth
 from app.config import Settings
 from app.csv_export import export_session_csv
 from app.models import AdminLoginRequest
+from app.rate_limit import TokenBucket, client_ip
 from app.room import RoomManager


 def router(settings: Settings, rooms: RoomManager) -> APIRouter:
    api = APIRouter()

+    # Per-app instance so test apps get fresh state.
+    # 10 attempts/minute/IP — generous for the instructor, hostile to brute
+    # force without locking out the campus network on student endpoints
+    # (which are not rate-limited at all, see rate_limit.py).
+    login_bucket = TokenBucket(capacity=10, refill_per_minute=10)
+
    def require_admin(request: Request) -> None:
        auth.require_admin_request(settings, request)

    @api.post("/admin/login")
-    async def login(body: AdminLoginRequest, response: Response):
+    async def login(body: AdminLoginRequest, request: Request, response: Response):
+        ip = client_ip(request)
+        if not login_bucket.take(ip):
+            raise HTTPException(
+                status_code=429,
+                detail="Too many login attempts; try again in a minute.",
+            )
        if not auth.verify_admin_password(settings, body.password):
            raise HTTPException(status_code=401, detail="Invalid admin password")
        auth.set_admin_cookie(settings, response, auth.sign_admin(settings))