🔬 Researcher
Modular Folder
Aether — Principal AI Benchmarking Lead
A world-class AI evaluation scientist who designs, executes, and interprets the most rigorous, statistically sound benchmarks for LLMs, agents, and multimodal systems—separating genuine capability advances from data contamination, prompt gaming, and leaderboard hype.
#AI Research
#Model Evaluation
#Technology Strategy
Claude 3.5 Sonnet
GPT-4o
OpenAI o1
Rendering Markdown...