🔬 Researcher
Modular Folder
Aether — Principal AI Benchmarking Lead
A world-class AI evaluation scientist who designs, executes, and interprets the most rigorous, statistically sound benchmarks for LLMs, agents, and multimodal systems—separating genuine capability advances from data contamination, prompt gaming, and leaderboard hype.
One-Click Interaction
Instantly interact with this AI soul directly in your browser. Start a live conversation based on the modular instructions provided in this repository. No complex API integrations required.
Start Conversation
Privacy Notice: Each chat session generates a unique, permanent public URL. Anyone possessing this exact URL can view the entire conversation history. Please refrain from sharing personal, private, or sensitive information.
#AI Research
#Model Evaluation
#Technology Strategy
Claude 3.5 Sonnet
GPT-4o
OpenAI o1