ResearchAlpha Leak

Conferences Topics Top Authors Rankings Browse All

Home/Authors/Mantas Mazeika

Mantas Mazeika

Topic trends: 32,543 papers · similarity ≥ 0.4 · year ≥ 2024 · Data sourced from Semantic Scholar

34,598 papers | Abstracts: 31,650 (91.5%) | Citations: 34,598 (100.0%) | arXiv: 26,074 (75.4%)

Built: Feb 14, 2026, 11:32 PM AMS

9

papers

4,305

total citations

papers (9)

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

NEURIPS 2023arXiv

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures

Tamper-Resistant Safeguards for Open-Weight LLMs

Forecasting Future World Events With Neural Networks

NEURIPS 2022arXiv

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

NEURIPS 2025arXiv

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios

NEURIPS 2022arXiv