α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Ruoxi Jia
Ruoxi Jia
23
papers
1,384
total citations
papers (23)
The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks
CVPR 2020
arXiv
488
citations
Rethinking the Backdoor Attacks' Triggers: A Frequency Perspective
ICCV 2021
arXiv
255
citations
Knowledge-Enriched Distributional Model Inversion Attacks
ICCV 2021
arXiv
118
citations
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
ICML 2024
arXiv
99
citations
Label-Only Model Inversion Attacks via Boundary Repulsion
CVPR 2022
arXiv
95
citations
CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks
NEURIPS 2022
arXiv
89
citations
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
ICML 2024
arXiv
67
citations
Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study
ICCV 2023
arXiv
42
citations
Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits
ICML 2024
arXiv
23
citations
Scalability vs. Utility: Do We Have To Sacrifice One for the Other in Data Importance Quantification?
CVPR 2021
arXiv
21
citations
Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources
NEURIPS 2023
arXiv
17
citations
LLMs Can Plan Only If We Tell Them
ICLR 2025
arXiv
16
citations
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes
CVPR 2024
arXiv
15
citations
A Randomized Approach to Tight Privacy Accounting
NEURIPS 2023
arXiv
12
citations
LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models
COLM 2025
arXiv
10
citations
Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning
NEURIPS 2022
arXiv
5
citations
Data-Centric Human Preference with Rationales for Direct Preference Alignment
COLM 2025
arXiv
4
citations
Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning
ICML 2025
arXiv
4
citations
Detecting Adversarial Data Using Perturbation Forgery
CVPR 2025
arXiv
3
citations
Efficient Input-level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation
ICCV 2025
arXiv
1
citations
Probing Hidden Knowledge Holes in Unlearned LLMs
NEURIPS 2025
0
citations
Position: A Safe Harbor for AI Evaluation and Red Teaming
ICML 2024
0
citations
A Privacy-Friendly Approach to Data Valuation
NEURIPS 2023
0
citations