"attention mechanism acceleration" Papers
2 papers found
Conference
ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Xinhao Luo, Zihan Liu, Yangjie Zhou et al.
NEURIPS 2025arXiv:2508.18850
2
citations
Pushing the Limits of BFP on Narrow Precision LLM Inference
Hui Wang, Yuan Cheng, Xiaomeng Han et al.
AAAI 2025paperarXiv:2502.00026
1
citations