"llm serving" Papers
2 papers found
Conference
NestedFP: High-Performance, Memory-Efficient Dual-Precision Floating Point Support for LLMs
Haeun Lee, Omin Kwon, Yeonhong Park et al.
NEURIPS 2025arXiv:2506.02024
1
citations
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving
Jiangfei Duan, Runyu Lu, Haojie Duanmu et al.
ICML 2024oralarXiv:2404.02015
39
citations