研究方向
模型训练与推理
大模型自动并行训练、高效推理框架、大规模广告模型系统
信息检索与向量数据库
向量化信息检索、相似性搜索、云原生分布式架构
图处理与图学习
图算法执行、图神经网络训练推理、分布式图系统
联邦学习
分布式隐私保护机器学习、联邦优化算法
亮点成果
向量数据库 NeurIPS'21 竞赛冠军 OpenAI 官方推荐
联合向量数据库领域明星创业公司 Zilliz 架构世界首款云原生分布式向量数据库 Manu,获得人工智能顶会 NeurIPS'21 全球向量检索竞赛冠军,获得 OpenAI 官方推荐。
- 架构了覆盖单机内存、单机磁盘、多机分布式、云计算、边缘设备、GPU加速器的全面向量检索工具栈
- 相关成果在华为和微软亚洲研究院投入实用
图处理与图学习系统 AWS DGL 集成
开发了一套完整的图处理和图学习系统栈,在相关课题发表 CCF-A 类论文超过10篇。
- 高效图算法执行引擎 CoroGraph,GPU图采样引擎 gSampler
- 分布式图通信引擎 DGCL,分布式图神经网络训练系统 DSP 和 ATP
- 基于磁盘的图模型训练系统 DiskGNN,图模型推理框架 DGI、Atom、SG-Serve
- 成果被整合进全球知名的亚马逊云 DGL 图学习平台,并在华为公司投入实用
大模型训练和推理系统
联合多家顶尖企业开发大模型系统,降低训练和推理成本:
- TensorOPT(联合华为):大模型自动并行训练系统
- RetroKV(联合微软亚洲研究院):基于稀疏注意力的高效大模型推理框架
- MTGenRec / MTServe(联合美团):大型广告模型训练和推理系统
- FEC / RECcompile(联合 Meta):大型广告模型并行训练系统
学生培养
- 指导的本科和硕士生获得宾夕法尼亚大学、密歇根大学、MIT、香港中文大学等高校全奖博士。
- 指导的博士生入职 Meta、AWS、华为(天才少年)等顶尖企业。
代表论文
DSP: Efficient GNN Training with Multiple GPUs CCF A
PPoPP, 2023
gSampler: General and Efficient GPU-based Graph Sampling for Graph Learning CCF A
SOSP, 2023
Face2exp: Combating Data Biases for Facial Expression Recognition CCF A
CVPR, 2022
Manu: A Cloud Native Vector Database Management System CCF A
VLDB, 2022
DGCL: An Efficient Communication Library for Distributed GNN Training CCF A
EuroSys, 2021
Elastic Deep Learning in Multi-tenant GPU Clusters CCF A
TPDS, 2021
TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-parallelism CCF A
TPDS, 2021
G-miner: An Efficient Task-oriented Graph Mining System CCF A
EuroSys, 2018
Flexps: Flexible Parallelism Control in Parameter Server Architecture CCF A
VLDB, 2018
Norm-ranging LSH for Maximum Inner Product Search CCF A
NeurIPS, 2018
其他论文
- H Li, S Deng, X Yan, X Zhi, J Cheng. SAQ: Pushing the Limits of Vector Quantization through Code Adjustment and Dimension Segmentation. SIGMOD, 2025. CCF A
- Y Chen, X Yan, A Meliou, E Lo. DiskJoin: Large-scale Vector Similarity Join with SSD. SIGMOD, 2025. CCF A
- P Yin, Q Zhou, X Yan, C Wang, E Lo, C Li, L Lu, H Fan, W Zhou, MC Yang, J Cheng. CARINA: An Efficient CXL-Oriented Embedding Serving System for Recommendation Models. SIGMOD, 2025. CCF A
- M Li, X Yan, B Lu, Y Zhang, J Cheng, C Ma. Attribute Filtering in Approximate Nearest Neighbor Search: An In-depth Experimental Study. SIGMOD, 2025. CCF A
- R Liu, Y Wang, X Yan, H Jiang, Z Cai, M Wang, B Tang, J Li. DiskGNN: Bridging I/O Efficiency and Model Accuracy for Out-of-core GNN Training. SIGMOD, 2025. CCF A
- K Ma, R Liu, X Yan, Z Cai, X Song, M Wang, Y Li, J Cheng. Adaptive Parallel Training for Graph Neural Networks. PPoPP, 2025. CCF A
- X Zhou, X Yan, F Fu, Z Fu, T Qian, Y Zhu, Q Zhang, B Cui, J Jiang. PS-MI: Accurate, Efficient, and Private Data Valuation in Vertical Federated Learning. VLDB, 2025. CCF A
- H Jiang, R Liu, Z Huang, Y Wang, X Yan, Z Cai, M Wang, D Wipf. MuseGNN: Forming Scalable, Convergent GNN Layers that Minimize a Sampling-Based Energy. ICLR, 2025.
- X Wang, J Jiang, X Yan, Q Huang. TESA: A Trajectory and Semantic-aware Dynamic Heterogeneous Graph Neural Network. WWW, 2025. CCF A
- Q Zhang, X Yan, Y Ding, F Fu, Q Xu, Z Li, C Hu, J Jiang. Hacore: Efficient Coreset Construction with Locality Sensitive Hashing for Vertical Federated Learning. AAAI, 2025. CCF A
- Q Zhang, X Yan, Y Zhao, F Fu, Q Xu, Y Ding, X Zhou, C Hu, J Jiang. Model Rake: A Defense Against Stealing Attacks in Split Learning. IJCAI, 2025. CCF A
- X Zhou, X Yan, F Fu, X Li, H Huang, Q Xu, C Yang, B Du, T Qian, J Jiang. Hounding Data Diversity: Towards Participant Selection in Vertical Federated Learning. ICDE, 2025. CCF A
- Y Chi, H Wang, Y Chen, Y Yang, J Yang, Z Yang, X Yan, G Chen. PI-SQL: Enhancing Text-to-SQL with Fine-Grained Guidance from Pivot Programming Languages. EMNLP Findings, 2025. CCF B
- C Zheng, G Jiang, X Yan, P Yin, Q Zhou, J Cheng. GE2: A General and Efficient Knowledge Graph Embedding Learning System. SIGMOD, 2024. CCF A
- Q Zhou, P Yin, X Yan, C Li, G Jiang, J Cheng. Atom: An Efficient Query Serving System for Embedding-based Knowledge Graph Reasoning with Operator-Level Batching. SIGMOD, 2024. CCF A
- W Ning, R Cheng, X Yan, B Kao, N Huo, NAH Haldar, B Tang. Debiasing Recommendation with Personal Popularity. WWW, 2024. CCF A
- Y Wang, X Yan, C Hu, Q Xu, C Yang, F Fu, W Zhang, H Wang, B Du, J Jiang. Generative and Contrastive Paradigms are Complementary for Graph Self-supervised Learning. ICDE, 2024. CCF A
- Z Bian, X Yan, J Zhang, ML Yiu, B Tang. QSRP: Efficient Reverse Query Processing on High-Dimensional Embeddings. ICDE, 2024. CCF A
- Y Wang, X Yan, S Jin, H Huang, Q Xu, Q Zhang, B Du, J Jiang. Self-supervised Learning for Graph Dataset Condensation. KDD, 2024. CCF A
- X Zhi, X Yan, B Tang, Z Yin, Y Zhu, M Zhou. CoroGraph: Bridging Cache Efficiency and Work Efficiency for Graph Algorithm Execution. VLDB, 2023. CCF A
- F Wang, X Yan, ML Yiu, S Li, Z Mao, B Tang. Speeding up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation. SIGMOD, 2023. CCF A
- K Ma, X Yan, Z Cai, Y Huang, Y Wu, J Cheng. FEC: Efficient Deep Recommendation Model Training with Flexible Embedding Communication. SIGMOD, 2023. CCF A
- S Zhang, R Yang, X Xiao, X Yan, B Tang. Effective and Efficient PageRank-based Positioning for Graph Visualization. SIGMOD, 2023. CCF A
- P Yin, X Yan, J Zhou, Q Fu, Z Cai, J Cheng, B Tang, M Wang. DGI: An Easy and Efficient Framework for GNN Model Evaluation. KDD, 2023. CCF A
- W Ning, X Yan, W Liu, R Cheng, R Zhang, B Tang. Multi-domain Recommendation with Embedding Disentangling and Domain Alignment. CIKM, 2023. CCF B
- Z Li, D Zeng, X Yan, Q Shen, B Tang. Analyzing and Combating Attribute Bias for Face Restoration. IJCAI, 2023. CCF A
- Q Shen, Z You, X Yan, C Zhang, K Xu, D Zeng, J Qin, B Tang. Qevis: Multi-grained Visualization of Distributed Query Execution. TVCG, 2023. CCF A
- J Zeng, X Yan, Y Li, M Han, B Tang. Extracting Top-Frequent and Diversified Patterns in Knowledge Graphs. TKDE, 2023. CCF A
- H Liu, B Tang, J Zhang, Y Deng, X Yan, X Zheng, Q Shen, D Zeng, Z Mao, ML Yiu, H Li, M Han, Q Li, Z Luo. GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing. SoCC, 2022. CCF B
- J Zhang, B Tang, ML Yiu, X Yan, K Li. T-levelindex: Towards Efficient Query Processing in Continuous Preference Space. SIGMOD, 2022. CCF A
- W Ning, R Cheng, J Shen, NAH Haldar, B Kao, X Yan, N Huo, WK Lam, B Tang. Automatic Meta-path Discovery for Effective Graph-based Recommendation. CIKM, 2022. CCF B
- H Yang, X Yan, X Dai, Y Chen, J Cheng. Self-enhanced GNN: Improving Graph Neural Networks Using Model Outputs. IJCNN, 2021.
- J Zeng, X Yan, M Han, B Tang. Fast Core-based Top-k Frequent Pattern Discovery in Knowledge Graphs. ICDE, 2021. CCF A
- W Ning, X Yan, B Tang. Towards Efficient MaxBRNN Computation for Streaming Updates. ICDE, 2021. CCF A
- Y Zhao, Z Liu, Y Wu, G Jiang, J Cheng, K Liu, X Yan. Timestamped State Sharing for Stream Analytics. TPDS, 2021. CCF A
- J Fang, H Fu, D Zeng, X Yan, Y Yan, J Liu. Combating Ambiguity for Hash-code Learning in Medical Instance Retrieval. IEEE JBHI, 2021.
- L Xiang, X Yan, L Lu, B Tang. GAIPS: Accelerating Maximum Inner Product Search with GPU. SIGIR, 2021. CCF A
- J Liu, X Yan, X Dai, Z Li, J Cheng, MC Yiu. Understanding and Improving Proximity Graph based Maximum Inner Product Search. AAAI, 2020. CCF A
- X Dai, X Yan, KKW Ng, J Liu, J Cheng. Norm-explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search. AAAI, 2020. CCF A
- X Dai, X Yan, K Zhou, Y Wang, H Yang, J Cheng. Convolutional Embedding for Edit Distance. SIGIR, 2020. CCF A
- Y Meng, X Yan, W Liu, H Wu, J Cheng. Wasserstein Collaborative Filtering for Item Cold-start Recommendation. UMAP, 2020.
- Y Meng, X Dai, X Yan, J Cheng, W Liu, J Guo, B Liao, G Chen. PMD: An Optimal Transportation-based User Distance for Recommender Systems. ECIR, 2020.
- H Chen, C Li, J Fang, C Huang, J Cheng, J Zhang, Y Hou, X Yan. Grasper: A High Performance Distributed System for OLAP on Property Graphs. SoCC, 2019. CCF B
- Y Huang, X Yan, G Jiang, T Jin, J Cheng, A Xu, Z Liu, S Tu. Tangram: Bridging Immutable and Mutable Abstractions for Distributed Data Analytics. USENIX ATC, 2019. CCF A
- S Deng, X Yan, KWN Kelvin, C Jiang, J Cheng. Pyramid: A General Framework for Distributed Similarity Search on Large-scale Datasets. IEEE BigData, 2019.
- J Li, X Yan, J Zhang, A Xu, J Cheng, J Liu, KKW Ng, T Cheng. A General and Efficient Querying Method for Learning to Hash. SIGMOD, 2018. CCF A
- J Li, J Cheng, F Yang, Y Huang, Y Zhao, X Yan, R Zhao. Losha: A General Framework for Scalable Locality Sensitive Hashing. SIGIR, 2017. CCF A