晏潇

晏潇

研究员 / 副教授 / 博士生导师

国家海外优青

武汉大学 数学与智能研究院
专业技术正高四级岗

📧 yanxiaosunny@whu.edu.cn 📍 武汉大学 雷军科技楼 A912 🎓 Google Scholar
晏潇博士,2020年于香港中文大学获得博士学位。2020年-2023年于南方科技大学计算机系任副研究员;2024年至2025年4月于香港博智感知交互研究中心任博士后研究员。主要研究方向是数据库与机器学习系统,专注提高系统效率。目前已发表CCF-A类论文60余篇,涵盖SOSP、PPoPP、SIGMOD、VLDB等系统和数据领域旗舰会议。担任VLDB、ICML、VLDBJ等10多个著名会议期刊的审稿人。研发的系统和技术在亚马逊云(AWS)、华为、微软亚洲研究院、Zilliz等公司投入实用,并与阿里、腾讯、美团、Meta等企业持续合作推动研究落地。

研究方向

模型训练与推理

大模型自动并行训练、高效推理框架、大规模广告模型系统

信息检索与向量数据库

向量化信息检索、相似性搜索、云原生分布式架构

图处理与图学习

图算法执行、图神经网络训练推理、分布式图系统

联邦学习

分布式隐私保护机器学习、联邦优化算法

亮点成果

向量数据库 NeurIPS'21 竞赛冠军 OpenAI 官方推荐

联合向量数据库领域明星创业公司 Zilliz 架构世界首款云原生分布式向量数据库 Manu,获得人工智能顶会 NeurIPS'21 全球向量检索竞赛冠军,获得 OpenAI 官方推荐。

  • 架构了覆盖单机内存、单机磁盘、多机分布式、云计算、边缘设备、GPU加速器的全面向量检索工具栈
  • 相关成果在华为和微软亚洲研究院投入实用

图处理与图学习系统 AWS DGL 集成

开发了一套完整的图处理和图学习系统栈,在相关课题发表 CCF-A 类论文超过10篇

  • 高效图算法执行引擎 CoroGraph,GPU图采样引擎 gSampler
  • 分布式图通信引擎 DGCL,分布式图神经网络训练系统 DSP 和 ATP
  • 基于磁盘的图模型训练系统 DiskGNN,图模型推理框架 DGI、Atom、SG-Serve
  • 成果被整合进全球知名的亚马逊云 DGL 图学习平台,并在华为公司投入实用

大模型训练和推理系统

联合多家顶尖企业开发大模型系统,降低训练和推理成本:

  • TensorOPT(联合华为):大模型自动并行训练系统
  • RetroKV(联合微软亚洲研究院):基于稀疏注意力的高效大模型推理框架
  • MTGenRec / MTServe(联合美团):大型广告模型训练和推理系统
  • FEC / RECcompile(联合 Meta):大型广告模型并行训练系统

学生培养

  • 指导的本科和硕士生获得宾夕法尼亚大学、密歇根大学、MIT、香港中文大学等高校全奖博士。
  • 指导的博士生入职 Meta、AWS、华为(天才少年)等顶尖企业。

代表论文

DSP: Efficient GNN Training with Multiple GPUs CCF A

PPoPP, 2023

gSampler: General and Efficient GPU-based Graph Sampling for Graph Learning CCF A

SOSP, 2023

Face2exp: Combating Data Biases for Facial Expression Recognition CCF A

CVPR, 2022

Manu: A Cloud Native Vector Database Management System CCF A

VLDB, 2022

DGCL: An Efficient Communication Library for Distributed GNN Training CCF A

EuroSys, 2021

Elastic Deep Learning in Multi-tenant GPU Clusters CCF A

TPDS, 2021

TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-parallelism CCF A

TPDS, 2021

G-miner: An Efficient Task-oriented Graph Mining System CCF A

EuroSys, 2018

Flexps: Flexible Parallelism Control in Parameter Server Architecture CCF A

VLDB, 2018

Norm-ranging LSH for Maximum Inner Product Search CCF A

NeurIPS, 2018

其他论文

  • H Li, S Deng, X Yan, X Zhi, J Cheng. SAQ: Pushing the Limits of Vector Quantization through Code Adjustment and Dimension Segmentation. SIGMOD, 2025. CCF A
  • Y Chen, X Yan, A Meliou, E Lo. DiskJoin: Large-scale Vector Similarity Join with SSD. SIGMOD, 2025. CCF A
  • P Yin, Q Zhou, X Yan, C Wang, E Lo, C Li, L Lu, H Fan, W Zhou, MC Yang, J Cheng. CARINA: An Efficient CXL-Oriented Embedding Serving System for Recommendation Models. SIGMOD, 2025. CCF A
  • M Li, X Yan, B Lu, Y Zhang, J Cheng, C Ma. Attribute Filtering in Approximate Nearest Neighbor Search: An In-depth Experimental Study. SIGMOD, 2025. CCF A
  • R Liu, Y Wang, X Yan, H Jiang, Z Cai, M Wang, B Tang, J Li. DiskGNN: Bridging I/O Efficiency and Model Accuracy for Out-of-core GNN Training. SIGMOD, 2025. CCF A
  • K Ma, R Liu, X Yan, Z Cai, X Song, M Wang, Y Li, J Cheng. Adaptive Parallel Training for Graph Neural Networks. PPoPP, 2025. CCF A
  • X Zhou, X Yan, F Fu, Z Fu, T Qian, Y Zhu, Q Zhang, B Cui, J Jiang. PS-MI: Accurate, Efficient, and Private Data Valuation in Vertical Federated Learning. VLDB, 2025. CCF A
  • H Jiang, R Liu, Z Huang, Y Wang, X Yan, Z Cai, M Wang, D Wipf. MuseGNN: Forming Scalable, Convergent GNN Layers that Minimize a Sampling-Based Energy. ICLR, 2025.
  • X Wang, J Jiang, X Yan, Q Huang. TESA: A Trajectory and Semantic-aware Dynamic Heterogeneous Graph Neural Network. WWW, 2025. CCF A
  • Q Zhang, X Yan, Y Ding, F Fu, Q Xu, Z Li, C Hu, J Jiang. Hacore: Efficient Coreset Construction with Locality Sensitive Hashing for Vertical Federated Learning. AAAI, 2025. CCF A
  • Q Zhang, X Yan, Y Zhao, F Fu, Q Xu, Y Ding, X Zhou, C Hu, J Jiang. Model Rake: A Defense Against Stealing Attacks in Split Learning. IJCAI, 2025. CCF A
  • X Zhou, X Yan, F Fu, X Li, H Huang, Q Xu, C Yang, B Du, T Qian, J Jiang. Hounding Data Diversity: Towards Participant Selection in Vertical Federated Learning. ICDE, 2025. CCF A
  • Y Chi, H Wang, Y Chen, Y Yang, J Yang, Z Yang, X Yan, G Chen. PI-SQL: Enhancing Text-to-SQL with Fine-Grained Guidance from Pivot Programming Languages. EMNLP Findings, 2025. CCF B
  • C Zheng, G Jiang, X Yan, P Yin, Q Zhou, J Cheng. GE2: A General and Efficient Knowledge Graph Embedding Learning System. SIGMOD, 2024. CCF A
  • Q Zhou, P Yin, X Yan, C Li, G Jiang, J Cheng. Atom: An Efficient Query Serving System for Embedding-based Knowledge Graph Reasoning with Operator-Level Batching. SIGMOD, 2024. CCF A
  • W Ning, R Cheng, X Yan, B Kao, N Huo, NAH Haldar, B Tang. Debiasing Recommendation with Personal Popularity. WWW, 2024. CCF A
  • Y Wang, X Yan, C Hu, Q Xu, C Yang, F Fu, W Zhang, H Wang, B Du, J Jiang. Generative and Contrastive Paradigms are Complementary for Graph Self-supervised Learning. ICDE, 2024. CCF A
  • Z Bian, X Yan, J Zhang, ML Yiu, B Tang. QSRP: Efficient Reverse Query Processing on High-Dimensional Embeddings. ICDE, 2024. CCF A
  • Y Wang, X Yan, S Jin, H Huang, Q Xu, Q Zhang, B Du, J Jiang. Self-supervised Learning for Graph Dataset Condensation. KDD, 2024. CCF A
  • X Zhi, X Yan, B Tang, Z Yin, Y Zhu, M Zhou. CoroGraph: Bridging Cache Efficiency and Work Efficiency for Graph Algorithm Execution. VLDB, 2023. CCF A
  • F Wang, X Yan, ML Yiu, S Li, Z Mao, B Tang. Speeding up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation. SIGMOD, 2023. CCF A
  • K Ma, X Yan, Z Cai, Y Huang, Y Wu, J Cheng. FEC: Efficient Deep Recommendation Model Training with Flexible Embedding Communication. SIGMOD, 2023. CCF A
  • S Zhang, R Yang, X Xiao, X Yan, B Tang. Effective and Efficient PageRank-based Positioning for Graph Visualization. SIGMOD, 2023. CCF A
  • P Yin, X Yan, J Zhou, Q Fu, Z Cai, J Cheng, B Tang, M Wang. DGI: An Easy and Efficient Framework for GNN Model Evaluation. KDD, 2023. CCF A
  • W Ning, X Yan, W Liu, R Cheng, R Zhang, B Tang. Multi-domain Recommendation with Embedding Disentangling and Domain Alignment. CIKM, 2023. CCF B
  • Z Li, D Zeng, X Yan, Q Shen, B Tang. Analyzing and Combating Attribute Bias for Face Restoration. IJCAI, 2023. CCF A
  • Q Shen, Z You, X Yan, C Zhang, K Xu, D Zeng, J Qin, B Tang. Qevis: Multi-grained Visualization of Distributed Query Execution. TVCG, 2023. CCF A
  • J Zeng, X Yan, Y Li, M Han, B Tang. Extracting Top-Frequent and Diversified Patterns in Knowledge Graphs. TKDE, 2023. CCF A
  • H Liu, B Tang, J Zhang, Y Deng, X Yan, X Zheng, Q Shen, D Zeng, Z Mao, ML Yiu, H Li, M Han, Q Li, Z Luo. GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing. SoCC, 2022. CCF B
  • J Zhang, B Tang, ML Yiu, X Yan, K Li. T-levelindex: Towards Efficient Query Processing in Continuous Preference Space. SIGMOD, 2022. CCF A
  • W Ning, R Cheng, J Shen, NAH Haldar, B Kao, X Yan, N Huo, WK Lam, B Tang. Automatic Meta-path Discovery for Effective Graph-based Recommendation. CIKM, 2022. CCF B
  • H Yang, X Yan, X Dai, Y Chen, J Cheng. Self-enhanced GNN: Improving Graph Neural Networks Using Model Outputs. IJCNN, 2021.
  • J Zeng, X Yan, M Han, B Tang. Fast Core-based Top-k Frequent Pattern Discovery in Knowledge Graphs. ICDE, 2021. CCF A
  • W Ning, X Yan, B Tang. Towards Efficient MaxBRNN Computation for Streaming Updates. ICDE, 2021. CCF A
  • Y Zhao, Z Liu, Y Wu, G Jiang, J Cheng, K Liu, X Yan. Timestamped State Sharing for Stream Analytics. TPDS, 2021. CCF A
  • J Fang, H Fu, D Zeng, X Yan, Y Yan, J Liu. Combating Ambiguity for Hash-code Learning in Medical Instance Retrieval. IEEE JBHI, 2021.
  • L Xiang, X Yan, L Lu, B Tang. GAIPS: Accelerating Maximum Inner Product Search with GPU. SIGIR, 2021. CCF A
  • J Liu, X Yan, X Dai, Z Li, J Cheng, MC Yiu. Understanding and Improving Proximity Graph based Maximum Inner Product Search. AAAI, 2020. CCF A
  • X Dai, X Yan, KKW Ng, J Liu, J Cheng. Norm-explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search. AAAI, 2020. CCF A
  • X Dai, X Yan, K Zhou, Y Wang, H Yang, J Cheng. Convolutional Embedding for Edit Distance. SIGIR, 2020. CCF A
  • Y Meng, X Yan, W Liu, H Wu, J Cheng. Wasserstein Collaborative Filtering for Item Cold-start Recommendation. UMAP, 2020.
  • Y Meng, X Dai, X Yan, J Cheng, W Liu, J Guo, B Liao, G Chen. PMD: An Optimal Transportation-based User Distance for Recommender Systems. ECIR, 2020.
  • H Chen, C Li, J Fang, C Huang, J Cheng, J Zhang, Y Hou, X Yan. Grasper: A High Performance Distributed System for OLAP on Property Graphs. SoCC, 2019. CCF B
  • Y Huang, X Yan, G Jiang, T Jin, J Cheng, A Xu, Z Liu, S Tu. Tangram: Bridging Immutable and Mutable Abstractions for Distributed Data Analytics. USENIX ATC, 2019. CCF A
  • S Deng, X Yan, KWN Kelvin, C Jiang, J Cheng. Pyramid: A General Framework for Distributed Similarity Search on Large-scale Datasets. IEEE BigData, 2019.
  • J Li, X Yan, J Zhang, A Xu, J Cheng, J Liu, KKW Ng, T Cheng. A General and Efficient Querying Method for Learning to Hash. SIGMOD, 2018. CCF A
  • J Li, J Cheng, F Yang, Y Huang, Y Zhao, X Yan, R Zhao. Losha: A General Framework for Scalable Locality Sensitive Hashing. SIGIR, 2017. CCF A