Leyuan Wang
Leyuan Wang
Vahvistettu sähköpostiosoite verkkotunnuksessa
{TVM}: An automated {End-to-End} optimizing compiler for deep learning
T Chen, T Moreau, Z Jiang, L Zheng, E Yan, H Shen, M Cowan, L Wang, ...
13th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2018
TVM: end-to-end optimization stack for deep learning
T Chen, T Moreau, Z Jiang, H Shen, EQ Yan, L Wang, Y Hu, L Ceze, ...
arXiv preprint arXiv:1802.04799 11 (2018), 20, 2018
Hawq-v3: Dyadic neural network quantization
Z Yao, Z Dong, Z Zheng, A Gholami, J Yu, E Tan, L Wang, Q Huang, ...
International Conference on Machine Learning, 11875-11886, 2021
Gunrock: GPU graph analytics
Y Wang, Y Pan, A Davidson, Y Wu, C Yang, L Wang, M Osama, C Yuan, ...
ACM Transactions on Parallel Computing (TOPC) 4 (1), 1-49, 2017
A comparative study on exact triangle counting algorithms on the GPU
L Wang, Y Wang, C Yang, JD Owens
Proceedings of the ACM Workshop on High Performance Graph Processing, 1-8, 2016
A unified optimization approach for cnn model inference on integrated gpus
L Wang, Z Chen, Y Liu, Y Wang, L Zheng, M Li, Y Wang
Proceedings of the 48th International Conference on Parallel Processing, 1-10, 2019
Bolt: Bridging the gap between auto-tuners and hardware-native performance
J Xing, L Wang, S Zhang, J Chen, A Chen, Y Zhu
Proceedings of Machine Learning and Systems 4, 204-216, 2022
UNIT: Unifying tensorized instruction compilation
J Weng, A Jain, J Wang, L Wang, Y Wang, T Nowatzki
2021 IEEE/ACM International Symposium on Code Generation and Optimization …, 2021
Bytetransformer: A high-performance transformer boosted for variable-length inputs
Y Zhai, C Jiang, L Wang, X Jia, S Zhang, Z Chen, X Liu, Y Zhu
2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2023
Fast parallel suffix array on the GPU
L Wang, S Baxter, JD Owens
European Conference on Parallel Processing, 573-587, 2015
Fast parallel skew and prefix‐doubling suffix array construction on the GPU
L Wang, S Baxter, JD Owens
Concurrency and Computation: Practice and Experience 28 (12), 3466-3484, 2016
Fast parallel subgraph matching on the gpu
L Wang, Y Wang, JD Owens
HPDC, 2016
Fast gunrock subgraph matching (gsm) on gpus
L Wang, JD Owens
arXiv preprint arXiv:2003.01527, 2020
Fast bfs-based triangle counting on gpus
L Wang, JD Owens
2019 IEEE High Performance Extreme Computing Conference (HPEC), 1-6, 2019
Optimal message scheduling for aggregation
L Wang, M Li, E Liberty, A Smola
Järjestelmä ei voi suorittaa toimenpidettä nyt. Yritä myöhemmin uudelleen.
Artikkelit 1–15