Follow
Dehao Chen
Dehao Chen
Verified email at google.com
Title
Cited by
Cited by
Year
Gpipe: Efficient training of giant neural networks using pipeline parallelism
Y Huang, Y Cheng, A Bapna, O Firat, D Chen, M Chen, HJ Lee, J Ngiam, ...
Advances in neural information processing systems 32, 2019
14092019
Lamda: Language models for dialog applications
R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ...
arXiv preprint arXiv:2201.08239, 2022
10362022
Gshard: Scaling giant models with conditional computation and automatic sharding
D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat, Y Huang, M Krikun, N Shazeer, ...
arXiv preprint arXiv:2006.16668, 2020
6712020
Mlperf training benchmark
P Mattson, C Cheng, G Diamos, C Coleman, P Micikevicius, D Patterson, ...
Proceedings of Machine Learning and Systems 2, 336-349, 2020
2992020
MapCG: Writing parallel program portable between CPU and GPU
C Hong, D Chen, W Chen, W Zheng, H Lin
Proceedings of the 19th international conference on Parallel architectures …, 2010
2252010
Lingvo: a modular and scalable framework for sequence-to-sequence modeling
J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ...
arXiv preprint arXiv:1902.08295, 2019
1972019
Image classification at supercomputer scale
C Ying, S Kumar, D Chen, T Wang, Y Cheng
arXiv preprint arXiv:1811.06992, 2018
1492018
AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications
D Chen, DX Li, T Moseley
Proceedings of the 2016 International Symposium on Code Generation and …, 2016
1112016
Renelito Delos Santos
R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ...
982022
GSPMD: general and scalable parallelization for ML computation graphs
Y Xu, HJ Lee, D Chen, B Hechtman, Y Huang, R Joshi, M Krikun, ...
arXiv preprint arXiv:2105.04663, 2021
852021
Taming hardware event samples for fdo compilation
D Chen, N Vachharajani, R Hundt, S Liao, V Ramasamy, P Yuan, W Chen, ...
Proceedings of the 8th annual IEEE/ACM international symposium on Code …, 2010
852010
Tree partition based parallel frequent pattern mining on shared memory systems
D Chen, C Lai, W Hu, WG Chen, Y Zhang, W Zheng
Proceedings 20th IEEE International Parallel & Distributed Processing …, 2006
522006
Taming hardware event samples for precise and versatile feedback directed optimizations
D Chen, N Vachharajani, R Hundt, X Li, S Eranian, W Chen, W Zheng
IEEE Transactions on Computers 62 (2), 376-389, 2011
482011
Scale mlperf-0.6 models on google tpu-v3 pods
S Kumar, V Bitorff, D Chen, C Chou, B Hechtman, HJ Lee, N Kumar, ...
arXiv preprint arXiv:1909.09756, 2019
352019
Automatic cross-replica sharding of weight update in data-parallel training
Y Xu, HJ Lee, D Chen, H Choi, B Hechtman, S Wang
arXiv preprint arXiv:2004.13336, 2020
262020
Feedback-directed optimizations in gcc with estimated edge profiles from hardware event sampling
V Ramasamy, P Yuan, D Chen, R Hundt
Proceedings of GCC Summit, 87-102, 2008
222008
Overlap communication with dependent computation via decomposition in large deep learning models
S Wang, J Wei, A Sabne, A Davis, B Ilbeyi, B Hechtman, D Chen, ...
Proceedings of the 28th ACM International Conference on Architectural …, 2022
212022
Providing source code level portability between CPU and GPU with MapCG
CT Hong, DH Chen, YB Chen, WG Chen, WM Zheng, HB Lin
Journal of Computer Science and Technology 27 (1), 42-56, 2012
212012
Compile-time feedback-directed optimizations using estimated edge profiles from hardware-event sampling
R Hundt, V Ramasamy, D Chen
US Patent 8,387,026, 2013
202013
Exploring the limits of Concurrency in ML Training on Google TPUs
S Kumar, Y Wang, C Young, J Bradbury, N Kumar, D Chen, A Swing
Proceedings of Machine Learning and Systems 3, 81-92, 2021
162021
The system can't perform the operation now. Try again later.
Articles 1–20