On the variance of the adaptive learning rate and beyond L Liu, H Jiang, P He, W Chen, X Liu, J Gao, J Han
arXiv preprint arXiv:1908.03265, 2019
1442 2019 Multi-task deep neural networks for natural language understanding X Liu, P He, W Chen, J Gao
arXiv preprint arXiv:1901.11504, 2019
1045 2019 Deberta: Decoding-enhanced bert with disentangled attention P He, X Liu, J Gao, W Chen
arXiv preprint arXiv:2006.03654, 2020
807 2020 Reasonet: Learning to stop reading in machine comprehension Y Shen, PS Huang, J Gao, W Chen
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge …, 2017
311 2017 Short text conceptualization using a probabilistic knowledgebase Y Song, H Wang, Z Wang, H Li, W Chen
Proceedings of the twenty-second international joint conference on …, 2011
271 2011 Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization H Jiang, P He, W Chen, X Liu, J Gao, T Zhao
arXiv preprint arXiv:1911.03437, 2019
264 2019 Lora: Low-rank adaptation of large language models EJ Hu, Y Shen, P Wallis, Z Allen-Zhu, Y Li, S Wang, L Wang, W Chen
arXiv preprint arXiv:2106.09685, 2021
209 2021 Fusionnet: Fusing via fully-aware attention with application to machine comprehension HY Huang, C Zhu, Y Shen, W Chen
arXiv preprint arXiv:1711.07341, 2017
193 2017 What Makes Good In-Context Examples for GPT- ? J Liu, D Shen, Y Zhang, B Dolan, L Carin, W Chen
arXiv preprint arXiv:2101.06804, 2021
179 2021 Document transformation for multi-label feature selection in text categorization W Chen, J Yan, B Zhang, Z Chen, Q Yang
Seventh IEEE International Conference on Data Mining (ICDM 2007), 451-456, 2007
158 2007 Improving multi-task deep neural networks via knowledge distillation for natural language understanding X Liu, P He, W Chen, J Gao
arXiv preprint arXiv:1904.09482, 2019
138 2019 Understanding the difficulty of training transformers L Liu, X Liu, J Gao, W Chen, J Han
arXiv preprint arXiv:2004.08249, 2020
137 2020 User-click modeling for understanding and predicting search-behavior Y Zhang, W Chen, D Wang, Q Yang
Proceedings of the 17th ACM SIGKDD international conference on Knowledge …, 2011
123 2011 A novel click model and its applications to online advertising ZA Zhu, W Chen, T Minka, C Zhu, Z Chen
Proceedings of the third ACM international conference on Web search and data …, 2010
121 2010 Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing P He, J Gao, W Chen
arXiv preprint arXiv:2111.09543, 2021
118 2021 Adversarial training for large neural language models X Liu, H Cheng, P He, W Chen, Y Wang, H Poon, J Gao
arXiv preprint arXiv:2004.08994, 2020
96 2020 P-packSVM: Parallel primal gradient descent kernel SVM AZ Zeyuan, C Weizhu, W Gang, Z Chenguang, C Zheng
2009 Ninth IEEE International Conference on Data Mining, 677-686, 2009
96 2009 Personalized click model through collaborative filtering S Shen, B Hu, W Chen, Q Yang
Proceedings of the fifth ACM international conference on Web search and data …, 2012
86 2012 Characterizing search intent diversity into click models B Hu, Y Zhang, W Chen, G Wang, Q Yang
Proceedings of the 20th international conference on World wide web, 17-26, 2011
83 2011 Generation-augmented retrieval for open-domain question answering Y Mao, P He, X Liu, Y Shen, J Gao, J Han, W Chen
arXiv preprint arXiv:2009.08553, 2020
81 2020