Follow
Kaiyue Wen
Kaiyue Wen
Phd Student, Stanford University
Verified email at stanford.edu - Homepage
Title
Cited by
Cited by
Year
On transferability of prompt tuning for natural language processing
Y Su, X Wang, Y Qin, CM Chan, Y Lin, H Wang, K Wen, Z Liu, P Li, J Li, ...
arXiv preprint arXiv:2111.06719, 2021
1512021
How Sharpness-Aware Minimization Minimizes Sharpness?
K Wen, T Ma, Z Li
International Conference on Learning Representations, 0
80*
Finding Skill Neurons in Pre-trained Transformer-based Language Models
X Wang, K Wen, Z Zhang, L Hou, Z Liu, J Li
arXiv preprint arXiv:2211.07349, 2022
742022
Sharpness minimization algorithms do not only minimize sharpness to achieve better generalization
K Wen, Z Li, T Ma
Advances in Neural Information Processing Systems 36, 2024
252024
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars
K Wen, Y Li, B Liu, A Risteski
Advances in Neural Information Processing Systems 36, 2024
20*2024
Rnns are not transformers (yet): The key bottleneck on in-context retrieval
K Wen, X Dang, K Lyu
arXiv preprint arXiv:2402.18510, 2024
172024
Benign overfitting in classification: Provably counter label noise with larger models
K Wen, J Teng, J Zhang
arXiv preprint arXiv:2206.00501, 2022
7*2022
Residual permutation test for high-dimensional regression coefficient testing
K Wen, T Wang, Y Wang
arXiv preprint arXiv:2211.16182, 2022
62022
Understanding warmup-stable-decay learning rates: A river valley loss landscape perspective
K Wen, Z Li, J Wang, D Hall, P Liang, T Ma
arXiv preprint arXiv:2410.05192, 2024
32024
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
K Wen, H Zhang, H Lin, J Zhang
arXiv preprint arXiv:2410.05459, 2024
12024
Practically Solving LPN in High Noise Regimes Faster Using Neural Networks
H Jiang, K Wen, Y Chen
arXiv preprint arXiv:2303.07987, 2023
2023
The system can't perform the operation now. Try again later.
Articles 1–11