Follow
Fedor Moiseev
Fedor Moiseev
Verified email at google.com
Title
Cited by
Cited by
Year
Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned
E Voita, D Talbot, F Moiseev, R Sennrich, I Titov
arXiv preprint arXiv:1905.09418, 2019
10252019
Gemini: a family of highly capable multimodal models
G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ...
arXiv preprint arXiv:2312.11805, 2023
4192023
SKILL: Structured knowledge infusion for large language models
F Moiseev, Z Dong, E Alfonseca, M Jaggi
arXiv preprint arXiv:2205.08184, 2022
572022
Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. arXiv 2019
E Voita, D Talbot, F Moiseev, R Sennrich, I Titov
arXiv preprint arXiv:1905.09418, 0
21
Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. arXiv
E Voita, D Talbot, F Moiseev, R Sennrich, I Titov
arXiv preprint arXiv:1905.09418, 2019
132019
SamToNe: Improving Contrastive Loss for Dual Encoder Retrieval Models with Same Tower Negatives
F Moiseev, GH Abrego, P Dornbach, I Zitouni, E Alfonseca, Z Dong
arXiv preprint arXiv:2306.02516, 2023
12023
The system can't perform the operation now. Try again later.
Articles 1–6