Follow
Luowei Zhou
Luowei Zhou
Research Scientist, Google Deepmind
Verified email at google.com - Homepage
Title
Cited by
Cited by
Year
Gemini: a family of highly capable multimodal models
G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ...
arXiv preprint arXiv:2312.11805, 2023
19072023
Unified vision-language pre-training for image captioning and vqa
L Zhou, H Palangi, L Zhang, H Hu, J Corso, J Gao
Proceedings of the AAAI conference on artificial intelligence 34 (07), 13041 …, 2020
9772020
Florence: A new foundation model for computer vision
L Yuan, D Chen, YL Chen, N Codella, X Dai, J Gao, H Hu, X Huang, B Li, ...
arXiv preprint arXiv:2111.11432, 2021
8662021
Towards automatic learning of procedures from web instructional videos
L Zhou, C Xu, J Corso
Proceedings of the AAAI Conference on Artificial Intelligence 32 (1), 2018
8142018
Less is more: Clipbert for video-and-language learning via sparse sampling
J Lei, L Li, L Zhou, Z Gan, TL Berg, M Bansal, J Liu
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021
6982021
End-to-end dense video captioning with masked transformer
L Zhou, Y Zhou, JJ Corso, R Socher, C Xiong
Proceedings of the IEEE conference on computer vision and pattern …, 2018
6622018
Regionclip: Region-based language-image pretraining
Y Zhong, J Yang, P Zhang, C Li, N Codella, LH Li, L Zhou, X Dai, L Yuan, ...
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022
5162022
Bevt: Bert pretraining of video transformers
R Wang, D Chen, Z Wu, Y Chen, X Dai, M Liu, YG Jiang, L Zhou, L Yuan
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022
2352022
Grounded video description
L Zhou, Y Kalantidis, X Chen, JJ Corso, M Rohrbach
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2019
2222019
Omnivl: One foundation model for image-language and video-language tasks
J Wang, D Chen, Z Wu, C Luo, L Zhou, Y Zhao, Y Xie, C Liu, YG Jiang, ...
Advances in neural information processing systems 35, 5696-5710, 2022
1362022
Clip-event: Connecting text and images with event structures
M Li, R Xu, S Wang, L Zhou, X Lin, C Zhu, M Zeng, H Ji, SF Chang
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022
1292022
Language models with image descriptors are strong few-shot video-language learners
Z Wang, M Li, R Xu, L Zhou, J Lei, X Lin, S Wang, Z Yang, C Zhu, ...
Advances in Neural Information Processing Systems 35, 8483-8497, 2022
1112022
Value: A multi-task benchmark for video-and-language understanding evaluation
L Li, J Lei, Z Gan, L Yu, YC Chen, R Pillai, Y Cheng, L Zhou, XE Wang, ...
arXiv preprint arXiv:2106.04632, 2021
1102021
Dense video captioning
Y Zhou, L Zhou, C Xiong, R Socher
US Patent 10,542,270, 2020
1012020
Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction
L Zhou, N Louis, JJ Corso
British Machine Vision Conference, 2018
962018
Watch what you just said: Image captioning with text-conditional attention
L Zhou, C Xu, P Koch, JJ Corso
Proceedings of the on Thematic Workshops of ACM Multimedia 2017, 305-313, 2017
962017
Uc2: Universal cross-lingual cross-modal vision-and-language pre-training
M Zhou, L Zhou, S Wang, Y Cheng, L Li, Z Yu, J Liu
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021
852021
Mist: Multi-modal iterative spatial-temporal transformer for long-form video question answering
D Gao, L Zhou, L Ji, L Zhu, Y Yang, MZ Shou
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023
792023
Multiagent reinforcement learning with sparse interactions by negotiation and knowledge transfer
L Zhou, P Yang, C Chen, Y Gao
IEEE transactions on cybernetics 47 (5), 1238-1250, 2016
652016
Assistgpt: A general multi-modal assistant that can plan, execute, inspect, and learn
D Gao, L Ji, L Zhou, KQ Lin, J Chen, Z Fan, MZ Shou
arXiv preprint arXiv:2306.08640, 2023
592023
The system can't perform the operation now. Try again later.
Articles 1–20