Seeclick: Harnessing gui grounding for advanced visual gui agents K Cheng, Q Sun, Y Chu, F Xu, Y Li, J Zhang, Z Wu arXiv preprint arXiv:2401.10935, 2024 | 9 | 2024 |
Beyond generic: Enhancing image captioning with real-world knowledge using vision-language pre-training model K Cheng, W Song, Z Ma, W Zhu, Z Zhu, J Zhang Proceedings of the 31st ACM International Conference on Multimedia, 5038-5047, 2023 | 2 | 2023 |
ADS-Cap: A Framework for Accurate and Diverse Stylized Captioning with Unpaired Stylistic Corpora K Cheng, Z Ma, S Zong, J Zhang, X Dai, J Chen CCF International Conference on Natural Language Processing and Chinese …, 2022 | 1 | 2022 |
A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond Q Sun, Z Chen, F Xu, K Cheng, C Ma, Z Yin, J Wang, C Han, R Zhu, ... arXiv preprint arXiv:2403.14734, 2024 | | 2024 |
Probing Commonsense Reasoning Capability of Text-to-Image Generative Models via Non-visual Description M Pan, J Li, M Yu, Z Ma, K Cheng, J Zhang, J Chen arXiv preprint arXiv:2312.07294, 2023 | | 2023 |
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models Z Ma, M Pan, W Wu, K Cheng, J Zhang, S Huang, J Chen Proceedings of the 31st ACM International Conference on Multimedia, 5674-5685, 2023 | | 2023 |