End-to-end audio visual scene-aware dialog using multimodal attention-based video features
C Hori, H Alamri, J Wang, G Wichern, T Hori, A Cherian, TK Marks, ...
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019
Video Representation Learning Using Discriminative Pooling
J Wang, A Cherian, F Porikli, S Gould
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, 2018
Ordered Pooling of Optical Flow Sequences for Action Recognition
J Wang, A Cherian, F Porikli
Winter Conference on Applications of Computer Vision (WACV), 2017, 2017
Audio visual scene-aware dialog
H Alamri, V Cartillier, A Das, J Wang, A Cherian, I Essa, D Batra, TK Marks, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019
Audio visual scene-aware dialog (avsd) challenge at dstc7
H Alamri, V Cartillier, RG Lopes, A Das, J Wang, I Essa, D Batra, D Parikh, ...
arXiv preprint arXiv:1806.00525, 2018
GODS: Generalized One-class Discriminative Subspaces for Anomaly Detection
J Wang, A Cherian
Proceedings of the International Conference on Computer Vision (ICCV), 2019
Learning discriminative video representations using adversarial perturbations
J Wang, A Cherian
Proceedings of the European Conference on Computer Vision (ECCV), 685-701, 2018
Multimodal attention for fusion of audio and spatiotemporal features for video description
C Hori, T Hori, G Wichern, J Wang, TY Lee, A Cherian, TK Marks
Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2018
Human action forecasting by learning task grammars
T Han, J Wang, A Cherian, S Gould
arXiv preprint arXiv:1709.06391, 2017
Discriminative Video Representation Learning Using Support Vector Classifiers
J Wang, A Cherian
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
Spatio-Temporal Ranked-Attention Networks for Video Captioning
A Cherian, J Wang, C Hori, T Marks
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2020
Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
X Lin, G Bertasius, J Wang, SF Chang, D Parikh, L Torresani
arXiv preprint arXiv:2101.12059, 2021
