Updates
Nov 2024: | We won best paper for our work on image transcreation at EMNLP 2024! Very honored and humbled :') |
Oct 2024: | We recently released Pangea-7B, an open-sourced multi-(lingual, modal, cultural) model! |
Sept 2024: | Our paper on image transcreation accepted at EMNLP (Main) '24 (tweet thread)! |
Sept 2024: | Invited talk on image transcreation at Pinterest! |
June 2024: | Invited keynote at the AmericasNLP workshop at NAACL '24! (slides, recording) |
May 2024: | Mentor for OpenNLP Labs where we are building technology to support cultural translation of stories for EduLang! |
Apr 2024: | Grateful to be supported by the Waibel Presidential Fellowship for 2024-25! |
Mar 2024: | Gave a talk on image transcreation at University of Edinburgh! |
Mar 2024: | Our work on data-efficient multilingual learning to appear in NAACL 2024! Reach out if you'd like to catch up in Mexico City :) |
Jan 2024: | Gave a talk on image transcreation (preprint, slides) at Google Research, Microsoft Research, IISc and Microsoft IDC! |
Dec 2023: | Gave a lecture at CMU 11737 (Multilingual NLP) on Image-Text Modeling for Multilingual NLP! (link to slides, tweet thread) |
Aug 2023: | Organized (and won a best paper at!) the Student Research Symposium at CMU LTI |
May 2023: | Our work on multi-cultural figurative language to appear in ACL 2023 Findings! We will be presenting at the LAW workshop :) |
May 2023: | I will be attending EACL 2023 in-person to present our work at SIGTYP and C3NLP! |
Jan 2023: | Honored to receive the best paper award for FLEURS at SLT 2022! |
Aug 2022: | I started my PhD at CMU, LTI! |
Aug 2022: | I presented our research and its application to Google Assistant at the Decode with Google 2022 event! Thanks to my amazing team at Google Research India for the opportunity! |
Oct 2021: | I'll be attending ALPS 2022! Feel free to get in touch if you'll be attending the same. |
Aug 2021: | Conducting a hands-on TensorFlow Tutorial session at the 5th CVIT IIIT Summer School! |
Aug 2021: | Hosting the NLP networking session at IKDD 2021 where Dr. Monojit Choudhury is our guest speaker! |
May 2021: | Our work on merging multiple pre-trained LMs to appear in ACL 2021 Findings. |
Mar 2021: | Technical write-up on MuRIL is now available on arxiv. |
Mar 2021: | The pre-trained MuRIL model (with MLM) is now available on HuggingFace. |
Nov 2020: | Open-sourced a multilingual model for Indian languages named MuRIL on TFHub! |
Sep 2020: | Hosted a Fireside Chat with Jeff Dean on his virtual Google India visit! |
Aug 2020: | I am joining the Google Research India lab as a Pre-Doctoral Researcher where I am working with Dr. Partha Talukdar! |
Aug 2020: | Graduated from BITS Pilani Goa with a dual degree in Computer Science and Economics. |
July 2020: | The GLUECoS code and leaderboard website are now open-sourced! |
Apr 2020: | Paper on building a benchmark for code-switched language processing to appear at ACL 2020! (Talk) |
Mar 2020: | We created a new dataset for code-mixed conversational NLI! Paper to appear in CALCS, LREC 2020. |
Jul 2019: | I am doing my bachelor thesis at the Microsoft Research India lab, where I am working with Dr. Sunayana Sitaram! |
Jun 2019: | Work done on generating code-mixed text in summer 2018 to appear at TLT SyntaxFest 2019 |
Apr 2018: | Summer internship at the MT-NLP lab, IIIT Hyderabad where I will be working with Dr. Dipti Misra Sharma |
Publications
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
Xiang Yue*, Yueqi Song*, Akari Asai, Seungone Kim, Jean de Dieu Nyandwi, ,
Anjali Kantharuban, Lintang Sutawika, Sathyanarayanan Ramamoorthy, Graham Neubig
Under Review
web
pdf
cite
🏆 Best Paper
An image speaks a thousand words, but can everyone listen?
On translating images for cultural relevance
, Sathyanarayanan Ramamoorthy, Yueqi Song, Graham Neubig
EMNLP '24 |
Conference on Empirical Methods in Natural Language Processing
web
pdf
code
cite
slides
video
DeMuX: Data-efficient Multilingual Learning
, Srinivas Gowriraj, Lucio Dery, Graham Neubig
NAACL '24 |
Conference of the North American Chapter of the Association for Computational Linguistics
pdf
code
slides
poster
video
cite
GlobalBench: A Benchmark for Global Progress in Natural Language Processing
Yueqi Song, Catherine Cui, , Pengfei Liu, ..., Graham Neubig
EMNLP '23 |
Conference on Empirical Methods in Natural Language Processing
pdf
cite
Multi-lingual and Multi-cultural Figurative Language Understanding
Anubha Kabra*, Emmy Liu*, , Alham Fikri Aji, Genta Indra Winata, Samuel Cahyawijaya, Anuoluwapo Aremu, Perez Ogayo, Graham Neubig
ACL '23 Findings |
Annual Meeting of the Association for Computational Linguistics
pdf
code
cite
🏆 Best Paper
FLEURS: Few-Shot Learning Evaluation of
Universal Representations of Speech
Alexis Conneau*, Min Ma*, , Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna
SLT '22 |
IEEE Spoken Language Technology Workshop
pdf
cite
MergeDistill : Merging Pre-trained Language Models using Distillation
, Melvin Johnson, Partha Talukdar
Findings of ACL'21 |
Annual Conference of the Association for Computational Linguistics
pdf
abstract
slides
cite
🗞️ Media Coverage
MuRIL : Multilingual Representations for Indian Languages
, Diksha Bansal, Sarvesh Mehtani, Savya Khosla,
Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi
Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, Partha Talukdar
tfhub
huggingface
pdf
abstract
cite
Coverage: Economic Times
Indian Express
Google AI Blog
GLUECoS: An Evaluation Benchmark for Code-Switched NLP
, Sandipan Dandapat, Anirudh Srinivasan, Sunayana
Sitaram, Monojit Choudhury
ACL'20 | Annual
Conference of the Association for Computational Linguistics
pdf
abstract
code
website
slides
video
cite
A New Dataset for Natural Language Inference from Code-mixed
Conversations
, Sandipan Dandapat, Sunayana Sitaram, Monojit
Choudhury
CALCS, LREC'20 | International
Conference on Language Resources and Evaluation
pdf
abstract
data
cite
Unsung Challenges of Building and Deploying Language Technologies for
Low Resource Language Communities
Pratik Joshi, Christain Barnes, Sebastin Santy, ,
Sanket Shah, Anirudh Srinivasan, Satwik Bhattamishra, Sunayana Sitaram, Monojit Choudhury,
Kalika Bali
ICON'19
| International Conference on Natural Language Processing
pdf
abstract
cite
Dependency Parser for Bengali-English Code-Mixed Data enhanced with a
Synthetic Treebank
Urmi Ghosh, , Dipti Misra Sharma
TLT,
SyntaxFest 2019
pdf
abstract
code
cite
Talks and Interviews
Decode With Google 2022
Speaker List
Talk (2:28:00 onwards), (registration required)
An Introduction to (Modern) TensorFlow
, Ameya Daigavane
CVIT Summer School, IIIT
Hyderabad
slides
Journey into Research
Rotaract
Club, BITS Hyderabad
interview
🏆 National Rank 1
ICSE National Topper
St. Mary's School, Pune
Media Coverage: India Times
Times of India
Indian Express