Updates
| Oct 2025: | Presenting my research at the MIT EECS Rising Star Workshop! | 
| Oct 2025: | Invited keynote at the CEGIS workshop at ICCV '25! (slides) | 
| Oct 2025: | Invited talk as a Rising Star in AI at the AI for Science Symposium at University of Michigan! (slides) | 
| June 2025: | We are one of the five teams that won the Imminent Translated Grant for our Human-AI Image Localization Platform (platform and pre-print coming soon)! | 
| June 2025: | Invited Keynote at the Demographic Diversity in CV workshop at CVPR '25, alongside a fantastic set of speakers (slides, video)! | 
| Mar 2025: | Interning at Google DeepMind this summer with Lun Wang! | 
| Mar 2025: | Invited talk at UT Austin NLL Reading Group! | 
| Jan 2025: | Our paper on designing automatic evaluation metrics for image transcreation is accepted at NAACL 2025 (code, tweet)! | 
| Jan 2025: | Pangea is accepted at ICLR 2025! | 
| Dec 2024: | We won best paper runner-up for our work on building an image-editing platform for localization at IEEE Big Data 2024! Pre-print out soon :) | 
| Nov 2024: | We won best paper for our work on image transcreation at EMNLP 2024! Very honored and humbled :') | 
| Oct 2024: | We recently released Pangea-7B, an open-sourced multi-(lingual, modal, cultural) model! | 
| Sept 2024: | Our paper on image transcreation accepted at EMNLP (Main) '24 (tweet thread)! | 
| Sept 2024: | Invited talk on image transcreation at Pinterest! | 
| June 2024: | Invited keynote at the AmericasNLP workshop at NAACL '24! (slides, recording) | 
| May 2024: | Mentor for OpenNLP Labs where we are building technology to support cultural translation of stories for EduLang! | 
| Apr 2024: | Grateful to be supported by the Waibel Presidential Fellowship for 2024-25! | 
| Mar 2024: | Gave a talk on image transcreation at University of Edinburgh! | 
| Mar 2024: | Our work on data-efficient multilingual learning to appear in NAACL 2024! Reach out if you'd like to catch up in Mexico City :) | 
| Jan 2024: | Gave a talk on image transcreation (preprint, slides) at Google Research, Microsoft Research, IISc and Microsoft IDC! | 
| Dec 2023: | Gave a lecture at CMU 11737 (Multilingual NLP) on Image-Text Modeling for Multilingual NLP! (link to slides, tweet thread) | 
| Aug 2023: | Organized (and won a best paper at!) the Student Research Symposium at CMU LTI | 
| May 2023: | Our work on multi-cultural figurative language to appear in ACL 2023 Findings! We will be presenting at the LAW workshop :) | 
| May 2023: | I will be attending EACL 2023 in-person to present our work at SIGTYP and C3NLP! | 
| Jan 2023: | Honored to receive the best paper award for FLEURS at SLT 2022! | 
| Aug 2022: | I started my PhD at CMU, LTI! | 
| Aug 2022: | I presented our research and its application to Google Assistant at the Decode with Google 2022 event! Thanks to my amazing team at Google Research India for the opportunity! | 
| Oct 2021: | I'll be attending ALPS 2022! Feel free to get in touch if you'll be attending the same. | 
| Aug 2021: | Conducting a hands-on TensorFlow Tutorial session at the 5th CVIT IIIT Summer School! | 
| Aug 2021: | Hosting the NLP networking session at IKDD 2021 where Dr. Monojit Choudhury is our guest speaker! | 
| May 2021: | Our work on merging multiple pre-trained LMs to appear in ACL 2021 Findings. | 
| Mar 2021: | Technical write-up on MuRIL is now available on arxiv. | 
| Mar 2021: | The pre-trained MuRIL model (with MLM) is now available on HuggingFace. | 
| Nov 2020: | Open-sourced a multilingual model for Indian languages named MuRIL on TFHub! | 
| Sep 2020: | Hosted a Fireside Chat with Jeff Dean on his virtual Google India visit! | 
| Aug 2020: | I am joining the Google Research India lab as a Pre-Doctoral Researcher where I am working with Dr. Partha Talukdar! | 
| Aug 2020: | Graduated from BITS Pilani Goa with a dual degree in Computer Science and Economics. | 
| July 2020: | The GLUECoS code and leaderboard website are now open-sourced! | 
| Apr 2020: | Paper on building a benchmark for code-switched language processing to appear at ACL 2020! (Talk) | 
| Mar 2020: | We created a new dataset for code-mixed conversational NLI! Paper to appear in CALCS, LREC 2020. | 
| Jul 2019: | I am doing my bachelor thesis at the Microsoft Research India lab, where I am working with Dr. Sunayana Sitaram! | 
| Jun 2019: | Work done on generating code-mixed text in summer 2018 to appear at TLT SyntaxFest 2019 | 
| Apr 2018: | Summer internship at the MT-NLP lab, IIIT Hyderabad where I will be working with Dr. Dipti Misra Sharma | 
Publications
                        Steering LLMs for Culturally Localized Generation
                        , Hongbin Liu, Shujian Zhang, John Lambert, Mingqing Chen, Rajiv Mathews, Lun Wang
                        Preprint (coming soon) | Under Conference Submission
                    
                        HILITe: Human-AI Collaborative Framework for Image Transcreation
                        , Yutong Zhang, Aayush Bheemaiah, Jainish Patel, Arya Pasumarthi, Armaan Sharma, Sophia Li, Yueqi Song, Michael Saxon, Diyi Yang, Graham Neubig
                         HCI+NLP@EMNLP '25 | Under Conference Submission
                    
                        CAIRE: Cultural Attribution of Images by Retrieval-Augmented Evaluation
                        Arnav Yayavaram*, Siddharth Yayavaram*, *, Michael Saxon, Graham Neubig
                         CEGIS@ICCV '25 | Under Conference Submission
                        pdf
                        code
                    
                        Towards Automatic Evaluation for Image Transcreation
                        
                        *, Vivek Iyer*, Claire He, Graham Neubig
                         NAACL 2025 | 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics  
                        pdf
                        code
                        cite
                    
                        Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
                        
                        Xiang Yue*, Yueqi Song*, Akari Asai, Seungone Kim, Jean de Dieu Nyandwi, , 
                        Anjali Kantharuban, Lintang Sutawika, Sathyanarayanan Ramamoorthy, Graham Neubig
                         ICLR 2025 | International Conference on Learning Representations  
                        web
                        pdf
                        cite
                    
                         🏆 Best Paper Runner-Up 
HILITE: Human-in-the-loop Interactive Tool for Image Editing
                        Arya Pasumarthi, Armaan Sharma, Jainish H. Patel, ..., Diyi Yang, Graham Neubig, 
                        IEEE BigData 2024 |
                            2024 IEEE International Conference on Big Data (Undergraduate Symposium)
                    
                         🏆 Best Paper 
An image speaks a thousand words, but can everyone listen?
 On translating images for cultural relevance
                        
                       , Sathyanarayanan Ramamoorthy, Yueqi Song, Graham Neubig
                       EMNLP '24 |
                        Conference on Empirical Methods in Natural Language Processing
                        web
                        pdf
                        code
                        cite
                        slides
                        video
                        NAACL Talk
                    
                        DeMuX: Data-efficient Multilingual Learning
                        
                       , Srinivas Gowriraj, Lucio Dery, Graham Neubig
                        NAACL '24 |
                            Conference of the North American Chapter of the Association for Computational Linguistics
                        pdf
                        code
                        slides
                        poster
                        video
                        cite
                    
                        GlobalBench: A Benchmark for Global Progress in Natural Language Processing
                        
                        Yueqi Song, Catherine Cui, , Pengfei Liu, ..., Graham Neubig
                        EMNLP '23 |
                            Conference on Empirical Methods in Natural Language Processing
                        pdf
                        cite
                    
                        Multi-lingual and Multi-cultural Figurative Language Understanding
                        
                        Anubha Kabra*, Emmy Liu*, , Alham Fikri Aji, Genta Indra Winata, Samuel Cahyawijaya, Anuoluwapo Aremu, Perez Ogayo, Graham Neubig
                        ACL '23 Findings |
                            Annual Meeting of the Association for Computational Linguistics
                        pdf
                        code
                        cite
                    
                         🏆 Best Paper 
FLEURS: Few-Shot Learning Evaluation of
                            Universal Representations of Speech
                        
                        Alexis Conneau*, Min Ma*, , Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna 
                        SLT '22 |
                            IEEE Spoken Language Technology Workshop
                        pdf
                        cite
                    
                        MergeDistill : Merging Pre-trained Language Models using Distillation
                        
                        , Melvin Johnson, Partha Talukdar
                        Findings of ACL'21 |
                            Annual Conference of the Association for Computational Linguistics
                        pdf
                        abstract
                        slides
                        cite
                    
                         🗞️ Media Coverage
MuRIL : Multilingual Representations for Indian Languages
                        , Diksha Bansal, Sarvesh Mehtani, Savya Khosla,
                        Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi
                        Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, Partha Talukdar
                        tfhub
                        huggingface
                        pdf
                        abstract
                        cite
                        Coverage: Economic Times
                    Indian Express
                    Google AI Blog 
                    
                        GLUECoS: An Evaluation Benchmark for Code-Switched NLP
                        , Sandipan Dandapat, Anirudh Srinivasan, Sunayana
                        Sitaram, Monojit Choudhury 
                        ACL'20 | Annual
                            Conference of the Association for Computational Linguistics
                        pdf
                        abstract
                        code
                        website
                        slides
                        video
                        cite
                    
                        A New Dataset for Natural Language Inference from Code-mixed
                            Conversations
                        , Sandipan Dandapat, Sunayana Sitaram, Monojit
                        Choudhury 
                        CALCS, LREC'20 | International
                            Conference on Language Resources and Evaluation
                        pdf
                        abstract
                        data
                        cite
                    
                        Unsung Challenges of Building and Deploying Language Technologies for
                            Low Resource Language Communities
                        Pratik Joshi, Christain Barnes, Sebastin Santy, ,
                        Sanket Shah, Anirudh Srinivasan, Satwik Bhattamishra, Sunayana Sitaram, Monojit Choudhury,
                        Kalika Bali 
                        ICON'19
                            | International Conference on Natural Language Processing 
                        pdf
                        abstract
                        cite
                    
                        Dependency Parser for Bengali-English Code-Mixed Data enhanced with a
                            Synthetic Treebank
                        Urmi Ghosh, , Dipti Misra Sharma 
                        TLT,
                                SyntaxFest 2019
                        pdf
                        abstract
                        code
                        cite
                    
Talks and Interviews
                        Decode With Google 2022
                        Speaker List
                        Talk (2:28:00 onwards), (registration required)
                    
                        An Introduction to (Modern) TensorFlow
                        , Ameya Daigavane 
                        CVIT Summer School, IIIT
                            Hyderabad
                        slides
                    
                        Journey into Research
                        Rotaract
                                Club, BITS Hyderabad
                        interview
                    
                         🏆 National Rank 1 
ICSE National Topper
                        St. Mary's School, Pune
                        Media Coverage: India Times
                        Times of India
                        Indian Express