Publications

(2025). GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning. ArXiv Preprint.

PDF Cite

(2025). SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models in LLMs. ACL 2025.

PDF Cite Code Dataset Poster Slides

(2025). Mind the Gap: Static and Interactive Evaluations of Large Audio Models. ACL 2025.

PDF Cite Code Source Document

(2025). AudioJudge: Understanding What Works in Large Audio Model Based Speech Evaluation. ArXiv Preprint.

PDF Cite Code

(2025). EnronQA: Towards Personalized RAG over Private Documents. ArXiv Preprint.

PDF Cite Dataset

(2025). LangProBe: a Language Programs Benchmark. ArXiv Preprint.

PDF Cite

(2024). Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs. EMNLP 2024.

PDF Cite Code Dataset Slides Video DOI

(2024). Distilling an End-to-End Voice Assistant Without Instruction Training Data. ACL 2025.

PDF Cite Code Video Source Document

(2024). Unintended Impacts of LLM Alignment on Global Representation. ACL 2024.

PDF Cite Code Dataset Poster Slides Video DOI

(2023). Towards Massively Multi-domain Multilingual Readability Assessment. EMNLP 2024.

PDF Cite Dataset

(2023). Having Beer after Prayer? Measuring Cultural Bias in Large Language Models. ACL 2024.

PDF Cite Dataset

(2018). Cloud Computed Machine Learning Based Real-Time Litter Detection using Micro-UAV Surveillance. IEEE MIT URTC.

PDF Cite Code Video DOI