1

Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs

We present MIPROv2, a language model program optimizer which improves both prompts and fewshot demonstrations for multistage language model programs. Our strategies include (i) program- and data-aware techniques for proposing effective instructions, (ii) a stochastic mini-batch evaluation function for learning a surrogate model of our objective, and (iii) a meta-optimization procedure in which we refine how LMs construct proposals over time. MIPRO outperforms baseline optimizers on five of seven diverse multi-stage LM programs using a best-in-class open-source model (Llama-3-8B), by as high as 13% accuracy.

Krista Opsahl-Ong*, Michael J. Ryan*, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, Omar Khattab

Unintended Impacts of LLM Alignment on Global Representation

We explore how alignment impacts performance along three axes of global representation, English dialects, multilingualism, and opinions from and about countries worldwide. Our results show that current alignment procedures create disparities between English dialects and global opinions. We find alignment improves capabilities in several languages. We conclude by discussing design decisions that led to these unintended impacts and recommendations for more equitable preference tuning.

Michael J. Ryan, William Held, Diyi Yang

Unintended Impacts of LLM Alignment on Global Representation

Revisiting non-English Text Simplification: A Unified Multilingual Benchmark

We release the MultiSim benchmark, a collection of 27 resources in 12 distinct languages containing over 1.7 million complex-simple sentence pairs. This benchmark will encourage research in developing more effective multilingual text simplification models and evaluation metrics. Our experiments using MultiSim with pre-trained multilingual language models reveal exciting performance improvements from multilingual training in non-English settings.

Michael J. Ryan, Tarek Naous, Wei Xu

Revisiting non-English Text Simplification: A Unified Multilingual Benchmark

Towards Massively Multi-domain Multilingual Readability Assessment

We present ReadMe++, a massively multi-domain multilingual dataset for automatic readability assessment. Prior work on readability assessment has been mostly restricted to the English language and one or two text domains. Additionally, the readability levels of sentences used in many previous datasets are assumed on the document-level other than sentence-level, which raises doubt about the quality of previous evaluations. We address those gaps in the literature by providing an annotated dataset of 9,757 sentences in Arabic, English, Hindi, French, and Russian collected from 112 different sources.

Tarek Naous, Michael J. Ryan, Anton Lavrouk, Mohit Chandra, Wei Xu

Cloud Computed Machine Learning Based Real-Time Litter Detection using Micro-UAV Surveillance

Litter can remain undetected and uncollected for extended periods of time, leading to detrimental consequences on the environment. The use of drones to detect this litter marks an important step towards solving this problem. We test five different computer vision algorithms for litter detection using drone surveillance and show a bagging ensemble of these methods to have the highest performance.

Ashley Chung, Sean Kim, Ethan Kwok, Michael J. Ryan, Erika Tan, Ryan Gamadia

Cloud Computed Machine Learning Based Real-Time Litter Detection using Micro-UAV Surveillance