new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Oct 31

Mind the Gap: A Practical Attack on GGUF Quantization

With the increasing size of frontier LLMs, post-training quantization has become the standard for memory-efficient deployment. Recent work has shown that basic rounding-based quantization schemes pose security risks, as they can be exploited to inject malicious behaviors into quantized models that remain hidden in full precision. However, existing attacks cannot be applied to more complex quantization methods, such as the GGUF family used in the popular ollama and llama.cpp frameworks. In this work, we address this gap by introducing the first attack on GGUF. Our key insight is that the quantization error -- the difference between the full-precision weights and their (de-)quantized version -- provides sufficient flexibility to construct malicious quantized models that appear benign in full precision. Leveraging this, we develop an attack that trains the target malicious LLM while constraining its weights based on quantization errors. We demonstrate the effectiveness of our attack on three popular LLMs across nine GGUF quantization data types on three diverse attack scenarios: insecure code generation (Delta=88.7%), targeted content injection (Delta=85.0%), and benign instruction refusal (Delta=30.1%). Our attack highlights that (1) the most widely used post-training quantization method is susceptible to adversarial interferences, and (2) the complexity of quantization schemes alone is insufficient as a defense.

  • 5 authors
·
May 24

Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models

In computational social science (CSS), researchers analyze documents to explain social and political phenomena. In most scenarios, CSS researchers first obtain labels for documents and then explain labels using interpretable regression analyses in the second step. One increasingly common way to annotate documents cheaply at scale is through large language models (LLMs). However, like other scalable ways of producing annotations, such surrogate labels are often imperfect and biased. We present a new algorithm for using imperfect annotation surrogates for downstream statistical analyses while guaranteeing statistical properties -- like asymptotic unbiasedness and proper uncertainty quantification -- which are fundamental to CSS research. We show that direct use of surrogate labels in downstream statistical analyses leads to substantial bias and invalid confidence intervals, even with high surrogate accuracy of 80-90%. To address this, we build on debiased machine learning to propose the design-based supervised learning (DSL) estimator. DSL employs a doubly-robust procedure to combine surrogate labels with a smaller number of high-quality, gold-standard labels. Our approach guarantees valid inference for downstream statistical analyses, even when surrogates are arbitrarily biased and without requiring stringent assumptions, by controlling the probability of sampling documents for gold-standard labeling. Both our theoretical analysis and experimental results show that DSL provides valid statistical inference while achieving root mean squared errors comparable to existing alternatives that focus only on prediction without inferential guarantees.

  • 4 authors
·
Jun 7, 2023

Exploiting LLM Quantization

Quantization leverages lower-precision weights to reduce the memory usage of large language models (LLMs) and is a key technique for enabling their deployment on commodity hardware. While LLM quantization's impact on utility has been extensively explored, this work for the first time studies its adverse effects from a security perspective. We reveal that widely used quantization methods can be exploited to produce a harmful quantized LLM, even though the full-precision counterpart appears benign, potentially tricking users into deploying the malicious quantized model. We demonstrate this threat using a three-staged attack framework: (i) first, we obtain a malicious LLM through fine-tuning on an adversarial task; (ii) next, we quantize the malicious model and calculate constraints that characterize all full-precision models that map to the same quantized model; (iii) finally, using projected gradient descent, we tune out the poisoned behavior from the full-precision model while ensuring that its weights satisfy the constraints computed in step (ii). This procedure results in an LLM that exhibits benign behavior in full precision but when quantized, it follows the adversarial behavior injected in step (i). We experimentally demonstrate the feasibility and severity of such an attack across three diverse scenarios: vulnerable code generation, content injection, and over-refusal attack. In practice, the adversary could host the resulting full-precision model on an LLM community hub such as Hugging Face, exposing millions of users to the threat of deploying its malicious quantized version on their devices.

  • 5 authors
·
May 28, 2024

WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks

Powered by a large language model (LLM), a web browsing agent operates web browsers in a human-like manner and offers a highly transparent path toward automating a wide range of everyday tasks. As web agents become increasingly capable and demonstrate proficiency in general browsing tasks, a critical question emerges: Can they go beyond general browsing to robustly handle tasks that are tedious and complex, or chores that humans often avoid doing themselves? In this paper, we introduce WebChoreArena, a new fully reproducible benchmark comprising 532 carefully curated tasks designed to extend the scope of WebArena beyond general browsing to more labor-intensive and tedious tasks. WebChoreArena systematically integrates three key challenges: (i) Massive Memory tasks requiring accurate retrieval of large amounts of information in the observations, (ii) Calculation tasks demanding precise mathematical reasoning, and (iii) Long-Term Memory tasks necessitating long-term memory across multiple webpages. Built on top of the fully reproducible and widely adopted four WebArena simulation environments, WebChoreArena ensures strict reproducibility and enables fair, direct comparisons with the established WebArena benchmark, offering key insights into agent progress. Our experimental results demonstrate that as LLMs evolve, represented by GPT-4o, Claude 3.7 Sonnet, and Gemini 2.5 Pro, significant improvements in performance are observed on WebChoreArena. These findings suggest that WebChoreArena is well-suited to measure the advancement of state-of-the-art LLMs with greater clarity. Nevertheless, the results also indicate that even with Gemini 2.5 Pro, there remains substantial room for improvement compared to WebArena, highlighting the increased challenges posed by WebChoreArena.

JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

Accelerating research on Large Multimodal Models (LMMs) in non-English languages is crucial for enhancing user experiences across broader populations. In this paper, we introduce JMMMU (Japanese MMMU), the first large-scale Japanese benchmark designed to evaluate LMMs on expert-level tasks based on the Japanese cultural context. To facilitate comprehensive culture-aware evaluation, JMMMU features two complementary subsets: (i) culture-agnostic (CA) subset, where the culture-independent subjects (e.g., Math) are selected and translated into Japanese, enabling one-to-one comparison with its English counterpart MMMU; and (ii) culture-specific (CS) subset, comprising newly crafted subjects that reflect Japanese cultural context. Using the CA subset, we observe performance drop in many LMMs when evaluated in Japanese, which is purely attributable to language variation. Using the CS subset, we reveal their inadequate Japanese cultural understanding. Further, by combining both subsets, we identify that some LMMs perform well on the CA subset but not on the CS subset, exposing a shallow understanding of the Japanese language that lacks depth in cultural understanding. We hope this work will not only help advance LMM performance in Japanese but also serve as a guideline to create high-standard, culturally diverse benchmarks for multilingual LMM development. The project page is https://mmmu-japanese-benchmark.github.io/JMMMU/.

  • 8 authors
·
Oct 22, 2024 2

Computing Power and the Governance of Artificial Intelligence

Computing power, or "compute," is crucial for the development and deployment of artificial intelligence (AI) capabilities. As a result, governments and companies have started to leverage compute as a means to govern AI. For example, governments are investing in domestic compute capacity, controlling the flow of compute to competing countries, and subsidizing compute access to certain sectors. However, these efforts only scratch the surface of how compute can be used to govern AI development and deployment. Relative to other key inputs to AI (data and algorithms), AI-relevant compute is a particularly effective point of intervention: it is detectable, excludable, and quantifiable, and is produced via an extremely concentrated supply chain. These characteristics, alongside the singular importance of compute for cutting-edge AI models, suggest that governing compute can contribute to achieving common policy objectives, such as ensuring the safety and beneficial use of AI. More precisely, policymakers could use compute to facilitate regulatory visibility of AI, allocate resources to promote beneficial outcomes, and enforce restrictions against irresponsible or malicious AI development and usage. However, while compute-based policies and technologies have the potential to assist in these areas, there is significant variation in their readiness for implementation. Some ideas are currently being piloted, while others are hindered by the need for fundamental research. Furthermore, naive or poorly scoped approaches to compute governance carry significant risks in areas like privacy, economic impacts, and centralization of power. We end by suggesting guardrails to minimize these risks from compute governance.

  • 19 authors
·
Feb 13, 2024 2

The dark side of early galaxies: $\texttt{geko}$ uncovers dark-matter fractions at $z\sim4-6$

JWST/NIRCam slitless spectroscopy enables dynamical mass measurements for typical star-forming galaxies only a billion years after the Big Bang. We model the Halpha morpho-kinematics of 163 galaxies at redshift zapprox4-6 from FRESCO and CONGRESS (with JADES imaging), using the geko code, and infer rotational velocities and dispersions within r_{rm e}. Our sample spans log M_{star}approx7-10 and log M_{rm dyn}approx9-11. Gas masses are estimated via scaling relations, yielding baryonic masses and dark-matter (DM) fractions f_{rm DM}(r<r_{rm e}) within the Halpha half-light radius. We find high median fractions of langle f_{rm gas}rangle=0.77 and langle f_{rm DM}rangle=0.73, where f_{rm gas} is measured with respect to the baryonic mass and f_{rm DM} with respect to the DM+baryonic mass. About two-thirds of systems are DM-dominated within r_{rm e}sim0.5-1 kpc. Both f_{rm gas} and f_{rm DM} decrease with stellar mass, consistent with simulations. The stellar Tully-Fisher relation shows a tentative offset to higher v_{rm circ} at fixed M_{star} and substantial intrinsic scatter, suggesting that the relation is only beginning to emerge at zsim5. We measure a negative correlation between f_{rm DM} and baryonic surface density Sigma_{rm bar}, weaker but broadly consistent with trends at cosmic noon and at zsim0. Qualitatively comparing with modified NFW profiles coupled to an empirical stellar-to-halo mass relation suggests that the lowest f_{rm DM} (lesssim0.4) require cored inner DM profiles, while the highest fractions favour cuspier profiles, potentially reflecting adiabatic contraction. Overall, the elevated f_{rm gas} and f_{rm DM} at zgtrsim4 are compatible with progenitors of baryon-dominated systems at zsim2 and naturally anticipate overmassive black holes at fixed M_{star}.

  • 18 authors
·
Oct 16

The Stellar Populations and Rest-Frame Colors of Star-Forming Galaxies at $z \approx 8$: Exploring the Impact of Filter Choice and Star Formation History Assumption with JADES

Our understanding of the physical properties of star-forming galaxies during the Epoch of Reionization (EoR, at z > 6) suffers from degeneracies among the apparent properties of the stars, the nebular gas, and the dust. These degeneracies are most prominent with photometry, which has insufficient (1) spectral resolution and (2) rest-frame spectral coverage. We explore ways to break these degeneracies with a sample of N = 22 high-redshift star-forming galaxies at 7 < z_{phot} leq 9, using some of the deepest existing imaging from JWST/NIRCam and JWST/MIRI with JADES. Key to this study is the imaging from JWST/MIRI at 7.7 mum, which provides coverage of the rest-frame I-band at the observed redshifts. We infer stellar population properties and rest-frame colors using a variety of filter sets and star formation history assumptions to explore the impact of these choices. Evaluating these quantities both with and without the 7.7 mum data point shows that dense spectral coverage with JWST/NIRCam (eight or more filters, including at least one medium-band) can compensate for lacking the rest-frame I-band coverage for the vast majority (approx 80%) of our sample. Furthermore, these galaxy properties are most consistently determined by assuming the delayed-tau star formation history, which provides the smallest offsets and scatters around these offsets when including JWST/MIRI. Within extragalactic surveys like JADES and CEERS, our findings suggest that robust characterization of the stellar population properties and rest-frame colors for high-redshift star-forming galaxies is possible with JWST/NIRCam alone at z approx 8.

  • 33 authors
·
Jun 2

ALMA Lensing Cluster Survey: Physical characterization of near-infrared-dark intrinsically faint ALMA sources at z=2-4

We present results from Atacama Large Millimeter/submillimeter Array (ALMA) spectral line-scan observations at 3-mm and 2-mm bands of three near-infrared-dark (NIR-dark) galaxies behind two massive lensing clusters MACS J0417.5-1154 and RXC J0032.1+1808. Each of these three sources is a faint (de-lensed S_{1.2 mm} < 1 mJy) triply lensed system originally discovered in the ALMA Lensing Cluster Survey. We have successfully detected CO and [C I] emission lines and confirmed that their spectroscopic redshifts are z=3.652, 2.391, and 2.985. By utilizing a rich multi-wavelength data set, we find that the NIR-dark galaxies are located on the star formation main sequence in the intrinsic stellar mass range of log (M_*/M_odot) = 9.8 - 10.4, which is about one order of magnitude lower than that of typical submillimeter galaxies (SMGs). These NIR-dark galaxies show a variety in gas depletion times and spatial extent of dust emission. One of the three is a normal star-forming galaxy with gas depletion time consistent with a scaling relation, and its infrared surface brightness is an order of magnitude smaller than that of typical SMGs. Since this galaxy has an elongated axis ratio of sim 0.17, we argue that normal star-forming galaxies in an edge-on configuration can be heavily dust-obscured. This implies that existing deep WFC3/F160W surveys may miss a fraction of typical star-forming main-sequence galaxies due to their edge-on orientation.

  • 36 authors
·
Jun 14, 2024

Overview of the JWST Advanced Deep Extragalactic Survey (JADES)

We present an overview of the James Webb Space Telescope (JWST) Advanced Deep Extragalactic Survey (JADES), an ambitious program of infrared imaging and spectroscopy in the GOODS-S and GOODS-N deep fields, designed to study galaxy evolution from high redshift to cosmic noon. JADES uses about 770 hours of Cycle 1 guaranteed time largely from the Near-Infrared Camera (NIRCam) and Near-Infrared Spectrograph (NIRSpec) instrument teams. In GOODS-S, in and around the Hubble Ultra Deep Field and Chandra Deep Field South, JADES produces a deep imaging region of ~45 arcmin^2 with an average of 130 hrs of exposure time spread over 9 NIRCam filters. This is extended at medium depth in GOODS-S and GOODS-N with NIRCam imaging of ~175 arcmin^2 with an average exposure time of 20 hrs spread over 8-10 filters. In both fields, we conduct extensive NIRSpec multi-object spectroscopy, including 2 deep pointings of 55 hrs exposure time, 14 medium pointings of ~12 hrs, and 15 shallower pointings of ~4 hrs, targeting over 5000 HST and JWST-detected faint sources with 5 low, medium, and high-resolution dispersers covering 0.6-5.3 microns. Finally, JADES extends redward via coordinated parallels with the JWST Mid-Infrared Instrument (MIRI), featuring ~9 arcmin^2 with 43 hours of exposure at 7.7 microns and twice that area with 2-6.5 hours of exposure at 12.8 microns For nearly 30 years, the GOODS-S and GOODS-N fields have been developed as the premier deep fields on the sky; JADES is now providing a compelling start on the JWST legacy in these fields.

  • 76 authors
·
Jun 4, 2023