new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Feb 4

TGPO: Temporal Grounded Policy Optimization for Signal Temporal Logic Tasks

Learning control policies for complex, long-horizon tasks is a central challenge in robotics and autonomous systems. Signal Temporal Logic (STL) offers a powerful and expressive language for specifying such tasks, but its non-Markovian nature and inherent sparse reward make it difficult to be solved via standard Reinforcement Learning (RL) algorithms. Prior RL approaches focus only on limited STL fragments or use STL robustness scores as sparse terminal rewards. In this paper, we propose TGPO, Temporal Grounded Policy Optimization, to solve general STL tasks. TGPO decomposes STL into timed subgoals and invariant constraints and provides a hierarchical framework to tackle the problem. The high-level component of TGPO proposes concrete time allocations for these subgoals, and the low-level time-conditioned policy learns to achieve the sequenced subgoals using a dense, stage-wise reward signal. During inference, we sample various time allocations and select the most promising assignment for the policy network to rollout the solution trajectory. To foster efficient policy learning for complex STL with multiple subgoals, we leverage the learned critic to guide the high-level temporal search via Metropolis-Hastings sampling, focusing exploration on temporally feasible solutions. We conduct experiments on five environments, ranging from low-dimensional navigation to manipulation, drone, and quadrupedal locomotion. Under a wide range of STL tasks, TGPO significantly outperforms state-of-the-art baselines (especially for high-dimensional and long-horizon cases), with an average of 31.6% improvement in task success rate compared to the best baseline. The code will be available at https://github.com/mengyuest/TGPO

What Regularized Auto-Encoders Learn from the Data Generating Distribution

What do auto-encoders learn about the underlying data generating distribution? Recent work suggests that some auto-encoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density. We show that the auto-encoder captures the score (derivative of the log-density with respect to the input). It contradicts previous interpretations of reconstruction error as an energy function. Unlike previous results, the theorems provided here are completely generic and do not depend on the parametrization of the auto-encoder: they show what the auto-encoder would tend to if given enough capacity and examples. These results are for a contractive training criterion we show to be similar to the denoising auto-encoder training criterion with small corruption noise, but with contraction applied on the whole reconstruction function rather than just encoder. Similarly to score matching, one can consider the proposed training criterion as a convenient alternative to maximum likelihood because it does not involve a partition function. Finally, we show how an approximate Metropolis-Hastings MCMC can be setup to recover samples from the estimated distribution, and this is confirmed in sampling experiments.

  • 2 authors
·
Nov 18, 2012

MCMC: Bridging Rendering, Optimization and Generative AI

Generative artificial intelligence (AI) has made unprecedented advances in vision language models over the past two years. During the generative process, new samples (images) are generated from an unknown high-dimensional distribution. Markov Chain Monte Carlo (MCMC) methods are particularly effective in drawing samples from such complex, high-dimensional distributions. This makes MCMC methods an integral component for models like EBMs, ensuring accurate sample generation. Gradient-based optimization is at the core of modern generative models. The update step during the optimization forms a Markov chain where the new update depends only on the current state. This allows exploration of the parameter space in a memoryless manner, thus combining the benefits of gradient-based optimization and MCMC sampling. MCMC methods have shown an equally important role in physically based rendering where complex light paths are otherwise quite challenging to sample from simple importance sampling techniques. A lot of research is dedicated towards bringing physical realism to samples (images) generated from diffusion-based generative models in a data-driven manner, however, a unified framework connecting these techniques is still missing. In this course, we take the first steps toward understanding each of these components and exploring how MCMC could potentially serve as a bridge, linking these closely related areas of research. Our course aims to provide necessary theoretical and practical tools to guide students, researchers and practitioners towards the common goal of generative physically based rendering. All Jupyter notebooks with demonstrations associated to this tutorial can be found on the project webpage: https://sinbag.github.io/mcmc/

  • 2 authors
·
Oct 10, 2025

Denotational validation of higher-order Bayesian inference

We present a modular semantic account of Bayesian inference algorithms for probabilistic programming languages, as used in data science and machine learning. Sophisticated inference algorithms are often explained in terms of composition of smaller parts. However, neither their theoretical justification nor their implementation reflects this modularity. We show how to conceptualise and analyse such inference algorithms as manipulating intermediate representations of probabilistic programs using higher-order functions and inductive types, and their denotational semantics. Semantic accounts of continuous distributions use measurable spaces. However, our use of higher-order functions presents a substantial technical difficulty: it is impossible to define a measurable space structure over the collection of measurable functions between arbitrary measurable spaces that is compatible with standard operations on those functions, such as function application. We overcome this difficulty using quasi-Borel spaces, a recently proposed mathematical structure that supports both function spaces and continuous distributions. We define a class of semantic structures for representing probabilistic programs, and semantic validity criteria for transformations of these representations in terms of distribution preservation. We develop a collection of building blocks for composing representations. We use these building blocks to validate common inference algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo. To emphasize the connection between the semantic manipulation and its traditional measure theoretic origins, we use Kock's synthetic measure theory. We demonstrate its usefulness by proving a quasi-Borel counterpart to the Metropolis-Hastings-Green theorem.

  • 10 authors
·
Nov 8, 2017

State and parameter learning with PaRIS particle Gibbs

Non-linear state-space models, also known as general hidden Markov models, are ubiquitous in statistical machine learning, being the most classical generative models for serial data and sequences in general. The particle-based, rapid incremental smoother PaRIS is a sequential Monte Carlo (SMC) technique allowing for efficient online approximation of expectations of additive functionals under the smoothing distribution in these models. Such expectations appear naturally in several learning contexts, such as likelihood estimation (MLE) and Markov score climbing (MSC). PARIS has linear computational complexity, limited memory requirements and comes with non-asymptotic bounds, convergence results and stability guarantees. Still, being based on self-normalised importance sampling, the PaRIS estimator is biased. Our first contribution is to design a novel additive smoothing algorithm, the Parisian particle Gibbs PPG sampler, which can be viewed as a PaRIS algorithm driven by conditional SMC moves, resulting in bias-reduced estimates of the targeted quantities. We substantiate the PPG algorithm with theoretical results, including new bounds on bias and variance as well as deviation inequalities. Our second contribution is to apply PPG in a learning framework, covering MLE and MSC as special examples. In this context, we establish, under standard assumptions, non-asymptotic bounds highlighting the value of bias reduction and the implicit Rao--Blackwellization of PPG. These are the first non-asymptotic results of this kind in this setting. We illustrate our theoretical results with numerical experiments supporting our claims.

  • 5 authors
·
Jan 2, 2023

Accelerating Distributed Stochastic Optimization via Self-Repellent Random Walks

We study a family of distributed stochastic optimization algorithms where gradients are sampled by a token traversing a network of agents in random-walk fashion. Typically, these random-walks are chosen to be Markov chains that asymptotically sample from a desired target distribution, and play a critical role in the convergence of the optimization iterates. In this paper, we take a novel approach by replacing the standard linear Markovian token by one which follows a nonlinear Markov chain - namely the Self-Repellent Radom Walk (SRRW). Defined for any given 'base' Markov chain, the SRRW, parameterized by a positive scalar {\alpha}, is less likely to transition to states that were highly visited in the past, thus the name. In the context of MCMC sampling on a graph, a recent breakthrough in Doshi et al. (2023) shows that the SRRW achieves O(1/{\alpha}) decrease in the asymptotic variance for sampling. We propose the use of a 'generalized' version of the SRRW to drive token algorithms for distributed stochastic optimization in the form of stochastic approximation, termed SA-SRRW. We prove that the optimization iterate errors of the resulting SA-SRRW converge to zero almost surely and prove a central limit theorem, deriving the explicit form of the resulting asymptotic covariance matrix corresponding to iterate errors. This asymptotic covariance is always smaller than that of an algorithm driven by the base Markov chain and decreases at rate O(1/{\alpha}^2) - the performance benefit of using SRRW thereby amplified in the stochastic optimization context. Empirical results support our theoretical findings.

  • 3 authors
·
Jan 17, 2024

Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts

While score-based generative models are the model of choice across diverse domains, there are limited tools available for controlling inference-time behavior in a principled manner, e.g. for composing multiple pretrained models. Existing classifier-free guidance methods use a simple heuristic to mix conditional and unconditional scores to approximately sample from conditional distributions. However, such methods do not approximate the intermediate distributions, necessitating additional 'corrector' steps. In this work, we provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models. We derive a weighted simulation scheme which we call Feynman-Kac Correctors (FKCs) based on the celebrated Feynman-Kac formula by carefully accounting for terms in the appropriate partial differential equations (PDEs). To simulate these PDEs, we propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality. We empirically demonstrate the utility of our methods by proposing amortized sampling via inference-time temperature annealing, improving multi-objective molecule generation using pretrained models, and improving classifier-free guidance for text-to-image generation. Our code is available at https://github.com/martaskrt/fkc-diffusion.

  • 9 authors
·
Mar 4, 2025 2

Efficient estimation of multiple expectations with the same sample by adaptive importance sampling and control variates

Some classical uncertainty quantification problems require the estimation of multiple expectations. Estimating all of them accurately is crucial and can have a major impact on the analysis to perform, and standard existing Monte Carlo methods can be costly to do so. We propose here a new procedure based on importance sampling and control variates for estimating more efficiently multiple expectations with the same sample. We first show that there exists a family of optimal estimators combining both importance sampling and control variates, which however cannot be used in practice because they require the knowledge of the values of the expectations to estimate. Motivated by the form of these optimal estimators and some interesting properties, we therefore propose an adaptive algorithm. The general idea is to adaptively update the parameters of the estimators for approaching the optimal ones. We suggest then a quantitative stopping criterion that exploits the trade-off between approaching these optimal parameters and having a sufficient budget left. This left budget is then used to draw a new independent sample from the final sampling distribution, allowing to get unbiased estimators of the expectations. We show how to apply our procedure to sensitivity analysis, by estimating Sobol' indices and quantifying the impact of the input distributions. Finally, realistic test cases show the practical interest of the proposed algorithm, and its significant improvement over estimating the expectations separately.

  • 3 authors
·
Nov 30, 2022

Denoising MCMC for Accelerating Diffusion-Based Generative Models

Diffusion models are powerful generative models that simulate the reverse of diffusion processes using score functions to synthesize data from noise. The sampling process of diffusion models can be interpreted as solving the reverse stochastic differential equation (SDE) or the ordinary differential equation (ODE) of the diffusion process, which often requires up to thousands of discretization steps to generate a single image. This has sparked a great interest in developing efficient integration techniques for reverse-S/ODEs. Here, we propose an orthogonal approach to accelerating score-based sampling: Denoising MCMC (DMCMC). DMCMC first uses MCMC to produce samples in the product space of data and variance (or diffusion time). Then, a reverse-S/ODE integrator is used to denoise the MCMC samples. Since MCMC traverses close to the data manifold, the computation cost of producing a clean sample for DMCMC is much less than that of producing a clean sample from noise. To verify the proposed concept, we show that Denoising Langevin Gibbs (DLG), an instance of DMCMC, successfully accelerates all six reverse-S/ODE integrators considered in this work on the tasks of CIFAR10 and CelebA-HQ-256 image generation. Notably, combined with integrators of Karras et al. (2022) and pre-trained score models of Song et al. (2021b), DLG achieves SOTA results. In the limited number of score function evaluation (NFE) settings on CIFAR10, we have 3.86 FID with approx 10 NFE and 2.63 FID with approx 20 NFE. On CelebA-HQ-256, we have 6.99 FID with approx 160 NFE, which beats the current best record of Kim et al. (2022) among score-based models, 7.16 FID with 4000 NFE. Code: https://github.com/1202kbs/DMCMC

  • 2 authors
·
Sep 29, 2022

Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin

The diffusion models (DMs) have demonstrated the remarkable capability of generating images via learning the noised score function of data distribution. Current DM sampling techniques typically rely on first-order Langevin dynamics at each noise level, with efforts concentrated on refining inter-level denoising strategies. While leveraging additional second-order Hessian geometry to enhance the sampling quality of Langevin is a common practice in Markov chain Monte Carlo (MCMC), the naive attempts to utilize Hessian geometry in high-dimensional DMs lead to quadratic-complexity computational costs, rendering them non-scalable. In this work, we introduce a novel Levenberg-Marquardt-Langevin (LML) method that approximates the diffusion Hessian geometry in a training-free manner, drawing inspiration from the celebrated Levenberg-Marquardt optimization algorithm. Our approach introduces two key innovations: (1) A low-rank approximation of the diffusion Hessian, leveraging the DMs' inherent structure and circumventing explicit quadratic-complexity computations; (2) A damping mechanism to stabilize the approximated Hessian. This LML approximated Hessian geometry enables the diffusion sampling to execute more accurate steps and improve the image generation quality. We further conduct a theoretical analysis to substantiate the approximation error bound of low-rank approximation and the convergence property of the damping mechanism. Extensive experiments across multiple pretrained DMs validate that the LML method significantly improves image generation quality, with negligible computational overhead.

  • 12 authors
·
May 30, 2025

Applying the Polynomial Maximization Method to Estimate ARIMA Models with Asymmetric Non-Gaussian Innovations

Classical estimators for ARIMA parameters (MLE, CSS, OLS) assume Gaussian innovations, an assumption frequently violated in financial and economic data exhibiting asymmetric distributions with heavy tails. We develop and validate the second-order polynomial maximization method (PMM2) for estimating ARIMA(p,d,q) models with non-Gaussian innovations. PMM2 is a semiparametric technique that exploits higher-order moments and cumulants without requiring full distributional specification. Monte Carlo experiments (128,000 simulations) across sample sizes N in {100, 200, 500, 1000} and four innovation distributions demonstrate that PMM2 substantially outperforms classical methods for asymmetric innovations. For ARIMA(1,1,0) with N=500, relative efficiency reaches 1.58--1.90 for Gamma, lognormal, and χ^2(3) innovations (37--47\% variance reduction). Under Gaussian innovations PMM2 matches OLS efficiency, avoiding the precision loss typical of robust estimators. The method delivers major gains for moderate asymmetry (|γ_3| geq 0.5) and N geq 200, with computational costs comparable to MLE. PMM2 provides an effective alternative for time series with asymmetric innovations typical of financial markets, macroeconomic indicators, and industrial measurements. Future extensions include seasonal SARIMA models, GARCH integration, and automatic order selection.

  • 1 authors
·
Nov 10, 2025 1

Efficient Massive Black Hole Binary parameter estimation for LISA using Sequential Neural Likelihood

The inspiral, merger, and ringdown of Massive Black Hole Binaries (MBHBs) is one the main sources of Gravitational Waves (GWs) for the future Laser Interferometer Space Antenna (LISA), an ESA-led mission in the implementation phase. It is expected that LISA will detect these systems throughout the entire observable universe. Robust and efficient data analysis algorithms are necessary to detect and estimate physical parameters for these systems. In this work, we explore the application of Sequential Neural Likelihood, a simulation-based inference algorithm, to detect and characterize MBHB GW signals in synthetic LISA data. We describe in detail the different elements of the method, their performance and possible alternatives that can be used to enhance the performance. Instead of sampling from the conventional likelihood function, which requires a forward simulation for each evaluation, this method constructs a surrogate likelihood that is ultimately described by a neural network trained from a dataset of simulations of the MBHB signals and noise. One important advantage of this method is that, given that the likelihood is independent of the priors, we can iteratively train models that target specific observations in a fraction of the time and computational cost that other traditional and machine learning-based strategies would require. Because of the iterative nature of the method, we are able to train models to obtain qualitatively similar posteriors with less than 2\% of the simulator calls that Markov Chain Monte Carlo methods would require. We compare these posteriors with those obtained from Markov Chain Monte Carlo techniques and discuss the differences that appear, in particular in relation with the important role that data compression has in the modular implementation of the method that we present. We also discuss different strategies to improve the performance of the algorithms.

  • 2 authors
·
Jun 1, 2024

Automatic Data Augmentation via Invariance-Constrained Learning

Underlying data structures, such as symmetries or invariances to transformations, are often exploited to improve the solution of learning tasks. However, embedding these properties in models or learning algorithms can be challenging and computationally intensive. Data augmentation, on the other hand, induces these symmetries during training by applying multiple transformations to the input data. Despite its ubiquity, its effectiveness depends on the choices of which transformations to apply, when to do so, and how often. In fact, there is both empirical and theoretical evidence that the indiscriminate use of data augmentation can introduce biases that outweigh its benefits. This work tackles these issues by automatically adapting the data augmentation while solving the learning task. To do so, it formulates data augmentation as an invariance-constrained learning problem and leverages Monte Carlo Markov Chain (MCMC) sampling to solve it. The result is a practical algorithm that not only does away with a priori searches for augmentation distributions, but also dynamically controls if and when data augmentation is applied. Our experiments illustrate the performance of this method, which achieves state-of-the-art results in automatic data augmentation benchmarks for CIFAR datasets. Furthermore, this approach can be used to gather insights on the actual symmetries underlying a learning task.

  • 3 authors
·
Sep 29, 2022

Solving Inverse Problems via Diffusion-Based Priors: An Approximation-Free Ensemble Sampling Approach

Diffusion models (DMs) have proven to be effective in modeling high-dimensional distributions, leading to their widespread adoption for representing complex priors in Bayesian inverse problems (BIPs). However, current DM-based posterior sampling methods proposed for solving common BIPs rely on heuristic approximations to the generative process. To exploit the generative capability of DMs and avoid the usage of such approximations, we propose an ensemble-based algorithm that performs posterior sampling without the use of heuristic approximations. Our algorithm is motivated by existing works that combine DM-based methods with the sequential Monte Carlo (SMC) method. By examining how the prior evolves through the diffusion process encoded by the pre-trained score function, we derive a modified partial differential equation (PDE) governing the evolution of the corresponding posterior distribution. This PDE includes a modified diffusion term and a reweighting term, which can be simulated via stochastic weighted particle methods. Theoretically, we prove that the error between the true posterior distribution can be bounded in terms of the training error of the pre-trained score function and the number of particles in the ensemble. Empirically, we validate our algorithm on several inverse problems in imaging to show that our method gives more accurate reconstructions compared to existing DM-based methods.

  • 5 authors
·
Jun 4, 2025

Bayesian Bi-clustering of Neural Spiking Activity with Latent Structures

Modern neural recording techniques allow neuroscientists to obtain spiking activity of multiple neurons from different brain regions over long time periods, which requires new statistical methods to be developed for understanding structure of the large-scale data. In this paper, we develop a bi-clustering method to cluster the neural spiking activity spatially and temporally, according to their low-dimensional latent structures. The spatial (neuron) clusters are defined by the latent trajectories within each neural population, while the temporal (state) clusters are defined by (populationally) synchronous local linear dynamics shared with different periods. To flexibly extract the bi-clustering structure, we build the model non-parametrically, and develop an efficient Markov chain Monte Carlo (MCMC) algorithm to sample the posterior distributions of model parameters. Validating our proposed MCMC algorithm through simulations, we find the method can recover unknown parameters and true bi-clustering structures successfully. We then apply the proposed bi-clustering method to multi-regional neural recordings under different experiment settings, where we find that simultaneously considering latent trajectories and spatial-temporal clustering structures can provide us with a more accurate and interpretable result. Overall, the proposed method provides scientific insights for large-scale (counting) time series with elongated recording periods, and it can potentially have application beyond neuroscience.

  • 1 authors
·
Sep 5, 2023

A Unified Sampling Framework for Solver Searching of Diffusion Probabilistic Models

Recent years have witnessed the rapid progress and broad application of diffusion probabilistic models (DPMs). Sampling from DPMs can be viewed as solving an ordinary differential equation (ODE). Despite the promising performance, the generation of DPMs usually consumes much time due to the large number of function evaluations (NFE). Though recent works have accelerated the sampling to around 20 steps with high-order solvers, the sample quality with less than 10 NFE can still be improved. In this paper, we propose a unified sampling framework (USF) to study the optional strategies for solver. Under this framework, we further reveal that taking different solving strategies at different timesteps may help further decrease the truncation error, and a carefully designed solver schedule has the potential to improve the sample quality by a large margin. Therefore, we propose a new sampling framework based on the exponential integral formulation that allows free choices of solver strategy at each step and design specific decisions for the framework. Moreover, we propose S^3, a predictor-based search method that automatically optimizes the solver schedule to get a better time-quality trade-off of sampling. We demonstrate that S^3 can find outstanding solver schedules which outperform the state-of-the-art sampling methods on CIFAR-10, CelebA, ImageNet, and LSUN-Bedroom datasets. Specifically, we achieve 2.69 FID with 10 NFE and 6.86 FID with 5 NFE on CIFAR-10 dataset, outperforming the SOTA method significantly. We further apply S^3 to Stable-Diffusion model and get an acceleration ratio of 2times, showing the feasibility of sampling in very few steps without retraining the neural network.

  • 4 authors
·
Dec 12, 2023

Sliced Wasserstein Estimation with Control Variates

The sliced Wasserstein (SW) distances between two probability measures are defined as the expectation of the Wasserstein distance between two one-dimensional projections of the two measures. The randomness comes from a projecting direction that is used to project the two input measures to one dimension. Due to the intractability of the expectation, Monte Carlo integration is performed to estimate the value of the SW distance. Despite having various variants, there has been no prior work that improves the Monte Carlo estimation scheme for the SW distance in terms of controlling its variance. To bridge the literature on variance reduction and the literature on the SW distance, we propose computationally efficient control variates to reduce the variance of the empirical estimation of the SW distance. The key idea is to first find Gaussian approximations of projected one-dimensional measures, then we utilize the closed-form of the Wasserstein-2 distance between two Gaussian distributions to design the control variates. In particular, we propose using a lower bound and an upper bound of the Wasserstein-2 distance between two fitted Gaussians as two computationally efficient control variates. We empirically show that the proposed control variate estimators can help to reduce the variance considerably when comparing measures over images and point-clouds. Finally, we demonstrate the favorable performance of the proposed control variate estimators in gradient flows to interpolate between two point-clouds and in deep generative modeling on standard image datasets, such as CIFAR10 and CelebA.

  • 2 authors
·
Apr 30, 2023

Don't Play Favorites: Minority Guidance for Diffusion Models

We explore the problem of generating minority samples using diffusion models. The minority samples are instances that lie on low-density regions of a data manifold. Generating a sufficient number of such minority instances is important, since they often contain some unique attributes of the data. However, the conventional generation process of the diffusion models mostly yields majority samples (that lie on high-density regions of the manifold) due to their high likelihoods, making themselves ineffective and time-consuming for the minority generating task. In this work, we present a novel framework that can make the generation process of the diffusion models focus on the minority samples. We first highlight that Tweedie's denoising formula yields favorable results for majority samples. The observation motivates us to introduce a metric that describes the uniqueness of a given sample. To address the inherent preference of the diffusion models w.r.t. the majority samples, we further develop minority guidance, a sampling technique that can guide the generation process toward regions with desired likelihood levels. Experiments on benchmark real datasets demonstrate that our minority guidance can greatly improve the capability of generating high-quality minority samples over existing generative samplers. We showcase that the performance benefit of our framework persists even in demanding real-world scenarios such as medical imaging, further underscoring the practical significance of our work. Code is available at https://github.com/soobin-um/minority-guidance.

  • 3 authors
·
Jan 28, 2023

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models

Diffusion probabilistic models (DPMs) have achieved impressive success in high-resolution image synthesis, especially in recent large-scale text-to-image generation applications. An essential technique for improving the sample quality of DPMs is guided sampling, which usually needs a large guidance scale to obtain the best sample quality. The commonly-used fast sampler for guided sampling is DDIM, a first-order diffusion ODE solver that generally needs 100 to 250 steps for high-quality samples. Although recent works propose dedicated high-order solvers and achieve a further speedup for sampling without guidance, their effectiveness for guided sampling has not been well-tested before. In this work, we demonstrate that previous high-order fast samplers suffer from instability issues, and they even become slower than DDIM when the guidance scale grows large. To further speed up guided sampling, we propose DPM-Solver++, a high-order solver for the guided sampling of DPMs. DPM-Solver++ solves the diffusion ODE with the data prediction model and adopts thresholding methods to keep the solution matches training data distribution. We further propose a multistep variant of DPM-Solver++ to address the instability issue by reducing the effective step size. Experiments show that DPM-Solver++ can generate high-quality samples within only 15 to 20 steps for guided sampling by pixel-space and latent-space DPMs.

  • 6 authors
·
Nov 2, 2022

Accelerated Bayesian Inference for Pulsar Timing Arrays: Normalizing Flows for Rapid Model Comparison Across Stochastic Gravitational-Wave Background Sources

The recent detection of nanohertz stochastic gravitational-wave backgrounds (SGWBs) by pulsar timing arrays (PTAs) promises unique insights into astrophysical and cosmological origins. However, traditional Markov Chain Monte Carlo (MCMC) approaches become prohibitively expensive for large datasets. We employ a normalizing flow (NF)-based machine learning framework to accelerate Bayesian inference in PTA analyses. For the first time, we perform Bayesian model comparison across SGWB source models in the framework of machine learning by training NF architectures on the PTA dataset (NANOGrav 15-year) and enabling direct evidence estimation via learned harmonic mean estimators. Our examples include 10 conventional SGWB source models such as supermassive black hole binaries, power-law spectrum, cosmic strings, domain walls, scalar-induced GWs, first-order phase transitions, and dual scenario/inflationary gravitational wave. Our approach jointly infers 20 red noise parameters and 2 SGWB parameters per model in sim 20\,hours (including training), compared to sim 10\,days with MCMC. Critically, the NF method preserves rigorous model selection accuracy, with small Hellinger distances (lesssim 0.3) relative to MCMC posteriors, and reproduces MCMC-based Bayes factors across all tested scenarios. This scalable technique for SGWB source comparison will be essential for future PTA expansions and next-generation arrays such as the SKA, offering orders-of-magnitude efficiency gains without sacrificing physical interpretability.

  • 2 authors
·
Apr 5, 2025

Weighted least-squares approximation with determinantal point processes and generalized volume sampling

We consider the problem of approximating a function from L^2 by an element of a given m-dimensional space V_m, associated with some feature map varphi, using evaluations of the function at random points x_1,dots,x_n. After recalling some results on optimal weighted least-squares using independent and identically distributed points, we consider weighted least-squares using projection determinantal point processes (DPP) or volume sampling. These distributions introduce dependence between the points that promotes diversity in the selected features varphi(x_i). We first provide a generalized version of volume-rescaled sampling yielding quasi-optimality results in expectation with a number of samples n = O(mlog(m)), that means that the expected L^2 error is bounded by a constant times the best approximation error in L^2. Also, further assuming that the function is in some normed vector space H continuously embedded in L^2, we further prove that the approximation is almost surely bounded by the best approximation error measured in the H-norm. This includes the cases of functions from L^infty or reproducing kernel Hilbert spaces. Finally, we present an alternative strategy consisting in using independent repetitions of projection DPP (or volume sampling), yielding similar error bounds as with i.i.d. or volume sampling, but in practice with a much lower number of samples. Numerical experiments illustrate the performance of the different strategies.

  • 2 authors
·
Dec 21, 2023

Blockwise Stochastic Variance-Reduced Methods with Parallel Speedup for Multi-Block Bilevel Optimization

In this paper, we consider non-convex multi-block bilevel optimization (MBBO) problems, which involve mgg 1 lower level problems and have important applications in machine learning. Designing a stochastic gradient and controlling its variance is more intricate due to the hierarchical sampling of blocks and data and the unique challenge of estimating hyper-gradient. We aim to achieve three nice properties for our algorithm: (a) matching the state-of-the-art complexity of standard BO problems with a single block; (b) achieving parallel speedup by sampling I blocks and sampling B samples for each sampled block per-iteration; (c) avoiding the computation of the inverse of a high-dimensional Hessian matrix estimator. However, it is non-trivial to achieve all of these by observing that existing works only achieve one or two of these properties. To address the involved challenges for achieving (a, b, c), we propose two stochastic algorithms by using advanced blockwise variance-reduction techniques for tracking the Hessian matrices (for low-dimensional problems) or the Hessian-vector products (for high-dimensional problems), and prove an iteration complexity of O(mepsilon^{-3I(I<m)}{II} + mepsilon^{-3}{IB}) for finding an epsilon-stationary point under appropriate conditions. We also conduct experiments to verify the effectiveness of the proposed algorithms comparing with existing MBBO algorithms.

  • 5 authors
·
May 30, 2023

Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing

We propose an inference-time scaling approach for pretrained flow models. Recently, inference-time scaling has gained significant attention in LLMs and diffusion models, improving sample quality or better aligning outputs with user preferences by leveraging additional computation. For diffusion models, particle sampling has allowed more efficient scaling due to the stochasticity at intermediate denoising steps. On the contrary, while flow models have gained popularity as an alternative to diffusion models--offering faster generation and high-quality outputs in state-of-the-art image and video generative models--efficient inference-time scaling methods used for diffusion models cannot be directly applied due to their deterministic generative process. To enable efficient inference-time scaling for flow models, we propose three key ideas: 1) SDE-based generation, enabling particle sampling in flow models, 2) Interpolant conversion, broadening the search space and enhancing sample diversity, and 3) Rollover Budget Forcing (RBF), an adaptive allocation of computational resources across timesteps to maximize budget utilization. Our experiments show that SDE-based generation, particularly variance-preserving (VP) interpolant-based generation, improves the performance of particle sampling methods for inference-time scaling in flow models. Additionally, we demonstrate that RBF with VP-SDE achieves the best performance, outperforming all previous inference-time scaling approaches.

  • 4 authors
·
Mar 25, 2025 4

Procedural Generation of Grain Orientations using the Wave Function Collapse Algorithm

Statistics of grain sizes and orientations in metals correlate to the material's mechanical properties. Reproducing representative volume elements for further analysis of deformation and failure in metals, like 316L stainless steel, is particularly important due to their wide use in manufacturing goods today. Two approaches, initially created for video games, were considered for the procedural generation of representative grain microstructures. The first is the Wave Function Collapse (WFC) algorithm, and the second is constraint propagation and probabilistic inference through Markov Junior, a free and open-source software. This study aimed to investigate these two algorithms' effectiveness in using reference electron backscatter diffraction (EBSD) maps and recreating a statistically similar one that could be used in further research. It utilized two stainless steel EBSD maps as references to test both algorithms. First, the WFC algorithm was too constricting and, thus, incapable of producing images that resembled EBSDs. The second, MarkovJunior, was much more effective in creating a Voronoi tessellation that could be used to create an EBSD map in Python. When comparing the results between the reference and the generated EBSD, we discovered that the orientation and volume fractions were extremely similar. With the study, it was concluded that MarkovJunior is an effective machine learning tool that can reproduce representative grain microstructures.

  • 3 authors
·
Nov 20, 2023

ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design

Machine learning is a prevalent approach to tame the complexity of design space exploration for domain-specific architectures. Using ML for design space exploration poses challenges. First, it's not straightforward to identify the suitable algorithm from an increasing pool of ML methods. Second, assessing the trade-offs between performance and sample efficiency across these methods is inconclusive. Finally, lack of a holistic framework for fair, reproducible, and objective comparison across these methods hinders progress of adopting ML-aided architecture design space exploration and impedes creating repeatable artifacts. To mitigate these challenges, we introduce ArchGym, an open-source gym and easy-to-extend framework that connects diverse search algorithms to architecture simulators. To demonstrate utility, we evaluate ArchGym across multiple vanilla and domain-specific search algorithms in designing custom memory controller, deep neural network accelerators, and custom SoC for AR/VR workloads, encompassing over 21K experiments. Results suggest that with unlimited samples, ML algorithms are equally favorable to meet user-defined target specification if hyperparameters are tuned; no solution is necessarily better than another (e.g., reinforcement learning vs. Bayesian methods). We coin the term hyperparameter lottery to describe the chance for a search algorithm to find an optimal design provided meticulously selected hyperparameters. The ease of data collection and aggregation in ArchGym facilitates research in ML-aided architecture design space exploration. As a case study, we show this advantage by developing a proxy cost model with an RMSE of 0.61% that offers a 2,000-fold reduction in simulation time. Code and data for ArchGym is available at https://bit.ly/ArchGym.

  • 11 authors
·
Jun 15, 2023

A Hierarchical Bayesian Model for Deep Few-Shot Meta Learning

We propose a novel hierarchical Bayesian model for learning with a large (possibly infinite) number of tasks/episodes, which suits well the few-shot meta learning problem. We consider episode-wise random variables to model episode-specific target generative processes, where these local random variables are governed by a higher-level global random variate. The global variable helps memorize the important information from historic episodes while controlling how much the model needs to be adapted to new episodes in a principled Bayesian manner. Within our model framework, the prediction on a novel episode/task can be seen as a Bayesian inference problem. However, a main obstacle in learning with a large/infinite number of local random variables in online nature, is that one is not allowed to store the posterior distribution of the current local random variable for frequent future updates, typical in conventional variational inference. We need to be able to treat each local variable as a one-time iterate in the optimization. We propose a Normal-Inverse-Wishart model, for which we show that this one-time iterate optimization becomes feasible due to the approximate closed-form solutions for the local posterior distributions. The resulting algorithm is more attractive than the MAML in that it is not required to maintain computational graphs for the whole gradient optimization steps per episode. Our approach is also different from existing Bayesian meta learning methods in that unlike dealing with a single random variable for the whole episodes, our approach has a hierarchical structure that allows one-time episodic optimization, desirable for principled Bayesian learning with many/infinite tasks. The code is available at https://github.com/minyoungkim21/niwmeta.

  • 2 authors
·
Jun 16, 2023

Active Diffusion Subsampling

Subsampling is commonly used to mitigate costs associated with data acquisition, such as time or energy requirements, motivating the development of algorithms for estimating the fully-sampled signal of interest x from partially observed measurements y. In maximum-entropy sampling, one selects measurement locations that are expected to have the highest entropy, so as to minimize uncertainty about x. This approach relies on an accurate model of the posterior distribution over future measurements, given the measurements observed so far. Recently, diffusion models have been shown to produce high-quality posterior samples of high-dimensional signals using guided diffusion. In this work, we propose Active Diffusion Subsampling (ADS), a method for performing active subsampling using guided diffusion in which the model tracks a distribution of beliefs over the true state of x throughout the reverse diffusion process, progressively decreasing its uncertainty by choosing to acquire measurements with maximum expected entropy, and ultimately generating the posterior distribution p(x | y). ADS can be applied using pre-trained diffusion models for any subsampling rate, and does not require task-specific retraining - just the specification of a measurement model. Furthermore, the maximum entropy sampling policy employed by ADS is interpretable, enhancing transparency relative to existing methods using black-box policies. Experimentally, we show that ADS outperforms fixed sampling strategies, and study an application of ADS in Magnetic Resonance Imaging acceleration using the fastMRI dataset, finding that ADS performs competitively with supervised methods. Code available at https://active-diffusion-subsampling.github.io/.

  • 4 authors
·
Jun 20, 2024

Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution

Diffusion models, as a kind of powerful generative model, have given impressive results on image super-resolution (SR) tasks. However, due to the randomness introduced in the reverse process of diffusion models, the performances of diffusion-based SR models are fluctuating at every time of sampling, especially for samplers with few resampled steps. This inherent randomness of diffusion models results in ineffectiveness and instability, making it challenging for users to guarantee the quality of SR results. However, our work takes this randomness as an opportunity: fully analyzing and leveraging it leads to the construction of an effective plug-and-play sampling method that owns the potential to benefit a series of diffusion-based SR methods. More in detail, we propose to steadily sample high-quality SR images from pre-trained diffusion-based SR models by solving diffusion ordinary differential equations (diffusion ODEs) with optimal boundary conditions (BCs) and analyze the characteristics between the choices of BCs and their corresponding SR results. Our analysis shows the route to obtain an approximately optimal BC via an efficient exploration in the whole space. The quality of SR results sampled by the proposed method with fewer steps outperforms the quality of results sampled by current methods with randomness from the same pre-trained diffusion-based SR model, which means that our sampling method "boosts" current diffusion-based SR models without any additional training.

  • 5 authors
·
May 24, 2023

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models

A key challenge in applying reinforcement learning (RL) to diffusion large language models (dLLMs) lies in the intractability of their likelihood functions, which are essential for the RL objective, necessitating corresponding approximation in each training step. While existing methods approximate the log-likelihoods by their evidence lower bounds (ELBOs) via customized Monte Carlo (MC) sampling, the forward computational graphs of all MC samples need to be retained for the gradient computation of non-linear terms in the RL objective, resulting in significant memory overhead. This constraint restricts feasible sample sizes, leading to imprecise likelihood approximations and ultimately distorting the RL objective. To overcome this limitation, we propose Boundary-Guided Policy Optimization (BGPO), a memory-efficient RL algorithm that maximizes a specially constructed lower bound of the ELBO-based objective. This lower bound is carefully designed to satisfy two key properties: (1) Linearity: it is formulated in a linear sum where each term depends only on a single MC sample, thereby enabling gradient accumulation across samples and ensuring constant memory usage; (2) Equivalence: Both the value and gradient of this lower bound are equal to those of the ELBO-based objective in on-policy training, making it also an effective approximation for the original RL objective. These properties allow BGPO to adopt a large MC sample size, resulting in more accurate likelihood approximations and improved RL objective estimation, which in turn leads to enhanced performance. Experiments show that BGPO significantly outperforms previous RL algorithms for dLLMs in math problem solving, code generation, and planning tasks.

zai-org Z.ai
·
Oct 13, 2025 2

All You Need is a Good Functional Prior for Bayesian Deep Learning

The Bayesian treatment of neural networks dictates that a prior distribution is specified over their weight and bias parameters. This poses a challenge because modern neural networks are characterized by a large number of parameters, and the choice of these priors has an uncontrolled effect on the induced functional prior, which is the distribution of the functions obtained by sampling the parameters from their prior distribution. We argue that this is a hugely limiting aspect of Bayesian deep learning, and this work tackles this limitation in a practical and effective way. Our proposal is to reason in terms of functional priors, which are easier to elicit, and to "tune" the priors of neural network parameters in a way that they reflect such functional priors. Gaussian processes offer a rigorous framework to define prior distributions over functions, and we propose a novel and robust framework to match their prior with the functional prior of neural networks based on the minimization of their Wasserstein distance. We provide vast experimental evidence that coupling these priors with scalable Markov chain Monte Carlo sampling offers systematically large performance improvements over alternative choices of priors and state-of-the-art approximate Bayesian deep learning approaches. We consider this work a considerable step in the direction of making the long-standing challenge of carrying out a fully Bayesian treatment of neural networks, including convolutional neural networks, a concrete possibility.

  • 4 authors
·
Nov 25, 2020

DualFast: Dual-Speedup Framework for Fast Sampling of Diffusion Models

Diffusion probabilistic models (DPMs) have achieved impressive success in visual generation. While, they suffer from slow inference speed due to iterative sampling. Employing fewer sampling steps is an intuitive solution, but this will also introduces discretization error. Existing fast samplers make inspiring efforts to reduce discretization error through the adoption of high-order solvers, potentially reaching a plateau in terms of optimization. This raises the question: can the sampling process be accelerated further? In this paper, we re-examine the nature of sampling errors, discerning that they comprise two distinct elements: the widely recognized discretization error and the less explored approximation error. Our research elucidates the dynamics between these errors and the step by implementing a dual-error disentanglement strategy. Building on these foundations, we introduce an unified and training-free acceleration framework, DualFast, designed to enhance the speed of DPM sampling by concurrently accounting for both error types, thereby minimizing the total sampling error. DualFast is seamlessly compatible with existing samplers and significantly boost their sampling quality and speed, particularly in extremely few sampling steps. We substantiate the effectiveness of our framework through comprehensive experiments, spanning both unconditional and conditional sampling domains, across both pixel-space and latent-space DPMs.

  • 4 authors
·
Jun 15, 2025