Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeWarm Hawking Relics From Primordial Black Hole Domination
We study the cosmological impact of warm, dark-sector relic particles produced as Hawking radiation in a primordial-black-hole-dominated universe before big bang nucleosynthesis. If these dark-sector particles are stable, they would survive to the present day as "Hawking relics" and modify the growth of cosmological structure. We show that such relics are produced with much larger momenta, but in smaller quantities than the familiar thermal relics considered in standard cosmology. Consequently, Hawking relics with keV-MeV masses affect the growth of large-scale structure in a similar way to eV-scale thermal relics like massive neutrinos. We model their production and evolution, and show that their momentum distributions are broader than comparable relics with thermal distributions. Warm Hawking relics affect the growth of cosmological perturbations and we constrain their abundance to be less than 2% of the dark matter over a broad range of their viable parameter space. Finally, we examine how future measurements of the matter power spectrum can distinguish Hawking relics from thermal particles.
Stochastic acceleration in arbitrary astrophysical environments
Turbulent magnetic fields are to some extent a universal feature in astrophysical phenomena. Charged particles that encounter these turbulence get on average accelerated according to the so-called second-order Fermi process. However, in most astrophysical environments there are additional competing processes, such as different kinds of first-order energy changes and particle escape, that effect the resulting momentum distribution of the particles. In this work we provide to our knowledge the first semi-analytical solution of the isotropic steady-state momentum diffusion equation including continuous and catastrophic momentum changes that can be applied to any arbitrary astrophysical system of interest. Here, we adopt that the assigned magnetic turbulence is constrained on a finite range and the particle flux vanishes beyond these boundaries. Consequently, we show that the so-called pile-up bump -- that has for some special cases long been established -- is a universal feature of stochastic acceleration that emerges around the momentum chi_{rm eq} where acceleration and continuous loss are in equilibrium if the particle's residence time in the system is sufficient at chi_{rm eq}. In general, the impact of continuous and catastrophic momentum changes plays a crucial role in the shape of the steady-state momentum distribution of the accelerated particles, where simplified unbroken power-law approximations are often not adequate.
Drift surface solver for runaway electron current dominant equilibria during the Current Quench
Runaway electron current generated during the Current Quench phase of tokamak disruptions could result in severe damage to future high performance devices. To control and mitigate such runaway electron current, it is important to accurately describe the runaway electron current dominated equilibrium, based on which further stability analysis could be carried out. In this paper, we derive a Grad-Shafranov-like equation solving for the axisymmetric drift surfaces of the runaway electrons for the simple case that all runaway electron share the same parallel momentum. This new equilibrium equation is then numerically solved with simple rectangular wall with ITER-like and MAST-like geometry parameters. The deviation between the drift surfaces and the flux surfaces is readily obtained, and runaway electrons is found to be well confined even in regions with open field lines. The change of the runaway electron parallel momentum is found to result in a horizontal current center displacement without any changes in the total current or the external field. The runaway current density profile is found to affect the susceptibility of such displacement, with flatter profiles result in more displacement by the same momentum change. With up-down asymmetry in the external poloidal field, such displacement is accompanied by a vertical displacement of runaway electron current. It is found that this effect is more pronounced in smaller, compact device and weaker poloidal field cases. The above results demonstrate the dynamics of current center displacement caused by the momentum space change in the runaway electrons, and pave way for future, more sophisticated runaway current equilibrium theory with more realistic consideration on the runaway electron momentum distribution. This new equilibrium theory also provides foundation for future stability analysis of the runaway electron current.
First principles simulations of dense hydrogen
Accurate knowledge of the properties of hydrogen at high compression is crucial for astrophysics (e.g. planetary and stellar interiors, brown dwarfs, atmosphere of compact stars) and laboratory experiments, including inertial confinement fusion. There exists experimental data for the equation of state, conductivity, and Thomson scattering spectra. However, the analysis of the measurements at extreme pressures and temperatures typically involves additional model assumptions, which makes it difficult to assess the accuracy of the experimental data. rigorously. On the other hand, theory and modeling have produced extensive collections of data. They originate from a very large variety of models and simulations including path integral Monte Carlo (PIMC) simulations, density functional theory (DFT), chemical models, machine-learned models, and combinations thereof. At the same time, each of these methods has fundamental limitations (fermion sign problem in PIMC, approximate exchange-correlation functionals of DFT, inconsistent interaction energy contributions in chemical models, etc.), so for some parameter ranges accurate predictions are difficult. Recently, a number of breakthroughs in first principle PIMC and DFT simulations were achieved which are discussed in this review. Here we use these results to benchmark different simulation methods. We present an update of the hydrogen phase diagram at high pressures, the expected phase transitions, and thermodynamic properties including the equation of state and momentum distribution. Furthermore, we discuss available dynamic results for warm dense hydrogen, including the conductivity, dynamic structure factor, plasmon dispersion, imaginary-time structure, and density response functions. We conclude by outlining strategies to combine different simulations to achieve accurate theoretical predictions.
Using angular momentum maps to detect kinematically distinct galactic components
In this work we introduce a physically motivated method of performing disc/spheroid decomposition of simulated galaxies, which we apply to the Eagle sample. We make use of the HEALPix package to create Mollweide projections of the angular momentum map of each galaxy's stellar particles. A number of features arise on the angular momentum space which allows us to decompose galaxies and classify them into different morphological types. We assign stellar particles with angular separation of less/greater than 30 degrees from the densest grid cell on the angular momentum sphere to the disc/spheroid components, respectively. We analyse the spatial distribution for a subsample of galaxies and show that the surface density profiles of the disc and spheroid closely follow an exponential and a Sersic profile, respectively. In addition discs rotate faster, have smaller velocity dispersions, are younger and are more metal rich than spheroids. Thus our morphological classification reproduces the observed properties of such systems. Finally, we demonstrate that our method is able to identify a significant population of galaxies with counter-rotating discs and provide a more realistic classification of such systems compared to previous methods.
Baryonic Effects on Lagrangian Clustering and Angular Momentum Reconstruction
Recent studies illustrate the correlation between the angular momenta of cosmic structures and their Lagrangian properties. However, only baryons are observable and it is unclear whether they reliably trace the cosmic angular momenta. We study the Lagrangian mass distribution, spin correlation, and predictability of dark matter, gas, and stellar components of galaxy-halo systems using IllustrisTNG, and show that the primordial segregations between components are typically small. Their protoshapes are also similar in terms of the statistics of moment of inertia tensors. Under the common gravitational potential they are expected to exert the same tidal torque and the strong spin correlations are not destroyed by the nonlinear evolution and complicated baryonic effects, as confirmed by the high-resolution hydrodynamic simulations. We further show that their late-time angular momenta traced by total gas, stars, or the central galaxies, can be reliably reconstructed by the initial perturbations. These results suggest that baryonic angular momenta can potentially be used in reconstructing the parameters and models related to the initial perturbations.
Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning
Large pre-trained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks, with particular robustness towards distribution shifts. In addition, subsequent fine-tuning can considerably improve performance on a selected downstream task. However, through naive fine-tuning, these zero-shot models lose their generalizability and robustness towards distribution shifts. This is a particular problem for tasks such as Continual Learning (CL), where continuous adaptation has to be performed as new task distributions are introduced sequentially. In this work, we showcase that where fine-tuning falls short to adapt such zero-shot capable models, simple momentum-based weight interpolation can provide consistent improvements for CL tasks in both memory-free and memory-based settings. In particular, we find improvements of over +4% on standard CL benchmarks, while reducing the error to the upper limit of jointly training on all tasks at once in parts by more than half, allowing the continual learner to inch closer to the joint training limits.
Improving Federated Learning Communication Efficiency with Global Momentum Fusion for Gradient Compression Schemes
Communication costs within Federated learning hinder the system scalability for reaching more data from more clients. The proposed FL adopts a hub-and-spoke network topology. All clients communicate through the central server. Hence, reducing communication overheads via techniques such as data compression has been proposed to mitigate this issue. Another challenge of federated learning is unbalanced data distribution, data on each client are not independent and identically distributed (non-IID) in a typical federated learning setting. In this paper, we proposed a new compression compensation scheme called Global Momentum Fusion (GMF) which reduces communication overheads between FL clients and the server and maintains comparable model accuracy in the presence of non-IID data. GitHub repository: https://github.com/tony92151/global-momentum-fusion-fl
Torque-Aware Momentum
Efficiently exploring complex loss landscapes is key to the performance of deep neural networks. While momentum-based optimizers are widely used in state-of-the-art setups, classical momentum can still struggle with large, misaligned gradients, leading to oscillations. To address this, we propose Torque-Aware Momentum (TAM), which introduces a damping factor based on the angle between the new gradients and previous momentum, stabilizing the update direction during training. Empirical results show that TAM, which can be combined with both SGD and Adam, enhances exploration, handles distribution shifts more effectively, and improves generalization performance across various tasks, including image classification and large language model fine-tuning, when compared to classical momentum-based optimizers.
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
Sparse Mixture of Experts (SMoE) has become the key to unlocking unparalleled scalability in deep learning. SMoE has the potential to exponentially increase parameter count while maintaining the efficiency of the model by only activating a small subset of these parameters for a given sample. However, it has been observed that SMoE suffers from unstable training and has difficulty adapting to new distributions, leading to the model's lack of robustness to data contamination. To overcome these limitations, we first establish a connection between the dynamics of the expert representations in SMoEs and gradient descent on a multi-objective optimization problem. Leveraging our framework, we then integrate momentum into SMoE and propose a new family of SMoEs named MomentumSMoE. We theoretically prove and numerically demonstrate that MomentumSMoE is more stable and robust than SMoE. In particular, we verify the advantages of MomentumSMoE over SMoE on a variety of practical tasks including ImageNet-1K object recognition and WikiText-103 language modeling. We demonstrate the applicability of MomentumSMoE to many types of SMoE models, including those in the Sparse MoE model for vision (V-MoE) and the Generalist Language Model (GLaM). We also show that other advanced momentum-based optimization methods, such as Adam, can be easily incorporated into the MomentumSMoE framework for designing new SMoE models with even better performance, almost negligible additional computation cost, and simple implementations.
Hyperparameter Tuning is All You Need for LISTA
Learned Iterative Shrinkage-Thresholding Algorithm (LISTA) introduces the concept of unrolling an iterative algorithm and training it like a neural network. It has had great success on sparse recovery. In this paper, we show that adding momentum to intermediate variables in the LISTA network achieves a better convergence rate and, in particular, the network with instance-optimal parameters is superlinearly convergent. Moreover, our new theoretical results lead to a practical approach of automatically and adaptively calculating the parameters of a LISTA network layer based on its previous layers. Perhaps most surprisingly, such an adaptive-parameter procedure reduces the training of LISTA to tuning only three hyperparameters from data: a new record set in the context of the recent advances on trimming down LISTA complexity. We call this new ultra-light weight network HyperLISTA. Compared to state-of-the-art LISTA models, HyperLISTA achieves almost the same performance on seen data distributions and performs better when tested on unseen distributions (specifically, those with different sparsity levels and nonzero magnitudes). Code is available: https://github.com/VITA-Group/HyperLISTA.
