Beyond L2: Scaling Transformer Attention with Lp Norms

A new approach to normalizing attention mechanisms in Transformer models uses Lp norms to improve training stability and accelerate convergence.

A new approach to normalizing attention mechanisms in Transformer models uses Lp norms to improve training stability and accelerate convergence.
New research offers a streamlined approach to constrained contextual bandits, improving performance in challenging, unpredictable scenarios.
![The speed of sound squared, [latex]c_{s}^{2}[/latex], is shown to vary with quark chemical potential μ, exhibiting a relationship constrained by a conformal bound of [latex]c_{s}^{2} = 1/3[/latex], as indicated by the solid light-blue line and consistent with observations detailed in Figure 1.](https://arxiv.org/html/2602.05796v1/x8.png)
New research reveals how accurately modeling the behavior of extremely dense quark matter requires careful consideration of medium effects and a consistent regularization scheme.
A new review reveals that successful blockchain implementation in government requires a nuanced governance model that balances decentralization with necessary oversight.

Researchers have developed a new family of relativistic basis sets for p-block elements, promising enhanced accuracy in electronic structure calculations.

A new JAX library, lrux, dramatically accelerates quantum Monte Carlo calculations by optimizing the computation of key determinants and Pfaffians.

New research assesses how effectively RISC-V systems can isolate critical tasks from less sensitive ones, ensuring safety and reliability in complex applications.
![The study investigates the possible configurations of double-bottom tetraquarks, specifically exploring both meson-meson and diquark-antidiquark arrangements-where [latex]Q=b[/latex] represents bottom quarks and [latex]q=u,d[/latex] signifies up or down quarks-to understand the fundamental building blocks of these exotic hadronic states.](https://arxiv.org/html/2602.05941v1/x1.png)
Researchers are leveraging quantum simulation to probe the elusive structure and properties of tetraquarks, complex particles composed of four quarks.
![The system’s time complexity is delineated into three phases - Initialization [latex]T_{init}[/latex], Data Processing [latex]T_{process}[/latex], and Finalization [latex]T_{finalize}[/latex] - each contributing to the overall computational cost and defined by distinct equations that govern their respective durations.](https://arxiv.org/html/2602.05641v1/figure/time_complexity_diagram.png)
A new analysis provides a standardized framework for comparing the speed of ten finalist algorithms in the NIST Lightweight Cryptography competition.
This review explores the construction of efficient error-correcting codes derived from the intricate geometry of algebraic curves.