तद्वनः मम हृदये वसति।
Tadvanaḥ mama hṛdaye vasati.

About Me

I am a 5th-year doctoral candidate in the Computer Science department at the University of California-San Diego where I am co-advised by Prof. Sanjoy Dasgupta and Prof. Misha Belkin. Previously, I was a research fellow at Max Planck Institute (Saarbruecken, Germany) under Dr. Adish Singla. I completed a BSc in Mathematics and Computer Science, followed by an MSc in Computer Science at Chennai Mathematical Institute (CMI, India).

In the past, I have been fortunate to be supported by the following fellowships: Jacobs School of Engineering Fellowship (at UCSD), the Crerar Fellowship (awarded by UChicago CS, declined), the Max Planck Institute Fellowship, and the Chennai Mathematical Institute Scholastic Fellowship.

I am broadly interested in advancing both the theoretical foundations and practical applications of machine learning. Specifically, my focus lies in statistical machine learning, algorithm design, interactive learning, optimization, and the theoretical aspects of deep learning. I am particularly enthusiastic about leveraging tools from probability theory, analysis, differential geometry, and statistics to rigorously study the computational and statistical efficiency of learning algorithms. My goal is to deepen our understanding of the principles underlying data-driven learning and the capabilities of machines to extract meaningful insights from complex datasets.

Email: akk002 at ucsd dot edu

Recent News

  • [August, 2025] Presented my recent work at Princeton University and Yale University (Theory Seminar).
  • [May, 2025] Selected for the Summer School on machine learning theory at Princeton University.
  • [May, 2025] One paper to appear in JMLR 2025.
  • [May, 2025] Two papers accepted, one each in ICML 2025 and COLT 2025.
  • [Feb, 2025] Presenting my recent work at ITA’25 (San Diego) and reviewing for COLT’25.
  • [Dec, 2024] Attended Unknown Futures of Generalization workshop at Simons Institute, UC Berkeley.
  • [Summer, 2023] I was a research scientist intern at Adobe Research (San Jose, CA).
  • [Aug, 2022] Attended the Deep learning theory workshop at Simons Institute, UC Berkeley.
  • [July, 2022] Attended a summer school on Discrete Mathematics at Charles University, Prague (CZK).

Publications

  • A Gap Between the Gaussian RKHS and Neural Networks: An Infinite-Center Asymptotic Analysis
    A Kumar, R Parhi, M Belkin
    The 38th Annual Conference on Learning Theory (COLT 2025)
    colt · arxiv ·
    abstract
    Recent works have characterized the function-space inductive bias of infinite-width bounded-norm single-hidden-layer neural networks as a kind of bounded-variation-type space. This novel neural network Banach space encompasses many classical multivariate function spaces, including certain Sobolev spaces and the spectral Barron spaces. Notably, this Banach space also includes functions that exhibit less classical regularity, such as those that only vary in a few directions. On bounded domains, it is well-established that the Gaussian reproducing kernel Hilbert space (RKHS) strictly embeds into this Banach space, demonstrating a clear gap between the Gaussian RKHS and the neural network Banach space. It turns out that when investigating these spaces on unbounded domains, e.g., all of $\mathbb{R}^d$, the story is fundamentally different. We establish the following fundamental result: certain functions that lie in the Gaussian RKHS have infinite norm in the neural network Banach space. This provides a nontrivial gap between kernel methods and neural networks by exhibiting functions that kernel methods easily represent, whereas neural networks cannot.
  • Mirror Descent on Reproducing Kernel Banach Space (RKBS)
    A Kumar, M Belkin, P Pandit
    Journal of Machine Learning Research (JMLR), 2025 (To appear)
    arxiv ·
    abstract
    Recent advances in machine learning have led to increased interest in reproducing kernel Banach spaces (RKBS) as a more general framework that extends beyond reproducing kernel Hilbert spaces (RKHS). These works have resulted in the formulation of representer theorems under several regularized learning schemes. However, little is known about an optimization method that encompasses these results in this setting. This paper addresses a learning problem on Banach spaces endowed with a reproducing kernel, focusing on efficient optimization within RKBS. To tackle this challenge, we propose an algorithm based on mirror descent (MDA). Our approach involves an iterative method that employs gradient steps in the dual space of the Banach space using the reproducing kernel. We analyze the convergence properties of our algorithm under various assumptions and establish two types of results: first, we identify conditions under which a linear convergence rate is achievable, akin to optimization in the Euclidean setting, and provide a proof of the linear rate; second, we demonstrate a standard convergence rate in a constrained setting. Moreover, to instantiate this algorithm in practice, we introduce a novel family of RKBSs with $p$-norm ($p \neq 2$), characterized by both an explicit dual map and a kernel.
  • Learning Smooth Distance Functions via Queries
    A Kumar, S Dasgupta
    Preprint
    arxiv ·
    abstract
    In this work, we investigate the problem of learning distance functions within the query-based learning framework, where a learner is able to pose triplet queries of the form: “Is $x_i$ closer to $x_j$ or $x_k$?” We establish formal guarantees on the query complexity required to learn smooth, but otherwise general, distance functions under two notions of approximation: $\omega$-additive approximation and $(1 + \omega)$-multiplicative approximation. For the additive approximation, we propose a global method whose query complexity is quadratic in the size of a finite cover of the sample space. For the (stronger) multiplicative approximation, we introduce a method that combines global and local approaches, utilizing multiple Mahalanobis distance functions to capture local geometry. This method has a query complexity that scales quadratically with both the size of the cover and the ambient space dimension of the sample space.
  • Robust Empirical Risk Minimization with Tolerance
    R Bhattacharjee, K Chaudhuri, M Hopkins, A Kumar, H Yu
    Authors listed alphabetically
    The 34th International Conference on Algorithmic Learning Theory (ALT'23)
    alt · arxiv ·
    abstract
    Developing simple, sample-efficient learning algorithms for robust classification is a pressing issue in today’s tech-dominated world, and current theoretical techniques requiring exponential sample complexity and complicated improper learning rules fall far from answering the need. In this work we study the fundamental paradigm of (robust) empirical risk minimization (RERM), a simple process in which the learner outputs any hypothesis minimizing its training error. RERM famously fails to robustly learn VC classes, a bound we show extends even to ‘nice’ settings such as (bounded) halfspaces. As such, we study a recent relaxation of the robust model called tolerant robust learning, where the output classifier is compared to the best achievable error over slightly larger perturbation sets. We show that under geometric niceness conditions, a natural tolerant variant of RERM is indeed sufficient for $\gamma$-tolerant robust learning of VC classes over $\mathbb{R}^d$, and requires only $\tilde{\mathcal{O}}(\mathrm{VC}(\mathcal{H})\,d\,\log(D/\gamma)/\varepsilon^2)$ samples for robustness regions of (maximum) diameter $D$.
  • The Teaching Dimension of Kernel Perceptron
    A Kumar, H Zhang, A Singla, Y Chen
    The 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021)
    aistats · arxiv ·
    abstract
    Algorithmic machine teaching has been studied under the linear setting where exact teaching is possible. However, little is known for teaching nonlinear learners. Here, we establish the sample complexity of teaching, aka teaching dimension, for kernelized perceptrons for different families of feature maps. As a warm-up, we show that the teaching complexity is $\Theta(d)$ for the exact teaching of linear perceptrons in $\mathbb{R}^d$, and $\Theta(d^k)$ for kernel perceptron with a polynomial kernel of order $k$. Furthermore, under certain smooth assumptions on the data distribution, we establish a rigorous bound on the complexity for approximately teaching a Gaussian kernel perceptron. We provide numerical examples of the optimal (approximate) teaching set under several canonical settings for linear, polynomial and Gaussian kernel perceptrons.

Recent talks

  • Learning Smooth Distance Functions via Queries (slides)
    Yale Student Theory Seminar; Princeton ML Theory Summer School; UCSD CSE
  • A Gap Between Gaussian Kernel Machine and Neural Networks
    COLT 2025; UCSD CSE Thesis Proposal
  • Feature Learning in Large Language Models
    Adobe Research, San Jose

Some notes

Improved Certified Adversarial Lower Bound Using Adaptive Relaxations
Ongoing project on adversarial deep learning.

Escaping Saddle Points and Tensor Decomposition
Master’s Thesis under the guidance of Dr. K V Subrahmanyam. [Slides]

Natural Proofs Vs Derandomization
Project report completed as part of the Advanced Complexity course at Chennai Mathematical Institute.