Posts by Collection



Tabular Data Summarization

Posted on:

The task is to give a contextual description from non-fixed schema tabular data. The work involves curating a dataset based on NLP based heuristics and a Look-up algorithm. Achieved 90% success, which bypasses the performance of state-of-the-art Natural Language Processing models like SEMPRE by 40-50%. Further, we ran baselines models based on Copy Mechanism (Christopher D. Manning et al). Also, to evaluate the process we devise deep learning model: Attention Based Copy Mechanism, that learns to copy unseen vocabulary, has an alignment matrix, and learns vectorial representation for tables.

Escaping Saddle Points and Tensor Decomposition

Posted on:

One of the key problems in optimization is understanding optima of non-convex functions, because of its both theoretical and practical importance. In general, optimizing a non-convex function is NP-hard. The difficulty arises because of two reasons: finding the global solution among many local solutions, and presence of saddle points. First order methods (which rely on the first-order derivative) have been employed before to tackle the problem but are prone to saddle points i.e. $\nabla f$(saddle point) = 0. In this thesis we will review recent work on this problem, in particular by Jason D. Lee et al (First-order Methods Almost Always Avoid Saddle Points). The authors study non-convex functions which satisfy a mild property called the “Saddle-point property” and show that first order methods on non-convex functions satisfying this property almost always escape saddle points. Along the way, we’ll see how Stable Manifold Theorem guarantees that the set of starting points (for first-order methods) that get trapped around saddle points has measure zero and strengthens the intuition for “Strict-Saddle Property”. Finally, we will talk about the problem of tensor decomposition which is inherently non-convex and how it satisfies “Strict-Saddle Property”.

Improved Certified Adversarial Lower Bound Using Adaptive Relaxations

Posted on:

In this work, we are trying to improve upon the prior works Wong et al.(Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope) and Zhang et al.(Efficient Neural Network Robustness Certification with General Activation Functions) of convex linear approximation of ReLU networks to obtain certified adversarial perturbations. Existing work tries to get a certified lower bound on the minimal adversarial bound using convex relaxations for ReLU networks. For a fixed normed $\epsilon$-ball for an input instance $x$ i.e. $\mathcal{B}(x,\epsilon)$, they strive to find linear approximations for ReLU activation functions. This convex relaxation is used to upper and lower bound the components of the logit(final) layer. Although the linear relaxation thus used, helps in finding certified lower bound on the minimal adversarial bounds efficiently, it is at the cost of large error in approximating the ReLU function. Our work addresses this issue and builds on the work based on convex relaxations. To minimize the error in approximating the ReLU activations, we use regulated softplus function which we call ReST. Our approximation is based on adaptive selection of ReST, Linear and Quadratic bounds for ReLU activations in such a manner that we obtain concave upper bound and convex lower bound for the component functions of the logit layer. Thus, for a targeted attack, we obtain a concave objective to be maximized till a threshold.

Average-case Complexity of Teaching Convex Polytopes via Halfspace Queries

Posted on:

We examine the task of locating a target region among those induced by intersections of $n$ halfspaces in $\mathbb{R}^d$. This generic task connects to fundamental machine learning problems, such as training a perceptron and learning a $\phi$-separable dichotomy. We investigate the average teaching complexity of the task, i.e., the minimal number of samples (halfspace queries) required by a teacher to help a version-space learner in locating a randomly selected target. As our main result, we show that the average-case teaching complexity is $\Theta(d)$, which is in sharp contrast to the worst-case teaching complexity of $\Theta(n)$. If instead, we consider the average-case learning complexity, the bounds have a dependency on $n$ as $\Theta(n)$ for i.i.d. queries and $\Theta(d\cdot \log n)$ for actively chosen queries by the learner. Our proof techniques are based on novel insights from computational geometry, which allow us to count the number of convex polytopes and faces in a Euclidean space depending on the arrangement of halfspaces. Our insights allow us to establish a tight bound on the average-case complexity for $\phi$-separable dichotomies, which generalizes the known $\mathcal{O}(d)$ bound on the average number of “extreme patterns” in the classical computational geometry literature (Cover, 1965).

The Teaching Dimension of Kernel Perceptrons

Posted on:

Algorithmic machine teaching has been studied under the linear setting where exact teaching is possible. However, little is known for teaching nonlinear learners. Here, we establish the sample complexity of teaching, aka teaching dimension, for kernelized perceptrons for different families of feature maps. As a warm-up, we show that the teaching complexity is $\Theta(d)$ for the exact teaching of linear perceptrons in $\mathbb{R}^d$, and $\Theta(d^k)$ for kernel perceptron with a polynomial kernel of order $k$. Furthermore, under certain smooth assumptions on the data distribution, we establish a rigorous bound on the complexity for approximately teaching a Gaussian kernel perceptron. We provide numerical examples of the optimal (approximate) teaching set under several canonical settings for linear, polynomial and Gaussian kernel perceptrons.

Deletion to Induced Matching

Posted on:

In the Deletion to induced matching problem, we are given a graph $G$ on $n$ vertices, $m$ edges and a non-negative integer $k$ and asks whether there exists a set of vertices $S \subseteq V(G) $ such that $\lvert S\rvert \le k $ and the size of any connected component in $G-S$ is exactly 2. In this paper, we provide a fixed-parameter tractable (FPT) $O^*(1.748^{k})$ running time and polynomial space algorithm for the Deletion to induced matching problem using branch-and-reduce strategy and path decomposition. We also extend our work to the exact-exponential version of the problem.




Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.