Publications

1. AAAI

In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021. (To appear)

This paper develops a new approach for estimating an interpretable, relational model of a black-box autonomous agent that can plan and act. Our main contributions are a new paradigm for estimating such models using a minimal query interface with the agent, and a hierarchical querying algorithm that generates an interrogation policy for estimating the agent’s internal model in a vocabulary provided by the user. Empirical evaluation of our approach shows that despite the intractable search space of possible agent models, our approach allows correct and scalable estimation of interpretable agent models for a wide class of black-box autonomous agents. Our results also show that this approach can use predicate classifiers to learn interpretable models of planning agents that represent states as images.
@inproceedings{verma2021asking,
author        = {Verma, Pulkit and Marpally, Shashank Rao and Srivastava, Siddharth},
title         = {Asking the Right Questions: Learning Interpretable Action Models through Query Answering},
year          = {2021},
booktitle     = {Proceedings of the AAAI Conference on Artificial Intelligence},
}
Older Version(s):

Asking the Right Questions: Active Action-Model Learning.
Pulkit Verma, Shashank Rao Marpally, and Siddharth Srivastava.
In AAAI 2021 Workshop on Explainable Agency in Artificial Intelligence, 2021.

Learning Interpretable Models for Black-Box Agents.
Pulkit Verma, and Siddharth Srivastava.
In ICML 2020 Workshop on Human in the Loop Learning, 2020.

Learning Generalized Models by Interrogating Black-Box Autonomous Agents.
Pulkit Verma, and Siddharth Srivastava.
In AAAI 2020 Workshop on Generalization in Planning, 2020.

2021

1. ICSC
A Comparative Study of Resource Usage for Speaker Recognition Techniques.

In Proceedings of the 2016 International Conference on Signal Processing and Communication, 2016.

Resource usage of a software is an important factor to be taken into consideration while developing speaker recognition applications for mobile devices. Sometimes usage parameters are considered as important as accuracy of such systems. In this work, we analyze resource utilization in terms of power consumption, memory and space requirements of three standard speaker recognition techniques, viz. GMM-UBM framework, Joint Factor Analysis and i-vectors. Experiments are performed on the MIT MDSVC corpus using the Energy Measurement Library (EML). It is found that though i-vector approach requires more storage space, it is superior to the other two approaches in terms of memory and power consumption, which are critical factors for evaluating software performance in resource constrained mobile devices.
@inproceedings{verma2016comparative,
author    = {Verma, Pulkit and Das, Pradip K},
title     = {A Comparative Study of Resource Usage for Speaker Recognition Techniques},
booktitle = {Proceedings of the 2016 International Conference on Signal Processing and Communication},
pages     = {314–319},
year      = {2016},
publisher = {IEEE},
doi       = {10.1109/ICSPCom.2016.7980598},
url       = {https://doi.org/10.1109/ICSPCom.2016.7980598},
}

2016

1. IJST
i-Vectors in Speech Processing Applications: A Survey.

In International Journal of Speech Technology, 2015.

In the domain of speech recognition many methods have been proposed over time like Gaussian mixture models (GMM), GMM with universal background model (GMM-UBM framework), joint factor analysis, etc. i-Vector subspace modeling is one of the recent methods that has become the state of the art technique in this domain. This method largely provides the benefit of modeling both the intra-domain and inter-domain variabilities into the same low dimensional space. In this survey, we present a comprehensive collection of research work related to i-vectors since its inception. Some recent trends of using i-vectors in combination with other approaches are also discussed. The application of i-vectors in various fields of speech recognition, viz speaker, language, accent recognition, etc. is also presented. This paper should serve as a good starting point for anyone interested in working with i-vectors for speech processing in general. We then conclude the paper with a brief discussion on the future of i-vectors.
@article{verma2015ivectors,
author    = {Verma, Pulkit and Das, Pradip K},
title     = {i-{Vectors} in Speech Processing Applications: {A Survey}},
journal   = {International Journal of Speech Technology},
year      = {2015},
volume    = {18},
number    = {4},
pages     = {529–546},
publisher = {Springer Nature},
doi       = {10.1007/s10772-015-9295-3},
url       = {https://doi.org/10.1007/s10772-015-9295-3},
}
2. UIST
Investigating the “Wisdom of Crowds” at Scale.

In Adjunct Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, 2015.

In a variety of problem domains, it has been observed that the aggregate opinions of groups are often more accurate than those of the constituent individuals, a phenomenon that has been termed the "wisdom of the crowd." Yet, perhaps surprisingly, there is still little consensus on how generally the phenomenon holds, how best to aggregate crowd judgements, and how social influence affects estimates. We investigate these questions by taking a meta wisdom of crowds approach. With a distributed team of over 100 student researchers across 17 institutions in the United States and India, we develop a large-scale online experiment to systematically study the wisdom of crowds effect for 1,000 different tasks in 50 subject domains. These tasks involve various types of knowledge (e.g., explicit knowledge, tacit knowledge, and prediction), question formats (e.g., multiple choice and point estimation), and inputs (e.g., text, audio, and video). To examine the effect of social influence, participants are randomly assigned to one of three different experiment conditions in which they see varying degrees of information on the responses of others. In this ongoing project, we are now preparing to recruit participants via Amazon’s Mechanical Turk.
@inproceedings{mysore2015investigating,
author    = {Shankar Mysore, Alok and Yaligar, Vikas S. and Arrieta Ibarra, Imanol and
Simoiu, Camelia and Goel, Sharad and Arvind, Ramesh and Sumanth, Chiraag and Srikantan, Arvind
and HS, Bhargav and Pahadia, Mayank and Dobha, Tushar and Ahmed, Atif and Shankar, Mani and
Agarwal, Himani and Agarwal, Rajat and Anirudh-Kondaveeti, Sai and Arun-Gokhale, Shashank and
Attri, Aayush and Chandra, Arpita and Chilukur, Yogitha and Dharmaji, Sharath and Garg, Deepak
and Gupta, Naman and Gupta, Paras and Jacob, Glincy Mary and Jain, Siddharth and Joshi,
Shashank and Khajuria, Tarun and Khillan, Sameeksha and Konam, Sandeep and Kumar-Kolla, Praveen
and Loomba, Sahil and Madan, Rachit and Maharaja, Akshansh and Mathur, Vidit and Munshi, Bharat
and Nawazish, Mohammed and Neehar-Kurukunda, Venkata and Nirmal-Gavarraju, Venkat and
Parashar, Sonali and Parikh, Harsh and Paritala, Avinash and Patil, Amit and Phatak, Rahul and
Pradhan, Mandar and Ravichander, Abhilasha and Sangeeth, Krishna and
Sankaranarayanan, Sreecharan and Sehgal, Vibhor and Sheshan, Ashrith and Shibiraj, Suprajha and
Singh, Aditya and Singh, Anjali and Sinha, Prashant and Soni, Pushkin and Thomas, Bipin and
Varma-Dattada, Kasyap and Venkataraman, Sukanya and Verma, Pulkit and Yelurwar, Ishan},
title     = {Investigating the "{Wisdom of Crowds}" at Scale},
booktitle = {Adjunct Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology},
pages     = {75–76},
year      = {2015},
isbn      = {9781450337809},
publisher = {Association for Computing Machinery},
doi       = {10.1145/2815585.2815725},
url       = {https://doi.org/10.1145/2815585.2815725},
}
3. AIR
A Mobile Agents based Distributed Speech Recognition Engine for Controlling Multiple Robots.

In Proceedings of the 2015 Conference on Advances In Robotics, 2015.

Interaction with a robot has been an active area of research since the inception of robotics. Talking to a robot has always been considered the most natural way to communicate with it. But it is not always possible to have a full-fledged, standalone speech processing engine to be present on a robot or on a single machine. A dedicated system to convert the commands from audio to text is needed. However, as the number of commands and robots increases, it becomes necessary to eliminate all the single-point failure points in the system. Thus, distributed speech engine comes into picture. Also users may want to talk to the robot in different languages. The approach proposed in this paper is distributed, fault tolerant and scalable, such that any new recognition algorithm or language support can be added and used without any changes to the existing system. The work has been demonstrated on a freely available mobile agents based Internet of Things platform. However, any platform can be used.
@inproceedings{gupta2015mobile,
author    = {Gupta, Mayank and Verma, Pulkit and Bhattacharya, Tuhin and Das, Pradip K,
title     = {A Mobile Agents based Distributed Speech Recognition Engine for Controlling Multiple Robots},
booktitle = {Proceedings of the 2015 Conference on Advances In Robotics},
pages     = {1–6},
year      = {2015},
isbn      = {9781450333566},
publisher = {Association for Computing Machinery},
doi       = {10.1145/2783449.2783477},
url       = {https://doi.org/10.1145/2783449.2783477},
}

2015

1. IC3I
Improving Services Using Mobile Agents-based IoT in a Smart City.

In Proceedings of the 2014 International Conference on Contemporary Computing and Informatics, 2014.

Modern-day devices like smart-phones, tablets, televisions etc. possess very powerful processors and huge storage capacities compared to what were available a few years ago. Most of these devices are also connected to the Internet. However, the full capabilities of these devices are not fully harnessed and thus, they are not as intelligent as they could be. These devices, together with the Internet, can be used as “Internet of Things” where each device can be both producer and consumer of information. This framework is realizable in a real dynamic system if there is an intelligent distributed layer above it which can cater to services of all heterogeneous devices as required. The existing solutions to this problem are either too hardware dependent, or too abstract. In this paper we present a concept of this layer using mobile agents which makes the system flexible and dynamically adaptable. This layer has been deployed using a publicly available Prolog-based mobile agent emulator (however, any other mobile agent framework can also be used). The proposed approach is capable of updating information like availability and usability of services dynamically. It also has speech processing modules to provide solutions using voice-based commands and prompts. The prototype is scalable and robust to partial network failures. The implementation details and performance analysis of this work are reported and discussed. This framework can be used to deploy systems which can enable people to search for services like health facilities, food services, transportation, law and order using a common interface including voice commands.
@inproceedings{verma2014improving,
author    = {Verma, Pulkit and Gupta, Mayank and Bhattacharya, Tuhin and Das, Pradip K},
title     = {Improving Services Using Mobile Agents-based {IoT} in a Smart City},
booktitle = {2014 International Conference on Contemporary Computing and Informatics},
pages     = {107–111},
year      = {2014},
publisher = {IEEE},
doi       = {10.1109/IC3I.2014.7019766},
url       = {https://doi.org/10.1109/IC3I.2014.7019766},
}