1. GenPlan
    Learning Causal Models of Autonomous Agents using Interventions.
    Pulkit Verma, and Siddharth Srivastava.

    In IJCAI 2021 Workshop on Generalization in Planning, 2021.

    One of the several obstacles in the widespread use of AI systems is the lack of requirements of interpretability that can enable a layperson to ensure the safe and reliable behavior of such systems. We extend the analysis of an agent assessment module that lets an AI system execute high-level instruction sequences in simulators and answer the user queries about its execution of sequences of actions. We show that such a primitive query-response capability is sufficient to efficiently derive a user-interpretable causal model of the system in stationary, fully observable, and deterministic settings. We also introduce dynamic causal decision networks (DCDNs) that capture the causal structure of STRIPS-like domains. A comparative analysis of different classes of queries is also presented in terms of the computational requirements needed to answer them and the efforts required to evaluate their responses to learn the correct model.
      author    = {Verma, Pulkit and Srivastava, Siddharth},
      title     = {Learning Causal Models of Autonomous Agents using Interventions},
      booktitle = {IJCAI 2021 Workshop on Generalization in Planning},
      year      = {2021},
  2. KEPS
    Learning User-Interpretable Descriptions of Black-Box AI System Capabilities.

    In ICAPS 2021 Workshop on Knowledge Engineering for Planning and Scheduling, 2021.

    Several approaches have been developed to answer specific questions that a user may have about an AI system that can plan and act. However, the problems of identifying which questions to ask and that of computing a user-interpretable symbolic description of the overall capabilities of the system have remained largely unaddressed. This paper presents an approach for addressing these problems by learning user-interpretable symbolic descriptions of the limits and capabilities of a black-box AI system using low-level simulators. It uses a hierarchical active querying paradigm to generate questions and to learn a user-interpretable model of the AI system based on its responses. In contrast to prior work, we consider settings where imprecision of the user’s conceptual vocabulary precludes a direct expression of the agent’s capabilities. Furthermore, our approach does not require assumptions about the internal design of the target AI system or about the methods that it may use to compute or learn task solutions. Empirical evaluation on several game-based simulator domains shows that this approach can efficiently learn symbolic models of AI systems that use a deterministic black-box policy in fully observable scenarios.
      author    = {Verma, Pulkit and Marpally, Shashank Rao and Srivastava, Siddharth},
      title     = {Learning User-Interpretable Descriptions of Black-Box AI System Capabilities},
      booktitle = {ICAPS 2021 Workshop on Knowledge Engineering for Planning and Scheduling},
      year      = {2021},
  3. AAAI
    Asking the Right Questions: Learning Interpretable Action Models Through Query Answering.

    In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021.

    This paper develops a new approach for estimating an interpretable, relational model of a black-box autonomous agent that can plan and act. Our main contributions are a new paradigm for estimating such models using a minimal query interface with the agent, and a hierarchical querying algorithm that generates an interrogation policy for estimating the agent’s internal model in a vocabulary provided by the user. Empirical evaluation of our approach shows that despite the intractable search space of possible agent models, our approach allows correct and scalable estimation of interpretable agent models for a wide class of black-box autonomous agents. Our results also show that this approach can use predicate classifiers to learn interpretable models of planning agents that represent states as images.
      author        = {Verma, Pulkit and Marpally, Shashank Rao and Srivastava, Siddharth},
      title         = {Asking the Right Questions: Learning Interpretable Action Models Through Query Answering},
      year          = {2021},
      booktitle     = {Proceedings of the AAAI Conference on Artificial Intelligence},
    Older Version(s):

    Asking the Right Questions: Active Action-Model Learning.
    Pulkit Verma, Shashank Rao Marpally, and Siddharth Srivastava.
    In AAAI 2021 Workshop on Explainable Agency in Artificial Intelligence, 2021.

    Learning Interpretable Models for Black-Box Agents.
    Pulkit Verma, and Siddharth Srivastava.
    In ICML 2020 Workshop on Human in the Loop Learning, 2020.

    Learning Generalized Models by Interrogating Black-Box Autonomous Agents.
    Pulkit Verma, and Siddharth Srivastava.
    In AAAI 2020 Workshop on Generalization in Planning, 2020.


  1. ICSC
    A Comparative Study of Resource Usage for Speaker Recognition Techniques.
    Pulkit Verma, and Pradip K Das.

    In Proceedings of the 2016 International Conference on Signal Processing and Communication, 2016.

    Resource usage of a software is an important factor to be taken into consideration while developing speaker recognition applications for mobile devices. Sometimes usage parameters are considered as important as accuracy of such systems. In this work, we analyze resource utilization in terms of power consumption, memory and space requirements of three standard speaker recognition techniques, viz. GMM-UBM framework, Joint Factor Analysis and i-vectors. Experiments are performed on the MIT MDSVC corpus using the Energy Measurement Library (EML). It is found that though i-vector approach requires more storage space, it is superior to the other two approaches in terms of memory and power consumption, which are critical factors for evaluating software performance in resource constrained mobile devices.
      author    = {Verma, Pulkit and Das, Pradip K},
      title     = {A Comparative Study of Resource Usage for Speaker Recognition Techniques},
      booktitle = {Proceedings of the 2016 International Conference on Signal Processing and Communication},
      pages     = {314–319},
      year      = {2016},
      publisher = {IEEE},
      doi       = {10.1109/ICSPCom.2016.7980598},
      url       = {},


  1. IJST
    i-Vectors in Speech Processing Applications: A Survey.
    Pulkit Verma, and Pradip K. Das.

    In International Journal of Speech Technology, 2015.

    In the domain of speech recognition many methods have been proposed over time like Gaussian mixture models (GMM), GMM with universal background model (GMM-UBM framework), joint factor analysis, etc. i-Vector subspace modeling is one of the recent methods that has become the state of the art technique in this domain. This method largely provides the benefit of modeling both the intra-domain and inter-domain variabilities into the same low dimensional space. In this survey, we present a comprehensive collection of research work related to i-vectors since its inception. Some recent trends of using i-vectors in combination with other approaches are also discussed. The application of i-vectors in various fields of speech recognition, viz speaker, language, accent recognition, etc. is also presented. This paper should serve as a good starting point for anyone interested in working with i-vectors for speech processing in general. We then conclude the paper with a brief discussion on the future of i-vectors.
      author    = {Verma, Pulkit and Das, Pradip K},
      title     = {i-{Vectors} in Speech Processing Applications: {A Survey}},
      journal   = {International Journal of Speech Technology},
      year      = {2015},
      volume    = {18},
      number    = {4},
      pages     = {529–546},
      publisher = {Springer Nature},
      doi       = {10.1007/s10772-015-9295-3},
      url       = {},
  2. UIST
    Investigating the “Wisdom of Crowds” at Scale.
    Alok Shankar Mysore, Vikas S. Yaligar, Imanol Arrieta Ibarra, Camelia Simoiu, Sharad Goel, Ramesh Arvind, Chiraag Sumanth, Arvind Srikantan, Bhargav HS, Mayank Pahadia, Tushar Dobha, Atif Ahmed, Mani Shankar, Himani Agarwal, Rajat Agarwal, Sai Anirudh-Kondaveeti, Shashank Arun-Gokhale, Aayush Attri, Arpita Chandra, Yogitha Chilukur, Sharath Dharmaji, Deepak Garg, Naman Gupta, Paras Gupta, Glincy Mary Jacob, Siddharth Jain, Shashank Joshi, Tarun Khajuria, Sameeksha Khillan, Sandeep Konam, Praveen Kumar-Kolla, Sahil Loomba, Rachit Madan, Akshansh Maharaja, Vidit Mathur, Bharat Munshi, Mohammed Nawazish, Venkata Neehar-Kurukunda, Venkat Nirmal-Gavarraju, Sonali Parashar, Harsh Parikh, Avinash Paritala, Amit Patil, Rahul Phatak, Mandar Pradhan, Abhilasha Ravichander, Krishna Sangeeth, Sreecharan Sankaranarayanan, Vibhor Sehgal, Ashrith Sheshan, Suprajha Shibiraj, Aditya Singh, Anjali Singh, Prashant Sinha, Pushkin Soni, Bipin Thomas, Kasyap Varma-Dattada, Sukanya Venkataraman, Pulkit Verma, and Ishan Yelurwar

    In Adjunct Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, 2015.

    In a variety of problem domains, it has been observed that the aggregate opinions of groups are often more accurate than those of the constituent individuals, a phenomenon that has been termed the "wisdom of the crowd." Yet, perhaps surprisingly, there is still little consensus on how generally the phenomenon holds, how best to aggregate crowd judgements, and how social influence affects estimates. We investigate these questions by taking a meta wisdom of crowds approach. With a distributed team of over 100 student researchers across 17 institutions in the United States and India, we develop a large-scale online experiment to systematically study the wisdom of crowds effect for 1,000 different tasks in 50 subject domains. These tasks involve various types of knowledge (e.g., explicit knowledge, tacit knowledge, and prediction), question formats (e.g., multiple choice and point estimation), and inputs (e.g., text, audio, and video). To examine the effect of social influence, participants are randomly assigned to one of three different experiment conditions in which they see varying degrees of information on the responses of others. In this ongoing project, we are now preparing to recruit participants via Amazon’s Mechanical Turk.
      author    = {Shankar Mysore, Alok and Yaligar, Vikas S. and Arrieta Ibarra, Imanol and 
      Simoiu, Camelia and Goel, Sharad and Arvind, Ramesh and Sumanth, Chiraag and Srikantan, Arvind 
      and HS, Bhargav and Pahadia, Mayank and Dobha, Tushar and Ahmed, Atif and Shankar, Mani and 
      Agarwal, Himani and Agarwal, Rajat and Anirudh-Kondaveeti, Sai and Arun-Gokhale, Shashank and 
      Attri, Aayush and Chandra, Arpita and Chilukur, Yogitha and Dharmaji, Sharath and Garg, Deepak 
      and Gupta, Naman and Gupta, Paras and Jacob, Glincy Mary and Jain, Siddharth and Joshi, 
      Shashank and Khajuria, Tarun and Khillan, Sameeksha and Konam, Sandeep and Kumar-Kolla, Praveen 
      and Loomba, Sahil and Madan, Rachit and Maharaja, Akshansh and Mathur, Vidit and Munshi, Bharat 
      and Nawazish, Mohammed and Neehar-Kurukunda, Venkata and Nirmal-Gavarraju, Venkat and     
      Parashar, Sonali and Parikh, Harsh and Paritala, Avinash and Patil, Amit and Phatak, Rahul and 
      Pradhan, Mandar and Ravichander, Abhilasha and Sangeeth, Krishna and 
      Sankaranarayanan, Sreecharan and Sehgal, Vibhor and Sheshan, Ashrith and Shibiraj, Suprajha and 
      Singh, Aditya and Singh, Anjali and Sinha, Prashant and Soni, Pushkin and Thomas, Bipin and 
      Varma-Dattada, Kasyap and Venkataraman, Sukanya and Verma, Pulkit and Yelurwar, Ishan},
      title     = {Investigating the "{Wisdom of Crowds}" at Scale},
      booktitle = {Adjunct Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology},
      pages     = {75–76},
      year      = {2015},
      isbn      = {9781450337809},
      publisher = {Association for Computing Machinery},
      doi       = {10.1145/2815585.2815725},
      url       = {},
  3. AIR
    A Mobile Agents based Distributed Speech Recognition Engine for Controlling Multiple Robots.
    Mayank Gupta, Pulkit Verma, Tuhin Bhattacharya, and Pradip K Das.

    In Proceedings of the 2015 Conference on Advances In Robotics, 2015.

    Interaction with a robot has been an active area of research since the inception of robotics. Talking to a robot has always been considered the most natural way to communicate with it. But it is not always possible to have a full-fledged, standalone speech processing engine to be present on a robot or on a single machine. A dedicated system to convert the commands from audio to text is needed. However, as the number of commands and robots increases, it becomes necessary to eliminate all the single-point failure points in the system. Thus, distributed speech engine comes into picture. Also users may want to talk to the robot in different languages. The approach proposed in this paper is distributed, fault tolerant and scalable, such that any new recognition algorithm or language support can be added and used without any changes to the existing system. The work has been demonstrated on a freely available mobile agents based Internet of Things platform. However, any platform can be used.
      author    = {Gupta, Mayank and Verma, Pulkit and Bhattacharya, Tuhin and Das, Pradip K,
      title     = {A Mobile Agents based Distributed Speech Recognition Engine for Controlling Multiple Robots},
      booktitle = {Proceedings of the 2015 Conference on Advances In Robotics},
      pages     = {1–6},
      year      = {2015},
      isbn      = {9781450333566},
      publisher = {Association for Computing Machinery},
      doi       = {10.1145/2783449.2783477},
      url       = {},


  1. IC3I
    Improving Services Using Mobile Agents-based IoT in a Smart City.
    Pulkit Verma, Mayank Gupta, Tuhin Bhattacharya, and Pradip K Das.

    In Proceedings of the 2014 International Conference on Contemporary Computing and Informatics, 2014.

    Modern-day devices like smart-phones, tablets, televisions etc. possess very powerful processors and huge storage capacities compared to what were available a few years ago. Most of these devices are also connected to the Internet. However, the full capabilities of these devices are not fully harnessed and thus, they are not as intelligent as they could be. These devices, together with the Internet, can be used as “Internet of Things” where each device can be both producer and consumer of information. This framework is realizable in a real dynamic system if there is an intelligent distributed layer above it which can cater to services of all heterogeneous devices as required. The existing solutions to this problem are either too hardware dependent, or too abstract. In this paper we present a concept of this layer using mobile agents which makes the system flexible and dynamically adaptable. This layer has been deployed using a publicly available Prolog-based mobile agent emulator (however, any other mobile agent framework can also be used). The proposed approach is capable of updating information like availability and usability of services dynamically. It also has speech processing modules to provide solutions using voice-based commands and prompts. The prototype is scalable and robust to partial network failures. The implementation details and performance analysis of this work are reported and discussed. This framework can be used to deploy systems which can enable people to search for services like health facilities, food services, transportation, law and order using a common interface including voice commands.
      author    = {Verma, Pulkit and Gupta, Mayank and Bhattacharya, Tuhin and Das, Pradip K},
      title     = {Improving Services Using Mobile Agents-based {IoT} in a Smart City},
      booktitle = {2014 International Conference on Contemporary Computing and Informatics},
      pages     = {107–111},
      year      = {2014},
      publisher = {IEEE},
      doi       = {10.1109/IC3I.2014.7019766},
      url       = {},