Publications | Pulkit Verma

2025

AIA
Safety Beyond Verification: The Need for Continual, User-Driven Assessment of AI Systems

Siddharth Srivastava, Georgios Fainekos, Pulkit Verma, and Daniel R. Bramblett

In IJCAI 2025 Workshop on User-Aligned Assessment of Adaptive AI Systems, 2025 (To appear)

Abstract BibTeX Publisher

How should we assess the safety and functionality of taskable AI systems that are designed to continually learn and solve user-desired tasks in user-specific environments? From household robotics to digital assistants that can make potentially dangerous changes to their operational environments, this question is central to realizing the promise of AI.

We investigate why answering this question requires more than an extrapolation of existing paradigms for verification and validation, and identify concrete desiderata and promising directions for research on formal assessment of AI systems.
@inproceedings{srivastava2025safety, author = {Srivastava, Siddharth and Fainekos, Georgios and Verma, Pulkit and Bramblett, Daniel R.}, title = {Safety Beyond Verification: {The} Need for Continual, User-Driven Assessment of {AI} Systems}, booktitle = {IJCAI 2025 Workshop on User-Aligned Assessment of Adaptive AI Systems}, year = {2025}, }
X-HRI
Interpretability Analysis of Symbolic Representations for Sequential Decision-Making Systems

Pulkit Verma , and Julie A. Shah

In HRI 2025 Workshop on Explainability for Human-Robot Collaboration: Real-World Concerns, 2025

Abstract BibTeX Publisher PDF Video Slides

Interpretability in sequential decision-making (SDM) systems is critical for ensuring trust and transparency in human-robot collaboration scenarios. As robots increasingly work alongside humans in manufacturing, healthcare, and service environments, their decision-making processes must be understandable to their human collaborators. While significant progress has been made in interpretability for single-step decision-making systems, there remains a lack of consolidated research on interpretability techniques for SDM systems. This work analyzes various symbolic representations, evaluating their interpretability and applicability for effective human-robot teaming. We introduce a framework for analyzing these representations along key dimensions including interpretability, temporal expressiveness, and human-robot interaction capabilities. By synthesizing existing work and highlighting open challenges, this work guides researchers in selecting and designing interpretable symbolic representations that enhance trust in human-robot collaborative tasks.
@inproceedings{verma2025interpretability, author = {Verma, Pulkit and Shah, Julie A.}, title = {Interpretability Analysis of Symbolic Representations for Sequential Decision-Making Systems}, year = {2025}, booktitle = {HRI 2025 Workshop on Explainability for Human-Robot Collaboration: Real-World Concerns}, }
AAAI Symposium
Developing Shared Mental Models for Human-AI Collaboration in Autonomous Cyber-Physical System Operations

Pulkit Verma, Samir Wadhwania, Josh Rountree, Anthony Favier , and Julie A. Shah

In AAAI 2025 Spring Symposium on Current and Future Varieties of Human-AI Collaboration, 2025

Abstract BibTeX Publisher

This work explores the development of shared mental models and refined representations to enhance human-AI collaboration in complex, uncertain environments, particularly in autonomous cyber-physical system operations. By enabling iterative refinement of abstractions, we aim to calibrate trust between human operators and AI systems, ensuring users can effectively assess system performance and limitations. Simultaneously, these refined representations improve AI systems’ ability to model human behavior, interpret intent, and adapt to dynamic contexts. The goal is to foster seamless collaboration, where humans and AI systems leverage their complementary strengths to achieve shared objectives in real-time, high-stakes scenarios.
@inproceedings{verma2025developing, author = {Verma, Pulkit and Wadhwania, Samir and Rountree, Josh and Favier, Anthony and Shah, Julie A.}, title = {Developing Shared Mental Models for Human-AI Collaboration in Autonomous Cyber-Physical System Operations}, year = {2025}, booktitle = {AAAI 2025 Spring Symposium on Current and Future Varieties of Human-AI Collaboration}, }
AAAI Symposium
Leveraging LLMs for Collaborative Human-AI Decision Making

Anthony Favier, Pulkit Verma, Ngoc La , and Julie A. Shah

In Proceedings of the AAAI 2025 Spring Symposium on Current and Future Varieties of Human-AI Collaboration, 2025

Abstract BibTeX Publisher PDF Slides

Human-AI collaboration is a rapidly evolving field that seeks to leverage the complementary strengths of humans and artificial intelligence (AI) to solve complex problems. An area where such collaboration holds significant promise is in decision-making tasks, particularly in automated planning. As classical symbolic approaches are widely used in this field, they are limited when solving large and complex problems. Furthermore, they require expert knowledge in formal and structured languages to interact with, hindering their use. Recently, Large Language Models (LLMs) have emerged as a potential solution to these challenges but LLMs alone are not sufficient for solving such problems. However, a promising way to achieve seamless human-AI collaboration could be with hybrid approaches combining the strength of symbolic reasoning and the flexibility of LLMs.
@inproceedings{verma2025developing, author = {Favier, Anthony and Verma, Pulkit and La, Ngoc and Shah, Julie A.}, title = {Leveraging LLMs for Collaborative Human-AI Decision Making}, year = {2025}, booktitle = {Proceedings of the AAAI 2025 Spring Symposium on Current and Future Varieties of Human-AI Collaboration}, }
EAAI
Using Explainable AI and Hierarchical Planning for Outreach with Robots

Rushang Karia^*, Jayesh Nagpal^*, Daksh Dobhal^*, Pulkit Verma, Rashmeet Kaur Nayyar, Naman Shah, and Siddharth Srivastava

In Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence (EAAI Symposium Track), 2025

Abstract BibTeX arXiv Publisher PDF Video Slides

Understanding how robots plan and execute tasks is crucial in today’s world, where they are becoming more prevalent in our daily lives. However, teaching non-experts, such as K-12 students, the complexities of robot planning can be challenging. This work presents an open-source platform, JEDAI.Ed, that simplifies the process using a visual interface that abstracts the details of various planning processes that robots use for performing complex mobile manipulation tasks. Using principles developed in the field of explainable AI, this intuitive platform enables students to use a high-level intuitive instruction set to perform complex tasks, visualize them on an in-built simulator, and to obtain helpful hints and natural language explanations for errors. Finally, JEDAI.Ed includes an adaptive curriculum generation method that provides students with customized learning ramps. This platform’s efficacy was tested through a user study with university students who had little to no computer science background. Our results show that JEDAI.Ed is highly effective in increasing student engagement, teaching robotics programming, and decreasing the time need to solve tasks as compared to baselines.
@inproceedings{karia2025using, author = {Karia, Rushang and Nagpal, Jayesh and Dobhal, Daksh and Verma, Pulkit and Nayyar, Rashmeet Kaur and Shah, Naman and Srivastava, Siddharth}, title = {Using Explainable AI and Hierarchical Planning for Outreach with Robots}, year = {2025}, booktitle = {Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence (EAAI Symposium Track)}, }
^*Equal Contribution.
PRL
AI Planning: A Primer and Survey (Preliminary Report)

Dillon Z. Chen, Pulkit Verma, Siddharth Srivastava, Michael Katz, and Sylvie Thiébaux

In AAAI 2025 Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning, 2025

Abstract BibTeX arXiv PDF Video Slides

Automated decision-making is a fundamental topic that spans multiple sub-disciplines in AI: reinforcement learning (RL), AI planning (AP), foundation models, and operations research, among others. Despite recent efforts to "bridge the gaps" between these communities, there remain many insights that have not yet transcended the boundaries. Our goal in this paper is to provide a brief and non-exhaustive primer on ideas well-known in AP, but less so in other sub-disciplines. We do so by introducing the classical AP problem and representation, and extensions that handle uncertainty and time through the Markov Decision Process formalism. Next, we survey state-of-the-art techniques and ideas for solving AP problems, focusing on their ability to exploit problem structure. Lastly, we cover subfields within AP for learning structure from unstructured inputs and learning to generalise to unseen scenarios and situations.
@inproceedings{chen2025aiplanning, author = {Chen, Dillon Z. and Verma, Pulkit and Srivastava, Siddharth and Katz, Michael and Thiébaux, Sylvie}, title = {AI Planning: A Primer and Survey (Preliminary Report)}, booktitle = {AAAI 2025 Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning}, year = {2025}, }

EAAI

Bridging the Language Divide: Generative AI’s Promise for Global Education

Pulkit Verma

In Fifteenth Symposium on Educational Advances in Artificial Intelligence (Blue Sky Ideas), 2025

AAAI/ACM SIGAI Innovative AI Education Award 2025 BibTeX Publisher PDF Slides

🏆 Winner of the AAAI/ACM SIGAI Innovative AI Education Award 2025.

@inproceedings{verma2025bridging,
  author        = {Verma, Pulkit},
  title         = {Bridging the Language Divide: Generative AI’s Promise for Global Education},
  year          = {2025},
  booktitle     = {Fifteenth Symposium on Educational Advances in Artificial Intelligence (Blue Sky Ideas)},
}

2024

Preprint
∀uto∃val: Autonomous Assessment of LLMs in Formal Synthesis and Interpretation Tasks

Rushang Karia^*, Daniel R. Bramblett^*, Daksh Dobhal, Pulkit Verma, and Siddharth Srivastava

ArXiv 2403.18327 , 2024

Abstract BibTeX arXiv PDF Code

This paper presents ∀uto∃val, a new approach for scaling LLM assessment in translating formal syntax – such as first-order logic, regular expressions, etc – to natural language (interpretation) or vice versa (compilation), thereby facilitating their use in applications such as generating/explaining logic and control flow for programs etc. Existing approaches for LLM assessment in these areas require labor-intensive ground-truth creation, the availability of which undermines the separation of training and test sets. Furthermore, such datasets typically include relatively few hand-coded test cases over which LLM accuracy is determined, thus making them inadequate for determining the safety or correctness of their generated outputs. We introduce a new approach that utilizes context-free grammars (CFGs) to generate out-of-distribution datasets on the fly and perform closed-loop testing of LLM capabilities using formal verifiers to guarantee the correctness of LLM outputs without any human intervention. We release our dataset and benchmark as open-source code at https://github.com/AAIR-lab/auto-llm-assessment. We also conduct an assessment of several SOTA closed and open-source LLMs to showcase the feasibility and scalability of this paradigm. Our experiments reveal that SOTA LLMs are unable to solve the formal translation task adequately.
@misc{karia2024can, author = {Karia, Rushang and Dobhal, Daksh and Bramblett, Daniel R. and Verma, Pulkit and Srivastava, Siddharth}, title = {∀uto∃val: Autonomous Assessment of LLMs in Formal Synthesis and Interpretation Tasks}, year = {2024}, eprint = {2403.18327}, archivePrefix = {arXiv}, }
^*Equal Contribution.

Older Version(s):
Can LLMs translate SATisfactorily? Assessing LLMs in Generating and Interpreting Formal Specifications
Rushang Karia, Daksh Dobhal, Daniel R. Bramblett, Pulkit Verma, and Siddharth Srivastava.
In AAAI 2024 Spring Symposium on User-Aligned Assessment of Adaptive AI Systems, 2024
Publisher PDF Poster Slides
Preprint
From Reals to Logic and Back: Inventing Symbolic Vocabularies, Actions, and Models for Planning from Raw Data

Naman Shah, Jayesh Nagpal, Pulkit Verma, and Siddharth Srivastava

ArXiv 2402.11871 , 2024

Abstract BibTeX arXiv PDF

Hand-crafted, logic-based state and action representations have been widely used to overcome the intractable computational complexity of long-horizon robot planning problems, including task and motion planning problems. However, creating such representations requires experts with strong intuitions and detailed knowledge about the robot and the tasks it may need to accomplish in a given setting. Removing this dependency on human intuition is a highly active research area.
This paper presents the first approach for autonomously learning generalizable, logic-based relational representations for abstract states and actions starting from unannotated high-dimensional, real-valued robot trajectories. The learned representations constitute auto-invented PDDL-like domain models. Empirical results in deterministic settings show that powerful abstract representations can be learned from just a handful of robot trajectories; the learned relational representations include but go beyond classical, intuitive notions of high-level actions; and that the learned models allow planning algorithms to scale to tasks that were previously beyond the scope of planning without hand-crafted abstractions.
@misc{shah2024from, author = {Shah, Naman and Nagpal, Jayesh and Verma, Pulkit and Srivastava, Siddharth}, title = {From Reals to Logic and Back: Inventing Symbolic Vocabularies, Actions, and Models for Planning from Raw Data}, year = {2024}, eprint = {2402.11871}, archivePrefix = {arXiv}, }
ICAPS
Epistemic Exploration for Generalizable Planning and Learning in Non-Stationary Settings

Rushang Karia^*, Pulkit Verma^*, Alberto Speranzon, and Siddharth Srivastava

In Proceedings of the Thirty-Fourth International Conference on Automated Planning and Scheduling, 2024

Abstract BibTeX arXiv Publisher PDF Video Slides

Learning interpretable generalizable models of sequential decision-making agents is essential for user-driven assessment as well as for continual agent-design processes in several AI applications. Discovering an agent’s broad capabilities in terms of concepts a user understands and summarizing them for a user is a comparatively new solution approach for agent assessment. Prior work on this topic focuses on deterministic settings, or settings where the name of agent’s capabilities are already known, or situations where the learning system has access to only passively collected data regarding the agent’s behavior. These settings result in a limited scope and/or accuracy of the learned models. This paper presents an approach for discovering a black-box sequential decision making agent’s capabilities and interactively learning an interpretable model of the agent in stochastic settings. Our approach uses an initial set of observations to discover the agent’s capabilities and a hierarchical querying process to learn a probability distribution of the discovered stochastic capabilities. Our evaluation demonstrates that our method learns lifted SDM models with complex capabilities accurately.
@inproceedings{karia2024epistemic, author = {Karia, Rushang and Verma, Pulkit and Vipat, Gaurav and Srivastava, Siddharth}, title = {Epistemic Exploration for Generalizable Planning and Learning in Non-Stationary Settings}, booktitle = {Proceedings of the Thirty-Fourth International Conference on Automated Planning and Scheduling}, year = {2024}, }
^*Equal Contribution.

Older Version(s):
Epistemic Exploration for Generalizable Planning and Learning in Non-Stationary Stochastic Settings
Rushang Karia*, Pulkit Verma*, Gaurav Vipat, and Siddharth Srivastava.
In NeurIPS 2023 Workshop on Generalization in Planning, 2023
Publisher PDF Poster Slides Video
Thesis
Data-Efficient Paradigms for Personalized Assessment of Taskable AI Systems

Pulkit Verma

PhD Thesis, School of Computing and Augmented Intelligence, Arizona State University, 2024

Abstract BibTeX Publisher PDF Video Slides

Recent advances in Artificial Intelligence (AI) have brought AI closer to laypeople than ever before. This leads to a pervasive problem: how would a user ascertain whether an AI system will be safe, reliable, or useful in a given situation? This problem becomes particularly challenging when it is considered that most autonomous systems are not designed by their users; the internal software of these systems may be unavailable or difficult to understand; and the functionality of these systems may even change from initial specifications as a result of learning. To overcome these challenges, this dissertation proposes a paradigm for third-party autonomous assessment of black-box taskable AI systems. The four main desiderata of such assessment systems are: (i) interpretability: generating a description of the AI system’s functionality in a language that the target user can understand; (ii) correctness: ensuring that the description of AI system’s working is accurate; (iii) generalizability creating a solution approach that works well for different types of AI systems; and (iv) minimal requirements: creating an assessment system that does not place complex requirements on AI systems to support the third-party assessment, otherwise the manufacturers of AI system’s might not support such an assessment.
To satisfy these properties, this dissertation presents algorithms and requirements that would enable user-aligned autonomous assessment that helps the user understand the limits of a black-box AI system’s safe operability. This dissertation proposes a personalized AI assessment module that discovers the high-level “capabilities” of an AI system with arbitrary internal planning algorithms/policies and learns an accurate symbolic description of these capabilities in terms of concepts that a user understands. Furthermore, the dissertation includes the associated theoretical results and the empirical evaluations. The results show that (i) a primitive query-response interface can enable the development of autonomous assessment modules that can derive a causally accurate user-interpretable model of the system’s capabilities efficiently; and (ii) such descriptions are easier to understand and reason with for the users than the agent’s primitive actions.
@phdthesis{verma2024data, author = {Verma, Pulkit}, title = {Data-Efficient Paradigms for Personalized Assessment of Taskable AI Systems}, school = {Arizona State University}, address = {Tempe, AZ, USA}, type = {PhD Thesis}, year = {2024}, }
Tech Report
Learning Causally Accurate Models for Autonomous Assessment of Deterministic Black-Box Agents

Pulkit Verma, and Siddharth Srivastava

Technical Report TR-ASUSCAI-2024-001, School of Computing and Augmented Intelligence, Arizona State University , 2024

Abstract BibTeX PDF

This paper develops a new approach for estimating an interpretable, relational, and causally accurate model of a black-box autonomous agent that can plan and act in fully observable deterministic settings. Our main contributions are a new paradigm for estimating such models using a rudimentary query-response interface with the agent and a hierarchical querying algorithm that generates an interrogation policy. We also introduce dynamic causal decision networks (DCDNs) that capture the causal structure of planning models expressed in STRIPS-like languages. We show that the models we learn can be represented in the form of these DCDNs, and are causally accurate. Empirical evaluation of our approach shows that despite the exponential number of possible agent models in terms of the number of predicates and agent capabilities, our approach results in the correct and scalable estimation of interpretable agent models for a wide class of black-box autonomous agents. Our results also show that this approach can use predicate classifiers to learn interpretable models of planning agents that represent states as images.
@techreport{verma2024learning, author = {Verma, Pulkit and Srivastava, Siddharth}, title = {Learning Causally Accurate Models for Autonomous Assessment of Deterministic Black-Box Agents}, year = {2024}, number = {TR-ASUSCAI-2024-001}, department = {School of Computing and Augmented Intelligence}, institute = {Arizona State University}, }
AI Magazine
Reports of the Association for the Advancement of Artificial Intelligence’s 2024 Spring Symposium Series

Jessica Coates, Mononito Goswami, Takashi Kido, William Lawless, Xinyu Li, Christopher J. MacLellan, Andreas Martin, Siddharth Srivastava, Reinhard Stolle, Keiki Takadama, Pulkit Verma, Jie Yang, and Melo-Jean Yap

Interactive AI Magazine , May 2024

Abstract BibTeX Publisher PDF

The Association for the Advancement of Artificial Intelligence’s 2024 Spring Symposium Series was held at Stanford University in Stanford, California, March 25-27, 2024. There were eight symposia in the spring program: Bi-directionality in Human-AI Collaborative Systems, Clinical Foundation Models Symposium, Empowering Machine Learning and Large Language Models with Domain and Commonsense Knowledge (AAAI-MAKE 2024), Federated Learning on the Edge, Impact of GenAI on Social and Individual Well-being, Increasing Diversity in AI Education and Research, Symposium on Human-Like Learning, User-Aligned Assessment of Adaptive AI Systems. This report contains summaries of the workshops, which were submitted by some, but not all, of the workshop chairs.
@article{Coates2024Reports, author = {Coates, Jessica and Goswami, Mononito and Kido, Takashi and Lawless, William and Li, Xinyu and MacLellan, Christopher J. and Martin, Andreas and Srivastava, Siddharth and Stolle, Reinhard and Takadama, Keiki and Verma, Pulkit and Yang, Jie and Yap, Melo-Jean}, title = {Reports of the Association for the Advancement of Artificial Intelligence's 2024 Spring Symposium Series}, journal = {Interactive AI Magazine}, url = {https://interactiveaimag.org/updates/reports/symposium-reports/reports-of-the-association-for-the-advancement-of-artificial-intelligences-2024-spring-symposium-series/}, year = {2024}, month = may }
AAAI Symposium
User-Aligned Autonomous Capability Assessment of Black-Box AI Systems

Pulkit Verma, and Siddharth Srivastava

In AAAI 2024 Spring Symposium on User-Aligned Assessment of Adaptive AI Systems, 2024

Abstract BibTeX Publisher PDF Poster Slides

The vast diversity of internal designs of black-box AI systems and their nuanced zones of safe functionality make it difficult for a layperson to use them without unintended side effects. This work focuses on developing paradigms that enable a user to assess and understand the limits of an AI system’s safe operability. We develop a personalized AI assessment module that lets an AI system execute instruction sequences in simulators and answer queries about these executions. Our results show that such a primitive query-response interface is sufficient to efficiently derive a user-interpretable model of a system’s capabilities.
@inproceedings{verma2024user, author = {Verma, Pulkit and Srivastava, Siddharth}, title = {User-Aligned Autonomous Capability Assessment of Black-Box {AI} Systems}, booktitle = {AAAI 2024 Spring Symposium on User-Aligned Assessment of Adaptive AI Systems}, year = {2024}, }

2023

NeurIPS
Autonomous Capability Assessment of Sequential Decision-Making Systems in Stochastic Settings

Pulkit Verma, Rushang Karia, and Siddharth Srivastava

In Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems, 2023

Abstract BibTeX arXiv Publisher PDF Video Code Poster Slides

It is essential for users to understand what their AI systems can and can’t do in order to use them safely. However, the problem of enabling users to assess AI systems with evolving sequential decision making (SDM) capabilities is relatively understudied. This paper presents a new approach for modeling the capabilities of black-box AI systems that can plan and act, along with the possible effects and requirements for executing those capabilities in stochastic settings. We present an active-learning approach that can effectively interact with a black-box SDM system and learn an interpretable probabilistic model describing its capabilities. Theoretical analysis of the approach identifies the conditions under which the learning process is guaranteed to converge to the correct model of the agent; empirical evaluations on different agents and simulated scenarios show that this approach is few-shot generalizable and can effectively describe the capabilities of arbitrary black-box SDM agents in a sample-efficient manner.
@inproceedings{verma2023autonomous, author = {Verma, Pulkit and Karia, Rushang and Srivastava, Siddharth}, title = {Autonomous Capability Assessment of Sequential Decision-Making Systems in Stochastic Settings}, booktitle = {Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems}, year = {2023}, }
Older Version(s):
Autonomous Capability Assessment of Black-Box Sequential Decision-Making Systems
Pulkit Verma, Rushang Karia, and Siddharth Srivastava.
In ICAPS 2023 Workshop on Knowledge Engineering for Planning and Scheduling, 2023
Publisher PDF Slides Video
GenPlan
Learning AI-System Capabilities under Stochasticity

Pulkit Verma^*, Rushang Karia^*, Gaurav Vipat, Anmol Gupta, and Siddharth Srivastava

In NeurIPS 2023 Workshop on Generalization in Planning, 2023

Abstract BibTeX Publisher PDF Poster

Learning interpretable generalizable models of sequential decision-making agents is essential for user-driven assessment as well as for continual agent-design processes in several AI applications. Discovering an agent’s broad capabilities in terms of concepts a user understands and summarizing them for a user is a comparatively new solution approach for agent assessment. Prior work on this topic focuses on deterministic settings, or settings where the name of agent’s capabilities are already known, or situations where the learning system has access to only passively collected data regarding the agent’s behavior. These settings result in a limited scope and/or accuracy of the learned models. This paper presents an approach for discovering a black-box sequential decision making agent’s capabilities and interactively learning an interpretable model of the agent in stochastic settings. Our approach uses an initial set of observations to discover the agent’s capabilities and a hierarchical querying process to learn a probability distribution of the discovered stochastic capabilities. Our evaluation demonstrates that our method learns lifted SDM models with complex capabilities accurately.
@inproceedings{verma2023learning, author = {Verma, Pulkit and Karia, Rushang and Vipat, Gaurav and Gupta, Anmol and Srivastava, Siddharth}, title = {Learning AI-System Capabilities under Stochasticity}, booktitle = {NeurIPS 2023 Workshop on Generalization in Planning}, year = {2023}, }

2022

KR
Discovering User-Interpretable Capabilities of Black-Box Planning Agents

Pulkit Verma, Shashank Rao Marpally, and Siddharth Srivastava

In Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning, 2022

[Also appeared in AAAI 2022 Workshop on Explainable Agency in Artificial Intelligence, 2022]

Abstract BibTeX arXiv Publisher PDF Video Code Slides

Several approaches have been developed for answering users’ specific questions about AI behavior and for assessing their core functionality in terms of primitive executable actions. However, the problem of summarizing an AI agent’s broad capabilities for a user has received little research attention. This is aggravated by the fact that users may not know which questions to ask in order to understand the limits and capabilities of a system. This paper presents an algorithm for discovering from scratch the suite of high-level "capabilities" that an AI system with arbitrary internal planning algorithms/policies can perform. It computes conditions describing the applicability and effects of these capabilities in user-interpretable terms. Starting from a set of user-interpretable relational state properties, an AI agent, and a simulator that the agent can interact with, using arbitrary decision-making paradigms over primitive operations (unknown to the user), our algorithm returns a set of high-level capabilities with capability descriptions in the user’s relational vocabulary. Empirical evaluation on several game-based scenarios shows that this approach efficiently learns interpretable descriptions of various types of AI agents in deterministic, fully observable settings. User studies show that such interpretable descriptions are easier to understand and reason with than the agent’s primitive actions.
@inproceedings{verma2022discovering, author = {Verma, Pulkit and Marpally, Shashank Rao and Srivastava, Siddharth}, title = {Discovering User-Interpretable Capabilities of Black-Box Planning Agents}, booktitle = {Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning}, year = {2022}, }
Older Version(s):
Learning User-Interpretable Descriptions of Black-Box AI System Capabilities
Pulkit Verma, Shashank Rao Marpally, and Siddharth Srivastava.
In ICAPS 2021 Workshop on Knowledge Engineering for Planning and Scheduling, 2021
Publisher PDF Slides Video
AAMAS
JEDAI: A System for Skill-Aligned Explainable Robot Planning

Naman Shah^*, Pulkit Verma^*, Trevor Angle, and Siddharth Srivastava

In Proceedings of the Twenty-First International Conference on Autonomous Agents and MultiAgent Systems (Demonstration Track), 2022

[Also appeared in ICAPS 2022 Workshop on Explainable Artificial Intelligence Planning, 2022] (Video)

Best Demo Award Abstract BibTeX arXiv Publisher PDF Video Code

🏆 Winner of Best Demo Award at AAMAS 2022.

This paper presents JEDAI, an AI system designed for outreach and educational efforts aimed at non-AI experts. JEDAI features a novel synthesis of research ideas from integrated task and motion planning and explainable AI. JEDAI helps users create high-level, intuitive plans while ensuring that they will be executable by the robot. It also provides users customized explanations about errors and helps improve their understanding of AI planning as well as the limits and capabilities of the underlying robot system.
@misc{shah2022jedai, author = {Naman Shah and Pulkit Verma and Trevor Angle and Siddharth Srivastava}, title = { {JEDAI}: {A} System for Skill-Aligned Explainable Robot Planning}, booktitle = {Proceedings of the Twenty-First International Conference on Autonomous Agents and MultiAgent Systems}, year = {2022}, }
^*Equal Contribution.
AAAI
Differential Assessment of Black-Box AI Agents

Rashmeet Kaur Nayyar^*, Pulkit Verma^*, and Siddharth Srivastava

In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

[Also appeared in AAAI 2022 Workshop on Artificial Intelligence Safety, 2022] (Video)

Abstract BibTeX arXiv Publisher PDF Code Poster Slides

Much of the research on learning symbolic models of AI agents focuses on agents with stationary models. This assumption fails to hold in settings where the agent’s capabilities may change as a result of learning, adaptation, or other post-deployment modifications. Efficient assessment of agents in such settings is critical for learning the true capabilities of an AI system and for ensuring its safe usage. In this work, we propose a novel approach to differentially assess black-box AI agents that have drifted from their previously known models. As a starting point, we consider the fully observable and deterministic setting. We leverage sparse observations of the drifted agent’s current behavior and knowledge of its initial model to generate an active querying policy that selectively queries the agent and computes an updated model of its functionality. Empirical evaluation shows that our approach is much more efficient than re-learning the agent model from scratch. We also show that the cost of differential assessment using our method is proportional to the amount of drift in the agent’s functionality.
@inproceedings{nayyar2022differential, author = {Nayyar, Rashmeet Kaur and Verma, Pulkit and Srivastava, Siddharth}, title = {Differential Assessment of Black-Box AI Agents}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, year = {2022}, }
^*Equal Contribution.
EMNLP
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ Tasks

Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei^*, Anjana Arunkumar^*, Arjun Ashok^*, Arut Selvan Dhanasekaran^*, Atharva Naik^*, David Stap^*, Eshaan Pathak^*, Giannis Karamanolakis^*, Haizhi Gary Lai^* , Ishan Purohit^*, Ishani Mondal^*, Jacob Anderson^*, Kirby Kuznia^*, Krima Doshi^*, Maitreya Patel^*, Kuntal Kumar Pal^*, Mehrad Moradshahi^*, Mihir Parmar^*, Mirali Purohit^*, Neeraj Varshney^*, Phani Rohitha Kaza^*, Pulkit Verma^*, Ravsehaj Singh Puri^*, Rushang Karia^*, Shailaja Keyur Sampat^*, Savan Doshi^* , Siddharth Deepak Mishra^*, Sujan Reddy^*, Sumanta Patro^*, Tanay Dixit^*, Xudong Shen^*, Chitta Baral, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, and Daniel Khashabi

In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Abstract BibTeX arXiv Publisher PDF Code Poster

How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions. Our collection covers 76 distinct task types, including but not limited to classification, extraction, infilling, sequence tagging, text rewriting, and text composition. This large and diverse collection of tasks enables rigorous benchmarking of cross-task generalization under instructions-training models to follow instructions on a subset of tasks and evaluating them on the remaining unseen ones.
Furthermore, we build Tk-Instruct, a transformer model trained to follow a variety of in-context instructions (plain language task definitions or k-shot examples). Our experiments show that Tk-Instruct outperforms existing instruction-following models such as InstructGPT by over 9% on our benchmark despite being an order of magnitude smaller. We further analyze generalization as a function of various scaling parameters, such as the number of observed tasks, the number of instances per task, and model sizes. We hope our dataset and model facilitate future progress towards more general-purpose NLP models.
@inproceedings{Wang2022SuperNaturalInstructions, author = {Yizhong Wang and Swaroop Mishra and Pegah Alipoormolabashi and Yeganeh Kordi and Amirreza Mirzaei and Anjana Arunkumar and Arjun Ashok and Arut Selvan Dhanasekaran and Atharva Naik and David Stap and Eshaan Pathak and Giannis Karamanolakis and Haizhi Gary Lai and Ishan Purohit and Ishani Mondal and Jacob Anderson and Kirby Kuznia and Krima Doshi and Maitreya Patel and Kuntal Kumar Pal and Mehrad Moradshahi and Mihir Parmar and Mirali Purohit and Neeraj Varshney and Phani Rohitha Kaza and Pulkit Verma and Ravsehaj Singh Puri and Rushang Karia and Shailaja Keyur Sampat and Savan Doshi and Siddharth Deepak Mishra and Sujan Reddy and Sumanta Patro and Tanay Dixit and Xudong Shen and Chitta Baral and Yejin Choi and Noah A. Smith and Hannaneh Hajishirzi and Daniel Khashabi}, title = {Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ Tasks}, booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing}, year = {2022}, }
^*Equal Contribution.

2021

GenPlan
Learning Causal Models of Autonomous Agents using Interventions

Pulkit Verma, and Siddharth Srivastava

In IJCAI 2021 Workshop on Generalization in Planning, 2021

Abstract BibTeX Publisher PDF Poster Slides

One of the several obstacles in the widespread use of AI systems is the lack of requirements of interpretability that can enable a layperson to ensure the safe and reliable behavior of such systems. We extend the analysis of an agent assessment module that lets an AI system execute high-level instruction sequences in simulators and answer the user queries about its execution of sequences of actions. We show that such a primitive query-response capability is sufficient to efficiently derive a user-interpretable causal model of the system in stationary, fully observable, and deterministic settings. We also introduce dynamic causal decision networks (DCDNs) that capture the causal structure of STRIPS-like domains. A comparative analysis of different classes of queries is also presented in terms of the computational requirements needed to answer them and the efforts required to evaluate their responses to learn the correct model.
@inproceedings{verma2021learningcausal, author = {Verma, Pulkit and Srivastava, Siddharth}, title = {Learning Causal Models of Autonomous Agents using Interventions}, booktitle = {IJCAI 2021 Workshop on Generalization in Planning}, year = {2021}, }
AAAI
Asking the Right Questions: Learning Interpretable Action Models Through Query Answering

Pulkit Verma, Shashank Rao Marpally, and Siddharth Srivastava

In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Abstract BibTeX arXiv Publisher PDF Video Code Poster Slides

This paper develops a new approach for estimating an interpretable, relational model of a black-box autonomous agent that can plan and act. Our main contributions are a new paradigm for estimating such models using a minimal query interface with the agent, and a hierarchical querying algorithm that generates an interrogation policy for estimating the agent’s internal model in a vocabulary provided by the user. Empirical evaluation of our approach shows that despite the intractable search space of possible agent models, our approach allows correct and scalable estimation of interpretable agent models for a wide class of black-box autonomous agents. Our results also show that this approach can use predicate classifiers to learn interpretable models of planning agents that represent states as images.
@inproceedings{verma2021asking, author = {Verma, Pulkit and Marpally, Shashank Rao and Srivastava, Siddharth}, title = {Asking the Right Questions: Learning Interpretable Action Models Through Query Answering}, year = {2021}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, }
Older Version(s):
Asking the Right Questions: Active Action-Model Learning
Pulkit Verma, Shashank Rao Marpally, and Siddharth Srivastava.
In AAAI 2021 Workshop on Explainable Agency in Artificial Intelligence, 2021
Publisher PDF Slides Video

Learning Interpretable Models for Black-Box Agents
Pulkit Verma, and Siddharth Srivastava.
In ICML 2020 Workshop on Human in the Loop Learning, 2020
Publisher PDF Poster

Learning Generalized Models by Interrogating Black-Box Autonomous Agents
Pulkit Verma, and Siddharth Srivastava.
In AAAI 2020 Workshop on Generalization in Planning, 2020
Publisher PDF Poster Slides

2016

ICSC
A Comparative Study of Resource Usage for Speaker Recognition Techniques

Pulkit Verma, and Pradip K. Das

In Proceedings of the 2016 International Conference on Signal Processing and Communication, 2016

Abstract BibTeX Publisher PDF Slides

Resource usage of a software is an important factor to be taken into consideration while developing speaker recognition applications for mobile devices. Sometimes usage parameters are considered as important as accuracy of such systems. In this work, we analyze resource utilization in terms of power consumption, memory and space requirements of three standard speaker recognition techniques, viz. GMM-UBM framework, Joint Factor Analysis and i-vectors. Experiments are performed on the MIT MDSVC corpus using the Energy Measurement Library (EML). It is found that though i-vector approach requires more storage space, it is superior to the other two approaches in terms of memory and power consumption, which are critical factors for evaluating software performance in resource constrained mobile devices.
@inproceedings{verma2016comparative, author = {Verma, Pulkit and Das, Pradip K}, title = {A Comparative Study of Resource Usage for Speaker Recognition Techniques}, booktitle = {Proceedings of the 2016 International Conference on Signal Processing and Communication}, pages = {314–319}, year = {2016}, publisher = {IEEE}, doi = {10.1109/ICSPCom.2016.7980598}, url = {https://doi.org/10.1109/ICSPCom.2016.7980598}, }

2015

IJST
i-Vectors in Speech Processing Applications: A Survey

Pulkit Verma, and Pradip K. Das

International Journal of Speech Technology , vol. 18, no. 4, pp. 529–546 , 2015

Abstract BibTeX Publisher PDF

In the domain of speech recognition many methods have been proposed over time like Gaussian mixture models (GMM), GMM with universal background model (GMM-UBM framework), joint factor analysis, etc. i-Vector subspace modeling is one of the recent methods that has become the state of the art technique in this domain. This method largely provides the benefit of modeling both the intra-domain and inter-domain variabilities into the same low dimensional space. In this survey, we present a comprehensive collection of research work related to i-vectors since its inception. Some recent trends of using i-vectors in combination with other approaches are also discussed. The application of i-vectors in various fields of speech recognition, viz speaker, language, accent recognition, etc. is also presented. This paper should serve as a good starting point for anyone interested in working with i-vectors for speech processing in general. We then conclude the paper with a brief discussion on the future of i-vectors.
@article{verma2015ivectors, author = {Verma, Pulkit and Das, Pradip K}, title = {i-{Vectors} in Speech Processing Applications: {A Survey}}, journal = {International Journal of Speech Technology}, year = {2015}, volume = {18}, number = {4}, pages = {529–546}, publisher = {Springer Nature}, doi = {10.1007/s10772-015-9295-3}, url = {https://doi.org/10.1007/s10772-015-9295-3}, }
UIST
Investigating the “Wisdom of Crowds” at Scale

Alok Shankar Mysore, Vikas S. Yaligar, Imanol Arrieta Ibarra, Camelia Simoiu, Sharad Goel, Ramesh Arvind, Chiraag Sumanth, Arvind Srikantan, Bhargav HS, Mayank Pahadia, Tushar Dobha, Atif Ahmed, Mani Shankar, Himani Agarwal^*, Rajat Agarwal^*, Sai Anirudh-Kondaveeti^*, Shashank Arun-Gokhale^*, Aayush Attri^*, Arpita Chandra^*, Yogitha Chilukur^*, Sharath Dharmaji^*, Deepak Garg^* , Naman Gupta^* , Paras Gupta^*, Glincy Mary Jacob^*, Siddharth Jain^*, Shashank Joshi^*, Tarun Khajuria^*, Sameeksha Khillan^*, Sandeep Konam^*, Praveen Kumar-Kolla^*, Sahil Loomba^*, Rachit Madan^*, Akshansh Maharaja^*, Vidit Mathur^*, Bharat Munshi^*, Mohammed Nawazish^*, Venkata Neehar-Kurukunda^*, Venkat Nirmal-Gavarraju^*, Sonali Parashar^*, Harsh Parikh^*, Avinash Paritala^*, Amit Patil^*, Rahul Phatak^*, Mandar Pradhan^*, Abhilasha Ravichander^*, Krishna Sangeeth^*, Sreecharan Sankaranarayanan^*, Vibhor Sehgal^*, Ashrith Sheshan^*, Suprajha Shibiraj^* , Aditya Singh^*, Anjali Singh^*, Prashant Sinha^*, Pushkin Soni^*, Bipin Thomas^*, Kasyap Varma-Dattada^*, Sukanya Venkataraman^*, Pulkit Verma^*, and Ishan Yelurwar^*

In Adjunct Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, 2015

Abstract BibTeX Publisher PDF Code Poster

In a variety of problem domains, it has been observed that the aggregate opinions of groups are often more accurate than those of the constituent individuals, a phenomenon that has been termed the "wisdom of the crowd." Yet, perhaps surprisingly, there is still little consensus on how generally the phenomenon holds, how best to aggregate crowd judgements, and how social influence affects estimates. We investigate these questions by taking a meta wisdom of crowds approach. With a distributed team of over 100 student researchers across 17 institutions in the United States and India, we develop a large-scale online experiment to systematically study the wisdom of crowds effect for 1,000 different tasks in 50 subject domains. These tasks involve various types of knowledge (e.g., explicit knowledge, tacit knowledge, and prediction), question formats (e.g., multiple choice and point estimation), and inputs (e.g., text, audio, and video). To examine the effect of social influence, participants are randomly assigned to one of three different experiment conditions in which they see varying degrees of information on the responses of others. In this ongoing project, we are now preparing to recruit participants via Amazon’s Mechanical Turk.
@inproceedings{mysore2015investigating, author = {Shankar Mysore, Alok and Yaligar, Vikas S. and Arrieta Ibarra, Imanol and Simoiu, Camelia and Goel, Sharad and Arvind, Ramesh and Sumanth, Chiraag and Srikantan, Arvind and HS, Bhargav and Pahadia, Mayank and Dobha, Tushar and Ahmed, Atif and Shankar, Mani and Agarwal, Himani and Agarwal, Rajat and Anirudh-Kondaveeti, Sai and Arun-Gokhale, Shashank and Attri, Aayush and Chandra, Arpita and Chilukur, Yogitha and Dharmaji, Sharath and Garg, Deepak and Gupta, Naman and Gupta, Paras and Jacob, Glincy Mary and Jain, Siddharth and Joshi, Shashank and Khajuria, Tarun and Khillan, Sameeksha and Konam, Sandeep and Kumar-Kolla, Praveen and Loomba, Sahil and Madan, Rachit and Maharaja, Akshansh and Mathur, Vidit and Munshi, Bharat and Nawazish, Mohammed and Neehar-Kurukunda, Venkata and Nirmal-Gavarraju, Venkat and Parashar, Sonali and Parikh, Harsh and Paritala, Avinash and Patil, Amit and Phatak, Rahul and Pradhan, Mandar and Ravichander, Abhilasha and Sangeeth, Krishna and Sankaranarayanan, Sreecharan and Sehgal, Vibhor and Sheshan, Ashrith and Shibiraj, Suprajha and Singh, Aditya and Singh, Anjali and Sinha, Prashant and Soni, Pushkin and Thomas, Bipin and Varma-Dattada, Kasyap and Venkataraman, Sukanya and Verma, Pulkit and Yelurwar, Ishan}, title = {Investigating the "{Wisdom of Crowds}" at Scale}, booktitle = {Adjunct Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology}, pages = {75–76}, year = {2015}, isbn = {9781450337809}, publisher = {Association for Computing Machinery}, doi = {10.1145/2815585.2815725}, url = {https://doi.org/10.1145/2815585.2815725}, }
^*Equal Contribution.
AIR
A Mobile Agents based Distributed Speech Recognition Engine for Controlling Multiple Robots

Mayank Gupta, Pulkit Verma, Tuhin Bhattacharya, and Pradip K. Das

In Proceedings of the 2015 Conference on Advances In Robotics, 2015

Abstract BibTeX Publisher PDF Slides

Interaction with a robot has been an active area of research since the inception of robotics. Talking to a robot has always been considered the most natural way to communicate with it. But it is not always possible to have a full-fledged, standalone speech processing engine to be present on a robot or on a single machine. A dedicated system to convert the commands from audio to text is needed. However, as the number of commands and robots increases, it becomes necessary to eliminate all the single-point failure points in the system. Thus, distributed speech engine comes into picture. Also users may want to talk to the robot in different languages. The approach proposed in this paper is distributed, fault tolerant and scalable, such that any new recognition algorithm or language support can be added and used without any changes to the existing system. The work has been demonstrated on a freely available mobile agents based Internet of Things platform. However, any platform can be used.
@inproceedings{gupta2015mobile, author = {Gupta, Mayank and Verma, Pulkit and Bhattacharya, Tuhin and Das, Pradip K.}, title = {A Mobile Agents based Distributed Speech Recognition Engine for Controlling Multiple Robots}, booktitle = {Proceedings of the 2015 Conference on Advances In Robotics}, pages = {1–6}, year = {2015}, isbn = {9781450333566}, publisher = {Association for Computing Machinery}, doi = {10.1145/2783449.2783477}, url = {https://doi.org/10.1145/2783449.2783477}, }

2014

IC3I
Improving Services Using Mobile Agents-based IoT in a Smart City

Pulkit Verma , Mayank Gupta, Tuhin Bhattacharya, and Pradip K. Das

In Proceedings of the 2014 International Conference on Contemporary Computing and Informatics, 2014

Abstract BibTeX Publisher PDF Slides

Modern-day devices like smart-phones, tablets, televisions etc. possess very powerful processors and huge storage capacities compared to what were available a few years ago. Most of these devices are also connected to the Internet. However, the full capabilities of these devices are not fully harnessed and thus, they are not as intelligent as they could be. These devices, together with the Internet, can be used as “Internet of Things” where each device can be both producer and consumer of information. This framework is realizable in a real dynamic system if there is an intelligent distributed layer above it which can cater to services of all heterogeneous devices as required. The existing solutions to this problem are either too hardware dependent, or too abstract. In this paper we present a concept of this layer using mobile agents which makes the system flexible and dynamically adaptable. This layer has been deployed using a publicly available Prolog-based mobile agent emulator (however, any other mobile agent framework can also be used). The proposed approach is capable of updating information like availability and usability of services dynamically. It also has speech processing modules to provide solutions using voice-based commands and prompts. The prototype is scalable and robust to partial network failures. The implementation details and performance analysis of this work are reported and discussed. This framework can be used to deploy systems which can enable people to search for services like health facilities, food services, transportation, law and order using a common interface including voice commands.
@inproceedings{verma2014improving, author = {Verma, Pulkit and Gupta, Mayank and Bhattacharya, Tuhin and Das, Pradip K.}, title = {Improving Services Using Mobile Agents-based {IoT} in a Smart City}, booktitle = {2014 International Conference on Contemporary Computing and Informatics}, pages = {107–111}, year = {2014}, publisher = {IEEE}, doi = {10.1109/IC3I.2014.7019766}, url = {https://doi.org/10.1109/IC3I.2014.7019766}, }