TL;DR

Most enterprise RFPs for agentic AI in 2026 are still using checklists designed for traditional automation software. They ask about integrations, security certifications, and pricing. They don’t ask the questions that determine whether the platform will actually survive your next audit cycle, your next regulator review, or your next model upgrade.

This RFP template fixes that. It covers the 30 questions you should be asking every agentic AI vendor in 2026, organized into eight categories:

  • Architecture and reasoning (5 questions)
  • Audit trail and explainability (5 questions)
  • Model governance and version control (4 questions)
  • Human oversight and HITL design (4 questions)
  • Data lineage and security (3 questions)
  • Regulatory and compliance alignment (3 questions)
  • Implementation and operational readiness (3 questions)
  • Commercial and contractual terms (3 questions)

For each question, this template tells you what a good answer looks like, what a red-flag answer sounds like, and why the question matters under 2026 standards (COSO February 2026 guidance, PCAOB AS 2201 effective December 15, 2026, EU AI Act Article 11 enforcement beginning August 2, 2026, and ECOA Circular 2023-03 for credit-affecting decisions).

A note on the publisher. Kognitos publishes this template because it is the questionnaire we wish every prospect would send us. The questions are honest. They are designed to surface the architectural differences between agentic AI platforms, not to favor any one vendor. If you score Kognitos against the same 30 questions, we expect to do well. If a competitor scores higher on your particular use case, that is useful information. Use this template even if Kognitos is not on your shortlist.

Why agentic AI RFPs need a different template in 2026 #

The 2024-era RFP for AI vendors was a software RFP with an AI section added at the end. Three things changed in 2025–2026 that make that approach inadequate:

1. Audit trails became a procurement requirement. COSO published “Achieving Effective Internal Control Over Generative AI” on February 23, 2026, requiring that monitoring of AI-driven processes capture prompts, inputs, outputs, model and configuration versions, and evidence of human review sufficient to reconstruct what the AI acted on. PCAOB AS 2201, effective for fiscal years beginning on or after December 15, 2026, expands benchmarking provisions for automated controls but conditions reliance on the AI’s decision logic not having changed. Together, these mean RFPs must explicitly verify the vendor can produce reconstructable, attributable audit evidence. See our 2026 AI audit trail checklist for the field-level breakdown.

2. The EU AI Act moved from theory to enforcement. Full enforcement of high-risk AI provisions begins August 2, 2026 under current law. Article 11 (technical documentation), Article 12 (logs), Article 13 (transparency to deployers), Article 14 (human oversight), and Article 86 (right to explanation) all create vendor obligations that the buyer inherits if not contractually addressed. The RFP is where those obligations get caught or missed.

3. The platforms diverged architecturally. Some agentic AI platforms are deterministic (same input produces same output, with the reasoning grounded in explicit rules). Others are probabilistic (outputs vary based on model state, with reasoning emergent from model weights). The difference matters for every downstream procurement consideration, but most RFPs don’t ask about it directly. The questions below do.

The 30 questions #

Each question includes the question itself, what to look for in a good answer, and the red flag that should make you ask a follow-up.

Category 1: Architecture and reasoning (5 questions) #

Q1. Is your platform's decision logic deterministic or probabilistic? What does that mean for the same input being processed twice? #

Good answer: Deterministic. The same input produces the same output every time, grounded in explicit rules. We can demonstrate this with a sample workflow. Or: Probabilistic, with structured prompt-and-policy frameworks. Same input may produce different outputs depending on model state; here is how we control for that.

Red flag: Our AI learns from context, so each decision is unique. This is a non-answer wrapped in marketing language. Push for specifics.

Q2. In plain language, how does your platform represent the decision logic that drives an agent's actions? Show me a sample policy. #

Good answer: The vendor shows you the actual policy that runs in production, in a readable format. Plain English, a domain-specific language, or a clearly structured rule format. You should be able to understand it without engineering help.

Red flag: “The logic is in the model” or “it’s emergent from prompting.” Both mean the policy is not inspectable, which means audit walkthroughs will be difficult.

Q3. How do you handle exceptions or edge cases the AI cannot resolve on its own? #

Good answer: The vendor describes a defined exception taxonomy with documented escalation paths, plain-English explanations to human reviewers, and structured fallback behavior. Bonus points for showing the actual reviewer interface.

Red flag: “The AI escalates when confidence is low.” This is incomplete. Confidence-based escalation without structured explanations creates HITL bottlenecks (see Category 4).

Q4. Can your platform handle workflows that span multiple systems (ERP, CRM, document repositories, banking systems) in a single transaction? #

Good answer: The vendor names the systems they integrate with, describes how state is maintained across system calls, and explains how failures in one system are handled.

Red flag: “We integrate with everything via APIs.” Push for specifics. How many pre-built connectors? Which ones?

Q5. How does your platform handle multiple AI agents acting in coordination on the same workflow? #

Good answer: The vendor describes multi-agent orchestration patterns (sequential, parallel, hierarchical), how agents communicate state, and how conflicts between agents are resolved.

Red flag: “Each agent acts independently.” For complex workflows, this is a liability, not a feature.

Category 2: Audit trail and explainability (5 questions) #

Q6. For any decision your platform makes, can you produce, in plain language, the specific rule or policy that was applied? #

Good answer: Yes, with a worked example. The vendor shows the actual rule, the inputs that triggered it, and the resulting action.

Red flag: “We log the confidence score.” Confidence scores are not explanations. See our post on why “94% confident” is not an audit trail.

Q7. What fields are captured in your audit log for each AI-driven decision? #

Good answer: The vendor's audit log captures (at minimum): NTP-synced timestamp, unique decision ID, authenticated human user identity, AI system identity and version, model identity and version, inputs with source attribution, the specific rule applied, reasoning in plain language, output produced, downstream action, human review (if applicable), and tamper-evident integrity proof. This is the 12-field schema covered in our 2026 AI audit trail checklist.

Red flag: Anything less than 8 of those fields. Particularly missing: the specific rule applied, reasoning in plain language, and authenticated human user identity (not just service account).

Q8. How does your platform ensure audit logs cannot be altered after the fact? #

Good answer: The vendor describes cryptographic tamper-evidence (hash chains, append-only logs, write-once storage classes) and explains how an external auditor could independently verify log integrity.

Red flag: “Our logs are stored securely.” Push for the cryptographic mechanism.

Q9. If our external auditor picks a specific transaction from six months ago and asks how the AI handled it, can your platform reconstruct the entire decision path? #

Good answer: Yes, with a defined retention period (at least 7 years for SOX-relevant systems, 6 years for HIPAA, 6 months for EU AI Act high-risk). The vendor describes how the reconstruction works and what specific information would be available.

Red flag: “We log everything, but reconstruction would require a support ticket.” This is not audit-ready.

Q10. How long are audit logs retained, and what controls protect them during retention? #

Good answer: The vendor's retention defaults align with the longest applicable regulatory floor (typically 7 years). Controls include access logging, encryption, and tamper-evidence as standard.

Red flag: Retention defaults of 90 days or 1 year. These will fail SOX, HIPAA, and EU AI Act floors.

Category 3: Model governance and version control (4 questions) #

Q11. What AI models does your platform use, and how are model versions managed? #

Good answer: The vendor names the specific models (e.g., Claude 4.5 Sonnet, GPT-4o, Gemini 2.5), describes how model versions are pinned per workflow, and explains how model upgrades are tested before promotion to production.

Red flag: “We use the latest version of the model.” This will reopen every operating effectiveness conclusion under PCAOB AS 2201 every time the provider updates the model.

Q12. If the underlying model is upgraded by the provider, what is your change management process? #

Good answer: The vendor describes explicit model-version events with their own change records, regression testing against documented behavior baselines, and a controlled promotion path from staging to production.

Red flag: “Model upgrades happen transparently.” This means you cannot demonstrate to your auditor that the decision logic has not changed.

Q13. How does your platform detect if the AI is starting to behave differently over time (drift)? #

Good answer: The vendor describes continuous monitoring of decision distributions, exception rates, escalation rates, and a defined set of canary transactions whose behavior should not change. Alerting on drift outside defined tolerances is automatic.

Red flag: “We monitor performance metrics.” Push for specifics on what is monitored and how alerts are triggered.

Q14. Can we provide our own AI model (BYOM) or are we locked into your vendor stack? #

Good answer: The vendor supports BYOM (Bring Your Own Model) with specific models named as supported (e.g., Claude, GPT, Gemini, open-source models). They explain the trade-offs for each.

Red flag: “We use our proprietary model.” This is not automatically disqualifying, but it concentrates risk and locks you in.

Category 4: Human oversight and HITL design (4 questions) #

Q15. How does your platform support different levels of human oversight (auto-approve, async review, synchronous approval) based on decision risk? #

Good answer: The vendor describes a tiered HITL architecture native to the platform: low-risk decisions auto-approved with sampling audit, medium-risk decisions allowed to proceed with asynchronous human review, high-risk decisions blocked pending synchronous approval. See our post on HITL as a bottleneck.

Red flag: “Our platform supports human-in-the-loop.” Without tiering, this is a recipe for HITL theater.

Q16. When a human reviews an AI decision, what does the reviewer see? #

Good answer: The reviewer sees the specific rule the AI applied, the inputs the AI used, why the AI escalated the decision (or chose to act), and the most likely resolution paths. The interface is designed for 10–30 second decisions on routine reviews.

Red flag: “The reviewer sees the AI’s recommendation and a confidence score.” This produces rubber-stamping, not oversight.

Q17. How is the human reviewer's identity, decision, and review time captured in the audit log? #

Good answer: Every review event is logged with the reviewer's authenticated identity, timestamp, the explanation shown to them, the time they spent, the decision they made, and any comment they added. The reviewer event is part of the decision audit trail, not a separate system.

Red flag: “The reviewer’s username is captured.” Push for the full event log.

Q18. How does your platform support compliance with EU AI Act Article 14 (human oversight)? #

Good answer: The vendor describes specific Article 14 alignment: meaningful human capacity to verify and override AI decisions, training requirements for oversight personnel, clear interfaces that surface what the AI is doing, and the audit trail of oversight events.

Red flag: “Our platform is EU AI Act compliant” without specifics. The Act is complex; vendors who say this in one sentence usually have not done the work.

Category 5: Data lineage and security (3 questions) #

Q19. For any decision your platform makes, can you tell me exactly what data the AI accessed, from which systems, with what authorization? #

Good answer: Yes, with a worked example. The vendor shows the data lineage in the audit log, including which user's authorization was used for each system call.

Red flag: “We log data access at the system level.” Insufficient. ISACA’s May 2026 framework requires data lineage at the decision level.

Q20. How does your platform handle access control for AI agents themselves (not just human users)? #

Good answer: The vendor describes role-based access for agents, with explicit permissions per system, scoped to the minimum necessary for each task. Agent provisioning, deprovisioning, and quarterly access reviews are documented.

Red flag: “Agents inherit user permissions.” This can be appropriate, but only if the audit trail captures both the agent identity and the human user whose session triggered the agent’s access.

Q21. What security certifications does your platform hold? #

Good answer: SOC 2 Type II as a minimum. ISO 27001, GDPR alignment, HIPAA, and PCI DSS where applicable. For 2026, ISO/IEC 42001 alignment work in progress is a positive signal.

Red flag: “SOC 2 Type I” or no certification. Type II is the production standard.

Category 6: Regulatory and compliance alignment (3 questions) #

Q22. How does your platform support SOX compliance for AI-touched financial reporting controls? #

Good answer: The vendor maps their architecture to specific SOX requirements: ICFR scope mapping, walkthrough support, design effectiveness testing, operating effectiveness testing, ITGC alignment, and AS 2201 expanded benchmarking support. See our post on what SOX auditors ask about AI.

Red flag: “We are SOC 2 certified.” SOC 2 is not SOX. Different frameworks. Push for SOX-specific support.

Q23. For credit-affecting decisions (loan approvals, credit limits, vendor credit terms), how does your platform support ECOA “specific principal reasons” requirements? #

Good answer: The vendor describes how the AI's reasoning produces specific principal reasons (not just confidence scores) suitable for ECOA adverse action notices, in alignment with CFPB Circular 2023-03.

Red flag: “Our credit decisions include an explanation.” Push for the specific format and whether the explanation is generated post-hoc or comes from the AI’s actual reasoning.

Q24. How does your platform support GDPR Article 22 right to explanation for automated decisions affecting EU persons? #

Good answer: The vendor describes how meaningful information about the logic of the decision is captured, in a form a data subject can understand, with the right to human review of the automated decision.

Red flag: “Our platform is GDPR-compliant.” Same answer-pattern problem as the EU AI Act question.

Category 7: Implementation and operational readiness (3 questions) #

Q25. What does a typical implementation look like, and how long does it take? #

Good answer: The vendor describes a specific implementation methodology with named phases, typical durations per phase, customer responsibilities, and vendor responsibilities. They provide reference customers from similar industries.

Red flag: “Most customers go live in 30 days.” Possible for narrow use cases, suspicious for enterprise-wide deployments.

Q26. Who writes the workflow logic? Engineers, business users, or both? #

Good answer: The vendor describes who is best suited to author and modify workflows. For platforms with English-language or no-code authoring, business users (with vendor support) write the logic. For developer-focused platforms, engineers do.

Red flag: Vagueness on this. Knowing who actually writes the logic determines who you need to staff for operational success.

Q27. How do you support customers post-deployment? #

Good answer: The vendor describes a tiered support model with response times, dedicated solutions architect involvement, ongoing optimization workshops, and a customer community or knowledge base.

Red flag: “We have a customer success team.” Push for specifics: response SLAs, escalation paths, dedicated vs shared.

Category 8: Commercial and contractual terms (3 questions) #

Q28. How is pricing structured? Per workflow, per agent, per decision, per seat, or per transaction? #

Good answer: The vendor explains the pricing model clearly, provides a sample calculation for a hypothetical customer matching the buyer's profile, and discusses how pricing scales as usage grows.

Red flag: “Custom pricing based on your needs.” Get them to commit to a specific framework, even if the final number is custom.

Q29. What are your contractual commitments around model governance, audit trail completeness, and incident notification? #

Good answer: The vendor commits in the contract to model version pinning per agreement (no silent upgrades), audit trail completeness per the 12-field schema, incident notification within a defined SLA (typically 24–72 hours), and AIBOM delivery on every material change. See our post on AIBOM in procurement.

Red flag: “All of this is covered in our standard agreement.” Ask to see the specific clauses. If they cannot point to them, the commitments do not exist in writing.

Q30. If we terminate the contract, what happens to our workflow logic, data, and audit history? #

Good answer: The vendor commits to data export in a documented format, workflow logic export (where the logic itself is portable, as with English-as-code platforms), and audit history retention through the regulatory floor even after termination.

Red flag: “You can export your data.” Insufficient. Workflow logic and audit history are separate questions and most contracts do not address them by default.

How to score the responses #

A 30-question RFP needs a scoring framework. The simplest workable approach:

Score each question on a 0–3 scale

  • 0: No answer, evasive answer, or answer that reveals a structural gap
  • 1: Partial answer or answer that requires significant follow-up
  • 2: Solid answer with reasonable evidence
  • 3: Strong answer with verifiable evidence and reference customer corroboration

Weight the categories by your priorities

A finance team will weight Categories 2 (audit trail), 6 (regulatory), and 4 (HITL) more heavily. An engineering team may weight Categories 1 (architecture) and 3 (model governance) more heavily.

Set a minimum threshold per category

A vendor scoring well overall but below 8/15 on Categories 2 and 6 should not be in the final round if your use case is audit-sensitive. Total scores can hide categorical weaknesses.

Validate with reference customers

Two reference calls per shortlisted vendor, with questions specifically calibrated to the categories where their RFP score was strongest. If their best category does not check out in references, the rest of their scores are suspect.

Five red-flag patterns that should make you walk away #

Across hundreds of agentic AI vendor evaluations in 2026, the following patterns predict procurement regret:

1. The vendor cannot show you the actual policy that runs in production. If the answer to Q2 is “the logic is in the model” or “it’s emergent from prompting,” your auditors will not be able to verify the control.

2. The vendor logs confidence scores in place of explanations. Q6 and Q16 surface this. A platform that confuses confidence for reasoning is not audit-ready. See why “94% confident” is not an audit trail.

3. The vendor cannot pin model versions. Q11 and Q12. If the model upgrades silently, every PCAOB AS 2201 operating effectiveness conclusion is reopened.

4. The vendor’s HITL is undifferentiated. Q15. Without tiering by risk, HITL becomes a bottleneck and a source of audit-trail theater. See the hidden cost of human in the loop.

5. The vendor’s contractual commitments are vague. Q29 and Q30. Marketing promises do not bind. If the audit-trail and model-governance commitments are not in the contract, they do not exist.

If three or more of these patterns show up in the responses, the platform is not built for 2026 enterprise procurement. Do not advance the vendor regardless of demo quality or pricing.

How Kognitos answers these 30 questions #

We publish this template because it is the questionnaire we want every prospect to ask. For full transparency, here is a high-level view of how Kognitos answers each category:

  • Architecture and reasoning (Category 1): Deterministic neurosymbolic architecture. Workflow logic is written in plain English (English-as-code) and is the same English an auditor reads in walkthroughs. Sample policies available on request.
  • Audit trail and explainability (Category 2): Every decision logged with the 12-field schema, in plain English, with tamper-evident integrity proofs.
  • Model governance and version control (Category 3): Model versions pinned per automation. Upgrades are explicit events with change records.
  • Human oversight and HITL design (Category 4): Tiered HITL native to the platform with documented decision-authority matrices.
  • Data lineage and security (Category 5): Decision-level data lineage. SOC 2 Type II, ISO 27001, HIPAA, GDPR aligned. ISO/IEC 42001 work in progress. See our Trust & Security portal.
  • Regulatory and compliance alignment (Category 6): Mapped to SOX, COSO February 2026 guidance, PCAOB AS 2201, ECOA, GDPR Article 22, and EU AI Act Articles 11, 13, and 14 by design. See what your SOX auditor will ask about your AI automation.
  • Implementation and operational readiness (Category 7): Collaborative implementation with solutions architects. Business users author workflows in English with our support.
  • Commercial and contractual terms (Category 8): Standard contractual commitments around model governance, audit trail completeness, incident notification, and AIBOM delivery.

We do not expect every buyer to choose Kognitos. We do expect that buyers asking these 30 questions will end up with platforms architected for 2026 audit standards rather than platforms still designed for 2022 use cases.

Book a working session with a Kognitos solutions engineer → Try Kognitos free

Sources & citations #

The regulatory references, standards, and frameworks behind the 30 questions:

Regulatory and standards sources

Last updated: May 26, 2026. This article is intended for informational purposes and does not constitute legal, audit, or procurement advice. RFP design depends on specific organizational, regulatory, and operational contexts. Engage qualified counsel and procurement specialists for guidance specific to your situation.

Frequently asked questions

A 2026 agentic AI RFP should cover eight categories: architecture and reasoning, audit trail and explainability, model governance and version control, human oversight and HITL design, data lineage and security, regulatory and compliance alignment, implementation and operational readiness, and commercial and contractual terms. Within each category, the questions that matter most surface architectural distinctions (deterministic vs probabilistic reasoning, plain-language explanations vs confidence scores, tiered vs uniform HITL) and contractual commitments (model version pinning, audit trail completeness, incident notification SLAs). Generic AI checklists from 2023-2024 do not cover these adequately; the 30-question template in this article does.
No formal standard exists as of 2026, though several governance bodies are converging on similar evaluation criteria. NIST AI Risk Management Framework, ISO/IEC 42001, COSO's February 2026 generative AI guidance, and the EU AI Act Article 11 documentation requirements all imply specific questions a buyer should ask, but none of them publishes a procurement-ready RFP template. The 30-question template in this article synthesizes these requirements into a usable format. Industry groups like ISACA and Gartner have published partial frameworks; combining them with vendor-specific questions produces a complete RFP.
Run these four tests during evaluation. First, ask the vendor to produce the audit trail for a specific decision and check whether it includes the specific rule applied (not just the output and confidence score). Second, ask how the vendor demonstrates that the audit log has not been altered (tamper-evidence). Third, ask the vendor to reconstruct a decision from six months ago, end to end. Fourth, show a sample audit trail to your external auditor before signing the contract and ask whether it would satisfy a walkthrough under your control environment. If the vendor fails any of these four tests, the audit trail is not 2026-ready.
Under the EU AI Act, providers of high-risk AI systems must produce technical documentation under Article 11 with field-level requirements in Annex IV (effective August 2, 2026 under current law), maintain logs under Article 12 for at least six months, provide transparency to deployers under Article 13, ensure human oversight under Article 14, and support the right to explanation under Article 86. Buyers should ask vendors to demonstrate alignment with each of these articles, not just claim “EU AI Act compliance” generically. Article 86's right to explanation is particularly important because it requires the explanation to be in clear and meaningful terms a data subject can understand, which is harder for probabilistic AI systems to provide than for deterministic ones.
A well-structured agentic AI RFP cycle runs 8-12 weeks from RFP issuance to vendor selection. Two weeks for the vendor to respond, two weeks for buyer review and scoring, two weeks for vendor demos focused on the strongest responses, two weeks for reference checks and contract negotiation, with two weeks of buffer. Shorter cycles (4-6 weeks) typically miss audit-trail and contractual issues that surface later. Longer cycles (16+ weeks) usually indicate either a complex multi-stakeholder buying committee or a vendor evaluation that has lost focus.
Of the 30 questions in this template, five matter most because they predict procurement regret. Q2 (show me the actual policy that runs in production) surfaces whether the platform's logic is inspectable. Q7 (what fields are captured in your audit log) surfaces whether the platform is audit-ready. Q11 (how are model versions managed) surfaces whether silent model upgrades will reopen audit conclusions. Q15 (how is HITL tiered by risk) surfaces whether human oversight will scale or collapse. Q29 (what are your contractual commitments) surfaces whether the vendor's promises are binding. Any vendor that cannot answer these five strongly is unlikely to satisfy 2026 enterprise procurement standards.
Yes, and we encourage it. The template is designed to surface architectural and operational realities that apply to any agentic AI vendor in 2026. Some questions are easier to answer for deterministic, English-as-code platforms (like Kognitos), but the questions themselves are not biased toward any single vendor. If a different vendor scores higher than Kognitos on your specific use case, that is useful procurement information. Use the template freely.
A generic AI RFP focuses on AI features (model accuracy, supported use cases, integrations) and treats AI as a feature added to a traditional software product. An agentic AI RFP focuses on AI as the platform: the decision-making capacity, the reasoning architecture, the audit trail design, the human oversight model, and the contractual commitments that bind the vendor to deliver predictable behavior over time. The shift matters because agentic AI takes actions, not just makes recommendations. RFPs for action-taking systems need to verify the action-taking is governed, explainable, and auditable in ways that recommendation-only AI did not require.
Yes, particularly for enterprises with existing AI investments or regulatory sensitivities about model provenance. BYOM matters for three reasons. First, it reduces vendor lock-in: if you have already committed to Claude, GPT, or Gemini at the enterprise level, BYOM lets you preserve that investment. Second, it gives you control over model version pinning at the model level, not just the platform level. Third, it supports compliance scenarios where specific models are approved for specific data types (HIPAA, classified, jurisdiction-specific). Platforms with strong BYOM support are typically more architecturally mature than proprietary-only platforms.
Two reference calls per shortlisted vendor, structured around the categories where their RFP scored strongest. Ask each reference customer five questions: what specific use case did the platform solve, what was the timeline from contract to first production workflow, what surprised you (positively and negatively) during implementation, how does the platform handle exceptions in your specific workflow, and would you choose this vendor again. Avoid questions the reference cannot answer concretely (“how is their AI”). Ask questions that surface operational reality (“what is your current touchless rate, and what was it before”). The references the vendor offers are pre-screened to be positive; the operational questions surface the truth anyway.
Five protections are now standard in 2026 agentic AI contracts. First, model version pinning: the vendor commits not to silently upgrade the underlying model without your change-management process. Second, audit trail completeness: the vendor commits to the 12-field minimum schema for all decisions. Third, incident notification: the vendor commits to a defined SLA (24-72 hours) for notifying you of vulnerabilities, license issues, or material incidents. Fourth, AIBOM delivery: the vendor delivers an AI Bill of Materials on initial deployment and on every material change. Fifth, data and logic portability on termination: the vendor commits to delivering your workflow logic, data, and audit history in a usable format if the contract ends. RFPs that surface these protections during evaluation produce contracts that survive procurement review later.
K
Kognitos
Kognitos