TL;DR
Most enterprise RFPs for agentic AI in 2026 are still using checklists designed for traditional automation software. They ask about integrations, security certifications, and pricing. They don’t ask the questions that determine whether the platform will actually survive your next audit cycle, your next regulator review, or your next model upgrade.
This RFP template fixes that. It covers the 30 questions you should be asking every agentic AI vendor in 2026, organized into eight categories:
- Architecture and reasoning (5 questions)
- Audit trail and explainability (5 questions)
- Model governance and version control (4 questions)
- Human oversight and HITL design (4 questions)
- Data lineage and security (3 questions)
- Regulatory and compliance alignment (3 questions)
- Implementation and operational readiness (3 questions)
- Commercial and contractual terms (3 questions)
For each question, this template tells you what a good answer looks like, what a red-flag answer sounds like, and why the question matters under 2026 standards (COSO February 2026 guidance, PCAOB AS 2201 effective December 15, 2026, EU AI Act Article 11 enforcement beginning August 2, 2026, and ECOA Circular 2023-03 for credit-affecting decisions).
A note on the publisher. Kognitos publishes this template because it is the questionnaire we wish every prospect would send us. The questions are honest. They are designed to surface the architectural differences between agentic AI platforms, not to favor any one vendor. If you score Kognitos against the same 30 questions, we expect to do well. If a competitor scores higher on your particular use case, that is useful information. Use this template even if Kognitos is not on your shortlist.
Why agentic AI RFPs need a different template in 2026 #
The 2024-era RFP for AI vendors was a software RFP with an AI section added at the end. Three things changed in 2025–2026 that make that approach inadequate:
1. Audit trails became a procurement requirement. COSO published “Achieving Effective Internal Control Over Generative AI” on February 23, 2026, requiring that monitoring of AI-driven processes capture prompts, inputs, outputs, model and configuration versions, and evidence of human review sufficient to reconstruct what the AI acted on. PCAOB AS 2201, effective for fiscal years beginning on or after December 15, 2026, expands benchmarking provisions for automated controls but conditions reliance on the AI’s decision logic not having changed. Together, these mean RFPs must explicitly verify the vendor can produce reconstructable, attributable audit evidence. See our 2026 AI audit trail checklist for the field-level breakdown.
2. The EU AI Act moved from theory to enforcement. Full enforcement of high-risk AI provisions begins August 2, 2026 under current law. Article 11 (technical documentation), Article 12 (logs), Article 13 (transparency to deployers), Article 14 (human oversight), and Article 86 (right to explanation) all create vendor obligations that the buyer inherits if not contractually addressed. The RFP is where those obligations get caught or missed.
3. The platforms diverged architecturally. Some agentic AI platforms are deterministic (same input produces same output, with the reasoning grounded in explicit rules). Others are probabilistic (outputs vary based on model state, with reasoning emergent from model weights). The difference matters for every downstream procurement consideration, but most RFPs don’t ask about it directly. The questions below do.
The 30 questions #
Each question includes the question itself, what to look for in a good answer, and the red flag that should make you ask a follow-up.
Category 1: Architecture and reasoning (5 questions) #
Q1. Is your platform's decision logic deterministic or probabilistic? What does that mean for the same input being processed twice? #
Good answer: Deterministic. The same input produces the same output every time, grounded in explicit rules. We can demonstrate this with a sample workflow. Or: Probabilistic, with structured prompt-and-policy frameworks. Same input may produce different outputs depending on model state; here is how we control for that.
Red flag: Our AI learns from context, so each decision is unique. This is a non-answer wrapped in marketing language. Push for specifics.
Q2. In plain language, how does your platform represent the decision logic that drives an agent's actions? Show me a sample policy. #
Good answer: The vendor shows you the actual policy that runs in production, in a readable format. Plain English, a domain-specific language, or a clearly structured rule format. You should be able to understand it without engineering help.
Red flag: “The logic is in the model” or “it’s emergent from prompting.” Both mean the policy is not inspectable, which means audit walkthroughs will be difficult.
Q3. How do you handle exceptions or edge cases the AI cannot resolve on its own? #
Good answer: The vendor describes a defined exception taxonomy with documented escalation paths, plain-English explanations to human reviewers, and structured fallback behavior. Bonus points for showing the actual reviewer interface.
Red flag: “The AI escalates when confidence is low.” This is incomplete. Confidence-based escalation without structured explanations creates HITL bottlenecks (see Category 4).
Q4. Can your platform handle workflows that span multiple systems (ERP, CRM, document repositories, banking systems) in a single transaction? #
Good answer: The vendor names the systems they integrate with, describes how state is maintained across system calls, and explains how failures in one system are handled.
Red flag: “We integrate with everything via APIs.” Push for specifics. How many pre-built connectors? Which ones?
Q5. How does your platform handle multiple AI agents acting in coordination on the same workflow? #
Good answer: The vendor describes multi-agent orchestration patterns (sequential, parallel, hierarchical), how agents communicate state, and how conflicts between agents are resolved.
Red flag: “Each agent acts independently.” For complex workflows, this is a liability, not a feature.
Category 2: Audit trail and explainability (5 questions) #
Q6. For any decision your platform makes, can you produce, in plain language, the specific rule or policy that was applied? #
Good answer: Yes, with a worked example. The vendor shows the actual rule, the inputs that triggered it, and the resulting action.
Red flag: “We log the confidence score.” Confidence scores are not explanations. See our post on why “94% confident” is not an audit trail.
Q7. What fields are captured in your audit log for each AI-driven decision? #
Good answer: The vendor's audit log captures (at minimum): NTP-synced timestamp, unique decision ID, authenticated human user identity, AI system identity and version, model identity and version, inputs with source attribution, the specific rule applied, reasoning in plain language, output produced, downstream action, human review (if applicable), and tamper-evident integrity proof. This is the 12-field schema covered in our 2026 AI audit trail checklist.
Red flag: Anything less than 8 of those fields. Particularly missing: the specific rule applied, reasoning in plain language, and authenticated human user identity (not just service account).
Q8. How does your platform ensure audit logs cannot be altered after the fact? #
Good answer: The vendor describes cryptographic tamper-evidence (hash chains, append-only logs, write-once storage classes) and explains how an external auditor could independently verify log integrity.
Red flag: “Our logs are stored securely.” Push for the cryptographic mechanism.
Q9. If our external auditor picks a specific transaction from six months ago and asks how the AI handled it, can your platform reconstruct the entire decision path? #
Good answer: Yes, with a defined retention period (at least 7 years for SOX-relevant systems, 6 years for HIPAA, 6 months for EU AI Act high-risk). The vendor describes how the reconstruction works and what specific information would be available.
Red flag: “We log everything, but reconstruction would require a support ticket.” This is not audit-ready.
Q10. How long are audit logs retained, and what controls protect them during retention? #
Good answer: The vendor's retention defaults align with the longest applicable regulatory floor (typically 7 years). Controls include access logging, encryption, and tamper-evidence as standard.
Red flag: Retention defaults of 90 days or 1 year. These will fail SOX, HIPAA, and EU AI Act floors.
Category 3: Model governance and version control (4 questions) #
Q11. What AI models does your platform use, and how are model versions managed? #
Good answer: The vendor names the specific models (e.g., Claude 4.5 Sonnet, GPT-4o, Gemini 2.5), describes how model versions are pinned per workflow, and explains how model upgrades are tested before promotion to production.
Red flag: “We use the latest version of the model.” This will reopen every operating effectiveness conclusion under PCAOB AS 2201 every time the provider updates the model.
Q12. If the underlying model is upgraded by the provider, what is your change management process? #
Good answer: The vendor describes explicit model-version events with their own change records, regression testing against documented behavior baselines, and a controlled promotion path from staging to production.
Red flag: “Model upgrades happen transparently.” This means you cannot demonstrate to your auditor that the decision logic has not changed.
Q13. How does your platform detect if the AI is starting to behave differently over time (drift)? #
Good answer: The vendor describes continuous monitoring of decision distributions, exception rates, escalation rates, and a defined set of canary transactions whose behavior should not change. Alerting on drift outside defined tolerances is automatic.
Red flag: “We monitor performance metrics.” Push for specifics on what is monitored and how alerts are triggered.
Q14. Can we provide our own AI model (BYOM) or are we locked into your vendor stack? #
Good answer: The vendor supports BYOM (Bring Your Own Model) with specific models named as supported (e.g., Claude, GPT, Gemini, open-source models). They explain the trade-offs for each.
Red flag: “We use our proprietary model.” This is not automatically disqualifying, but it concentrates risk and locks you in.
Category 4: Human oversight and HITL design (4 questions) #
Q15. How does your platform support different levels of human oversight (auto-approve, async review, synchronous approval) based on decision risk? #
Good answer: The vendor describes a tiered HITL architecture native to the platform: low-risk decisions auto-approved with sampling audit, medium-risk decisions allowed to proceed with asynchronous human review, high-risk decisions blocked pending synchronous approval. See our post on HITL as a bottleneck.
Red flag: “Our platform supports human-in-the-loop.” Without tiering, this is a recipe for HITL theater.
Q16. When a human reviews an AI decision, what does the reviewer see? #
Good answer: The reviewer sees the specific rule the AI applied, the inputs the AI used, why the AI escalated the decision (or chose to act), and the most likely resolution paths. The interface is designed for 10–30 second decisions on routine reviews.
Red flag: “The reviewer sees the AI’s recommendation and a confidence score.” This produces rubber-stamping, not oversight.
Q17. How is the human reviewer's identity, decision, and review time captured in the audit log? #
Good answer: Every review event is logged with the reviewer's authenticated identity, timestamp, the explanation shown to them, the time they spent, the decision they made, and any comment they added. The reviewer event is part of the decision audit trail, not a separate system.
Red flag: “The reviewer’s username is captured.” Push for the full event log.
Q18. How does your platform support compliance with EU AI Act Article 14 (human oversight)? #
Good answer: The vendor describes specific Article 14 alignment: meaningful human capacity to verify and override AI decisions, training requirements for oversight personnel, clear interfaces that surface what the AI is doing, and the audit trail of oversight events.
Red flag: “Our platform is EU AI Act compliant” without specifics. The Act is complex; vendors who say this in one sentence usually have not done the work.
Category 5: Data lineage and security (3 questions) #
Q19. For any decision your platform makes, can you tell me exactly what data the AI accessed, from which systems, with what authorization? #
Good answer: Yes, with a worked example. The vendor shows the data lineage in the audit log, including which user's authorization was used for each system call.
Red flag: “We log data access at the system level.” Insufficient. ISACA’s May 2026 framework requires data lineage at the decision level.
Q20. How does your platform handle access control for AI agents themselves (not just human users)? #
Good answer: The vendor describes role-based access for agents, with explicit permissions per system, scoped to the minimum necessary for each task. Agent provisioning, deprovisioning, and quarterly access reviews are documented.
Red flag: “Agents inherit user permissions.” This can be appropriate, but only if the audit trail captures both the agent identity and the human user whose session triggered the agent’s access.
Q21. What security certifications does your platform hold? #
Good answer: SOC 2 Type II as a minimum. ISO 27001, GDPR alignment, HIPAA, and PCI DSS where applicable. For 2026, ISO/IEC 42001 alignment work in progress is a positive signal.
Red flag: “SOC 2 Type I” or no certification. Type II is the production standard.
Category 6: Regulatory and compliance alignment (3 questions) #
Q22. How does your platform support SOX compliance for AI-touched financial reporting controls? #
Good answer: The vendor maps their architecture to specific SOX requirements: ICFR scope mapping, walkthrough support, design effectiveness testing, operating effectiveness testing, ITGC alignment, and AS 2201 expanded benchmarking support. See our post on what SOX auditors ask about AI.
Red flag: “We are SOC 2 certified.” SOC 2 is not SOX. Different frameworks. Push for SOX-specific support.
Q23. For credit-affecting decisions (loan approvals, credit limits, vendor credit terms), how does your platform support ECOA “specific principal reasons” requirements? #
Good answer: The vendor describes how the AI's reasoning produces specific principal reasons (not just confidence scores) suitable for ECOA adverse action notices, in alignment with CFPB Circular 2023-03.
Red flag: “Our credit decisions include an explanation.” Push for the specific format and whether the explanation is generated post-hoc or comes from the AI’s actual reasoning.
Q24. How does your platform support GDPR Article 22 right to explanation for automated decisions affecting EU persons? #
Good answer: The vendor describes how meaningful information about the logic of the decision is captured, in a form a data subject can understand, with the right to human review of the automated decision.
Red flag: “Our platform is GDPR-compliant.” Same answer-pattern problem as the EU AI Act question.
Category 7: Implementation and operational readiness (3 questions) #
Q25. What does a typical implementation look like, and how long does it take? #
Good answer: The vendor describes a specific implementation methodology with named phases, typical durations per phase, customer responsibilities, and vendor responsibilities. They provide reference customers from similar industries.
Red flag: “Most customers go live in 30 days.” Possible for narrow use cases, suspicious for enterprise-wide deployments.
Q26. Who writes the workflow logic? Engineers, business users, or both? #
Good answer: The vendor describes who is best suited to author and modify workflows. For platforms with English-language or no-code authoring, business users (with vendor support) write the logic. For developer-focused platforms, engineers do.
Red flag: Vagueness on this. Knowing who actually writes the logic determines who you need to staff for operational success.
Q27. How do you support customers post-deployment? #
Good answer: The vendor describes a tiered support model with response times, dedicated solutions architect involvement, ongoing optimization workshops, and a customer community or knowledge base.
Red flag: “We have a customer success team.” Push for specifics: response SLAs, escalation paths, dedicated vs shared.
Category 8: Commercial and contractual terms (3 questions) #
Q28. How is pricing structured? Per workflow, per agent, per decision, per seat, or per transaction? #
Good answer: The vendor explains the pricing model clearly, provides a sample calculation for a hypothetical customer matching the buyer's profile, and discusses how pricing scales as usage grows.
Red flag: “Custom pricing based on your needs.” Get them to commit to a specific framework, even if the final number is custom.
Q29. What are your contractual commitments around model governance, audit trail completeness, and incident notification? #
Good answer: The vendor commits in the contract to model version pinning per agreement (no silent upgrades), audit trail completeness per the 12-field schema, incident notification within a defined SLA (typically 24–72 hours), and AIBOM delivery on every material change. See our post on AIBOM in procurement.
Red flag: “All of this is covered in our standard agreement.” Ask to see the specific clauses. If they cannot point to them, the commitments do not exist in writing.
Q30. If we terminate the contract, what happens to our workflow logic, data, and audit history? #
Good answer: The vendor commits to data export in a documented format, workflow logic export (where the logic itself is portable, as with English-as-code platforms), and audit history retention through the regulatory floor even after termination.
Red flag: “You can export your data.” Insufficient. Workflow logic and audit history are separate questions and most contracts do not address them by default.
How to score the responses #
A 30-question RFP needs a scoring framework. The simplest workable approach:
Score each question on a 0–3 scale
- 0: No answer, evasive answer, or answer that reveals a structural gap
- 1: Partial answer or answer that requires significant follow-up
- 2: Solid answer with reasonable evidence
- 3: Strong answer with verifiable evidence and reference customer corroboration
Weight the categories by your priorities
A finance team will weight Categories 2 (audit trail), 6 (regulatory), and 4 (HITL) more heavily. An engineering team may weight Categories 1 (architecture) and 3 (model governance) more heavily.
Set a minimum threshold per category
A vendor scoring well overall but below 8/15 on Categories 2 and 6 should not be in the final round if your use case is audit-sensitive. Total scores can hide categorical weaknesses.
Validate with reference customers
Two reference calls per shortlisted vendor, with questions specifically calibrated to the categories where their RFP score was strongest. If their best category does not check out in references, the rest of their scores are suspect.
Five red-flag patterns that should make you walk away #
Across hundreds of agentic AI vendor evaluations in 2026, the following patterns predict procurement regret:
1. The vendor cannot show you the actual policy that runs in production. If the answer to Q2 is “the logic is in the model” or “it’s emergent from prompting,” your auditors will not be able to verify the control.
2. The vendor logs confidence scores in place of explanations. Q6 and Q16 surface this. A platform that confuses confidence for reasoning is not audit-ready. See why “94% confident” is not an audit trail.
3. The vendor cannot pin model versions. Q11 and Q12. If the model upgrades silently, every PCAOB AS 2201 operating effectiveness conclusion is reopened.
4. The vendor’s HITL is undifferentiated. Q15. Without tiering by risk, HITL becomes a bottleneck and a source of audit-trail theater. See the hidden cost of human in the loop.
5. The vendor’s contractual commitments are vague. Q29 and Q30. Marketing promises do not bind. If the audit-trail and model-governance commitments are not in the contract, they do not exist.
If three or more of these patterns show up in the responses, the platform is not built for 2026 enterprise procurement. Do not advance the vendor regardless of demo quality or pricing.
How Kognitos answers these 30 questions #
We publish this template because it is the questionnaire we want every prospect to ask. For full transparency, here is a high-level view of how Kognitos answers each category:
- Architecture and reasoning (Category 1): Deterministic neurosymbolic architecture. Workflow logic is written in plain English (English-as-code) and is the same English an auditor reads in walkthroughs. Sample policies available on request.
- Audit trail and explainability (Category 2): Every decision logged with the 12-field schema, in plain English, with tamper-evident integrity proofs.
- Model governance and version control (Category 3): Model versions pinned per automation. Upgrades are explicit events with change records.
- Human oversight and HITL design (Category 4): Tiered HITL native to the platform with documented decision-authority matrices.
- Data lineage and security (Category 5): Decision-level data lineage. SOC 2 Type II, ISO 27001, HIPAA, GDPR aligned. ISO/IEC 42001 work in progress. See our Trust & Security portal.
- Regulatory and compliance alignment (Category 6): Mapped to SOX, COSO February 2026 guidance, PCAOB AS 2201, ECOA, GDPR Article 22, and EU AI Act Articles 11, 13, and 14 by design. See what your SOX auditor will ask about your AI automation.
- Implementation and operational readiness (Category 7): Collaborative implementation with solutions architects. Business users author workflows in English with our support.
- Commercial and contractual terms (Category 8): Standard contractual commitments around model governance, audit trail completeness, incident notification, and AIBOM delivery.
We do not expect every buyer to choose Kognitos. We do expect that buyers asking these 30 questions will end up with platforms architected for 2026 audit standards rather than platforms still designed for 2022 use cases.
Book a working session with a Kognitos solutions engineer → Try Kognitos free
Sources & citations #
The regulatory references, standards, and frameworks behind the 30 questions:
Regulatory and standards sources
- COSO — “Achieving Effective Internal Control Over Generative AI” (February 23, 2026).
- PCAOB AS 2201, “An Audit of Internal Control Over Financial Reporting That Is Integrated with An Audit of Financial Statements” (expanded benchmarking effective December 15, 2026).
- EU AI Act, Article 11 — Technical Documentation.
- EU AI Act, Article 12 — Record-keeping (logs).
- EU AI Act, Article 13 — Transparency to deployers.
- EU AI Act, Article 14 — Human oversight.
- EU AI Act, Article 86 — Right to explanation.
- GDPR Article 22 — Automated individual decision-making.
- CFPB Circular 2023-03 — Adverse action notification requirements.
- NIST AI Risk Management Framework (AI RMF 1.0).
- ISO/IEC 42001:2023 — AI management systems.
Last updated: May 26, 2026. This article is intended for informational purposes and does not constitute legal, audit, or procurement advice. RFP design depends on specific organizational, regulatory, and operational contexts. Engage qualified counsel and procurement specialists for guidance specific to your situation.
