A proposed standard that replaces story points with a model that defines how AI should execute, review, and deploy work.
Task: fix bug in payment calculation
C: 3 | I: 2 | R: 3 | K: 3
→ requires human review + controlled deployment
AI writes the code. CIRK governs the execution.
Story points ask
"How long will this take?"
CIRK asks
"How should this run?"
The shift
But AI changed the bottleneck. What matters now is not just how long work takes. It is:
CIRK models that reality.
Story points governed human delivery for 20 years. CIRK is designed to govern AI delivery.
Time is a side effect of the vector — not a dimension of it.
The model
Each task is scored from 1 to 3 across four dimensions. The vector defines how it runs.
How much context does the AI need?
| C1 | Isolated logic, local change, minimal context |
| C2 | Multiple components, shared patterns, moderate system understanding |
| C3 | Cross-system behavior, architectural reasoning, or domain-critical context |
How many iterations until it works?
| I1 | Deterministic or near one-pass execution |
| I2 | Some iteration expected, usually 2–3 cycles |
| I3 | Ambiguous, exploratory, or high-complexity convergence |
How much human review is required?
| R1 | Quick validation or spot-check |
| R2 | Rule-level or functional review required |
| R3 | Deep architectural, security, or mission-critical review |
How dangerous is deploying this?
| K1 | Safe, additive, low-risk rollout |
| K2 | Existing behavior changes, moderate release caution |
| K3 | Coordinated rollout, migration, or cross-team dependency |
Composite score = C + I + R + K
"Score the execution reality, not the political preference."
CIRK Scoring Guidance
Example — Login API
What CIRK is not
CIRK is a proposed execution standard for AI-native development.
CIRK in 60 seconds
Any unit of work: a feature, a fix, a refactor, a migration.
Assign C, I, R, K values from 1 to 3. Ask: how much context? how many cycles? how much review? how risky is rollout?
The vector maps to an execution mode — autonomous, guided, draft-first, supervised, or blocked. Agents and humans follow the same rules.
Teams gain a shared language: "This is high R." "Low K, we can ship." "I3 — let the agent draft first."
Rule of thumb
Execution mapping
The vector defines what happens — not just how big the task is.
Agent executes without intervention. Auto-approval allowed. No checkpoints.
Agent executes. Human review required before merge.
Agent produces a draft. Human validates before any commit.
Step-by-step execution. Approval per step. Deploy runbook required.
Task must be decomposed. Execution not allowed.
Policy rules
Quick start
Pick a task
Any task your team or agent is about to execute.
Score it
C(1-3) I(1-3) R(1-3) K(1-3)
Apply the policy
Examples
Each vector maps to concrete execution behavior, with reasoning for each dimension.
Backend
Frontend
Agent
Fix null check in cart total
Update button color token
Rename internal variable and update references
Login API
Dashboard layout revision
Refactor repository layer for shared patterns
Billing webhook change
Auth UI redesign (login + recovery flows)
Migrate payment gateway provider
Open standard
It is a proposed open standard for execution governance in AI-native software development. We are looking for teams willing to test it.
It can be adopted in issue trackers, coding agents, pull request workflows, internal platforms, or governance layers such as Orbit618.
FAQ
Yes, in AI-assisted development contexts where execution governance matters more than effort estimation.
Story points estimate human effort. CIRK estimates execution conditions for AI-native workflows.
Not directly. CIRK is about execution complexity, review intensity, and rollout risk.
Teams may later derive time insights from calibration data, but duration is not the primary output.
Time obscures the real constraint. CIRK makes it explicit.
Duration is a byproduct of complexity, iteration depth, review burden, and rollout sensitivity — not an independent variable. Adding time to the model would conflate cause and effect.
Teams that need execution windows can derive them from the vector: high R means longer review cycles, high K means wider rollout windows, high I means more iteration rounds. The vector already encodes the information — time just reads it.
Yes. Even when humans perform the implementation, CIRK still helps classify review burden, context depth, and deployment sensitivity.
No. Orbit618 is one possible implementation environment for CIRK, but CIRK is designed as a standalone proposed open standard.
Because effort is no longer the most important variable for AI-assisted execution.
CIRK makes the real constraints explicit instead of treating them as side rules layered on top of an effort model.
The vector, not just the score.
C3 I1 R1 K3 and C1 I3 R3 K1 may have the same sum but require very different execution policies. One is deploy-sensitive. The other is review-sensitive.
CIRK is a proposed standard — not a framework, not a tool. It defines a shared language for execution governance. Any team, tool, or platform can implement it. MIT licensed. We are looking for teams willing to test it.
Pick a task. Score it across C, I, R, K (1–3 each). Sum the vector. Apply the policy: 4–5 automate, 6–9 require review, 10+ control execution. That's it.
Yes. CIRK replaces the estimation layer (story points), not the process. Sprints, standups, and backlogs stay the same. The difference is that each task carries an execution policy instead of an effort guess.
Open questions
These are tensions we are still debating. If you have answers or counterexamples, we want to hear them.
What happens to roadmap predictability without story points?
CIRK governs execution, not timelines. Teams that depend on velocity charts and burndowns may lose a planning signal. Is execution governance enough, or does a complementary time layer still matter?
Does CIRK cover governance beyond engineering?
CIRK models Context, Iteration, Review, and Integration Risk for software tasks. But execution governance in production systems also involves compliance, reversibility, and audit trails. Should CIRK expand or stay scoped to engineering?
How does CIRK coexist with Scrum, Kanban, and SAFe?
CIRK replaces the estimation layer, not the process. But in practice, story points are deeply embedded in sprint planning, capacity allocation, and stakeholder reporting. Can CIRK slot in without disrupting those flows?
Have a perspective? Join the discussion on GitHub →
MIT licensed. No dependencies. Works with anything. We want to see where it fails.