OpenAI (PII Redaction Guardrails)

Description

This use case shows how to govern LLM usage with decision-grade controls before any prompt leaves your network.

An internal KnowledgeOps agent may call OpenAI only when guardrails around capabilities, trust, model allow-lists, data classification, PII, region, and budget are satisfied.

Iron Book issues a one-shot, purpose-bound token and evaluates an open policy (Rego/OPA) per call; permitted calls proceed to OpenAI, denied calls return explicit reasons; both are fully auditable via Iron Book's Audit Log.

The script runs two scenarios:

  • Allow: internal, no PII → model permitted → within budget → region US → OpenAI call proceeds.
  • Deny: PII detected (email, SSN) → classification “internal” → blocked with reason.

Extend with a redaction scenario by, for example, setting purpose="redaction" and adjusting the policy to allow PII only for redaction workflows.


Business Problem & Value

Enterprises love the speed of LLMs but face real risks:

  • PII/PHI exposure: prompts can accidentally leak regulated data.
  • Model governance: teams quietly switch to unapproved models.
  • Runaway spend: token costs are hard to cap at the call site.
  • Auditability: approvals and denials need defensible, consistent, compliant evidence.

Traditional IAM can grant access to an API, but not to a specific model under specific business constraints per request. Iron Book closes that gap.


Why Iron Book vs. "Just More OAuth Scopes"?

OAuth proves an app can reach OpenAI; it doesn’t prove that this specific agent, with this capability, under this business context, and within budget, should be allowed to call this model right now. Iron Book adds:

  1. Agent-level least privilege (capabilities).
  2. Per-call, one-shot authorization with open, explainable policy.
  3. Behavioral trust + context (classification, PII, region, spend) in the same decision.
  4. Unified audit across agents, apps, and clouds.

Iron Book Components

  • Verifiable agent identity (DID/VC) + capability descriptors (e.g., openai_infer).
  • One-shot tokens bound to audience, action, resource, nonce, and expiry.
  • Behavior/context-aware policy (Rego/OPA) that evaluates every call.
  • Decision-grade audit: who/what acted, inputs summarized, policy version, reason, etc.
  • Vendor-neutral: use with OpenAI today; add Bedrock, AOAI, Vertex with the same pattern.

High-Value Controls Enforced

The policy in this example allows an OpenAI call only when:

  1. Capability: agent declares openai_infer.
  2. Trust threshold: input.trust >= 60 (behaviorally adjustable; comes from Iron Book’s trust engine).
  3. Model allow-list: model = {gpt-4o-mini, gpt-4o, o4-mini}.
  4. Classification: prompt is public | internal | deidentified.
  5. PII rule: deny if PII detected = true (example shows a simple local detector).
  6. Region: "US" in the demo (extend for your data residency).
  7. Budget: estimated cost (¢) ≤ remaining daily budget (¢).

You can expand this with purpose-of-use, quiet hours, project codes, model version pins, tenant, business unit, approval ticket IDs, etc., all as simple policy inputs.

Quick Start

pip install ironbook-sdk openai pydantic
python openai_iron.py

SDK Implementation

"""
Secure OpenAI API calls with Iron Book

Enterprise scenario:
- An internal "KnowledgeOps" agent can call OpenAI models only when:
  - The agent has capability 'openai_infer'
  - Trust score >= 60
  - The chosen model is in an allowlist
  - Data classification ∈ {'public','internal','deidentified'}
  - If PII is detected, the purpose must be 'redaction' (or else deny)
  - Estimated call cost is within remaining daily budget
  - Region is allowed (e.g., 'US')

What this script does:
1) Registers an agent in Iron Book with capability 'openai_infer'
2) Uploads a Rego/OPA policy that encodes the above guardrails
3) Defines `secured_openai_infer()` that:
   - runs a simple local PII check on the prompt
   - estimates token/cost for the model
   - obtains a one-shot Iron Book token and requests a policy decision
   - if allowed, calls OpenAI Responses API; if denied, raises
4) Runs two demos:
   - Allowed normal inference
   - Denied (PII present)
5) Feel free to add more demos, or to use this as a template for your own production policy:
   - Add more capabilities to the agent
   - Edit models on the allowlist
   - Edit regions on the allowlist
   - Edit classifications on the allowlist
   - Add more PII patterns to the PII detection
   - Add more cost guardrails
   - Etc.

Rego policy build helper: https://play.openpolicyagent.org
"""

import os
import re
import math
import asyncio
from typing import Dict, Any, Optional

from pydantic import BaseModel, Field

# Iron Book SDK (async)
from ironbook_sdk import (
    IronBookClient, RegisterAgentOptions, GetAuthTokenOptions,
    UploadPolicyOptions, PolicyInput
)

# OpenAI SDK (sync)
from openai import OpenAI


# ------------------------------------------------------------------------------
# Configuration
# ------------------------------------------------------------------------------
IRONBOOK_API_KEY = 'REPLACE ME'
OPENAI_API_KEY = 'REPLACE ME'
IRONBOOK_AUDIENCE = 'https://api.openai.com' # default audience for OpenAI

AGENT_NAME = "knowledgeops-agent"
CAPABILITIES = ["openai_infer"]

# Simple per-model $/1K token pricing (illustrative; keep conservative)
# Adjust to your internal cost table if needed.
MODEL_PRICING_PER_1K = {
    "gpt-4o-mini": {"input": 0.15, "output": 0.60},   # $/1k tok (example)
    "gpt-4o":      {"input": 2.50, "output": 5.00},
    "o4-mini":     {"input": 0.30, "output": 1.20},
}

# Default budget (in cents) for demo purposes
DEFAULT_DAILY_BUDGET_REMAINING_CENTS = 500  # $5.00 remaining


# ------------------------------------------------------------------------------
# Rego Policy: OpenAI guardrails
# ------------------------------------------------------------------------------
POLICY_CONTENT = """
default allow = false

# Constant sets
allowed_models          = {"gpt-4o-mini", "gpt-4o", "o4-mini"}
allowed_classifications = {"public", "internal", "deidentified"}

allow if {
  input.action   == "infer"
  input.resource == "openai://responses"

  # Agent has the appropriate capability
  input.capabilities[_] == "openai_infer"

  # Trust threshold
  input.trust >= 60

  # Model and classification must be allowed
  allowed_models[input.context.model]
  allowed_classifications[input.context.data_classification]

  # PII must not be present
  input.context.pii_detected != true

  # Region constraint
  input.context.region == "US"

  # Budget guardrail
  input.context.estimated_cost_cents <= input.context.daily_budget_remaining_cents
}
"""


# ------------------------------------------------------------------------------
# Simple PII detection (demo only; replace with real DLP/PII detection)
# ------------------------------------------------------------------------------
PII_PATTERNS = [
    re.compile(r"\b\d{3}-\d{2}-\d{4}\b"),      # US SSN-like
    re.compile(r"\b\d{10}\b"),                 # 10-digit numbers (phones, etc.)
    re.compile(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}"),  # emails
]

def detect_pii(text: str) -> bool:
    for pat in PII_PATTERNS:
        if pat.search(text or ""):
            return True
    return False


# ------------------------------------------------------------------------------
# Token & cost estimation helpers (rough)
# ------------------------------------------------------------------------------
def estimate_tokens(text: str) -> int:
    # Back-of-the-envelope: 1 token ~ 4 chars (varies widely in practice)
    return max(1, math.ceil(len(text) / 4))

def estimate_cost_cents(model: str, prompt_tokens: int, max_output_tokens: int) -> int:
    prices = MODEL_PRICING_PER_1K.get(model)
    if not prices:
        # unknown model: treat as expensive to be safe
        prices = {"input": 5.0, "output": 10.0}
    input_cost = (prompt_tokens / 1000.0) * prices["input"]
    output_cost = (max_output_tokens / 1000.0) * prices["output"]
    total_usd = input_cost + output_cost
    return int(round(total_usd * 100))  # cents


# ------------------------------------------------------------------------------
# Iron Book bootstrap (async)
# ------------------------------------------------------------------------------
async def ib_bootstrap() -> Dict[str, Any]:

    iron = IronBookClient(api_key=IRONBOOK_API_KEY)

    # Note: You don't have to re-register the agent every time in the Production scenario
    agent_vc = await iron.register_agent(RegisterAgentOptions(
        agent_name=AGENT_NAME,
        capabilities=CAPABILITIES
    ))

    # Note: You don't have to re-upload the policy every time in the Production scenario
    policy = await iron.upload_policy(UploadPolicyOptions(
        agent_did=agent_vc["agentDid"],
        config_type="opa",
        policy_content=POLICY_CONTENT,
        metadata={"name": "openai_guardrails", "version": "1.0"}
    ))

    return {"iron": iron, "agent_vc": agent_vc, "policy": policy}


# ------------------------------------------------------------------------------
# Iron Book authorize (async)
# ------------------------------------------------------------------------------
async def ib_authorize_openai_call(agent_did: str, vc: str, policy_id: str, context: Dict[str, Any]) -> None:
    iron = IronBookClient(api_key=IRONBOOK_API_KEY)

    # Acquire a one-shot token for the OpenAI audience
    token_data = await iron.get_auth_token(GetAuthTokenOptions(
        agent_did=agent_did,
        vc=vc,
        audience=IRONBOOK_AUDIENCE
    ))
    access_token = token_data.get("access_token")
    if not access_token:
        raise RuntimeError("Failed to obtain Iron Book access token")

    # Ask for the policy decision
    decision = await iron.policy_decision(PolicyInput(
        agent_did=agent_did,
        policy_id=policy_id,
        token=access_token,
        action="infer",
        resource="openai://responses",
        context=context
    ))

    allowed = getattr(decision, "allow", False) if hasattr(decision, "allow") else decision.get("allow", False)
    if not allowed:
        reason = getattr(decision, "reason", None) if hasattr(decision, "reason") else decision.get("reason", "Denied by policy")
        raise PermissionError(f"Iron Book policy denied OpenAI call: {reason}")


# ------------------------------------------------------------------------------
# Secured OpenAI call (sync wrapper)
# ------------------------------------------------------------------------------
class InferenceRequest(BaseModel):
    prompt: str = Field(..., description="User prompt text.")
    model: str = Field(default="gpt-4o-mini")
    max_output_tokens: int = Field(default=200)
    data_classification: str = Field(default="internal")  # 'public'|'internal'|'deidentified'
    purpose: str = Field(default="assistant")             # 'assistant'|'redaction'|...
    region: str = Field(default="US")
    daily_budget_remaining_cents: int = Field(default=DEFAULT_DAILY_BUDGET_REMAINING_CENTS)

def secured_openai_infer(ib_ctx: Dict[str, Any], req: InferenceRequest) -> str:
    # 1) Local prompt inspection for PII (signal only; source of truth is policy)
    pii = detect_pii(req.prompt)

    # 2) Rough cost estimate
    prompt_tokens = estimate_tokens(req.prompt)
    est_cost_cents = estimate_cost_cents(req.model, prompt_tokens, req.max_output_tokens)

    # 3) Build policy context
    context = {
        "model": req.model,
        "region": req.region,
        "data_classification": req.data_classification,
        "purpose": req.purpose,
        "pii_detected": pii,
        "estimated_cost_cents": est_cost_cents,
        "daily_budget_remaining_cents": req.daily_budget_remaining_cents
    }

    # 4) Ask Iron Book for a decision (and fetch a one-shot token) — will raise on deny
    asyncio.run(ib_authorize_openai_call(
        agent_did=ib_ctx["agent_vc"]["agentDid"],
        vc=ib_ctx["agent_vc"]["vc"],
        policy_id=ib_ctx["policy"]["policyId"],
        context=context
    ))

    # 5) If allowed, call OpenAI
    client = OpenAI(api_key=OPENAI_API_KEY)
    response = client.responses.create(
        model=req.model,
        input=req.prompt,
        max_output_tokens=req.max_output_tokens,
        store=False,
    )
    # Normalize output string (Responses API shape may evolve)
    try:
        # new SDK: response.output_text available in many builds
        return getattr(response, "output_text", None) or str(response)
    except Exception:
        return str(response)


# ------------------------------------------------------------------------------
# Demo
# ------------------------------------------------------------------------------
def main():
    # Bootstrap Iron Book (register agent as a new agent, upload policy)
    ib_ctx = asyncio.run(ib_bootstrap())

    print("\n=== 1) Allowed normal inference (no PII) ===")
    try:
        out = secured_openai_infer(ib_ctx, InferenceRequest(
            prompt="Summarize our Q3 internal roadmap into three bullet points for executives (no sensitive data).",
            model="gpt-4o-mini",
            data_classification="internal",
            purpose="assistant",
            region="US",
            max_output_tokens=150
        ))
        print("✅ ALLOW\n", out[:400], "...\n")
    except Exception as e:
        print("❌ DENY:", e, "\n")

    print("=== 2) Denied due to PII being present ===")
    try:
        out = secured_openai_infer(ib_ctx, InferenceRequest(
            prompt="Patient email is [email protected] and SSN is 123-45-6789. Draft a welcome letter.",
            model="gpt-4o-mini",
            data_classification="internal",
            purpose="assistant",
            region="US",
            max_output_tokens=150
        ))
        print("UNEXPECTED ALLOW:\n", out[:400], "...\n")
    except Exception as e:
        print("✅ Expected DENY:", e, "\n")


if __name__ == "__main__":
    main()

Solution Architecture

Flow (per request):

  1. Agent identity & capability: The KnowledgeOps agent is registered in Iron Book with a single general-purpose capability openai_infer.
  2. Policy upload: A Rego policy expressing the guardrails is uploaded/versioned.
  3. Local checks (signal): The script runs lightweight PII detection and a cost estimate (you may plug in a real DLP later).
  4. One-shot token: The agent requests a short-lived token with audience=https://api.openai.com.
  5. Policy decision: The script calls Iron Book policy_decision() with custom context (model, region, classification, PII flag, costs, budget, etc.).
  6. Allow → OpenAI: On allow, the script calls openai.responses.create(...).
  7. Deny → Reasoned failure: On deny, the script surfaces the exact reason (e.g., “PII detected”).
  8. Audit: Every allow/deny is logged with agent DID, policy version, context snapshot, reason, trust score, etc.

Key Implementation Elements

1) Agent Capability

Agent name: knowledgeops-agent

Primary capability: openai_infer

In production, you’ll typically have one agent per app/service or per critical workflow; capabilities let you keep least privilege.

2) Policy (Rego/OPA)

The policy evaluates a single simple allow rule (demo uses the allow if { ... } style) with conditions for capability, trust, model/classification allow-lists, PII, region, and budget. It’s uploaded through upload_policy() and referenced by policyId at decision time.

3) Context Inputs

The script builds and passes:

  • model, region, data_classification, purpose (optional in this demo).
  • pii_detected (bool) from the local detector.
  • estimated_cost_cents, daily_budget_remaining_cents (numeric).

Add anything else you need (e.g., project, ticket_id, environment, business_unit), the policy can reference them immediately.

4) One-Shot Token

get_auth_token() issues a short-lived one-shot JWT tied to the agent DID/VC, audience, and purpose. Even if intercepted, it quickly expires and is unusable outside the intended call.

5) Decision & Audit

policy_decision() returns allow/deny and a reason. Iron Book records:

  • agent DID, token/jti, action/resource, policy id with version.
  • evaluated context snapshot (sanitized), agent's trust score.
  • allow/deny result + reason (useful for support and audits).

Customization Checklist

You can easily extend this demo's features for production use:

  • Models: enforce approved model families and versions and make this multi-provider: duplicate the same guardrails for Bedrock, Azure OpenAI, Vertex - only the audience, resource, and model sets change.
  • Capabilities: split usage by team or workflow (openai_infer_reports, openai_infer_support, …).
  • Classification: drive from your DLP or gateway; map to “public/internal/secret/regulatory”.
  • PII/PHI: replace the demo regex with your enterprise DLP or a first-party PII service.
  • Budget: read remaining budget from your cost tracker; deny or downgrade model dynamically.
  • Region: expand to multi-region (“US”, “EU”, “CA”) based on user or tenant location.
  • Purpose-of-use: e.g., assistant, redaction, summarization, RCA - enable/deny per purpose.
  • Evidence bundles: export decision logs to SIEM and attach to control mappings (SOC2/PCI/HIPAA/AI-Act).

This pattern converts LLM usage from “best-effort hygiene” into provable governance, with least privilege, adaptive risk controls, and audits that hold up.

Use this file as a template, plug in your policy inputs, and you’ll have a reusable trust layer for all AI calls across your estate. Reach out to us via [email protected] if you'd like some help.