1. Introduction: From “Compliance as Burden” to “Compliance as Architecture”
Most organizations experience compliance as friction: ad-hoc audits, long email threads, scattered spreadsheets, and heroic last-minute efforts to assemble documents for clients and regulators. In this model, compliance is a post-hoc activity—something done after systems and processes are built.
A compliance-by-design pipeline turns this around. Instead of treating compliance as an external constraint added at the end, we engineer it into our architecture, tooling, and workflows from the very beginning. The core idea is simple but powerful:
Every meaningful action in your system should be able to tell a story: What happened? Why was it allowed? Who signed off? Where is the evidence?
To achieve this, we need an integrated compliance pipeline that:
- Collects and normalizes operational and governance data.
- Enforces policy automatically wherever possible.
- Produces structured, reusable evidence that can be recomposed into:
- Client-facing documentation
- Regulatory submissions
- Internal governance reports
- Technical and operational dashboards
The goal is not just to “be compliant,” but to:
- Minimize rework and duplication in documentation.
- Maximize reuse potential of compliance artifacts and evidence.
- Make compliance status observable and explainable across the organization.
In this article, we will walk through:
- A reference architecture for a compliance-by-design pipeline.
- Design principles and patterns for policy automation and evidence capture.
- Concrete data models that enable documentation reuse.
- A step-by-step blueprint for implementation.
- How to generate tailored outputs (clients, authorities, operations) from the same core evidence base.
- Governance, sustainability, and continuous improvement practices to keep the pipeline alive and relevant.
2. Core Concepts and Design Principles
Before we design the pipeline, it helps to align on key concepts and principles.
2.1 Compliance-By-Design: Operational Definition
Compliance-by-design means:
- Policies → Machine-Readable Controls
Regulatory and internal requirements are translated into:- Testable technical controls
- Standard operating procedures (SOPs)
- Measurable indicators and thresholds
- Controls → Automated Enforcement & Monitoring
Wherever possible, controls are:- Enforced directly in infrastructure (e.g., IaC, CI/CD gates)
- Monitored continuously (logs, metrics, events)
- Monitoring → Evidence & Documentation-Ready Artifacts
Every control must:- Produce traceable evidence (events, logs, signatures, approvals)
- Be recordable in a structured way for reuse in documents and reports
2.2 Key Design Principles
- Single Source of Truth for Compliance Evidence
All control states and evidence should converge into a central Compliance Evidence Store with a consistent schema. - Separation of Concerns
- Data collection and normalization
- Policy definition and evaluation
- Evidence storage
- Presentation and documentation generation
must be decoupled so each can evolve independently.
- Composable Documentation
Documentation should be generated from smaller, atomic evidence units:- “Control X is compliant for system Y, as of timestamp T, based on evidence E.” These units are then composed into:
- Client security reports
- Regulator submissions
- Executive summaries
- Traceability and Explainability
For any claim in documentation (“We encrypt data at rest”), we must be able to trace:- Which control implements it (e.g.,
C-001-Encryption-At-Rest) - Which systems it covers
- Which evidence supports it (configuration snapshots, logs, test results)
- When this evidence was last refreshed
- Which control implements it (e.g.,
- Reuse and Meta-Data Richness
Every piece of evidence should be:- Tagged with metadata (system, control, owner, jurisdiction, criticality, validity period).
- Stored in a reusable format (e.g., JSON, structured documents). This is what makes it easy to repackage the same data for different audiences.
- Lifecycle Management and Versioning
Policies, controls, and evidence change over time. A robust pipeline must:- Version policies and control definitions.
- Keep history of evidence and its validity windows.
- Allow side-by-side views: before vs after a new regulation or system change.
3. Reference Architecture for a Compliance-by-Design Pipeline
Let’s define an architectural blueprint you can adapt and extend.
3.1 High-Level Components
At a high level, the pipeline consists of:
- Policy & Control Management Layer
- Policy registry
- Control library
- Mapping from policies → controls → systems
- Data Collection & Normalization Layer
- Integrations with systems (cloud, CI/CD, logs, HR, identity, etc.)
- Collectors and agents
- Normalization into a common evidence schema
- Compliance Engine
- Rule evaluation (policy-as-code)
- Risk scoring and thresholding
- Compliance state computation (per control, per system)
- Evidence Store
- Structured storage for:
- Raw evidence (logs, configs, test results)
- Aggregated compliance states
- Derived metrics and trends
- Structured storage for:
- Documentation & Reporting Layer
- Templated report generation (clients, regulators, internal)
- Dashboards and self-service portals
- APIs for downstream tools
- Governance, Workflow & Audit Layer
- Approvals, exceptions, and waivers
- Change management and versioning
- Meta-governance (who can change policies/controls, under what process)
3.2 Logical Data Flow
- Policy and Control Definition
- Compliance, legal, and security experts define requirements (e.g. ISO 27001, GDPR, sectoral regulations).
- These are turned into structured policies and controls.
- Mapping Controls to Systems
- Each control is linked to:
- Specific systems and services.
- Specific mechanisms (e.g., KMS configurations, access control lists).
- Each control is linked to:
- Collection
- Collectors gather relevant operational data:
- Cloud service configurations
- CI/CD pipeline artifacts
- Infrastructure-as-Code (IaC) templates
- Access logs and identity provider data
- HR and vendor management data (where relevant)
- Collectors gather relevant operational data:
- Normalization
- Data is transformed into a normalized schema:
EvidenceItemControlStateFinding
- This ensures all downstream consumers see a consistent data model.
- Data is transformed into a normalized schema:
- Evaluation
- A rules engine evaluates:
- “Does this system meet the requirement of control C-001?”
- The result:
compliant,non_compliant, orunknown, plus context.
- A rules engine evaluates:
- Storage
- Raw evidence and evaluation results are stored with:
- Time stamps
- System and owner tags
- Control references
- Jurisdiction/context labels
- Raw evidence and evaluation results are stored with:
- Documentation Generation
- Report templates pull from the Evidence Store:
- For each claim in the report, the pipeline retrieves relevant evidence units.
- Reports are rendered for:
- External clients (security overviews, SOC 2-style narratives)
- Authorities (regulatory filings, incident notifications)
- Operations (technical dashboards, runbooks)
- Report templates pull from the Evidence Store:
- Continuous Feedback
- Gaps, findings, and exceptions feed back into:
- Policy updates
- Control refinements
- Operational improvements
- Gaps, findings, and exceptions feed back into:
4. Designing Policies, Controls, and Evidence for Maximum Reuse
To make documentation composable and reusable, you need robust conceptual building blocks.
4.1 Policy Model
A policy is a high-level statement of intent or obligation, e.g.:
- “Customer data must be encrypted at rest and in transit.”
- “Access to production systems must follow least privilege principles.”
- “Sensitive operations must be auditable and tamper-evident.”
Structure each policy with:
- ID:
POL-ENCRYPTION-001 - Title: “Encryption of Data at Rest”
- Description: Prose, aligned with regulatory standard(s).
- Sources/References: E.g., “ISO 27001 Annex A.10.1,” “GDPR Art. 32.”
- Jurisdiction/Scope: Global or region-specific.
- Owner: Role or person accountable.
- Version & Validity: Version number, effective dates.
4.2 Control Model
A control is an actionable mechanism that enforces or tests compliance with a policy. Example:
- Control ID:
C-ENCRYPTION-AT-REST-001 - Policy Link:
POL-ENCRYPTION-001 - Type: Technical control
- Mechanism: “Ensure all production databases have encryption at rest enabled.”
- Implementation Targets: “AWS RDS, GCP Cloud SQL, Azure SQL Database.”
- Verification Method: Automated configuration check via cloud APIs and/or IaC scans.
- Thresholds: “All production instances must have
storage_encrypted = true.”
Controls define the “shape” of evidence you need to collect.
4.3 Evidence Model
An evidence item captures a concrete observation that supports (or contradicts) a control’s effectiveness.
Core fields:
evidence_idcontrol_idsystem_id/asset_idtimestamp_collectedcollector(what tool gathered this?)raw_payload(JSON, config snippet, log extract)normalized_fields(e.g.,encrypted: true,kms_key_id: ...)result(pass | fail | indeterminate)valid_until(if time-bounded)
Why this matters for reuse:
- Evidence items are the atomic units you can reuse across:
- A technical runbook (showing exact configs)
- A client security overview (aggregated compliance rates)
- A regulator’s data-protection impact assessment (DPiA)
You never want to rebuild these from scratch per audience; you want to remix them.
5. Data Collection & Normalization: Foundations of the Pipeline
To build a compliance-by-design pipeline, we must treat data collection as an engineered subsystem, not an afterthought.
5.1 Sources of Evidence
- Infrastructure & Cloud Platforms
- Cloud providers (AWS, Azure, GCP)
- Kubernetes clusters
- On-premise infrastructure
- Application & Service Layers
- Configuration stores
- Service mesh/security gateways
- API gateways and WAFs
- Identity & Access Management
- IdPs (e.g., Azure AD, Okta, Keycloak)
- RBAC/ABAC policies
- Privileged access management tools
- Development & CI/CD Tooling
- Git repositories (pull request history, code owners)
- CI pipelines (artifact logs, test results)
- IaC templates (Terraform, CloudFormation, etc.)
- Business & Governance Systems
- Ticketing and change management (Jira, ServiceNow)
- HR and vendor onboarding systems
- Risk registers and incident management tools
Each source contributes a slice of the compliance story.
5.2 Collection Patterns
- Pull-Based Collection
- Scheduled scans of:
- Cloud resource configurations
- IAM policies
- Repository configurations
- Batch jobs retrieving data at defined intervals.
- Scheduled scans of:
- Push-Based Collection
- Streaming logs and events to a central bus (e.g., Kafka, Pub/Sub).
- Webhooks from external services on key events (deployments, approvals).
- Agent-Based Collection
- Host-level agents collecting:
- Security agent signals
- File integrity monitoring (FIM)
- Process and network metadata
- Host-level agents collecting:
- API-Based Collection
- Direct API calls to SaaS tools for:
- Audit logs
- User management events
- Compliance attestation states
- Direct API calls to SaaS tools for:
5.3 Normalization Strategies
To make this data usable at scale:
- Standard Entity Model Define canonical entities:
System/ServiceAsset(e.g., database, storage bucket)User/IdentityControlandPolicyEvidenceItemFinding(non-compliance event plus context)
- JSON Schemas or Protobufs Use an explicit schema for each entity, with:
- Mandatory fields (IDs, timestamps, source)
- Optional metadata (tags by business unit, environment, jurisdiction)
- Tagging and Classification Tag all evidence with:
- Environment:
prod,staging,dev - Data classification:
public,internal,confidential,restricted - Risk tier:
low,medium,high - Region/jurisdiction:
EU,US,NO, etc.
- Environment:
Proper tagging is essential for:
- Generating jurisdiction-specific regulatory reports.
- Filtering evidence for a particular client or asset set.
- Supporting differential retention policies.
6. Policy-as-Code and the Compliance Engine
The compliance engine is the “brain” of the pipeline.
6.1 Policy-as-Code: Why It Matters
Instead of having compliance requirements locked in PDFs and spreadsheets, policy-as-code converts them into machine-readable rules. This approach:
- Enables automated evaluation at scale.
- Ensures consistency across systems.
- Allows traditional software engineering practices:
- Code review
- Version control
- Testing
Typical implementations might use:
- Rule languages (e.g., Rego, CEL, homegrown DSLs).
- Constraint engines embedded in CI/CD and runtime gateways.
6.2 Control Evaluation Lifecycle
For a given control, the lifecycle is:
- Rule Definition
- Define rule logic:
- Inputs: e.g., asset config, tags, IAM policy.
- Conditions: e.g., encryption flag must be
true.
- Define severity and risk level.
- Define rule logic:
- Binding to Evidence
- Map evidence schemas to rule inputs.
- Ensure that for each target system, the necessary evidence is collected.
- Evaluation
- Periodic (e.g., every 15 minutes) or event-driven.
- Evaluate rules using:
- Latest evidence snapshots
- Historical data (if needed for trends)
- Result Recording
- Create a
ControlStaterecord:control_id,system_id,state,last_evaluated_at- Pointers to the relevant evidence
- Create a
- Alerting and Workflows
- If a control fails:
- Create a
Findingwith context. - Hook into ticketing workflows.
- Trigger notifications or even automatic remediation, where safe.
- Create a
- If a control fails:
6.3 Handling Exceptions and Waivers
Real-world systems always have edge cases. A compliance-by-design pipeline must handle:
- Documented Exceptions
- Exceptions with:
- Clear justification
- Validity period
- Risk owner and approver
- Linked to the underlying control and systems.
- Exceptions with:
- Compensating Controls
- In cases where a primary control cannot be met, document:
- Additional monitoring
- Manual reviews
- Alternative technical measures
- In cases where a primary control cannot be met, document:
- Visibility in Documentation
- Exceptions should:
- Be included in internal and regulatory reports as needed.
- Be sanitized or summarized for clients depending on sensitivity.
- Exceptions should:
This creates a realistic compliance posture that is honest and manageable, while still auditable.
7. Evidence Store: Structuring for Observability and Reuse
The Evidence Store is where the pipeline’s value accumulates.
7.1 Architectural Characteristics
- Immutable Append-Only Logs for Raw Evidence
To ensure tamper-evidence and historical traceability. - Queryable, Indexed Views
For fast access by:- Control
- System
- Owner
- Time range
- Jurisdiction
- Separation of Raw and Derived Data
- Raw evidence is preserved for auditability.
- Derived tables/materialized views offer aggregated views (compliance scores, trends).
7.2 Example Logical Tables / Collections
policiescontrolssystems/servicesevidence_itemscontrol_statesfindingsexceptionsreports(metadata about generated reports, references to underlying evidence)
Each record should carry a strong identity:
- UUIDs
- Version numbers
- Soft deletion or “supersedes” semantics where needed
7.3 Metadata for Documentation Reuse
To support multiple documentation types:
- Add metadata explicitly for:
- Audience:
client_external,regulator,internal_ops,management - Sensitivity: Sanitization requirements (e.g. anonymization before external use)
- Narrative Fragments: Short explanation snippets (“This control ensures encryption at rest…”) that can be reused in multiple documents.
- Audience:
For example, controls might have:
narrative_short(one-liner)narrative_detailed(a couple of paragraphs)linksto user-friendly diagrams or dashboards
8. Generating Documentation for Different Stakeholders
Once your evidence and meta-structure are in place, documentation generation becomes largely a matter of templating and orchestration.
8.1 Template-Driven Documentation
Define reusable templates for:
- Client Security Overview Report
- Overview of security posture.
- Summary metrics (e.g., percentage of controls compliant).
- Selected controls relevant to the client’s scope (e.g. data residency, incident response).
- Evidence summaries, not raw logs.
- Regulatory Submission Package
- Mapping of policies and controls to regulatory articles/clauses.
- Detailed evidence for certain controls (e.g., DPIAs, breach handling records).
- Timeline of changes and incidents.
- Operational Dashboards & Runbooks
- Dynamic views for:
- On-call engineers
- Security operations (SOC)
- Platform teams
- Include:
- Control status per system
- Open findings
- Pending exceptions approaching expiry
- Dynamic views for:
- Executive & Board-Level Views
- Aggregated risk and compliance scores.
- Trends over time:
- “Controls compliant over last 12 months.”
- Heat maps of critical systems.
8.2 Composable Building Blocks
Each report template should be built from:
- Control blocks: showing control state and description.
- System blocks: summarizing evidence and key metrics per system.
- Jurisdiction blocks: focusing on regulatory obligations per region.
- Incident blocks: past incidents and remedial actions.
Each block is automatically populated from the Evidence Store via APIs or query templates.
8.3 Automation Pipeline for Documentation
You can design a report generation pipeline as follows:
- Trigger
- Scheduled (e.g., monthly, quarterly).
- On-demand (client request, regulator request).
- Event-based (after a significant change or incident).
- Scope Resolution
- Identify which:
- Clients
- Systems
- Controls
- Jurisdictions
are in scope for this report.
- Identify which:
- Evidence Query
- Pull all relevant:
control_statesevidence_itemsfindingsandexceptions
- Pull all relevant:
- Transformation
- Aggregate metrics (percentages, trends).
- Inject narrative text and diagrams.
- Sanitize or anonymize raw data for external audiences.
- Rendering
- Convert to:
- PDF / HTML / Markdown
- Online portals (dashboards)
- Attach relevant snapshots and appendices.
- Convert to:
- Distribution and Archiving
- Deliver via secure channels.
- Archive a copy with:
- Version label
- Timestamp
- Audience and scope
9. Building the Pipeline: A Practical Step-by-Step Blueprint
Let’s translate the concepts into a phased implementation roadmap.
9.1 Phase 1: Discovery and Scoping
- Inventory Systems and Data Flows
- List critical systems, data stores, and integrations.
- Identify data flows, especially for personal or sensitive data.
- Identify Regulatory and Client Drivers
- ISO 27001, SOC 2, GDPR, NIS2, sector-specific rules, contracts, etc.
- Map overlapping requirements to reduce duplication.
- Define Initial Policy and Control Set
- Start with a core subset:
- Access control
- Encryption
- Logging and monitoring
- Change management
- Start with a core subset:
- Establish Governance Structure
- Define roles:
- Policy owner, control owner, system owner
- Define approval flows for:
- New policies and controls
- Exceptions and waivers
- Define roles:
9.2 Phase 2: Data Collection Foundations
- Implement Connectors
- Prioritize evidence sources:
- Cloud platforms
- CI/CD systems
- IAM systems
- Build or configure agents and API integrations.
- Prioritize evidence sources:
- Define Evidence Schemas
- Create canonical schemas for:
EvidenceItemControlStateFinding
- Implement schema validation and versioning.
- Create canonical schemas for:
- Centralize Storage
- Stand up the Evidence Store:
- Choose appropriate database technologies (e.g., columnar DB for analytics + object storage for raw logs).
- Implement basic indexing and tagging strategies.
- Stand up the Evidence Store:
9.3 Phase 3: Policy-as-Code and Evaluation Engine
- Select or Create Rule Engine
- Choose a policy-as-code framework.
- Define coding standards and repository structure.
- Encode Priority Controls
- Start with high-impact controls (e.g. encryption, authentication).
- Write tests for rules:
- Unit tests with sample evidence.
- Integration tests with real data.
- Continuous Evaluation
- Wire the engine to:
- Periodically re-evaluate controls.
- Update the
control_statestable.
- Wire the engine to:
- Alerting & Ticketing Integration
- For failed controls, automatically:
- Create tickets.
- Assign to the right teams.
- Provide all necessary context and direct links to evidence.
- For failed controls, automatically:
9.4 Phase 4: Documentation and Reporting
- Define Report Templates
- Collaborate with:
- Legal, compliance, and security teams
- Client success and sales
- Engineering and operations
- Define content blocks, structure, and data sources.
- Collaborate with:
- Build the Report Generation Engine
- Implement:
- Query layer (SQL, GraphQL, or other APIs)
- Template rendering (Markdown → PDF/HTML, etc.)
- Implement:
- Pilot Reports
- Run pilot with:
- One internal operational report.
- One client report.
- Validate:
- Accuracy
- Readability
- Auditability (every statement traceable to evidence)
- Run pilot with:
- Feedback, Iterate, Expand
- Collect feedback from:
- Recipients and authors
- Refine templates, wording, and visuals.
- Collect feedback from:
9.5 Phase 5: Scale, Optimize, and Evolve
- Increase Coverage
- Expand control set incrementally.
- Add more evidence sources and platform integrations.
- Performance and Cost Optimization
- Optimize:
- Data retention strategies
- Storage tiers
- Query performance for frequent reports
- Optimize:
- Advanced Analytics
- Predictive risk modeling:
- Which controls are likely to fail next?
- Trend analysis:
- Impact of new policies or regulations
- Predictive risk modeling:
- Continuous Governance
- Regular policy refresh cycles.
- Periodic audits of pipeline integrity itself.
10. Maximizing Reuse and Future-Proofing
One of your key goals is reuse potential: every investment in evidence and documentation should serve multiple purposes.
10.1 Reuse at the Control and Evidence Level
- Design controls so they are generic where possible (e.g., “encryption at rest” across multiple data stores) with platform-specific implementations.
- Store evidence in a platform-neutral schema with adapters per technology.
- Aggregate results to create:
- Platform-agnostic compliance summaries.
- Per-system or per-client drill-downs.
10.2 Reuse Across Audiences
Use targeting metadata and scopes to drive:
- Client Reports
- Filter controls and systems to those affecting the client’s data or services.
- Translate technical verbiage into accessible language.
- Authorities
- Include more formalized policy references and legal language.
- Show specific controls linked to legal clauses.
- Include incident handling records and DPIAs.
- Operations
- Offer real-time or near real-time dashboards.
- Provide direct integration with runbooks and on-call tooling.
- Show only the level of detail needed for remediation.
10.3 Future-Proofing for New Regulations and Technologies
- Meta-Modeling Regulations
- Represent regulations and standards as data:
- Articles/clauses
- Requirements
- Mappings to policies and controls
- When a new regulation appears, you can:
- Map existing controls to new requirements.
- Identify gaps that need new controls.
- Represent regulations and standards as data:
- Abstracted Interfaces
- Keep your evidence schema and policy model abstracted from specific technological platforms:
- E.g., “database” and “object storage,” not just “RDS” or “S3.”
- When tech platforms change, adapt the collectors, not the whole model.
- Keep your evidence schema and policy model abstracted from specific technological platforms:
- Versioned Policies and Controls
- Each policy and control has versions:
- Allow co-existence of old and new regimes during migrations.
- Historical perspective:
- Show compliance posture under “old” and “new” frameworks.
- Each policy and control has versions:
11. Embedding Compliance in Everyday Workflows
To truly be “by design,” compliance must be integrated into the daily experience of engineers, product teams, and business operations—not just in a separate tool.
11.1 Shift-Left Compliance in Development
- Pre-Commit / Pre-Merge Checks
- Linting for security and compliance patterns in code and IaC.
- Blocking merges on critical control violations.
- CI/CD Gates
- Policy checks as build or deployment steps.
- Automatic environment classification and enforcement (e.g. production vs non-production).
- Developer Feedback
- Simple, actionable error messages.
- Links to documentation and examples.
11.2 Operational Embedment
- Dashboards in Existing Tools
- Integrate compliance views into:
- Observability tools
- Operational consoles
- Integrate compliance views into:
- On-Call and Incident Response
- During an incident, teams can:
- See relevant controls and state.
- Access historical evidence.
- During an incident, teams can:
- Business Workflow Integration
- For vendor onboarding, HR, and process changes:
- Trigger compliance checks.
- Ensure residual risk and mitigations are captured.
- For vendor onboarding, HR, and process changes:
12. Governance, Ethics, and Sustainability in the Compliance Pipeline
Beyond pure technical and regulatory perspective, a modern compliance pipeline must also address:
12.1 Ethical and Responsible Practices
- Transparency:
Make decisions and criteria transparent internally; explain why something is non-compliant and what the trade-offs are. - Fairness and Non-Discrimination:
If the pipeline influences decisions about people (e.g., access control, identity management), explicitly review controls and policies for fairness and bias. - Human Oversight:
For high-impact areas, keep a human in the loop:- Automated flags → human review → documented decision.
12.2 Environmental and Resource Sustainability
- Efficient Data Retention
- Keep necessary evidence to meet legal and audit requirements—but avoid infinite retention.
- Tier storage and lifecycle policies to balance cost, performance, and sustainability.
- Instrumentation Overload
- Avoid collecting unnecessary data that overwhelms humans and systems.
- Focus on meaningful evidence rather than sheer volume.
- Tool Consolidation
- Avoid sprawling, duplicative tooling that increases complexity and resource use.
- Centralize and rationalize collection and evaluation.
12.3 Cultural Sustainability
- Training and Enablement
- Educate teams not just what the pipeline does, but why.
- Turn compliance from a fear-driven obligation into a shared practice of care and quality.
- Feedback Channels
- Provide easy ways to suggest improvements.
- Treat policy gaps as learning opportunities, not just failure points.
13. Conclusion: Compliance as a Living System, Not a Static Binder
A well-designed compliance-by-design pipeline reshapes how your organization interacts with risk, regulation, and trust. Instead of static PDFs and brittle ad-hoc processes, you gain:
- A living, observable system of controls and evidence.
- A single source of truth from which diverse documentation can be generated—clients, regulators, operations, executives.
- A foundation for continuous improvement in both technical and human dimensions.
By:
- Modeling policies and controls explicitly.
- Capturing evidence automatically and structurally.
- Adopting policy-as-code and automated evaluation.
- Building reusable, composable documentation artifacts.
- Embedding compliance into everyday workflows and culture.
…you turn compliance from a drag on innovation into a platform for trustworthy, scalable growth.
Discover more from Jarlhalla Group
Subscribe to get the latest posts sent to your email.


