EthiCompass
Technical Deep Dive

How EthiCompass
Actually Works

A reactive evaluation system that transforms raw content into auditable ethical evaluations through a deterministic, transparent, and scalable pipeline.

Deterministic
Transparent
Compliant
Scalable
01 — CORE FLOW

The Evaluation Pipeline

Watch how content flows through the EthiCompass system in real-time

STEP 1

Dataset Submission

Frontend stores conversation datasets to S3 via LakeFS

STEP 2

Trigger Activation

API call to /run/{metric_id} initiates evaluation

STEP 3

Sample Creation

Normalized dataset object with metadata created

STEP 4

Kafka Queue

Async processing and job tracking

STEP 5

Metrics Analysis

7 dimensions analyze independently

STEP 6

Results Synthesis

Scorecard, decisions, and explanations

STEP 7

Audit Trail

Immutable log for compliance

Dataset Submission

Frontend stores conversation datasets to S3 via LakeFS. Each dataset contains session_id, assistant_id, language, context, and conversation Q&A pairs. LakeFS provides version control and returns commit_ids.

01
02 — TRIGGERS

How Evaluations Begin

Triggers detect conditions and create Samples from raw content

Frontend Dataset

Submit datasets via LakeFS with commit_ids

Scheduled Monitoring

Periodic scans of configured content sources

Webhook

External systems notify via webhook

Manual Evaluation

User-initiated through interface or API

Real-Time Detection

Content detection in real-time

Frontend Dataset Trigger

TRIGGER CONDITIONS

Dataset stored to S3
commit_ids returned
Valid API key

SAMPLE CREATION FLOW

1
Event Detected
2
Validate Request
3
Download Dataset
4
Create Sample
5
Queue for Processing
03 — SAMPLES

The Sample Object

Normalized content objects ready for ethical evaluation

Core Sample Fields

sample_idstring

Unique identifier (UUID)

job_idstring

Job identifier for tracking

datasetarray

Conversation sessions array

sourcestring

Trigger source type

source_metadataobject

Trigger and callback info

evaluation_metadataobject

Priority and policy versions

statusenum

pending | evaluating | completed

Sample Lifecycle

PendingCreated, waiting for evaluation
EvaluatingBeing analyzed by Metrics
CompletedEvaluation finished
FailedError occurred (retry available)

STORAGE TIERS

Cache
Redis 24h
Persistent
PostgreSQL 7+ years
Archive
S3 Immutable
04 — METRICS

The 7 Evaluation Dimensions

Each metric independently analyzes samples via cloud functions

D
T
E
P
F
R
R
D

Discrimination

Fairness and bias detection

T

Toxicity

Harmful language identification

E

Explainability

Clarity and transparency

P

Privacy

PII and data protection

F

Factuality

Accuracy and verification

R

Robustness

Reliability and consistency

R

Regulatory

Compliance with regulations

05 — RESULTS

Evaluation Outcomes

Clear decisions with full transparency and audit trails

APPROVED

Content passes all thresholds

CONDITIONAL

Minor issues, proceed with caution

ESCALATED

Requires human review

REJECTED

Critical issues detected

Dimensional Scorecard

Scores for each of the 7 dimensions

Flags & Explanations

Detailed reasoning for each flag

Recommendations

Actionable steps for improvement

Ready to Transform Your
AI Governance?

Experience deterministic, transparent, and compliant AI evaluation