Cordon uses transformer embeddings and density scoring to identify what's semantically unique in log files, filtering out repetitive noise.
The core insight: a critical error repeated 1000x is "normal" (semantically dense). A strange one-off event is anomalous (semantically isolated).
Outputs XML-tagged blocks with anomaly scores. Designed to reduce large logs as a form of pre-processing for LLM analysis.
Architecture: https://github.com/calebevans/cordon/blob/main/docs/architec...
Benchmark: https://github.com/calebevans/cordon/blob/main/benchmark/res...
Trade-offs: intentionally ignores repetitive patterns, uses percentile-based thresholds (relative, not absolute).