Hacker News

calebevans
Show HN: Cordon – Reduce large log files to anomalous sections github.com

Cordon uses transformer embeddings and density scoring to identify what's semantically unique in log files, filtering out repetitive noise.

The core insight: a critical error repeated 1000x is "normal" (semantically dense). A strange one-off event is anomalous (semantically isolated).

Outputs XML-tagged blocks with anomaly scores. Designed to reduce large logs as a form of pre-processing for LLM analysis.

Architecture: https://github.com/calebevans/cordon/blob/main/docs/architec...

Benchmark: https://github.com/calebevans/cordon/blob/main/benchmark/res...

Trade-offs: intentionally ignores repetitive patterns, uses percentile-based thresholds (relative, not absolute).


hn-front (c) 2024 voximity
source