OpenLineage
Standardized Data Lineage for Modern Pipelines
von Trex Team
Beschreibung
"OpenLineage: Standardized Data Lineage for Modern Pipelines"
Data lineage is only useful when it stays correct under retries, partial failures, and heterogeneous tooling—and that’s where most implementations quietly break. This book is written for experienced data engineers, platform engineers, and architects who need lineage they can trust in production, not just a diagram for a slide. It focuses on the practical contracts and design choices that determine whether lineage becomes reliable operational telemetry or an expensive, fragmented graph.
You’ll build a rigorous mental model of lineage goals and scope, then dive into OpenLineage as an event standard: producers, transports, and consumers. From there, the book goes deep on the core event model—RunEvents, Jobs, Datasets, and input/output semantics—so you can produce deterministic graphs across orchestrators and execution engines. You’ll learn lifecycle correctness (START/terminal, idempotency, retries), identity and naming strategies that don’t drift, and facet design for rich metadata without interoperability loss. Practical chapters cover capture-layer decisions, mixing automatic and custom instrumentation safely, configuring transports for real networks, validating events, monitoring lineage completeness, and emitting governance-relevant metadata without creating privacy risk.
Expect a specification-aware, failure-mode-driven approach with clear decision criteria and operational guardrails. Familiarity with modern data stacks (orchestrators, warehouses/lakes, streaming, CI/CD, observability) is assumed; the payof
Produktdetails
| ISBN | 6610001179151 |
| Verlag | NobleTrex Press |
| Erscheinungsdatum | 09.03.2026 |
| Sprache | Englisch |