In 2025, building a real-time data pipeline often involves hundreds of lines of Python, an orchestrator like Airflow, and weeks of configuration. What if you could define the same pipeline in 30 lines of YAML? The declarative approach changes the game.

The Pipeline Problem in 2025

The data ecosystem is more fragmented than ever:

Too many tools — Airflow, Prefect, Dagster, dbt, Fivetran, Airbyte... each tool covers a piece of the puzzle
Too much boilerplate code — 80% of pipeline code is plumbing, not business logic
Too much maintenance — Every Python dependency is a ticking time bomb (versions, conflicts, deprecations)
Too much latency — Most tools are designed for batch, not real-time

According to a 2024 Fivetran report, data teams spend an average of 44% of their time maintaining existing pipelines rather than working on new projects.

Imperative vs Declarative: Python vs YAML

The fundamental difference:

Aspect	Imperative (Python)	Declarative (YAML)
Approach	"How to do it"	"What to do"
Typical code	200-500 lines	20-50 lines
Learning curve	Weeks	Hours
Maintenance	Python dependencies	One binary + YAML
Real-time	Complex to implement	Native
Flexibility	Unlimited	Limited to plugins

The imperative approach gives you total control, but at the cost of complexity. The declarative approach sacrifices some flexibility for radical simplicity.

Anatomy of a YAML Pipeline

A declarative pipeline breaks down into three sections:

1. Sources — Where does the data come from?

source:
  type: http
  url: "https://api.example.com/events"
  method: GET
  auth:
    type: oauth2
    token_url: "https://auth.example.com/token"
  rate_limit: 100/minute
  pagination:
    type: cursor
    field: "next_cursor"

2. Transforms — What to do with the data?

transforms:
  - type: sql
    engine: duckdb
    query: |
      SELECT
        user_id,
        event_type,
        timestamp,
        json_extract(payload, '$.amount') as amount
      FROM input
      WHERE event_type IN ('purchase', 'refund')

  - type: pii_mask
    fields: [email, phone]
    method: sha256

3. Sinks — Where to send the results?

sink:
  type: postgresql
  connection: "postgres://user:pass@host:5432/db"
  table: "events_processed"
  batch_size: 1000
  on_conflict: upsert
  key: [event_id]

3 Concrete Use Cases

Case 1: API Sync to Database

You have a REST API emitting events and want to store them in PostgreSQL with SQL enrichment. In Python, that's 200+ lines (requests, psycopg2, error handling, retry...). In declarative YAML, it's 25 lines.

Case 2: CDC (Change Data Capture)

Capture changes from a source PostgreSQL database and replicate them to Snowflake in real-time. Native CDC eliminates the need for Debezium + Kafka Connect.

Case 3: PII Masking

Read data containing personal information, anonymize it (SHA-256 hashing), and send it to a data lake. Masking is declared as a simple transform, not a separate service.

Tool Comparison

Criteria	Airflow	Singer/Meltano	Mako
Approach	Imperative (Python DAGs)	Semi-declarative	Declarative (YAML)
Real-time	No (batch)	No (batch)	Yes (native)
Transforms	Python	dbt (SQL)	SQL + WASM
Installation	Complex	pip install	One Go binary
Observability	Web UI	Logs	Prometheus + Grafana

Quick Start in 5 Minutes

Here's how to get started with Mako, an open-source declarative pipeline framework written in Go:

# Clone and build
git clone https://github.com/Stefen-Taime/mako.git
cd mako
go build -o bin/mako .

# Initialize a new pipeline
./bin/mako init

# Run the pipeline
./bin/mako run pipeline.yaml

Mako supports HTTP/REST, Kafka, PostgreSQL CDC, DuckDB, and file sources (JSON, CSV, Parquet). Transforms include SQL via DuckDB, WASM plugins (Go/Rust), schema validation, and PII masking. For sinks: PostgreSQL, Snowflake, BigQuery, ClickHouse, S3, GCS, and more.

Conclusion

The declarative YAML approach won't replace Python for every use case. But for 80% of common data pipelines — API sync, CDC, simple ETL, PII masking — it offers a radically simpler and more maintainable alternative. Mako is an open-source framework (MIT) that embodies this philosophy: YAML in, events out.

Resources:

Mako on GitHub

Real-Time Data Pipelines: The Declarative YAML Approach Without Writing Code