. Excessive use or large objects (like heavy Pandas DataFrames) can significantly degrade database performance. Apache Airflow The "Exclusive" Advanced Setup: Custom Backends
In Apache Airflow, tasks are isolated by design. This isolation is great for reliability, but it creates a challenge when one task needs to share information—like a filename, a record count, or a status flag—with a downstream task. (short for "cross-communication") is the built-in mechanism that solves this problem. What is XCom?
In this example, consumer_task implicitly pulls the exact return value of producer_task . This creates a direct, exclusive dependency. 2. Traditional Operators (xcom_push/xcom_pull) airflow xcom exclusive
In modern Airflow, the has made XComs feel more integrated than ever. Instead of manually "pushing" and "pulling" values, you simply return a value from one Python function and pass it as an argument to another. This creates an "exclusive" flow where data and dependencies are inextricably linked. Key Characteristics
@task def validate(txn_json, **context): df = pd.read_json(txn_json) # Can pull ONLY "raw_txns" from fetch_transactions # Attempt to pull any other key or from a diff task fails ... This isolation is great for reliability, but it
| Setting | Default | Change in airflow.cfg | |---------|---------|--------------------------| | xcom_backend | airflow.models.xcom.BaseXCom | – | | xcom_backend_kwargs | {} | – | | Max size (SQLite/Postgres) | 1–2 KB | Not recommended to increase → use external storage for >1MB |
XComs are not a general-purpose data storage solution. They have strict limitations that define their usage. In this example, consumer_task implicitly pulls the exact
def pull_exclusive(ti): # Only allowed to pull its own execution date's key key = f"run_ti.execution_date_data" return ti.xcom_pull(task_ids="push_exclusive", key=key)
(like CSVs or DataFrames); these should be stored in S3 or GCS instead. Database Bloat
By default, Airflow uses the PickleXCom backend. This means data must be serializable (pickled).
Start small: enable a custom XCom backend on one critical DAG, add exclusive key maps, and measure the improvement in reliability and performance. Then expand across your entire Airflow instance.