Assembling Sections and Orchestrating Data Pipeline
There should be only one orchestration section, and it must be the last section in the project, which assembles all previous sections and blocks into a complete and functional data pipeline.
Using Airflow
Currently, Splicing supports using Apache Airflow to orchestrate data pipelines. By default, Splicing will generate an Airflow DAG in a block:
- The DAG name will be the block name, representing the data transformation task the model performs.
- Every block is defined as a task in the DAG.
- The dependencies between tasks will be defined if "Source Section" and "Source Block" are set in a block.
-
By default,
- A block written in Python will be defined as a PythonOperator.
- A block written in SQL (dbt) will be defined as a BashOperator, which runs
dbt run
command to run a dbt model.
You can customize the task type (different operators) and the definition of the task (arguments to the operator) as your needs.
Exporting Data Pipeline to Airflow
You can download the code in the project and export it as an Airflow DAG by following these steps:
- Install the project dependencies by running
pip install -r requirements.txt
in the Python environment where Airflow is running. - Uncompress the downloaded zip file and place it in the
dags
folder of your Airflow home directory (i.e.,AIRFLOW_HOME/<dags_folder>
, usually~/airflow/dags
), or upload it to a managed Apache Airflow service (e.g., MWAA or Astronomer). - You should now be able to see the DAG in the Airflow UI and trigger a DAG run to execute the data pipeline.