![]() The first step, parse_job_args_task is a simple PythonOperator that parses the configuration parameter customer_code provided in the DAG run configuration (a DAG run is a specific trigger of the DAG): dag = DAG(ĭag. args, kwargs) 90 91 def createtaskgroup(self, tgfactory: Callable. So basically we have a first step where we parse the configuration parameters, then we run the actual PDT, and if something goes wrong, we get a Slack notification. taskmixin import DAGNode 43 from import XComArg 44. Here is what the Airflow DAG (named navigator_pdt_supplier in this example) would look like: We can do so easily by passing configuration parameters when we trigger the airflow DAG. This job will be a templated job, meaning that in order to run it we need to specify which customer database (as a parameter customer_code for example) to run it for. Lets say I have a DAG (we can call it a job) that performs some sql queries to generate a Persistent Derived Table PDT for a customer. There is a feature that Jenkins has that most schedulers do not. The improvements we gained by using an actual job scheduler are great (dag visualization, dynamic dag setup, specific task triggering among others), It is a direct competitor of other schedulers such as Spotify's Luigi or newer solutions such as DigDag or Prefect (created by core Airflow developers, I'm keeping this one on my list for future projects when it matures a bit).Īt my current company, Daltix, we are moving away from an older tool, Jenkins, a CI/CD tool we hacked so it can act as a job scheduler, to Airflow. Initially developed at Airbnb, a few years ago it became an Apache foundation project, quickly becoming one of the foundation top projects. Airflow is one of the most widely used Schedulers currently in the tech industry.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |