How do you test your DAGs locally before pushing to staging/prod?

Hi all,

Recently we are preparing to upgrade our CC1-Airflow1 setup to CC2-Airflow2. For ease of development, we are looking at how to setup a Airflow 2 environment that's close to CC2 actual environment and are delighted to know with CC2 there's a specific `composer-dev` CLI to create a docker container that use Google's CC local dev image. That seems to a great offering on the tin as that would save lots of setup time for us! However, things aren't as it seems in actual world it seems...

When using the image by `composer-dev`, it is massively restrictive which result in the scheduler constantly crashing on DAGs that has a lot of concurrent operations (compute lights, just API requests).

Back in CC1-Airflow1 we have our own docker-compose which spawns the whole setup in a proper way (with separate SQL db image and redis image etc.) so we can as well setup the same for CC2. However, doing so will be defiance of the reason for using CC2 official images, to develop based on a similar environment.

I would be nice to know how do CC2 community members develop? What's the development flow like and do you manage to test something big with instance created by `compose-dev`?

Thank you

Q

 

0 1 1,336
1 REPLY 1

There are multiple ways to test DAGs locally in Google Cloud before deploying them to staging or production environments.

One option is to use the composer-dev CLI, which creates a Docker container using Google's CC local development image. This allows you to test your DAGs in an environment that closely resembles the actual CC2 environment. However, it's important to note that the composer-dev image has limitations and may not handle DAGs with high concurrency well.

Another approach is to set up your own Docker Compose environment. This gives you more flexibility and control over the testing environment for your DAGs. However, this method may require more setup and maintenance effort.

Alternatively, you can consider using cloud-based testing services. These services provide isolated environments specifically designed for testing DAGs. They can handle DAGs with high concurrency and offer a convenient way to test your DAGs in a controlled environment.

The specific development flow for testing DAGs locally in Google Cloud depends on the chosen method. However, the general steps involved are as follows:

  1. Create a test environment.
  2. Install the necessary Airflow dependencies.
  3. Load your DAGs into the test environment.
  4. Run and test your DAGs to ensure their functionality.
  5. Fix any errors or issues that arise during testing.
  6. Once you are satisfied with the performance and functionality of your DAGs, you can proceed to deploy them to staging or production.

Here are some additional tips to enhance your DAG testing process:

  • Use a consistent naming convention for your DAGs and tasks to facilitate error tracking.
  • Include comments in your DAGs to document the purpose of each task and the expected output.
  • Implement unit tests to verify the functionality of your DAGs.
  • Leverage debugging tools to step through your DAGs and troubleshoot any errors that occur.