know the user who executed a DAG

Hey, guys,

I need to know who ran a DAG on the GCP composer airflow(Version 1 and 2).

Do you have any tips for knowing which user ran a dag?

Solved Solved
1 2 467
1 ACCEPTED SOLUTION

Tracking the specific user who triggered a DAG run in Google Cloud Composer presents challenges due to Airflow's design and Composer's logging capabilities. However, here are some strategies and workarounds to gain insights into DAG triggering:

Challenges:

  • Airflow's Design: Apache Airflow prioritizes workflow scheduling and automation over in-depth user auditing. The range of trigger mechanisms (manual, scheduled, external) makes directly associating users with DAG runs complex.
  • Cloud Composer Logging: Composer logs DAG run events, but may not explicitly identify triggering users.

Workarounds and Best Practices:

  • Enable Cloud Audit Logs: Google Cloud Audit Logs track API calls potentially related to DAG triggers. Analyze these for Composer/Airflow events, paying close attention to IAM permissions.
  • Webserver Access Logs: For UI-triggered DAGs, webserver logs can provide IP addresses and timestamps. Be aware of limitations with shared IPs or NAT configurations.
  • Custom Logging in DAGs: Add tasks at the start of DAGs to log potential user information. Since os.getlogin() has limitations in Composer, consider:
    • Parameter Passing: Include a username parameter when triggering DAGs.
    • Webserver Authentication: Integrate user authentication for UI-based triggers (requires additional setup).
  • Naming Conventions (Use Cautiously): In Composer 2, suggest including user identifiers in DAG names/descriptions, but understand the limited auditing value.
  • External Authentication Systems: For robust auditing, integrate external authentication directly with your DAG workflow.
  • Airflow Metadata Database: For advanced users, Airflow's metadata database offers details on DAG runs and modifications.

Security and Privacy Considerations:

  • Comply with data protection regulations and internal policies when tracking user information, especially if it's personally identifiable.

Example: Improved Custom Logging Task

 
import os 
from airflow import DAG 
from airflow.operators.python import PythonOperator
from datetime import datetime 
from airflow.utils.log.logging_mixin import LoggingMixin 

def log_dag_trigger_info(): 
    potential_username = os.environ.get('USER', 'unknown') 
    # Consider capturing additional info, such as IP address or DAG ID
    logger = LoggingMixin().log 
    logger.info("DAG Triggered by: %s", potential_username) 

with DAG('example_dag', 
         start_date=datetime(2021, 1, 1), 
         catchup=False) as dag: 

    log_info_task = PythonOperator(
        task_id='log_trigger_info', 
        python_callable=log_dag_trigger_info 
    )

    # Include the rest of your DAG tasks here
  • Clearer Trigger Mechanisms: Emphasizes how workaround suitability depends on trigger types.
  • Acknowledged os.getlogin()Limitations: Addresses issues and suggests alternatives for Composer.
  • Airflow Metadata Database: Introduces the concept for deeper analysis.
  • Enhanced Logging Example: Uses Airflow's logging for consistency and better log management.

View solution in original post

2 REPLIES 2

Tracking the specific user who triggered a DAG run in Google Cloud Composer presents challenges due to Airflow's design and Composer's logging capabilities. However, here are some strategies and workarounds to gain insights into DAG triggering:

Challenges:

  • Airflow's Design: Apache Airflow prioritizes workflow scheduling and automation over in-depth user auditing. The range of trigger mechanisms (manual, scheduled, external) makes directly associating users with DAG runs complex.
  • Cloud Composer Logging: Composer logs DAG run events, but may not explicitly identify triggering users.

Workarounds and Best Practices:

  • Enable Cloud Audit Logs: Google Cloud Audit Logs track API calls potentially related to DAG triggers. Analyze these for Composer/Airflow events, paying close attention to IAM permissions.
  • Webserver Access Logs: For UI-triggered DAGs, webserver logs can provide IP addresses and timestamps. Be aware of limitations with shared IPs or NAT configurations.
  • Custom Logging in DAGs: Add tasks at the start of DAGs to log potential user information. Since os.getlogin() has limitations in Composer, consider:
    • Parameter Passing: Include a username parameter when triggering DAGs.
    • Webserver Authentication: Integrate user authentication for UI-based triggers (requires additional setup).
  • Naming Conventions (Use Cautiously): In Composer 2, suggest including user identifiers in DAG names/descriptions, but understand the limited auditing value.
  • External Authentication Systems: For robust auditing, integrate external authentication directly with your DAG workflow.
  • Airflow Metadata Database: For advanced users, Airflow's metadata database offers details on DAG runs and modifications.

Security and Privacy Considerations:

  • Comply with data protection regulations and internal policies when tracking user information, especially if it's personally identifiable.

Example: Improved Custom Logging Task

 
import os 
from airflow import DAG 
from airflow.operators.python import PythonOperator
from datetime import datetime 
from airflow.utils.log.logging_mixin import LoggingMixin 

def log_dag_trigger_info(): 
    potential_username = os.environ.get('USER', 'unknown') 
    # Consider capturing additional info, such as IP address or DAG ID
    logger = LoggingMixin().log 
    logger.info("DAG Triggered by: %s", potential_username) 

with DAG('example_dag', 
         start_date=datetime(2021, 1, 1), 
         catchup=False) as dag: 

    log_info_task = PythonOperator(
        task_id='log_trigger_info', 
        python_callable=log_dag_trigger_info 
    )

    # Include the rest of your DAG tasks here
  • Clearer Trigger Mechanisms: Emphasizes how workaround suitability depends on trigger types.
  • Acknowledged os.getlogin()Limitations: Addresses issues and suggests alternatives for Composer.
  • Airflow Metadata Database: Introduces the concept for deeper analysis.
  • Enhanced Logging Example: Uses Airflow's logging for consistency and better log management.

Thanks for the tip. I really didn't bother putting this log into the DAG.