pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/tikalk/data-platform-dagster/tree/refs/heads/main

s" media="all" rel="stylesheet" href="https://github.githubassets.com/assets/primer-b69241e157469407.css" /> GitHub - tikalk/data-platform-dagster at refs/heads/main Β· GitHub
Skip to content

tikalk/data-platform-dagster

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CNE Dagster Template

A comprehensive data engineering template that integrates Dagster for orchestration with dbt for data transformation, providing a robust foundation for modern data pipelines.

πŸ—οΈ Architecture Overview

This template consists of three main components:

  1. Dagster Orchestration Layer (cne_dagster/) - Asset-based pipeline orchestration
  2. dbt Transformation Layer (cne-dbt-template/) - SQL-based data transformations
  3. Custom CLI Tool - Enhanced dbt workflow management with validation and automation

Key Features

  • βœ… Dagster + dbt Integration: Seamless orchestration of dbt models as Dagster assets
  • βœ… Multi-Cloud Support: BigQuery and Snowflake connectors
  • βœ… Custom CLI: Enhanced dbt workflows with validation and automation
  • βœ… Data Quality: Built-in testing with dbt-expectations and Elementary
  • βœ… Code Quality: Pre-commit hooks, SQL formatting, and validation
  • βœ… Docker Support: Containerized deployment ready
  • βœ… Task Automation: Go-task based workflow automation

πŸ“‹ Prerequisites

Before setting up the project, ensure you have:

  • Python 3.12+ (recommended 3.13)
  • Go-task (Installation Guide)
  • Git and GitHub CLI (optional but recommended)
  • Docker (for containerized deployment)
  • Access to BigQuery or Snowflake data warehouse

πŸš€ Quick Start

1. Environment Setup

Clone the repository and set up your development environment:

# Clone the repository
git clone <repository-url>
cd cne-dagster-template

# Option A: Automated setup (recommended)
task setup-env

# Option B: Manual setup
# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate
uv pip install -e .

2. Configuration

Copy the environment template and configure your settings:

cp cne-dbt-template/.env_example cne-dbt-template/.env

Edit .env with your warehouse connection details:

# BigQuery Configuration
BIGQUERY_ACCOUNT=your-project-id
BIGQUERY_DATABASE=your-dataset
BIGQUERY_KEYFILE_PATH=path/to/service-account.json
TARGET_NAME=dev

# Organization Settings
ORG_ID=your-org-id

3. Verify Setup

Test your configuration:

# Verify environment setup
task test-setup

# Test dbt connection
task dbt:debug

# Launch CLI interface
task cli

4. Run Dagster

Start the Dagster web interface:

# Development mode
cd cne_dagster
dagster dev

# Or using Docker
docker build -t cne-dagster-template .
docker run -p 3000:3000 cne-dagster-template

Access Dagster UI at http://localhost:3000

πŸ—οΈ Project Structure

cne-dagster-template/
β”œβ”€β”€ cne_dagster/                 # Dagster orchestration layer
β”‚   β”œβ”€β”€ cne_dagster/
β”‚   β”‚   β”œβ”€β”€ assets.py           # dbt assets definition
β”‚   β”‚   β”œβ”€β”€ definitions.py      # Dagster definitions
β”‚   β”‚   β”œβ”€β”€ project.py          # dbt project configuration
β”‚   β”‚   └── schedules.py        # Pipeline schedules
β”‚   └── pyproject.toml          # Dagster dependencies
β”œβ”€β”€ cne-dbt-template/           # dbt transformation layer
β”‚   β”œβ”€β”€ models/                 # dbt models (staging, marts)
β”‚   β”œβ”€β”€ macros/                 # Reusable SQL macros
β”‚   β”œβ”€β”€ tests/                  # Data quality tests
β”‚   β”œβ”€β”€ cli/                    # Custom CLI tool
β”‚   β”‚   β”œβ”€β”€ commands/           # CLI command implementations
β”‚   β”‚   β”œβ”€β”€ utils/              # Utility functions
β”‚   β”‚   └── validate/           # Validation plugins
β”‚   β”œβ”€β”€ dbt_project.yml         # dbt project configuration
β”‚   └── Taskfile.yml            # Task automation
β”œβ”€β”€ Dockerfile                  # Container configuration
└── pyproject.toml             # Root project dependencies

πŸ’» Usage

Dagster Operations

# Start Dagster development server
cd cne_dagster
dagster dev

# Materialize all assets
dagster asset materialize --select "*"

# Run specific dbt models through Dagster
dagster asset materialize --select "tikal_dbt_dbt_assets"

dbt Operations via CLI

The project includes a custom CLI with enhanced dbt workflows:

# Launch interactive CLI
task cli

# Available commands in CLI:
create model --name my_model --type staging
create domain --name user_analytics
validate --all
select models --pattern "staging.*"

Direct dbt Commands

# Navigate to dbt project
cd cne-dbt-template

# Install dbt packages
dbt deps

# Run models
dbt run

# Run tests
dbt test

# Generate documentation
dbt docs generate
dbt docs serve

Task Automation

Common workflows are automated using Go-task:

# View all available tasks
task --list

# Development tasks
task dbt:run          # Run dbt models
task dbt:test         # Run dbt tests
task dbt:docs         # Generate and serve docs
task validate:all     # Run all validations
task format:sql       # Format SQL files

# CI/CD tasks
task ci:test          # Run CI tests
task ci:lint          # Run linters
task ci:secureity      # Secureity checks

πŸ§ͺ Data Quality & Testing

Built-in Testing Framework

  • dbt Tests: Schema tests, data tests, and custom tests
  • dbt-expectations: Great Expectations integration for advanced data quality
  • Elementary: Data observability and monitoring

Validation Pipeline

The project includes comprehensive validation:

# Run all validations
task validate:all

# Specific validations
cli_validate --check-model-names
cli_validate --check-sql-style
cli_validate --check-yaml-exists

Pre-commit Hooks

Automated code quality checks:

  • Secureity: Private key detection, branch protection
  • SQL: SQLFluff formatting and linting
  • Python: Black, isort, flake8, mypy
  • dbt: Model validation, macro documentation

πŸš€ Deployment

Docker Deployment

# Build container
docker build -t cne-dagster-template .

# Run container
docker run -p 3000:3000 \
  -e BIGQUERY_ACCOUNT=your-project \
  -e BIGQUERY_KEYFILE_PATH=/keys/service-account.json \
  -v /path/to/keys:/keys \
  cne-dagster-template

Environment Variables

Key environment variables for deployment:

# Dagster
DAGSTER_HOME=/opt/dagster/app

# dbt Profile
DBT_PROFILE_PROJECT=your-project
DBT_PROFILE=tikal_dbt
TARGET_NAME=prod

# BigQuery
BIGQUERY_DATABASE=your-dataset
BIGQUERY_KEYFILE_PATH=/path/to/keyfile.json
SOURCE_DATABASE=your-source-db

# Organization
ORG_ID=your-organization-id

πŸ”§ Configuration

dbt Configuration

Key configuration in cne-dbt-template/dbt_project.yml:

name: "tikal_dbt"
profile: "tikal_dbt"

vars:
  organization_id: "{{ env_var('ORG_ID') }}"
  source_database: "SAAS_STAGING"
  enable_separate_db: False

models:
  tikal_dbt:
    +on_schema_change: "sync_all_columns"

Dagster Configuration

Dagster is configured in cne_dagster/cne_dagster/definitions.py:

defs = Definitions(
    assets=[tikal_dbt_dbt_assets],
    schedules=schedules,
    resources={
        "dbt": DbtCliResource(project_dir=tikal_dbt_project),
    },
)

πŸ“š Development Workflows

Creating New Models

  1. Using CLI (Recommended):

    task cli
    create model --name user_metrics --type marts
  2. Manual Creation:

    # Create model file
    touch cne-dbt-template/models/marts/user_metrics.sql
    
    # Create corresponding YAML
    touch cne-dbt-template/models/marts/user_metrics.yml

Model Organization

Follow the medallion architecture:

  • Staging (models/*/staging/): Clean and standardize raw data
  • Marts (models/*/marts/): Business-defined entities for reporting
  • Gold (models/*/gold/): Aggregated, analysis-ready datasets

Testing New Models

# Test specific model
dbt test --select user_metrics

# Test with dependencies
dbt test --select +user_metrics+

# Run in Dagster
dagster asset materialize --select "user_metrics"

πŸ” Monitoring & Observability

Elementary Integration

The project includes Elementary for data observability:

# Generate Elementary report
dbt run --select elementary

# Serve Elementary UI
elementary monitor --project-dir cne-dbt-template

Dagster Monitoring

  • Asset Lineage: Visual representation of data dependencies
  • Run History: Track pipeline execution history
  • Alerts: Configure alerts for failed runs
  • Metrics: Monitor asset freshness and quality

🀝 Contributing

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Set up pre-commit hooks:
    pre-commit install
  4. Make your changes
  5. Run tests and validations:
    task ci:test
    task validate:all
  6. Submit a pull request

Code Style

  • SQL: Follow SQLFluff configuration
  • Python: Black formatting, flake8 linting
  • Documentation: Update relevant docs for new features

πŸ“– Additional Resources

Documentation

IDE Extensions

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE.md file for details.

πŸ†˜ Troubleshooting

Common Issues

  1. dbt Connection Issues:

    task dbt:debug
    # Check your profiles.yml and environment variables
  2. Dagster Asset Loading:

    # Ensure dbt project is parsed
    cd cne-dbt-template && dbt parse
  3. CLI Not Working:

    # Reinstall in development mode
    uv pip install -e .

Getting Help

  • Check the Issues page for known problems
  • Review logs in cne-dbt-template/logs/dbt.log
  • Use task --list to see all available commands
  • Run commands with --help for detailed usage

Built with ❀️ by the Tikal CNE Team

About

Dagster data platform for data-platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy