pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/vedatonuryilmaz/NMFscripts

githubassets.com/assets/primer-70be7debc79a8eff.css" /> GitHub - vedatonuryilmaz/NMFscripts · GitHub
Skip to content

vedatonuryilmaz/NMFscripts

Repository files navigation

NMF K-Optimization Pipeline for ATAC-seq Data

Overview

Python-based CLI pipeline for Non-negative Matrix Factorization (NMF) k-optimization on ATAC-seq data. Processes TCGA Z-scores to identify optimal component numbers through:

  1. Binary accessibility mapping from Z-scores
  2. Sample grouping by embryonic origen
  3. NMF evaluation across k-values (2-26)
  4. Metric calculation (F1, AUPRC, Reconstruction Error)
  5. Model weight storage and visualization

Quick Start

# Setup
python3 -m venv nmf_env && source nmf_env/bin/activate
pip install -r requirements.txt

# Configure paths in config.json
# Run pipeline
python prepare_data.py
python run_group_nmf_cli.py Ectoderm
python run_allsamples_nmf_cli.py

Repository Structure

├── config.json                    # Main configuration
├── emb.json                       # Sample groupings
├── requirements.txt               # Dependencies
├── data_utils.py                  # Data processing utilities
├── nmf_evaluation.py              # Evaluation metrics
├── nmf_plotting.py                # Visualization
├── nmf_workflow.py                # Core pipeline
├── prepare_data.py                # Data preparation CLI
├── run_group_nmf_cli.py           # Group-specific NMF CLI
├── run_allsamples_nmf_cli.py      # All-samples NMF CLI
└── embryonic_group_nmf_outputs_cli/
  ├── preprocessed_data/         # Intermediate files
  ├── Ect_NMF_K_opt_2_26/        # Ectoderm results
  ├── Mes_NMF_K_opt_2_26/        # Mesoderm results
  └── AllSamples_NMF_K_opt_2_26/ # Combined results

Configuration

config.json - Essential Settings

{
  "TCGA_ZSCORES_PATH": "/path/to/TCGA_zscores.parquet",
  "K_RANGE_EMBRYONIC_START": 2,
  "K_RANGE_EMBRYONIC_END": 26,
  "TOP_N_FEATURES_CUTOFF": 117442,
  "N_JOBS_PARALLEL": -1
}

emb.json - Sample Groupings

{
  "organ_system_groupings": [
  {
    "group_name": "Ectoderm",
    "cancer_codes": ["BRCA", "SKCM", "HNSC"]
  }
  ]
}

Input Requirements

  • TCGA Z-scores: Parquet file with features as rows, samples as columns (starting column 7)
  • Sample IDs: Must follow TCGA format for cancer type extraction
  • Python 3.7+ with packages from requirements.txt

Usage

Single Commands

# Data preparation (run once)
python prepare_data.py

# Group-specific analysis
python run_group_nmf_cli.py Ectoderm
python run_group_nmf_cli.py Mesoderm --n-jobs 4

# All samples analysis
python run_allsamples_nmf_cli.py

Background Execution

# Using tmux
tmux new -s nmf_pipeline
source nmf_env/bin/activate
python prepare_data.py && \
python run_group_nmf_cli.py Ectoderm && \
python run_allsamples_nmf_cli.py
# Ctrl+b, d to detach

Output Structure

Each analysis produces:

  • {k}NMF/weights/: W.npy (features×k), H.npy (samples×k) matrices
  • *_evaluation_summary.csv: Metrics for all k values
  • summary_figures/: K-selection plots
  • *_run_parameters.json: Analysis parameters

Key Metrics

  • Max Mean F1: Higher = better component separation
  • AUPRC: Higher = better reconstruction quality
  • Reconstruction Error: Lower = better fit

Troubleshooting

Issue Solution
Wrong cancer type extraction Modify get_cancer_type_from_sample_id() in data_utils.py
Memory issues Reduce N_JOBS_PARALLEL or BOOL_MAP_CHUNK_SIZE
Path errors Verify all paths in config.json
Polars issues Update Polars version, check enable_string_cache() usage

FAIR Compliance

  • Findable: Clear naming, version control support
  • Accessible: Open-source, CLI-based, standard dependencies
  • Interoperable: JSON, Parquet, NPZ, CSV formats
  • Reusable: Modular design, configurable parameters

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy