This project is read-only.

Project Description
Metadata driven ELT framework to ingest data from various sources into Azure SQL Data Warehouse and transform it to format appropriate for effective reporting.

I started this project as I am sick and tired of current ETL/ELT/EL systems where (still) many things have to be done manually - schedules, load patterns, SCD? code, etc. I know there are a lot of frameworks out there, but let's face it - each consulting company has it's own "framework" depending on the way they are implementing BI.

Besides, it is almost impossible to generate usable documentation from current ETL/ELT systems. Instead, documentation has to be done manually and is almost always out of date even before project is deployed to production.

So let's face the reality - we don't have a good solution for part that takes 75-85% of each Business Intelligence project! Microsoft launched it's Azure Data Factory long time ago but I am still about to see good real-life implementation of it. When we did a review of this product last year, it seemed more like a product that has been made just for specific purpose and then only branded and called "Data Factory" which apparently is not as it requires serious development skills to make it work. In summary, not what I would have expect from product called "Data Factory".

And what I would expect from "Data Factory" to be?
  • Simple. So that any "BI person" can reuse it's data design & manipulation skills
  • Easily configurable & intuitive (using GUI and SQL language where possible)
  • Distributed Ingestion
  • Ability to execute different types of Data Transformation tasks, such as SQL Statements, SSIS Packages, Commands (PowerShell), etc.
  • Automated Source to Target Mapping and other types of reports and documentation
  • Automated Dimension table loads (all SCD types)
  • Automated Fact table loads (full, delta, snapshot)
  • Master Data Management
  • In-built Data Profiling
  • etc.
If "Data Factory" is not capable of doing most of these tasks, let's just call it Needs-more-work-to-be-as-good-as-SSIS in Cloud.

Last edited Feb 6, 2017 at 9:08 PM by knyazs, version 34