PyConDE 2016 / Building Data Processing Pipelines with Luigi
Originally developed at Spotify, Luigi is a Python module to build complex pipelines of batch jobs. It offers dependency resolution, workflow management and task visualization. Luigi is used extensively at TrustYou to run hundreds of batch jobs on two Hadoop clusters, ranging from pure ETL to more data science and NLP pipelines. In this talk will give an introduction to Luigi, showing the ideas behind it and how to use it to orchestrate large, multi-step data processing tasks. I will also show some internal use cases and share some tips and tricks on how to use Luigi effectively.
The talk is aimed at Data Engineers, Data Scientists and Python programmers that need to handle and orchestrate batch jobs. Familiarity with Hadoop and/or Spark is not required but it will be beneficial.