Building Data Processing Pipelines with Luigi

Originally developed at Spotify, Luigi is a Python module to build complex pipelines of batch jobs. It offers
dependency resolution, workflow management and task visualization.
Luigi is used extensively at TrustYou to run hundreds of batch jobs on two Hadoop clusters, ranging from pure ETL
to more data science and NLP pipelines.
In this talk will give an introduction to Luigi, showing the ideas behind it and how to use it to orchestrate large, multi-step data processing
tasks. I will also show some internal use cases and share some tips and tricks on how to use Luigi effectively.

The talk is aimed at Data Engineers, Data Scientists and Python programmers that need to handle and orchestrate batch jobs. Familiarity with Hadoop and/or Spark is not required but it will be beneficial.

Sign up and never miss a PyMunich event