Before doing any data science, machine learning or AI, you need to get your data right. As the volume of data grows, having a reliable, available and scalable data pipeline becomes a challenge.
In this talk we will share our learnings from running a data pipeline in AWS infrastructure using technologies like Apache Spark, gRPC, Protocol buffers.
Majid Fatemian, is a Principal Software Engineer of data platforms at Red Ventures. He is passionate about scalability and reliability of distributed systems.