Data Engineering Projects

Featured DE Project

Data Processing with Spark

Batch ETL processing on Airbnb London listings using PySpark DataFrame API: type conversion, regex cleaning of price fields, filtering and aggregations (max/unique, hosts/year). Focus on clear transformation steps and reproducible notebook/scripts.

View Project

DE Project

Demographic Data Analyzer

Pandas data wrangling on 1994 US Census (Adult) dataset. GroupBy queries, filtering and result validation through automated unit tests. Reproducible CLI + optional notebook.

View

DE/Analytics Project

Medical Data Visualizer

Data cleaning (BMI/normalization/outliers) and insights through categorical charts & correlation heatmaps. Matplotlib/Seaborn for clear communication of relationships.

View

Foundations

Mean–Variance–Std Calculator

Vectorized NumPy calculations (mean/var/std/min/max/sum) with validation and structured error handling. Testable code piece as foundation for numerical pipelines.

View