How to create a Spark DataFrame

A dataframe is a collection of data, organised much like a table in a relational database with columns and rows. There are many methods available on a dataframe that can help with filtering, selecting, aggregating the data within.

There are many ways a DataFrame can be created. Below I show some of the common ones that I have used in pySpark.

Continue reading “How to create a Spark DataFrame”