Delta Lake is an open source storage layer that provides ACID transactions. Spark DataFrames can be saved in delta format by just specifying the format as “delta”.
Continue reading “Append, Overwrite, Merge into Delta Lake”Category: Spark
How to create a Spark DataFrame
A dataframe is a collection of data, organised much like a table in a relational database with columns and rows. There are many methods available on a dataframe that can help with filtering, selecting, aggregating the data within.
There are many ways a DataFrame can be created. Below I show some of the common ones that I have used in pySpark.
Continue reading “How to create a Spark DataFrame”