Read & Write Avro files using Spark SQL

Akash Patel
3 min readApr 17, 2020

--

READ AND WRITE — Avro, Parquet, ORC, CSV, JSON, Hive tables…

Here, I have covered all the Spark SQL APIs by which you can read and write data from and to HDFS and local files.

Sample data is available here. [Avro, Parquet, ORC, CSV, JSON]

Avro file format and Spark SQL integrated and it is easily available in Spark 2.4.x and later, but for Spark version( < 2.4.0 ) we have to configuration a bit different way. [Reference: https://github.com/databricks/spark-avro]

Command:
Spark version: 2.3.0
Python version: 3.6.8
Scala Version: 2.11

$Pyspark2

$Spark-shell

Configuration to make READ/WRITE APIs avilable for AVRO Data source

To read Avro File from Data Source, we need to make sure the Spark-Avro jar file must be available at the Spark configuration. (com.databricks:spark-avro_2.11:4.0.0)

Spark and Avro compatible matrix

Spark and Avro compatible matrix

Here, are two different methods to make Avro format available as a part of Spark-SQL APIs.

Pyspark — Spark-shell — Spark-submit add packages and dependency details

from pyspark.sql import SparkSession

# METHOD — 1
# import jar files
from pyspark.conf import SparkConf

conf = SparkConf()
conf.set(“spark.jars.packages”, “com.databricks:spark-avro_2.11:4.0.0”)

spark = SparkSession.builder.appName(‘AVRO-Excersices’).master(‘yarn’). \
config(conf= conf). \
getOrCreate()

# METHOD — 2
spark = SparkSession.builder.appName(‘AVRO-Excersices’).master(‘yarn’). \
config(“spark.jars.packages”, “com.databricks:spark-avro_2.11:4.0.0”). \
getOrCreate()

AVRO — READ AND WRITE DATA

AVRO — READ AND WRITE DATA

PARQUET — READ AND WRITE DATA

PARQUET — READ AND WRITE DATA

ORC — READ AND WRITE DATA

CSV — READ AND WRITE DATA

JSON — READ AND WRITE DATA

HIVE — READ AND WRITE DATA

Jupyter Notebook file: Source code is available here.

Please Clap!! 👏 See you all in my next blog. Follow me to get more updates about data engineering.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Akash Patel
Akash Patel

Written by Akash Patel

Data Engineer — 🗡️ Samurai

No responses yet

Write a response