Apache Spark
-
Enterprise Java

Reading and Writing Deeply Partitioned Files in Apache Spark
In large-scale data engineering and analytics, files are often stored in deeply partitioned directories to improve performance and manageability. This…
Read More » -
Enterprise Java

Real-Time Data Streams: Building Analytics with Kafka and Spark
In today’s fast-paced digital world, businesses demand real-time insights to make critical decisions. Batch processing is no longer enough—organizations want…
Read More » -
Software Development

Apache Spark: Unleashing Big Data Power
1. Introduction Apache Spark is a powerful open-source, distributed computing system that has become a cornerstone in the world of…
Read More » -
Software Development

Where is Apache Spark heading?
I watched (COVID19-era version of “attended”) the latest spark Summit and in one of the keynotes Reynold Xin from Databricks,…
Read More » -
Enterprise Java

Long Live ETL
Extract transform load is process for pulling data from one datasystem and loading into another datasystem. Datasystem involved are called…
Read More » -
Enterprise Java

Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 2)
In part 1 we have learned how to test data lineage info collection with Spline from a Spark shell. The same can…
Read More » -
Enterprise Java

Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 1)
One interesting and promising Open Source project that caught my attention lately is Spline, a data lineage tracking and visualization tool…
Read More » -
Enterprise Java

Insights from Spark UI
As continuation of anatomy-of-apache-spark-job post i will share how you can use Spark UI for tuning job. I will continue with same…
Read More » -
Enterprise Java

Anatomy of Apache Spark Job
Apache Spark is general purpose large scale data processing framework. Understanding how spark executes jobs is very important for getting most of…
Read More »


