Spark - DEV Community

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Jubin Soni

Jun 24

Apache Spark Query Optimization on Databricks: Catalyst, AQE, and Photon Engine

#databricks #spark #python #performance

10 min read

Jubin Soni

Jun 24

Real-Time AI Feature Engineering with Spark Structured Streaming and Databricks Feature Store

#databricks #spark #ai #python

10 min read

DataDriven

Jun 16

Read-Write ETL on NAS Data with EMR Serverless Spark — No Cluster, No Copy

#aws #spark #emr #amazonfsxfornetappontap

10 min read

Andrey

May 5

Stream Processing Continuum: Golang Sockets to Flink and Spark Pipelines

#dataengineering #go #spark #data

36 min read

Manish Podiyal

May 4

The Data Refinery: Why Apache Spark is the Engine Behind Real-World Big Data Use Cases

#bigdata #spark #pyspark #dataengineering

2 min read

StiiWann

May 19

Fentanyl Poverty: Building a Big Data Pipeline to Map America's Overdose Epidemic

#bigdata #elasticsearch #spark #python

3 min read

RASMIN BHALLA

Apr 11

Understanding Join Strategies in PySpark (With Real-World Insights)

#pyspark #databricks #sparkarchitecture #spark

2 min read

Alexandros Biratsis

Apr 6

Stopping Spark Structured Streaming jobs via external signals

#spark #scala #databricks #streaming

3 min read

Lee Yao

May 7

Why My Spark Container Keeps Exiting — Docker PID 1 and the Daemon Trap

#docker #spark #dataengineering #devops

5 min read

Vinicius Fagundes

Apr 13

Apache Spark in Plain English: The Engine Behind Databricks

#ai #dataengineering #spark

5 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

DEV Community

# spark

Apache Spark Query Optimization on Databricks: Catalyst, AQE, and Photon Engine

Real-Time AI Feature Engineering with Spark Structured Streaming and Databricks Feature Store

Top 12 Spark Interview Problems for Data Engineers, With Answers

Read-Write ETL on NAS Data with EMR Serverless Spark — No Cluster, No Copy

Stream Processing Continuum: Golang Sockets to Flink and Spark Pipelines

The Data Refinery: Why Apache Spark is the Engine Behind Real-World Big Data Use Cases

Fentanyl Poverty: Building a Big Data Pipeline to Map America's Overdose Epidemic

Understanding Join Strategies in PySpark (With Real-World Insights)

Stopping Spark Structured Streaming jobs via external signals

Why My Spark Container Keeps Exiting — Docker PID 1 and the Daemon Trap

Apache Spark in Plain English: The Engine Behind Databricks