Spark Scala

Explanations, details, workflow, and syntax of Spark, with Scala

Spark as a framework has three main interfaces to make it more conveneient for data professionals; those are Scala, Python, and R.

Spark is written in Scala, which is in its turn, based on Java. Thus you can import libraries from Java too. And so, it is allegedly faster to do use the native Scala interface than PySpark or SparkR, and you'll get updates first from Spark. However, if you're not familiar with Scala, that will be a learning curve for you initially. As it's a rigid and more particular language in comparison to Python.

I initially started with PySpark, then quickly switched to Spark Scala as it made more sense to me, I didn't like the mix between Python fluidity and Spark preciseness. Also, there wasn't a lot of support from the online community to PySpark in comparison to Spark Scala by the time I was putting these notes together.

If you do end up wanting to learn Scala, I included a page in this group/section to get you up to speed quickly with the fundamentals like functions, logic, types, loops, and if-else statements. You will also probably see it all over the Spark Scala section.

For all the above, this group of pages will contain all I know about Spark. The PySpark section will merely contain applications in PySpark, and examples.

PreviousIntroduction NextReading Files, Essential Imports & Docs + Optimization

Last updated 4 years ago

Was this helpful?