Writing a List of Strings To a Text File (With New Line Character)
def writeFile(filePath:String, listObject:List[String]):Unit = {
import java.io._
val stringSeq = listObject.map(r => r + "\n").toSeq
val file = new File(filePath)
val bw = new BufferedWriter(new FileWriter(file))
for (line <- lines) {
bw.write(line)
}
bw.close
}
To use it, do writeFile("path/mytextfile.txt", myList)
This function adds a new line character to the end of each element of your list, otherwise it would save as one long string in the text file, without quotes.
To use it, do val myList = readFile("path/mytextfile.txt")
☞ String formatting is left to "Spark Scala Fundamentals" page of this book.
Saving Without a New Line Character
By default, it saves it as one long string, no New Line characters. As you saw from above, we had to manually add the New Line character to save a list. But sometimes we need to save as a long string, like what we did when we extracted, and saved the schema of a data frame as JSON. Find all details in page "Schema: Extracting, Reading, Writing to a Text File" page of this book.
import java.io._
val theString = "some string I have"
val theNewFileObject = new File("path/filename.txt")
val bw = new BufferedWriter(new FileWriter(theNewFileObject))
bw.write(theString)
bw.close
Reading One Long String, No New Line Character
import scala.io.Source
val string1 = "path/filename.txt"
val jstring1=Source.fromFile(string1).getLines.mkString
val bw = Source.fromFile("path/filename.txt")
bw.close
☞ If Reading a Text File Errored Out
Sometimes, when you're on a cluster, trying to read a text file using .collect() you might get an error related to Hadoop and complier saying,
Name: java.lang.IllegalAccessError
Message: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
StackTrace: at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
Solve it with reading text files through Hadoop commands instead, like so
val basePath = "path_to_the_datalake"
val filePath = s"$basePath/workspace/haya_toumy/filename.txt"
val text1DF = spark.read.
option("inferSchema", "false").
option("header", "false").
csv(f"$filePath")
val s1 = text1DF.rdd.map(_.mkString(",")).collect()(0)
Flatten and Read JSONs
Converts a Row to an RDD. In case you need it
import org.apache.spark.sql._ //for the Row
import org.apache.spark.rdd._ //for the RDD
// convert data frame to RDD
val rows: RDD[Row]= df.rdd
val flatRows= rows.flatMap(Row => toString)
//OR
sc.parallelize(row_name.toString()) //get Rows
df.map(x=>x.toString()).rdd //convert to RDD