Writing and Reading a Text File
Saving & Loading list of strings to a .txt file
Last updated
Was this helpful?
Saving & Loading list of strings to a .txt file
Last updated
Was this helpful?
Was this helpful?
def writeFile(filePath:String, listObject:List[String]):Unit = {
import java.io._
val stringSeq = listObject.map(r => r + "\n").toSeq
val file = new File(filePath)
val bw = new BufferedWriter(new FileWriter(file))
for (line <- lines) {
bw.write(line)
}
bw.close
}
To use it, do writeFile("path/mytextfile.txt", myList)
This function adds a new line character to the end of each element of your list, otherwise it would save as one long string in the text file, without quotes.
def readFile(filePath:String):List[String] = {
val stringList = spark.sparkContext.textFile(filePath).collect.toList
return stringList
}
To use it, do val myList = readFile("path/mytextfile.txt")
☞ String formatting is left to "Spark Scala Fundamentals" page of this book.
By default, it saves it as one long string, no New Line characters. As you saw from above, we had to manually add the New Line character to save a list. But sometimes we need to save as a long string, like what we did when we extracted, and saved the schema of a data frame as JSON. Find all details in page "Schema: Extracting, Reading, Writing to a Text File" page of this book.
import java.io._
val theString = "some string I have"
val theNewFileObject = new File("path/filename.txt")
val bw = new BufferedWriter(new FileWriter(theNewFileObject))
bw.write(theString)
bw.close
Reading One Long String, No New Line Character
import scala.io.Source
val string1 = "path/filename.txt"
val jstring1=Source.fromFile(string1).getLines.mkString
val bw = Source.fromFile("path/filename.txt")
bw.close
Sometimes, when you're on a cluster, trying to read a text file using .collect()
you might get an error related to Hadoop and complier saying,
Name: java.lang.IllegalAccessError
Message: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
StackTrace: at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
Solve it with reading text files through Hadoop commands instead, like so
val basePath = "path_to_the_datalake"
val filePath = s"$basePath/workspace/haya_toumy/filename.txt"
val text1DF = spark.read.
option("inferSchema", "false").
option("header", "false").
csv(f"$filePath")
val s1 = text1DF.rdd.map(_.mkString(",")).collect()(0)
Converts a Row to an RDD. In case you need it
import org.apache.spark.sql._ //for the Row
import org.apache.spark.rdd._ //for the RDD
// convert data frame to RDD
val rows: RDD[Row]= df.rdd
val flatRows= rows.flatMap(Row => toString)
//OR
sc.parallelize(row_name.toString()) //get Rows
df.map(x=>x.toString()).rdd //convert to RDD