Difference between map() & flatMap() in RDD

This example below demonstrates the difference b/w map() & flatMap() operation in RDD using Scala Shell. A flatMap flattens multiple Array into one Single Array

mountain@mountain:~/sbook$ cat words.txt 
line1 word1
line2 word2 word1 
line3 word3 word4
line4 word1

scala> val lines = sc.textFile("words.txt");
...
scala> lines.map(_.split(" ")).take(3)
res4: Array[Array[String]] = Array(Array(line1, word1), Array(line2, word2, word1), Array(line3, word3, word4))

A flatMap() flattens multiple list into one single List

scala> lines.flatMap(_.split(" ")).take(3)
res5: Array[String] = Array(line1, word1, line2)

4 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    apache spark training in electronic city

    ReplyDelete
  3. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
    DataScience with Python Training in Bangalore

    ReplyDelete