## 05 December 2015

### mapValues() Example

When we use map() with a Pair RDD, we get access to both Key & value. There are times we might only be interested in accessing the value(& not key). In those case, we can use mapValues() instead of map().

In this example we use mapValues() along with reduceByKey() to calculate average for each subject

```scala> val inputrdd = sc.parallelize(Seq(("maths", 50), ("maths", 60), ("english", 65)))
inputrdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD at parallelize at :21

scala> val mapped = inputrdd.mapValues(mark => (mark, 1));
mapped: org.apache.spark.rdd.RDD[(String, (Int, Int))] = MapPartitionsRDD at mapValues at :23

scala> val reduced = mapped.reduceByKey((x, y) => (x._1 + y._1, x._2 + y._2))
reduced: org.apache.spark.rdd.RDD[(String, (Int, Int))] = ShuffledRDD at reduceByKey at :25

scala> val average = reduced.map { x =>
|                      val temp = x._2
|                      val total = temp._1
|                      val count = temp._2
|                      (x._1, total / count)
|                      }
average: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD at map at :27

scala>
| average.collect()
res30: Array[(String, Int)] = Array((english,65), (maths,55))
```

Note

Operations like map() always cause the new RDD to no retain the parent partitioning information

### Reference

Learning Spark : Partitioning : 64