### mapValues() Example

When we use map() with a Pair RDD, we get access to both Key & value. There are times we might only be interested in accessing the value(& not key). In those case, we can use mapValues() instead of map().

In this example we use mapValues() along with reduceByKey() to calculate average for each subject

```scala> val inputrdd = sc.parallelize(Seq(("maths", 50), ("maths", 60), ("english", 65)))
inputrdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[29] at parallelize at :21

scala> val mapped = inputrdd.mapValues(mark => (mark, 1));
mapped: org.apache.spark.rdd.RDD[(String, (Int, Int))] = MapPartitionsRDD[30] at mapValues at :23

scala> val reduced = mapped.reduceByKey((x, y) => (x._1 + y._1, x._2 + y._2))
reduced: org.apache.spark.rdd.RDD[(String, (Int, Int))] = ShuffledRDD[31] at reduceByKey at :25

scala> val average = reduced.map { x =>
|                      val temp = x._2
|                      val total = temp._1
|                      val count = temp._2
|                      (x._1, total / count)
|                      }
average: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[32] at map at :27

scala>
| average.collect()
res30: Array[(String, Int)] = Array((english,65), (maths,55))
```

Note

Operations like map() always cause the new RDD to no retain the parent partitioning information

### Reference

Learning Spark : Partitioning : 64

1. This comment has been removed by the author.

2. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor led live training in Apache Scala, kindly contact us http://www.maxmunus.com/contact
MaxMunus Offer World Class Virtual Instructor led training on Apache Scala. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
Nitesh Kumar
MaxMunus
E-mail: nitesh@maxmunus.com
Skype id: nitesh_maxmunus
Ph:(+91) 8553912023
http://www.maxmunus.com/

3. Average can be calculated by executing below command .

val average = reduced.mapValues(x=>x._1/x._2)

Where x._1 is the total marks
x._2 is the total instances

4. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in APACHE SPARK , kindly contact us http://www.maxmunus.com/contact
MaxMunus Offer World Class Virtual Instructor led training On APACHE SPARK . We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
Saurabh Srivastava
MaxMunus
E-mail: saurabh@maxmunus.com
Skype id: saurabhmaxmunus
Ph:+91 8553576305 / 080 - 41103383
http://www.maxmunus.com/

5. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

Python Training in electronic city

DataScience with Python Training in electronic city

AWS Training in electronic city

Big Data Hadoop Training in electronic city

Devops Training in electronic city

6. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

apache spark training in electronic city

7. how does the average.collect know keys like maths or english because the last transformation performed was reduced.map can you clarify on that part please

8. I like your blog, I read this blog please update more content on python, further check it once at python online course

9. Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating big data online training