Spark Basics: top() & takeOrdered() Example

24 November 2015

top() & takeOrdered() Example

top() & takeOrdered() are actions that return the N elements based on the default ordering or the Customer ordering provided by us

Syntax

def top(num: Int)(implicit ord: Ordering[T]): Array[T]
Returns the top k (largest) elements from this RDD as defined by the specified implicit Ordering[T]. This does the opposite of takeOrdered. For example:

sc.parallelize(Seq(10, 4, 2, 12, 3)).top(1)
// returns Array(12)

sc.parallelize(Seq(2, 3, 4, 5, 6)).top(2)
// returns Array(6, 5)
num     k, the number of top elements to return
ord     the implicit ordering for T
returns an array of top elements

Example
In this example, let us return the top 5 elements based on ascending order

scala> val inputrdd = sc.parallelize{ Seq(10, 4, 5, 3, 11, 2, 6) }
inputrdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[20] at parallelize at :44

scala>

scala> implicit val sortIntegersByString = new Ordering[Int] {
     | override def compare(a: Int, b: Int) = {
     |     //a.toString.compare(b.toString)
     |       if(a > b) {
     |          -1
     |       }else{
     |          +1
     |       }
     |     }
     | }
sortIntegersByString: Ordering[Int] = $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anon$1@4be552af

scala> inputrdd.top(5)
res28: Array[Int] = Array(2, 3, 4, 5, 6)

takeOrdered() does the opposite of top()

Reference

Learning Spark : 60
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD