top() & takeOrdered() are actions that return the N elements based on the default ordering or the Customer ordering provided by us
Syntax
Example
In this example, let us return the top 5 elements based on ascending order
takeOrdered() does the opposite of top()
Learning Spark : 60
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
Syntax
def top(num: Int)(implicit ord: Ordering[T]): Array[T] Returns the top k (largest) elements from this RDD as defined by the specified implicit Ordering[T]. This does the opposite of takeOrdered. For example: sc.parallelize(Seq(10, 4, 2, 12, 3)).top(1) // returns Array(12) sc.parallelize(Seq(2, 3, 4, 5, 6)).top(2) // returns Array(6, 5) num k, the number of top elements to return ord the implicit ordering for T returns an array of top elements
Example
In this example, let us return the top 5 elements based on ascending order
scala> val inputrdd = sc.parallelize{ Seq(10, 4, 5, 3, 11, 2, 6) } inputrdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[20] at parallelize at:44 scala> scala> implicit val sortIntegersByString = new Ordering[Int] { | override def compare(a: Int, b: Int) = { | //a.toString.compare(b.toString) | if(a > b) { | -1 | }else{ | +1 | } | } | } sortIntegersByString: Ordering[Int] = $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anon$1@4be552af scala> inputrdd.top(5) res28: Array[Int] = Array(2, 3, 4, 5, 6)
takeOrdered() does the opposite of top()
Reference
Learning Spark : 60
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD