top() & takeOrdered() are actions that return the N elements based on the default ordering or the Customer ordering provided by us
Syntax
Example
In this example, let us return the top 5 elements based on ascending order
takeOrdered() does the opposite of top()
Learning Spark : 60
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
Syntax
def top(num: Int)(implicit ord: Ordering[T]): Array[T] Returns the top k (largest) elements from this RDD as defined by the specified implicit Ordering[T]. This does the opposite of takeOrdered. For example: sc.parallelize(Seq(10, 4, 2, 12, 3)).top(1) // returns Array(12) sc.parallelize(Seq(2, 3, 4, 5, 6)).top(2) // returns Array(6, 5) num k, the number of top elements to return ord the implicit ordering for T returns an array of top elements
Example
In this example, let us return the top 5 elements based on ascending order
scala> val inputrdd = sc.parallelize{ Seq(10, 4, 5, 3, 11, 2, 6) }
inputrdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[20] at parallelize at :44
scala>
scala> implicit val sortIntegersByString = new Ordering[Int] {
| override def compare(a: Int, b: Int) = {
| //a.toString.compare(b.toString)
| if(a > b) {
| -1
| }else{
| +1
| }
| }
| }
sortIntegersByString: Ordering[Int] = $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anon$1@4be552af
scala> inputrdd.top(5)
res28: Array[Int] = Array(2, 3, 4, 5, 6)
takeOrdered() does the opposite of top()
Reference
Learning Spark : 60
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD