takeSample() Example

takeSample() is an action that is used to return a fixed-size sample subset of an RDD

def takeSample(withReplacement: Boolean, num: Int, seed: Long = Utils.random.nextLong): Array[T]
Return a fixed-size sampled subset of this RDD in an array
withReplacement whether sampling is done with replacement
num             size of the returned sample
seed            seed for the random number generator
returns         sample of specified size in an array

scala> val inputrdd = sc.parallelize{ Seq(10, 4, 5, 3, 11, 2, 6) }
inputrdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[22] at parallelize at :47

scala> inputrdd.takeSample(false, 3, System.nanoTime.toInt)
res29: Array[Int] = Array(6, 11, 10)

scala> inputrdd.takeSample(false, 3, System.nanoTime.toInt)
res30: Array[Int] = Array(5, 11, 4)

scala> inputrdd.takeSample(true, 3, System.nanoTime.toInt)
res31: Array[Int] = Array(10, 11, 5)


Learning Spark : 41


  1. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in APACHE SPARK , kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training On APACHE SPARK . We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Saurabh Srivastava
    E-mail: saurabh@maxmunus.com
    Skype id: saurabhmaxmunus
    Ph:+91 8553576305 / 080 - 41103383

  2. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    apache spark training in electronic city