31 May 2015

RDD : Basics

An RDD is the basic unit of data in Spark upon which all Operations are performed. RDDs are intermediate results stored in Memory and are Partitioned to be operated on multiple nodes in the Cluster

An RDD Operation can be either be actions or transformations

action returns result to the Driver Program or write it to the Storage. An action normally starts a Computation to provide result and always return some other data type other than RDD

transformation returns Pointer to new RDD

Check the link here for common actions & transformations