An RDD is the basic unit of data in Spark upon which all Operations are performed. RDDs are intermediate results stored in Memory and are Partitioned to be operated on multiple nodes in the Cluster
An RDD Operation can be either be actions or transformations
action returns result to the Driver Program or write it to the Storage. An action normally starts a Computation to provide result and always return some other data type other than RDD
transformation returns Pointer to new RDD
Check the link here for common actions & transformations
An RDD Operation can be either be actions or transformations
action returns result to the Driver Program or write it to the Storage. An action normally starts a Computation to provide result and always return some other data type other than RDD
transformation returns Pointer to new RDD
Check the link here for common actions & transformations