Following are some important components of Spark
- Cluster Manager
- Is used to run the Spark Application in Cluster Mode
- Application
- User program built on Spark. Consists of,
- Driver Program
- The Program that has SparkContext. Acts as a coordinator for the Application
- Executors
- Runs computation & Stores Application Data
- Are launched at the beginning of an Application & runs for the entire life time of an Application
- Each Application gets it own Executors
- An Application can have multiple Executors
- An Executor is not shared by Multiple Applications
- Provides in-memory storage for RDDs
- For an Application, No >1 Executors run in the same Node
- Task
- Represents a unit of work in Spark
- Gets executed in Executor
- Job
- Parallel Computation consisting of multiple Tasks that gets spawned in response to Spark action
Check out the Diagram & Glossary here for more information on the Component