Apache Spark – A Basic Understanding

Apache Spark - A Basic Understanding -  #machinelearning #IoT #AI #BigData

  • Stages are divided based on computational boundaries, all computations (operators) cannot be Updated in a single Stage.
  • One task is executed on one partition of data on one executor (machine).
  • DAG: DAG stands for Directed Acyclic Graph, in the present context its a DAG of operators.
  • Driver: The program/process responsible for running the Job over the Spark Engine

    Master: The machine on which the Driver program runs

    Slave: The machine on which the Executor program runs

    All jobs in spark comprise a series of operators and run on a set of data.

  • All the operators in a job are used to construct a DAG (Directed Acyclic Graph).

Before diving deep into how Apache Spark works, lets understand the jargon of Apache Spark Job: A piece of code which reads some input from HDFS or local,

@abunchofdata: Apache Spark – A Basic Understanding – #machinelearning #IoT #AI #BigData

Before diving deep into how Apache Spark works, lets understand the jargon of Apache Spark

Job: A piece of code which reads some input from HDFS or local, performs some computation on the data and writes some output data.

Stages: Jobs are divided into stages. Stages are classified as a Map or reduce stages (Its easier to understand if you have worked on Hadoop and want to correlate). Stages are divided based on computational boundaries, all computations (operators) cannot be Updated in a single Stage. It happens over many stages.

Tasks: Each stage has some tasks, one task per partition. One task is executed on one partition of data on one executor (machine).

DAG: DAG stands for Directed Acyclic Graph, in the present context its a DAG of operators.

Executor: The process responsible for executing a task.

Driver: The program/process responsible for running the Job over the Spark Engine

Master: The machine on which the Driver program runs

Slave: The machine on which the Executor program runs

All jobs in spark comprise a series of operators and run on a set of data. All the operators in a job are used to construct a DAG (Directed Acyclic Graph). The DAG is optimized by rearranging and combining operators where possible. …

(as of August 10, 2017 11:34 pm – More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)

(as of August 10, 2017 11:34 pm – More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)

(as of August 10, 2017 11:34 pm – More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)

(as of August 10, 2017 11:34 pm – More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)

(as of August 10, 2017 11:34 pm – More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)

(as of August 10, 2017 11:34 pm – More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)

(as of August 10, 2017 11:34 pm – More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)

(as of August 10, 2017 11:34 pm – More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)

(as of August 10, 2017 11:34 pm – More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)

(as of August 10, 2017 11:34 pm – More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)

Apache Spark – A Basic Understanding

You might also like More from author

Comments are closed, but trackbacks and pingbacks are open.