Data distribution in Apache Spark -
i'm new spark , have general question.as far know whole file must available on worker nodes processed.if so, how know partition should read?driver controls partitions how driver tell them read partition?
each rdd divided multiple partition. compute each partition, spark generate task , assign worker node. when driver sends task worker, specifies partitionid of task.
the worker executes task chaining rdd's iterator way inputrdd , pass along partitionid. inputrdd determines part of input corresponding specified partition id , return data.
rdditer.next -> parentrdditer.next -> grandparentrdditer.next -> ... -> inputrdditer.next
Comments
Post a Comment