Data distribution in Apache Spark -

- September 15, 2010

i'm new spark , have general question.as far know whole file must available on worker nodes processed.if so, how know partition should read?driver controls partitions how driver tell them read partition?

each rdd divided multiple partition. compute each partition, spark generate task , assign worker node. when driver sends task worker, specifies partitionid of task.

the worker executes task chaining rdd's iterator way inputrdd , pass along partitionid. inputrdd determines part of input corresponding specified partition id , return data.

rdditer.next -> parentrdditer.next -> grandparentrdditer.next -> ... -> inputrdditer.next

Search This Blog

M16

Data distribution in Apache Spark -

Comments

Post a Comment

Popular posts from this blog

iis - ASP.Net Core CreatedAtAction in HttpPost action returns 201 but entire request ends with 500 -

gcc - Neither ld wrap nor LD_PRELOAD working to intercept system call -

ssh - Vagrant Windows - ssh_exchange_identification: read: Connection reset by peer -