python - dask dataframe apply not executing in parallel -


i have following python script, create dask dataframe using existing pandas dataframe. i'm using multiprocessing scheduler, since function use pure python. scheduler creates 8 processes (one each partition) running sequentially, 1 @ time.

dask_data = ddf.from_pandas(data, npartitions=8)  dask_data = dask_data.assign(     images_array_1=dask_data.images_array_1.apply(lambda x: [] if x == "" else [int(el) el in x.split(',')], name='images_array_1'),     images_array_2=dask_data.images_array_2.apply(lambda x: [] if x == "" else [int(el) el in x.split(',')], name='images_array_2') ) dask_data.compute(get=dask.multiprocessing.get) 

i'm using dask parallelize computation, dataset small enough stay in main memory.

is possible run every process in parallel?


Comments

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

windows - Debug iNetMgr.exe unhandle exception System.Management.Automation.CmdletInvocationException -

configurationsection - activeMq-5.13.3 setup configurations for wildfly 10.0.0 -