python - dask dataframe apply not executing in parallel -
i have following python script, create dask dataframe using existing pandas dataframe. i'm using multiprocessing scheduler, since function use pure python. scheduler creates 8 processes (one each partition) running sequentially, 1 @ time.
dask_data = ddf.from_pandas(data, npartitions=8) dask_data = dask_data.assign( images_array_1=dask_data.images_array_1.apply(lambda x: [] if x == "" else [int(el) el in x.split(',')], name='images_array_1'), images_array_2=dask_data.images_array_2.apply(lambda x: [] if x == "" else [int(el) el in x.split(',')], name='images_array_2') ) dask_data.compute(get=dask.multiprocessing.get)
i'm using dask parallelize computation, dataset small enough stay in main memory.
is possible run every process in parallel?
Comments
Post a Comment