speed up the process of import multiple csv into python dataframe -

- May 15, 2014

i read multiple csv files (hundreds of files,hundreds of lines each same number of columns) target directory single python pandas dataframe.

the code below wrote works slow.it takes minutes run 30 files(so how long should wait if load of files). can alter make work faster?

besides, in replacefunction, want replace "_"(don't know encoding, not normal one) "-"(normal utf-8), how can that? use coding=latin-1because have french accents in files.

#coding=latin-1  import pandas pd import glob  pd.set_option('expand_frame_repr', false)  path = r'd:\python27\mypfe\data_test' allfiles = glob.glob(path + "/*.csv") frame = pd.dataframe() list_ = [] file_ in allfiles:     df = pd.read_csv(file_, index_col = none, header = 0, sep = ';', dayfirst = true,                       parse_dates=['heureprevue','heuredebuttrajet','heurearriveesursite','heureeffective'])     df.drop(labels=['apaye','methodepaiement','argentpercu'],axis=1,inplace=true)     df['sens'].replace("\n", "-", inplace=true,regex=true)     list_.append(df)      print "fichier lu:",file_  frame = pd.concat(list_)  print frame

you may try following - read columns need, use list comprehension , call pd.concat([ ... ], ignore_index=true) once, because it's pretty slow:

# there no sense read columns don't need # specify column list (excluding: 'apaye','methodepaiement','argentpercu') cols = ['col1', 'col2', 'etc.'] date_cols = ['heureprevue','heuredebuttrajet','heurearriveesursite','heureeffective']  df = pd.concat(         [pd.read_csv(f, sep = ';', dayfirst = true, usecols=cols,                       parse_dates=date_cols)          f in allfiles         ],         ignore_index=true      )

this should work if have enough memory store two resulting dfs...

Search This Blog

M16

speed up the process of import multiple csv into python dataframe -

Comments

Post a Comment

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

windows - Debug iNetMgr.exe unhandle exception System.Management.Automation.CmdletInvocationException -

android - CoordinatorLayout, FAB and container layout conflict -