speed up the process of import multiple csv into python dataframe -


i read multiple csv files (hundreds of files,hundreds of lines each same number of columns) target directory single python pandas dataframe.

the code below wrote works slow.it takes minutes run 30 files(so how long should wait if load of files). can alter make work faster?

besides, in replacefunction, want replace "_"(don't know encoding, not normal one) "-"(normal utf-8), how can that? use coding=latin-1because have french accents in files.

#coding=latin-1  import pandas pd import glob  pd.set_option('expand_frame_repr', false)  path = r'd:\python27\mypfe\data_test' allfiles = glob.glob(path + "/*.csv") frame = pd.dataframe() list_ = [] file_ in allfiles:     df = pd.read_csv(file_, index_col = none, header = 0, sep = ';', dayfirst = true,                       parse_dates=['heureprevue','heuredebuttrajet','heurearriveesursite','heureeffective'])     df.drop(labels=['apaye','methodepaiement','argentpercu'],axis=1,inplace=true)     df['sens'].replace("\n", "-", inplace=true,regex=true)     list_.append(df)      print "fichier lu:",file_  frame = pd.concat(list_)  print frame 

you may try following - read columns need, use list comprehension , call pd.concat([ ... ], ignore_index=true) once, because it's pretty slow:

# there no sense read columns don't need # specify column list (excluding: 'apaye','methodepaiement','argentpercu') cols = ['col1', 'col2', 'etc.'] date_cols = ['heureprevue','heuredebuttrajet','heurearriveesursite','heureeffective']  df = pd.concat(         [pd.read_csv(f, sep = ';', dayfirst = true, usecols=cols,                       parse_dates=date_cols)          f in allfiles         ],         ignore_index=true      ) 

this should work if have enough memory store two resulting dfs...


Comments

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

windows - Debug iNetMgr.exe unhandle exception System.Management.Automation.CmdletInvocationException -

configurationsection - activeMq-5.13.3 setup configurations for wildfly 10.0.0 -