Regex in R -- extracting sub-string based on two start/stop words -


i have character (text) column:

tweets <- c(     "drinking bud light @budweiser @ joe's crab shack http://www.joes.com",     "drinking sam adams winter ale @samadams @ growler stop http://www.growlerstop.com",     "drinking coco loco @nodabrewing @ corner pub http://www.cornerpub.com" ) 

as can see, assume tweets have standard structure:

"drinking [name of beer] @[name of brewery] @ [name of bar, notice whitespace] http://" 

i want use regular expressions (and substr()?) create 3 new columns:

  1. name of beer
  2. name of brewery
  3. name of bar (note have white space, needs go "http:")

one step further - how control tweets not have same structure?

it's ugly:

setnames(nm=c('beer','brewery','bar'),as.data.frame(do.call(rbind,     regmatches(tweets,regexec('^drinking an? (.*) @(.*) @ (.*) http://.*$',tweets)) )[,-1l])); ##                   beer     brewery              bar ## 1            bud light   budweiser joe's crab shack ## 2 sam adams winter ale    samadams     growler stop ## 3            coco loco nodabrewing   corner pub 

see regexec() , regmatches().


Comments

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

windows - Debug iNetMgr.exe unhandle exception System.Management.Automation.CmdletInvocationException -

configurationsection - activeMq-5.13.3 setup configurations for wildfly 10.0.0 -