Regex in R -- extracting sub-string based on two start/stop words -

- August 15, 2015

i have character (text) column:

tweets <- c(     "drinking bud light @budweiser @ joe's crab shack http://www.joes.com",     "drinking sam adams winter ale @samadams @ growler stop http://www.growlerstop.com",     "drinking coco loco @nodabrewing @ corner pub http://www.cornerpub.com" )

as can see, assume tweets have standard structure:

"drinking [name of beer] @[name of brewery] @ [name of bar, notice whitespace] http://"

i want use regular expressions (and substr()?) create 3 new columns:

name of beer
name of brewery
name of bar (note have white space, needs go "http:")

one step further - how control tweets not have same structure?

it's ugly:

setnames(nm=c('beer','brewery','bar'),as.data.frame(do.call(rbind,     regmatches(tweets,regexec('^drinking an? (.*) @(.*) @ (.*) http://.*$',tweets)) )[,-1l])); ##                   beer     brewery              bar ## 1            bud light   budweiser joe's crab shack ## 2 sam adams winter ale    samadams     growler stop ## 3            coco loco nodabrewing   corner pub

see regexec() , regmatches().

Search This Blog

M16

Regex in R -- extracting sub-string based on two start/stop words -

Comments

Post a Comment

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

android - CoordinatorLayout, FAB and container layout conflict -

unity3d - How do I remove the Unity Splash Screen from my iOS builds? -