python - Efficient way to extract data within double quotes -

- August 15, 2011

i need extract data within double quotes string.

input:

<a href="networking-denial-of-service.aspx">next page →</a>

output:

networking-denial-of-service.aspx

currently, using following method , running fine.

atag = '<a href="networking-denial-of-service.aspx">next page →</a>' start = 0 end = 0  in range(len(atag)):     if atag[i] == '"' , start==0:         start =     elif atag[i] == '"' , end==0:          end =  nxtlink = atag[start+1:end]

so, question is there other efficient way task.

thankyou.

you tagged beautifulsoup don't see why want regex, if want href anchors can use css select 'a[href]' find anchor tags have href attributes:

h = '''<a href="networking-denial-of-service.aspx">next page →</a>'''  soup = beautifulsoup(h)  print(soup.select_one('a[href]')["href"])

or find:

 print(soup.find('a', href=true)["href"])

if have multiple:

for  in soup.select_one('a[href]'):     print a["href"]

or:

for  in  soup.find_all("a", href=true):      print a["href"]

you specify want hrefs have leading ":

 soup.select_one('a[href^="]')

Search This Blog

M16

python - Efficient way to extract data within double quotes -

Comments

Post a Comment

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

windows - Debug iNetMgr.exe unhandle exception System.Management.Automation.CmdletInvocationException -

android - CoordinatorLayout, FAB and container layout conflict -