Saturday, 17 August 2013

Extracting images from HTML pages with Python

Extracting images from HTML pages with Python

The below is my code. It attempts to get the src of an image within an
image tag in html.
import re
for text in open('site.html'):
matches = re.findall(r'\ssrc="([^"]+)"', text)
matches = ' '.join(matches)
print(matches)
problem is when i put in something like:
<img src="asdfasdf">
It works but when i put in an ENTIRE HTML page it returns nothing. Why
does it do that? and how do i fix it?

No comments:

Post a Comment