Extracting images from HTML pages with Python
The below is my code. It attempts to get the src of an image within an
image tag in html.
import re
for text in open('site.html'):
matches = re.findall(r'\ssrc="([^"]+)"', text)
matches = ' '.join(matches)
print(matches)
problem is when i put in something like:
<img src="asdfasdf">
It works but when i put in an ENTIRE HTML page it returns nothing. Why
does it do that? and how do i fix it?
No comments:
Post a Comment