find text from visible HTML in python -
i'm trying following:
- i have text file have values line line.
- a website generate list of values based on page number. values xxx & yyy in example below.
- python script reads first text file (efficient 0(1) lookups using set) , search in website page after page +1, , if value match found must print page number.
the search must www.site.com/1 www.site.com/2 www.site.com/3 ...etc
html source:
<pre class="values"> <strong>a</strong> <strong>b</strong> <strong>c</strong> <span id="1"> <a href="/#">+</a> <span title="1">1</span> <a href="/#">xxx</a> <a href="/#">yyy</a> </span> </pre>
text file efficient 0(1) lookups using set:
with open("values.txt", "r") f1: lines = set(f1) # efficient 0(1) lookups using set line in html : if line in lines: print(line)
from xml.etree import elementtree et <pre class="values"> <strong>a</strong> <strong>b</strong> <strong>c</strong> <span id="1"> <a href="/#">+</a> <span title="1">1</span> <a href="/#">xxx</a> <a href="/#">yyy</a> </span> </pre> open('/path/to/file.html') fp: html = et.fromstring(fp.read()) node in html.iter(): if node.tag == 'a': print node.text
Comments
Post a Comment