find text from visible HTML in python -


i'm trying following:

  1. i have text file have values line line.
  2. a website generate list of values based on page number. values xxx & yyy in example below.
  3. python script reads first text file (efficient 0(1) lookups using set) , search in website page after page +1, , if value match found must print page number.

the search must www.site.com/1 www.site.com/2 www.site.com/3 ...etc

html source:

<pre class="values">     <strong>a</strong>     <strong>b</strong>     <strong>c</strong>     <span id="1">         <a href="/#">+</a>          <span title="1">1</span>         <a href="/#">xxx</a>         <a href="/#">yyy</a>     </span> </pre> 

text file efficient 0(1) lookups using set:

with open("values.txt", "r") f1:         lines = set(f1) # efficient 0(1) lookups using set         line in html :             if line in lines:                 print(line) 

from xml.etree import elementtree et  <pre class="values">     <strong>a</strong>     <strong>b</strong>     <strong>c</strong>     <span id="1">         <a href="/#">+</a>          <span title="1">1</span>         <a href="/#">xxx</a> <a href="/#">yyy</a>     </span> </pre>  open('/path/to/file.html') fp:     html = et.fromstring(fp.read())  node in html.iter():     if node.tag == 'a':         print node.text 

Comments

Popular posts from this blog

google chrome - Developer tools - How to inspect the elements which are added momentarily (by JQuery)? -

angularjs - Showing an empty as first option in select tag -

php - Cloud9 cloud IDE and CakePHP -