Parsing list items from html with Go -

- August 15, 2011

i want extract list items (content of each <li></li>) go. should use regexp <li> items or there other library this?

my intention list or array in go contains list item specific html web page. how should that?

you want use golang.org/x/net/html package. it's not in go standard packages, instead in go sub-repositories. (the sub-repositories part of go project outside main go tree. developed under looser compatibility requirements go core.)

there an example in documentation may similar want.

if need stick go standard packages reason, "typical html" can use encoding/xml.

both packages tend use io.reader input. if have string or []byte variable can wrap them strings.newreader or bytes.buffer io.reader.

for html it's more you'll come http.response body (make sure close when done). perhaps like:

    resp, err := http.get(someurl)     if err != nil {         return err     }     defer resp.body.close()      doc, err := html.parse(resp.body)     if err != nil {         return err     }     // recursively visit nodes in parse tree     var f func(*html.node)     f = func(n *html.node) {         if n.type == html.elementnode && n.data == "a" {             _, := range n.attr {                 if a.key == "href" {                     fmt.println(a.val)                     break                 }             }         }         c := n.firstchild; c != nil; c = c.nextsibling {             f(c)         }     }     f(doc) }

of course, parsing fetched web pages won't work pages modify own contents javascript on client side.

Search This Blog

Unity

Parsing list items from html with Go -

Comments

Post a Comment

Popular posts from this blog

angularjs - Showing an empty as first option in select tag -

qt - Change color of QGraphicsView rubber band -

string - Writing a Java program to encrypt and decrypt a ADFGVX cipher -