Parsing list items from html with Go -


i want extract list items (content of each <li></li>) go. should use regexp <li> items or there other library this?

my intention list or array in go contains list item specific html web page. how should that?

you want use golang.org/x/net/html package. it's not in go standard packages, instead in go sub-repositories. (the sub-repositories part of go project outside main go tree. developed under looser compatibility requirements go core.)

there an example in documentation may similar want.

if need stick go standard packages reason, "typical html" can use encoding/xml.

both packages tend use io.reader input. if have string or []byte variable can wrap them strings.newreader or bytes.buffer io.reader.

for html it's more you'll come http.response body (make sure close when done). perhaps like:

    resp, err := http.get(someurl)     if err != nil {         return err     }     defer resp.body.close()      doc, err := html.parse(resp.body)     if err != nil {         return err     }     // recursively visit nodes in parse tree     var f func(*html.node)     f = func(n *html.node) {         if n.type == html.elementnode && n.data == "a" {             _, := range n.attr {                 if a.key == "href" {                     fmt.println(a.val)                     break                 }             }         }         c := n.firstchild; c != nil; c = c.nextsibling {             f(c)         }     }     f(doc) } 

of course, parsing fetched web pages won't work pages modify own contents javascript on client side.


Comments

Popular posts from this blog

google chrome - Developer tools - How to inspect the elements which are added momentarily (by JQuery)? -

angularjs - Showing an empty as first option in select tag -

php - Cloud9 cloud IDE and CakePHP -