Parsing list items from html with Go -
i want extract list items (content of each <li></li>
) go. should use regexp <li>
items or there other library this?
my intention list or array in go contains list item specific html web page. how should that?
you want use golang.org/x/net/html package. it's not in go standard packages, instead in go sub-repositories. (the sub-repositories part of go project outside main go tree. developed under looser compatibility requirements go core.)
there an example in documentation may similar want.
if need stick go standard packages reason, "typical html" can use encoding/xml
.
both packages tend use io.reader
input. if have string
or []byte
variable can wrap them strings.newreader
or bytes.buffer
io.reader
.
for html it's more you'll come http.response
body (make sure close when done). perhaps like:
resp, err := http.get(someurl) if err != nil { return err } defer resp.body.close() doc, err := html.parse(resp.body) if err != nil { return err } // recursively visit nodes in parse tree var f func(*html.node) f = func(n *html.node) { if n.type == html.elementnode && n.data == "a" { _, := range n.attr { if a.key == "href" { fmt.println(a.val) break } } } c := n.firstchild; c != nil; c = c.nextsibling { f(c) } } f(doc) }
of course, parsing fetched web pages won't work pages modify own contents javascript on client side.
Comments
Post a Comment