regex - Unable to match double quotes inside C program -
this question has answer here:
- my regex matching much. how make stop? 4 answers
i've been struggling hours this. coding c program matches regex patterns read file. e.g:
https?\:\/\/(\w+\.)+[a-z]{2,4}(\/.*)* href\=\"https?\:\/\/(\w+\.)+[a-z]{2,4}(\/.*)*\"
the first line matches every link in html page until end of line, in order avoid that, have created second 1 should match hyperlinks. unfortunately, not work either. computer running linux, , using reference linux man page
man 3 regex
in addition, have tried single double quote in file well, no positive result. disclaimer: know done easier in bash or python, have use c.
you can solve making use of ungreedy capture group. although url complicated, instance in case, don't capture ip-adress based urls (e.g. 127.0.0.1/file
) ports (e.g. 127.0.0.1:8080
), queries (foo.bar/index.php?foo=bar&qux=baz
) , on.
other people drop protocol (thus no http://
,...) or use locally linked pages.
the best way capturing href
part using ungreedy regex well:
href="(.*?)"
or if want capture links specific protocol:
href="(https?:\/\/.*?)"
and hope person has entered valid. question discusses how build regex captures valid urls. can see story extremely complicated.
Comments
Post a Comment