regex - Unable to match double quotes inside C program -


this question has answer here:

i've been struggling hours this. coding c program matches regex patterns read file. e.g:

https?\:\/\/(\w+\.)+[a-z]{2,4}(\/.*)* href\=\"https?\:\/\/(\w+\.)+[a-z]{2,4}(\/.*)*\" 

the first line matches every link in html page until end of line, in order avoid that, have created second 1 should match hyperlinks. unfortunately, not work either. computer running linux, , using reference linux man page

man 3 regex 

in addition, have tried single double quote in file well, no positive result. disclaimer: know done easier in bash or python, have use c.

you can solve making use of ungreedy capture group. although url complicated, instance in case, don't capture ip-adress based urls (e.g. 127.0.0.1/file) ports (e.g. 127.0.0.1:8080), queries (foo.bar/index.php?foo=bar&qux=baz) , on.

other people drop protocol (thus no http://,...) or use locally linked pages.

the best way capturing href part using ungreedy regex well:

href="(.*?)" 

or if want capture links specific protocol:

href="(https?:\/\/.*?)" 

and hope person has entered valid. question discusses how build regex captures valid urls. can see story extremely complicated.


Comments

Popular posts from this blog

google chrome - Developer tools - How to inspect the elements which are added momentarily (by JQuery)? -

angularjs - Showing an empty as first option in select tag -

php - Cloud9 cloud IDE and CakePHP -