python - Best practise to apply several rules on 1 string -
i'm getting url string , need apply several rules it. first rule remove anchors, remove '../' notation, because urljoin joins url incorrect in cases, , remove leading slash. have such code:
def construct_url(parent_url, child_url): url = urljoin(parent_url, child_url) url = url.split('#')[0] url = url.replace('../', '') url = url.rstrip('/') return url
but dont think best practise. think can done simpler. me please? thanks.
unfortunately, there isn't make function simpler here, since you're dealing pretty odd cases.
but can make more robust using python's urlparse.urlsplit()
split url in well-defined components, processing, , put using urlparse.urlunsplit()
:
from urlparse import urljoin urlparse import urlsplit urlparse import urlunsplit def construct_url(parent_url, child_url): url = urljoin(parent_url, child_url) scheme, netloc, path, query, fragment = urlsplit(url) path = path.replace('../', '') path = path.rstrip('/') url = urlunsplit((scheme, netloc, path, query, '')) return url parent_url = 'http://user:pw@google.com' child_url = '../../../chrome/#foo' print construct_url(parent_url, child_url)
output:
http://user:pw@google.com/chrome
using tools urlparse
has advantage know processing operates on (path , fragment in case), , handles things user credentials, query strings, parameters etc. you.
note: contrary suggested in comments, urljoin
in fact normalize urls:
>>> urlparse import urljoin >>> urljoin('http://google.com/foo/bar', '../qux') 'http://google.com/qux'
but strictly following rfc 1808.
from rfc 1808 section 5.2: abnormal examples:
within object well-defined base url of
base:
<url:http://a/b/c/d;p?q#f>
[...]
parsers must careful in handling case there more relative path
".."
segments there hierarchical levels in base url's path. note".."
syntax cannot used change<net_loc>
of url.../../../g = <url:http://a/../g> ../../../../g = <url:http://a/../../g>
so urljoin
right thing preserving extraneous ../
, therefore need remove them manual processing.
Comments
Post a Comment