How to fix auto adding forward slash - python

In my models.py I have a function property that return thumbnail url
return os.environ['BASE_URL'] + self.thumbnail.url
my .env file
BASE_URL=http://127.0.0.1:8000
but it keep adding 1 more forward slash ( / ) after my BASE_URL
http://127.0.0.1:8000//media/uploads/...
Anyone know how to fix this

use urljoin from urllib
from urllib.parse import urljoin
urljoin(os.getenv('BASE_URL'), self.image.url)

Related

I get InvalidURL: URL can't contain control characters when I try to send a request using urllib

I am trying to get a JSON response from the link used as a parameter to the urllib request. but it gives me an error that it can't contain control characters.
how can I solve the issue?
start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
source = urllib.request.urlopen(start_url).read()
the error I get is :
URL can't contain control characters. '/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq=' (found at least ' ')
Replacing whitespace with:
url = url.replace(" ", "%20")
if the problem is with the whitespace.
Spaces are not allowed in URL, I removed them and it seems to be working now:
import urllib.request
start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
url = start_url.replace(" ","")
source = urllib.request.urlopen(url).read()
Solr search strings can get pretty weird. Better use the 'quote' method to encode characters before making the request. See example below:
from urllib.parse import quote
start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
source = urllib.request.urlopen(quote(start_url)).read()
Better later than never...
You probably already found out by now but let's get it written here.
There can't be any space character in the URL, and there are 2, after bundle_fq e dm_field_deadlineTo_fq
Remove those and you're good to go
Like the error message says, there are some control characters in your url, which doesn't seem to be a valid one by the way.
You need to encode the control characters inside the URL. Especially spaces need to be encoded to %20.
Parsing the url first and then encoding the url elements would work.
import urllib.request
from urllib.parse import urlparse, quote
def make_safe_url(url: str) -> str:
"""
Returns a parsed and quoted url
"""
_url = urlparse(url)
url = _url.scheme + "://" + _url.netloc + quote(_url.path) + "?" + quote(_url.query)
return url
start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
start_url = make_safe_url(start_url)
source = urllib.request.urlopen(start_url).read()
The code returns the JSON-document despite the double forward-slash and the whitespace in the url.

django ignore slash in url and take parameter

hi i have url like this:
path('api/v1/store/download/<str:ix>/', DownloadVideoAPI.as_view(), name='download'),
it accept long string .
I want to keep allthing after download key in above URL as the parameter.
but when I enter a long string that contains some slash Django says page not found for example when if enter "/api/v1/store/download/asdasd2asdsadas/asdasd" will give me 404 not found ...
how can I do that?
this is my view:
class DownloadVideoAPI(APIView):
def get(self, request, ix):
pre = ix.split(",")
hash = pre[0]
dec = pre[1]
de_hash = decode_data(hash, dec)
Well, It's possible to add the extra parameters in the request. you can use re_path method.
# urls.py
from django.urls import re_path
re_path(r'api/v1/store/download/(?P<ix>\w+)/', DownloadVideoAPI.as_view(), name='download'),
ref: https://docs.djangoproject.com/en/2.0/ref/urls/#django.urls.re_path
Just use
path('api/v1/store/download/<str:ix>', DownloadVideoAPI.as_view(), name='download'),
without / at the end.
/api/v1/store/download/asdasd2asdsadas/asdasd will result in a 404 page since Django cannot map the URL, /api/v1/store/download/asdasd2asdsadas/, to a route in your urls.py. To solve this, aside from using BugHunter's answer, you could URL encode your long string first before passing it to your URL.
So, given the long string, "asdasd2asdsadas/asdasd", URL encode it first to "asdasd2asdsadas%2Fasdasd". Once you have encoded it, your URL should now look like "/api/v1/store/download/asdasd2asdsadas%2Fasdasd".
To URL encode in Python 3, you can use urllib.
import urllib
parameter = 'asdasd2asdsadas/asdasd'
encoded_string = urllib.quote(parameter, safe='')
encoded_string here should have the value, "asdasd2asdsadas%2Fasdasd".

How to get the part of a URL without protocol nor domain

I have URLs of the form
http://example.com/example/a/b/c.html
https//www.example.com/
How do I get the path from the server root, without protocol or domain name? With the examples above, the function should return:
/example/a/b/c.html
/
(I am using Django: answers relying on this framework are accepted!)
urlparse module can solve this:
from urlparse import urlparse # for python 2
from urllib.parse import urlparse # for python 3
parsed_url = urlparse('http://example.com/abc/cde')
assert parsed_url.path == '/abc/cde'
You could use the path attribute of django HttpRequest object, in other words:
request.path
see the docs for more

How to compare Referer URL in Django Request to another URL using reverse()?

How can I compare the referer URL and reverse() url?
Here is my current code:
if request.META.get('HTTP_REFERER') == reverse('dashboard'):
print 'Yeah!'
But this doesn't work because the reverse will output /dashboard while HTTP_REFERER output http://localhost:8000/dashboard/
My current solution is:
if reverse('dashboard') in request.META.get('HTTP_REFERER'):
print 'Yeah!'
I don't know if this is the best way to do this. Any suggestion would be great.
You can use urlparse to get the path element from a URL. In Python3:
from urllib import parse
path = parse.urlparse('http://localhost:8000/dashboard/').path
and in Python 2:
import urlparse
path = urlparse.urlparse('http://localhost:8000/dashboard/').path

Python: How to resolve URLs containing '..'

I need to uniquely identify and store some URLs. The problem is that sometimes they come containing ".." like http://somedomain.com/foo/bar/../../some/url which basically is http://somedomain.com/some/url if I'm not wrong.
Is there a Python function or a tricky way to resolve this URLs ?
There’s a simple solution using urllib.parse.urljoin:
>>> from urllib.parse import urljoin
>>> urljoin('http://www.example.com/foo/bar/../../baz/bux/', '.')
'http://www.example.com/baz/bux/'
However, if there is no trailing slash (the last component is a file, not a directory), the last component will be removed.
This fix uses the urlparse function to extract the path, then use (the posixpath version of) os.path to normalize the components. Compensate for a mysterious issue with trailing slashes, then join the URL back together. The following is doctestable:
from urllib.parse import urlparse
import posixpath
def resolve_components(url):
"""
>>> resolve_components('http://www.example.com/foo/bar/../../baz/bux/')
'http://www.example.com/baz/bux/'
>>> resolve_components('http://www.example.com/some/path/../file.ext')
'http://www.example.com/some/file.ext'
"""
parsed = urlparse(url)
new_path = posixpath.normpath(parsed.path)
if parsed.path.endswith('/'):
# Compensate for issue1707768
new_path += '/'
cleaned = parsed._replace(path=new_path)
return cleaned.geturl()
Those are file paths. Look at os.path.normpath:
>>> import os
>>> os.path.normpath('/foo/bar/../../some/url')
'/some/url'
EDIT:
If this is on Windows, your input path will use backslashes instead of slashes. In this case, you still need os.path.normpath to get rid of the .. patterns (and // and /./ and whatever else is redundant), then convert the backslashes to forward slashes:
def fix_path_for_URL(path):
result = os.path.normpath(path)
if os.sep == '\\':
result = result.replace('\\', '/')
return result
EDIT 2:
If you want to normalize URLs, do it (before you strip off the method and such) with urlparse module, as shown in the answer to this question.
EDIT 3:
It seems that urljoin doesn't normalize the base path it's given:
>>> import urlparse
>>> urlparse.urljoin('http://somedomain.com/foo/bar/../../some/url', '')
'http://somedomain.com/foo/bar/../../some/url'
normpath by itself doesn't quite cut it either:
>>> import os
>>> os.path.normpath('http://somedomain.com/foo/bar/../../some/url')
'http:/somedomain.com/some/url'
Note the initial double slash got eaten.
So we have to make them join forces:
def fix_URL(urlstring):
parts = list(urlparse.urlparse(urlstring))
parts[2] = os.path.normpath(parts[2].replace('/', os.sep)).replace(os.sep, '/')
return urlparse.urlunparse(parts)
Usage:
>>> fix_URL('http://somedomain.com/foo/bar/../../some/url')
'http://somedomain.com/some/url'
urljoin won't work, as it only resolves dot segments if the second argument isn't absolute(!?) or empty. Not only that, it doesn't handle excessive ..s properly according to RFC 3986 (they should be removed; urljoin doesn't do so). posixpath.normpath can't be used either (much less os.path.normpath), since it resolves multiple slashes in a row to only one (e.g. ///// becomes /), which is incorrect behavior for URLs.
The following short function resolves any URL path string correctly. It shouldn't be used with relative paths, however, since additional decisions about its behavior would then need to be made (Raise an error on excessive ..s? Remove . in the beginning? Leave them both?) - instead, join URLs before resolving if you know you might handle relative paths. Without further ado:
def resolve_url_path(path):
segments = path.split('/')
segments = [segment + '/' for segment in segments[:-1]] + [segments[-1]]
resolved = []
for segment in segments:
if segment in ('../', '..'):
if resolved[1:]:
resolved.pop()
elif segment not in ('./', '.'):
resolved.append(segment)
return ''.join(resolved)
This handles trailing dot segments (that is, without a trailing slash) and consecutive slashes correctly. To resolve an entire URL, you can then use the following wrapper (or just inline the path resolution function into it).
try:
# Python 3
from urllib.parse import urlsplit, urlunsplit
except ImportError:
# Python 2
from urlparse import urlsplit, urlunsplit
def resolve_url(url):
parts = list(urlsplit(url))
parts[2] = resolve_url_path(parts[2])
return urlunsplit(parts)
You can then call it like this:
>>> resolve_url('http://example.com/../thing///wrong/../multiple-slashes-yeah/.')
'http://example.com/thing///multiple-slashes-yeah/'
Correct URL resolution has more than a few pitfalls, it turns out!
I wanted to comment on the resolveComponents function in the top response.
Notice that if your path is /, the code will add another one which can be problematic.
I therefore changed the IF condition to:
if parsed.path.endswith( '/' ) and parsed.path != '/':
According to RFC 3986 this should happen as part of "relative resolution" process. So answer could be urlparse.urljoin(url, ''). But due to bug urlparse.urljoin does not remove dot segments when second argument is empty url. You can use yurl — alternative url manipulation library. It do this right:
>>> import yurl
>>> print yurl.URL('http://somedomain.com/foo/bar/../../some/url') + yurl.URL()
http://somedomain.com/some/url
import urlparse
import posixpath
parsed = list(urlparse.urlparse(url))
parsed[2] = posixpath.normpath(posixpath.join(parsed[2], rel_path))
proper_url = urlparse.urlunparse(parsed)
First, you need to the base URL, and then you can use urljoin from urllib.parse
example :
from urllib.parse import urljoin
def resolve_urls( urls, site):
for url in urls:
print(urljoin(site, url))
return
urls= ["/aboutMytest", "#", "/terms-of-use", "https://www.example.com/Admission"]
resolve_urls(urls,'https://example.com/')
output :
https://example.com/aboutMytest
https://example.com/
https://example.com/terms-of-use
https://www.example.com/Admission

Categories

Resources