variables with space in url (django) - python

I am having the same issue as How to pass variables with spaces through URL in :Django. I have tried the solutions mentioned but everything is returning as "The resource you are looking for has been removed, had its name changed, or is temporarily unavailable."
I am trying to pass a file name example : new 3
in urls.py:
url(r'^file_up/delete_file/(?P<oname>[0-9A-Za-z\ ]+)/$', 'app.views.delete_file' , name='delete_file'),
in views.py:
def delete_file(request,fname):
return render_to_response(
'app/submission_error.html',
{'fname':fname,
},
context_instance=RequestContext(request)
)
url : demo.net/file_up/delete_file/new%25203/
Thanks for the help

Thinking this over; are you stuck with having to use spaces? If not, I think you may find your patterns (and variables) easier to work with. A dash or underscore, or even a forward slash will look cleaner, and more predictable.
I also found this: https://stackoverflow.com/a/497972/352452 which cites:
The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs.
You may also be able to capture your space with a literal %20. Not sure. Just leaving some thoughts here that come to mind.

demo.net/file_up/delete_file/new%25203/
This URL is double-encoded. The space is first encoded to %20, then the % character is encoded to %25. Django only decodes the URL once, so the decoded url is /file_up/delete_file/new%203/. Your pattern does not match the literal %20.
If you want to stick to spaces instead of a different delimiter, you should find the source of that URL and make sure it is only encoded once: demo.net/file_up/delete_file/new%203/.

Related

Regular expressions and russian symbols in Django

I have the one url like this
url(ur'^gradebook/(?P<group>[\w\-А-Яа-я])$', some_view, name='some_view')
and I expect it to process a request like
../gradebook/group='Ф-12б'
but I get an error and the server crashes.
Please help me figure out the Russian symbols
The group='…' part is more a problem, since the equation sign = is not part of the character group.
Furthermore you should match multiple characters:
# quantifier &downarrow;
url(ur'^gradebook/(?P[\w\-А-Яа-я]+)$', some_view, name='some_view')
then this can match a URL:
/gradebook/Ф-12б
but if you want to match the group='…' as well, you should include the = and the ' character:
# extra characters &downarrow;&downarrow;
url(ur"^gradebook/(?P[\w\-А-Яа-я'=]+)$", some_view, name='some_view')
Then you can match with:
/gradebook/group='Ф-12б'
although that might accept too much, since it can also accept f'q'a=gr=f for example.

Python3 regex not changing \" to "

i have a json file filled with user comments (from web scraping) which I've pulled into python with pandas
import pandas as pd
data = pd.DataFrame(pd.read_json(filename, orient=columnName,encoding="utf-8"),columns=columnName)
data['full_text'] = data['full_text'].replace('^#ABC(\\u2019s)*[ ,\n]*', '', regex=True)
data['full_text'] = data['full_text'].replace('(\\u2019)', "'", regex=True)
data.to_json('new_abc_short.json',orient='records')
The messages don't completely match the respective messages online. (emojis shown as \u0234 or something, apostrophes as \u2019, forward slash in links, and quote marks have back slash.
i want to clean them up so i learnt some regex, so i can pull into python, clean them up and then resave them back to json in a different name (for now) (https://docs.python.org/3/howto/regex.html)
second line helps to remove the twitter handle (if it exists in only in the beginning), then removes 's if it was used (e.g. #ABC's ). If there was no twitter handle at the beginning (maybe used in the middle of the message) then that is kept. then it removes any spaces and commas that were left behind (again only at the beginning of the string)
e.g. "#ABC, hi there" becomes "hi there". "hi there #ABC" stays the same. "#ABC's twitter is big" would become "twitter is big"
third line helps replace every apostrophe that could not be shown (e.g. don\u2019t changes back to don't)
i have thousands of records (not all of them have issues with apostrophes, quotes, links etc), and based on the very small examples i've looked at, they seem to work
but my third one doesn't work:
data['full_text'] = data['full_text'].replace('\\"', '"', regex=True)
Example message in the json: "full_text":"#ABC How can you \"accidentally close\" my account"
i want to remove the \ next to the double quotes so it looks like the real message (i assume it is a escape character which the user obviously didn't type)
but no matter what i do, i can't remove it
from my regex learning, " is't a metacharacter. so backslash shouldn't even be there. But anyway, I've tried:
\\" (which i think should be the obvious one, i have \", no special quirk in " but there is in \ so i need another back slash to escape that)
\\\\" (some forums posts online mention needing 4 slashes
\\\" ( i think someone mention in the forum posts that they got it workin with 3)
\\\(\") (i know that brackets provide groupings so i tried different combinations)
(\\\\")
all of the above expression i encased in single quotes, and they didn't work. I thought maybe the double quote was the problem since i only had one, so i replaced the single quotes with single quotes x3
'''\\"'''
but none of the above worked for triple single quotes either
I keep rechecking the newly saved json and i keep seeing:
"full_text":"How can you \"accidentally close\" my account"
(i.e. removing #ABC with space worked, but not the back slash bit)
originally, i tried looking into converting these unicode issues i.e. using encoding="utf-8") although my experience in this is limited and it kept failing, so regex is my best option
Ow, I missed the pandas hint, so pandas replace does use regexes. But, to be clear, str.replace doesn't work with regexes. re.sub does.
Now
to match a single backslash, your regex is: "\\"
string to describe that regex: "\\\\"
when using a raw string, a double backslash is enough: r'\\'
If your string really contains a \ preceding a ", a regex that would do is:
\\(?=\")
which does a lookahead for your " (Look at regex101).
You would have to use something like:
re.sub(r'\\(?=\")',"",s,0)
or a pandas equivalent using that regex.

How to escape slash in url path in python? [duplicate]

I have set up my coldfusion application to have dynamic urls on the page, such as
www.musicExplained/index.cfm/artist/:VariableName
However my variable names will sometimes contain slashes, such as
www.musicExplained/index.cfm/artist/GZA/Genius
This is causing a problem, because my application presumes that the slash in the variable name represents a different section of the website, the artists albums. So the URL will fail.
I am wondering if there is anyway to prevent this from happening? Do I need to use a function that replaces slashes in the variable names with another character?
You need to escape the slashes as %2F.
You could easily replace the forward slashes / with something like an underscore _ such as Wikipedia uses for spaces. Replacing special characters with underscores, etc., is common practice.
You need to escape those but don't just replace it by %2F manually. You can use URLEncoder for this.
Eg URLEncoder.encode(url, "UTF-8")
Then you can say
yourUrl = "www.musicExplained/index.cfm/artist/" + URLEncoder.encode(VariableName, "UTF-8")
Check out this w3schools page about "HTML URL Encoding Reference":
https://www.w3schools.com/tags/ref_urlencode.asp
for / you would escape with %2F

Django - trailing slash resets page title

I apologise for the blatant ignorance of this question but I've been charged with fixing something in Django that I have NO experience with!
We're getting an issue with URLs and duplicated content.
If we visit "www.hello.com/services/" then we get our full page rendered, absolutely fine.
If we visit "www.hello.com/services" then we get the same content but with a default that seems to be set in a line:
class PageTitleNode(template.Node):?
?
def render(self, context):?
try:?
meta_info = MetaInfo.objects.get(url=context['request'].path)?
except ObjectDoesNotExist:?
return u'This is our default page title'?
return u"%s - hello.com" % meta_info.title
The main problem with this is that Google is indexing two almost identical pages and it's bad SEO according to our client's overpaid online strategy partner.
I know it's vague but if anyone can help then much rejoicing will be had.
Thanks for reading!
I think your consultant is correct. One URL = one resource. Having two urls on one resource is quite dirty anyway. This is why Django features automatic redirect from non trailing slash to urls with trailing slashes. Under certain conditions.
I'm pretty sure your url definition regexp for /services/ lacks the trailing slash. Anyway, you should use trailing slashes only:
Ensure APPEND_SLASH is set to True: from django.conf import settings; print settings.APPEND_SLASH
Ensure that all your url regexps have the trailing slash, e.g. url(r'foo' ...) is bad, and url(r'foo/' ...) passes barely because of possible conflicts and url(r'foo/$' ...) is better
Ensure all MetaInfo objects have url with trailing slash, e.g. MetaInfo.objects.exclude(url__endswith='/') should return MetaInfo without trailing slash in url

Building proper link with spaces

I have the following code in Python:
linkHTML = "click here" % strLink
The problem is that when strLink has spaces in it the link shows up as
click here
I can use strLink.replace(" ","+")
But I am sure there are other characters which can cause errors. I tried using
urllib.quote(strLink)
But it doesn't seem to help.
Thanks!
Joel
Make sure you use the urllib.quote_plus(string[, safe]) to replace spaces with plus sign.
urllib.quote_plus(string[, safe])
Like quote(), but also replaces spaces
by plus signs, as required for quoting
HTML form values when building up a
query string to go into a URL. Plus
signs in the original string are
escaped unless they are included in
safe. It also does not have safe
default to '/'.
from http://docs.python.org/library/urllib.html#urllib.quote_plus
Ideally you'd be using the urllib.urlencode function and passing it a sequence of key/value pairs like {["q","with space"],["s","with space & other"]} etc.
As well as quote_plus(*), you also need to HTML-encode any text you output to HTML. Otherwise < and & symbols will be markup, with potential security consequences. (OK, you're not going to get < in a URL, but you definitely are going to get &, so just one parameter name that matches an HTML entity name and your string's messed up.
html= 'click here' % cgi.escape(urllib.quote_plus(q))
*: actually plain old quote is fine too; I don't know what wasn't working for you, but it is a perfectly good way of URL-encoding strings. It converts spaces to %20 which is also valid, and valid in path parts too. quote_plus is optimal for generating query strings, but otherwise, when in doubt, quote is safest.

Categories

Resources