GET search in multilanguage site - python

I've included a search form in my web2py application, in the following form:
myapp/controller/search?query=myquery
However, for security reasons web2py automatically replaces spaces and non-alphanumeric characters with underscores, which is okay for English-only sites but
an impediment for languages that use accent marks. For example, searching for "áéíóú" returns five underscores.
This could be solved by using POST instead of GET for the search form, but then the users wouldn't be able to bookmark the results.
Is there any option to solve this?
Thanks in advance.

Here's an idea that I've used in the past:
Use post to submit the query
Generate a unique string (e.g. youtube: https://www.youtube.com/watch?v=jX3DuS2Ak3g)
Associate the query to that string and store as key/value pair in session/app state/db (depending on how long you want it to live)
Redirect the user to that
If you don't want to occupy extra memory/space as they tend to grow a lot in some cases, you can substitute steps 2-3 with encrypting the string to something you can decrypt afterwards. You can do this in a middleware class so that it's transparent to your app's logic.

This is a general problem people face while handling urls.
You can use the quote/quote_plus module in urllib to normalize the strings -
For example, from the strings you suggested -
>>> print urllib.quote('éíóú')
%C3%A9%C3%AD%C3%B3%C3%BA
>>> print urllib.unquote('%C3%A9%C3%AD%C3%B3%C3%BA')
éíóú
you will have to perform the unquote when you retrieve it on the backend from the request.
There are also some other posts which might be helpful - urlencode implementation and unicode ready urls

Related

How to reduce the length of a string?

I'm trying to reduce the size of a string like this:
'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE0NDU0OTk3NDUsImQiOnsiYXV0aF9kYXRhIjoiZm9vIiwib3RoZXJfYXV0aF9kYXRhIjoiYmFyIiwidWlkIjoidW5pcXVlSWQxIn0sInYiOjB9.h6LV3boj0ka2PsyOjZJb8Q48ugiHlEkNksusRGtcUBk'
to something that someone could type in less then 30 seconds like this:
'aF9kYX'
and be able to turn it back to the original string too. How could I achieve that?
EDIT: I guess I'm not being clear, first I don't know if what I want is possible.
So, I have my app which asks for a token to log in, which is that JWT. But it is way too long for someone to manually type. So I supposed there was an algorithm to make this string smaller (compress it) so that it could be easier and faster to type. An example that comes to my mind of how I would use such algorithm is:
short_to_big(small_string) //Returns the original JWT
big_to_short(JWT_string) //Returns the smaller string
Stupid simple answer: use a dict to store the short string as key and the long one as value. Then you just have to generate the short string the way you like and make sure it's not already in the dict. If you need to persist the key/value, you can use almost any kind of database (sql, key:value, document, or even a csv file FWIW).
Oh and if that doesn't solve your problem then you may want to consider giving more context ;)
You need more constraints. A 200 character string contains a lot more information than a 6 character string, so either need to a lot more about the original strings (e.g. that they come from some known set of strings, or have a limited character set) or you need to store the original strings somewhere and use the string the user type as a key to a map or similar.
There are lossless compression algorithms, but these depend on knowing some probabilistic information about the string (e.g. that repeated characters are likely) and will typically expand the strings if the probabilities are wrong.
UPDATE (After question clarification and comments suggestion)
You could implement an algorithm that uniquely maps this big string into a short representation of the string and store this mapping in a dictionary. The following algorithm does not guarantee the uniqueness but should give you some path to follow.
import random
import string
def long_string_to_short(original_string, length=10):
random.seed(original_string)
filling_values = string.digits + string.ascii_letters
short_string = ''.join(random.choice(filling_values) for char_ in xrange(length))
return short_string
When calling the function you can specify an appropriate length for the short string.
Then you could:
my_mapping_dict = {}
my_long_string = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE0NDU0OTk3NDUsImQiOnsiYXV0aF9kYXRhIjoiZm9vIiwib3RoZXJfYXV0aF9kYXRhIjoiYmFyIiwidWlkIjoidW5pcXVlSWQxIn0sInYiOjB9.h6LV3boj0ka2PsyOjZJb8Q48ugiHlEkNksusRGtcUBk'
short_string = long_string_to_short(my_long_string)
my_mapping_dict[short_string] = my_long_string
Ok, so, because I couldn't find a solution for shrinking the string, I tried to give it a different approach, and found a solution.
Now to clarify why I wanted to log in with the token, I'm going to write what I want to do with my app:
In Firebase anyone can create an account, but I don't want that, so for that I made a group of users that were the only ones that could write or read the data.
So in order to create an account, the user would have to request a register code, (Which in reality is a JWT generated from Firebase, so that you have permission to add a user to that group I was talking about).
This app is for local use, meaning that only people that lives here are going to use it. So, back to the original question, the token is too big for someone to type (as I have said many times), and I wanted to know if I could shrink it and how. But without success I tried a different approach, which is to generate the token (from a different program), encrypt it with a random code, and upload it to a firebase, that way I give the random code to people so that users can type it in the app so that it can retrieve and decrypt the token and authenticate with it, so that finally the user has an account that has the privilege to read or write data.
Thanks for your responses and sorry if I wasted your time.

Django: How to query terms with punctuation (ie: !.;’) insensitivity?

I am creating an application that needs to find facebook usernames that I’ve stored in the database, but facebook usernames are both case insensitive and insensitive to periods. For example, the username Johnsmith.55 is the same as johnsmith55 or even j…O.hn.sMiTh.5.5. when sending facebook API requests.
Obviously, I am using the _iexact query command to remedy the case insensitivity, but what can I use to remedy the insensitivity to periods? I know a cop out method is simply to save all usernames to the database after stripping them of periods and also stripping the username that’s being searched of its periods and then querying, but I want to save and display people’s username’s the way that they really appear in their facebook URL (which includes periods) even though facebook API requests technically are insensitive to periods.
Any ideas for a simple method of doing this? Thanks in advance for any help
You can store two user names in your DB, one to query against and one to display.
However, if you don't want to have to do that, it's simple matter of cleaning the string from the characters Facebook ignores before querying:
# ... import 're' and pull username from DB
normalized_username = re.sub('[,.]', '', real_username)
# query using the normalized username
Note: This example ignores dots and commas, Facebook may ignore more than that.
You'll need to store two versions of the username: one for querying against, and one for display.
You can also implement your own querying loguc with custom lookups in Django 1.7 or later.

Multiple URL segment in Flask and other Python frameowrks

I'm building an application in both Bottle and Flask to see which I am more comfortable with as Django is too much 'batteries included'.
I have read through the routing documentation of both, which is very clear and understandable but I am struggling to find a way of dealing with an unknown, possibly unlimited number of URL segments. ie:
http://www.example.com/seg1/seg2/seg3/seg4/seg5.....
I was looking at using something like #app.route(/< path:fullurl >) using regex to remove unwanted characters and splitting the fullurl string into a list the same length as the number of segments, but this seems incredibly inefficient.
Most PHP frameworks seem to have a method of building an array of the segment variable names regardless of the number but neither Flask, Bottle or Django seem to have a similar option, I seem to need to specify an exact number of segments to capture variables. A couple of PHP cms's seem to collect the first 9 segments immediately as variables and anything any longer gets passed as a full path which is then broken down in the way I mentioned above.
Am I not understanding the way things work in URL routing? Is the string splitting method really inefficient or the best way to do it? Or, is there a way of collecting an unknown number of segments straight into variables in Flask?
I'm pretty new on Python frameworks so a five year olds explanation would help,
many thanks.
I'm fairly new to Flask myself, but from what I've worked out so far, I'm pretty sure that the idea is that you have lots of small route/view methods, rather than one massive great switching beast.
For example, if you have urls like this:
http://example.com/unit/57/
http://example.com/unit/57/page/23/
http://example.com/unit/57/page/23/edit
You would route it like this:
#app.route('/unit/<int:unit_number>/')
def display_unit(unit_number):
...
#app.route('/unit/<int:unit_number>/page/<int:page_number>/')
def display_page(unit_number, page_number):
...
#app.route('/unit/<int:unit_number>/page/<int:page_number>/edit')
def page_editor(unit_number, page_number):
...
Doing it this way helps to keep some kind of structure in your application and relies on the framework to route stuff, rather than grabbing the URL and doing all the routing yourself. You could then also make use of blueprints to deal with the different functions.
I'll admit though, I'm struggling to think of a situation where you would need a possibly unlimited number of sections in the URL?
Splitting the string doesn't introduce any inefficiency to your program. Performance-wise, it's a negligible addition to the URL processing done by the framework. It also fits in a single line of code.
#app.route('/<path:fullurl>')
def my_view(fullurl):
params = fullurl.split('/')
it works:
#app.route("/login/<user>/<password>")
def login(user, password):
app.logger.error('An error occurred')
app.logger.error(password)
return "user : %s password : %s" % (user, password)
then:
http://example.com:5000/login/jack/hi
output:
user : jack password : hi

Some questions about Django localisation

I intend to localise my Django application and began reading up on localisation on the Django site. This put a few questions in my mind:
It seems that when you run the 'django-admin.py makemessages' command, it scans the files for embedded strings and generates a message file that contains the translations. These translations are mapped to the strings in the file. For example, if I have a string in HTML that reads "Please enter the recipients name", Django would consider it to be the message id. What would happen if i changed something in the string. Let's say I added the missing apostrophe to the word "recipient". Would this break the translation?
In relation to the above scenario, Is it better to use full fledged sentences in the source (which might change) or would I be better off using a word like "RECIPIENT_NAME" which is less likely to change and easier to map to?
Does the 'django-admin.py makemessages' command scan the Python sources as well?
Thanks.
It very probably would, in some cases 'similar' strings can be detected and your translation will be marked with fuzzy. But it depends on the type of string, I don't know what adding an apostrophe would do. Read the GNU gettext docs for more information about this.
However, an easy solution for your problem would be: don't fix the typo in the original, but make a translation like english to english where the translated string is the correct one :). I personally wouldn't recommend this approach, but If you're afraid to break tens of translation files, it can be considered.
No it isn't, it throws away all sense of context. It might look clearer for sites where only a few translation strings are required and you know the exact context by heart. But as soon as you have 100s of strings in the translation file, short names like that will say nothing, you'll always have to look up the exact context. Even worse, it can be you use the same 'short name' for something that actually has to be translated differently, which will end up giving you weirder short names to handle both cases. Finally, if you use one normal language as default, you don't need to translate this language explicitly anymore.
Yes it does, there exist multiple functions to mark strings in python for translation, an overview can be found here.

How to localize an app on Google App Engine?

What options are there for localizing an app on Google App Engine? How do you do it using Webapp, Django, web2py or [insert framework here].
1. Readable URLs and entity key names
Readable URLs are good for usability and search engine optimization (Stack Overflow is a good example on how to do it). On Google App Engine, key based queries are recommended for performance reasons. It follows that it is good practice to use the entity key name in the URL, so that the entity can be fetched from the datastore as quickly as possible.
Some characters have special meaning in URLs (&, ", ' etc). To be able to use key names as parts of an URL, they should not contain any of these characters. Currently I use the function below to create key names:
import re
import unicodedata
def urlify(unicode_string):
"""Translates latin1 unicode strings to url friendly ASCII.
Converts accented latin1 characters to their non-accented ASCII
counterparts, converts to lowercase, converts spaces to hyphens
and removes all characters that are not alphanumeric ASCII.
Arguments
unicode_string: Unicode encoded string.
Returns
String consisting of alphanumeric (ASCII) characters and hyphens.
"""
str = unicodedata.normalize('NFKD', unicode_string).encode('ASCII',
'ignore')
str = re.sub('[^\w\s-]', '', str).strip().lower()
return re.sub('[-\s]+', '-', str)
This is basically a whitelist for approved characters. It works fine for English and Swedish, however it will fail for non-western scripts and remove letters from some western ones (like Norwegian and Danish with their œ and ø).
Can anyone suggest a method that works with more languages? Would it be better to remove problematic characters (blacklist)?
2. Translating templates
Does Django internationalization and localization work on Google App Engine? Are there any extra steps that must be performed? Is it possible to use Django i18n and l10n for Django templates while using Webapp?
The Jinja2 template language provides integration with Babel. How well does this work, in your experience?
What options are avilable for your chosen template language?
3. Translated datastore content
When serving content from (or storing it to) the datastore: Is there a better way than getting the accept_language parameter from the HTTP request and matching this with a language property that you have set with each entity?
Regarding point 1, there's really no need to go to such lengths: Simply use unicode key names. They'll be encoded as UTF-8 in the datastore for you.
Regarding point 3, there are many ways to handle language detection. Certainly accept_language should be part of it, and you'll find webob's accept_language support particularly useful here (hopefully Django or your framework-of-choice has something similar). It's quite often the case, however, that a user's browser's language configuration isn't correct, so you'll want to make sure there's some way for the user to override the detected language - for example, with a link on each page to change the language, setting a preference cookie.
Concerning point 2, I asked a similar question a few months ago. I've managed to get the application internationalized, but just the content, not the urls (wasn't planning on doing so either).
I've also added the revision I made to my code so that people can see what changes went into i18n'ing this Google App Engine app. Look at my second comment on the accepted answer.
Good luck with your other 2 points!

Categories

Resources