Django: How to query terms with punctuation (ie: !.;’) insensitivity? - python

I am creating an application that needs to find facebook usernames that I’ve stored in the database, but facebook usernames are both case insensitive and insensitive to periods. For example, the username Johnsmith.55 is the same as johnsmith55 or even j…O.hn.sMiTh.5.5. when sending facebook API requests.
Obviously, I am using the _iexact query command to remedy the case insensitivity, but what can I use to remedy the insensitivity to periods? I know a cop out method is simply to save all usernames to the database after stripping them of periods and also stripping the username that’s being searched of its periods and then querying, but I want to save and display people’s username’s the way that they really appear in their facebook URL (which includes periods) even though facebook API requests technically are insensitive to periods.
Any ideas for a simple method of doing this? Thanks in advance for any help

You can store two user names in your DB, one to query against and one to display.
However, if you don't want to have to do that, it's simple matter of cleaning the string from the characters Facebook ignores before querying:
# ... import 're' and pull username from DB
normalized_username = re.sub('[,.]', '', real_username)
# query using the normalized username
Note: This example ignores dots and commas, Facebook may ignore more than that.

You'll need to store two versions of the username: one for querying against, and one for display.

You can also implement your own querying loguc with custom lookups in Django 1.7 or later.

Related

LDAP search takes too long

I'm trying to get more than 50.000 records from LDAP (with python-ldap and page control tools).
I have searching filter, which is
(|(field=value_1) (field=value_2)...(field=value_50000)
But this request taking more than 15 minutes. I'm taking 10 attributes from LDAP for these records.
Could you please tell me if is it okay for some large request or I can try to change filter?
You should refine your search base, and make it the closet possible to what you are searching for, for example, instead of querying dc=company,dc=com use ou=people,dc=company,dc=com.
You can also build an index of the field you are searching for, and you can also enable cache for your ldap, and finnally concerning your search filter, if you query the same attribute you can try something like:
&"(field>=MinValue)(field=<MaxValue)"
It's way better the matching every single attribute.

Properly refresh an SQLAlchemy session to view externally updated data

After trying everything suggested here, I still can't get SQLAlchemy to display the correct results!
I've used various combinations of Nick's answer, session.commit(), flush() and expire_all(), restarted MySQL, even restarted the entire freaking server, and I still get old results from SQLAlchemy...why????
The most infuriating thing about this whole issue is that I can see from any other application, or even from a direct connection.execute() call, that the updated data is there. I just can't get it to display on the webpage!
BTW this is in a Pyramid app, not Flask, but since Pyramid is 99% Flask it shouldn't make a difference, right?
MTIA for any help on this, it's driving me nuts!!
PS: I tried to add this as an answer to the linked question, but it was deleted for not being a valid answer. So for future reference, if I just want to add something to an existing question without having to post an entirely new one, how would I go about that?
EDIT: My apologies zvone, here is my code:
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
session = DBSession()
query = session.query(Item).join(Item.tagged)
filters = []
for term in searchTerms:
subterms = term.split(' ')
for subterm in subterms:
filters.append(Item.itemTitle.like('%' + subterm + '%'))
filters.append(Tag.tagName.like('%' + subterm + '%'))
query = query.filter(or_(*filters))
matchedItems = query.all()
And to make some more sense out of it, here's the context:
I'm building a basic CMS where users can upload and download items of any type (text files, images, etc.).
The whole idea of this page is to allow the user to search for items that have been tagged with certain expressions. Tags are entered in the search field as a comma-delimited string of search phrases, e.g. "movies, books, photos, search term with spaces". This string is split up into its counterparts to create searchTerms, a Python list of all the terms entered into the field.
You can see in the code where I'm iterating through searchTerms, splitting phrases into separate words and adding query filters for each word.
The problem arises when searching for "big, theory". I know for certain that 3 users on the production site have posted Big Bang Theory episodes, but after migrating these DB records to my dev server, I only get one search result (the old amount).
Many thanks again for the help! :D

Why is there a difference in format of encrypted passwords in Django

I am using Django 1.97. The encrypted passwords are significantly different (in terms of the format).
Some passwords are of format $$$:
pbkdf2_sha256$24000$61Rm3LxOPsCA$5kV2bzD32bpXoF6OO5YuyOlr5UHKUPlpNKwcNVn4Bt0=
While others are of format :
!9rPYViI1oqrSMfkDCZSDeJxme4juD2niKcyvKdpB
Passwords are set either using User.objects.create_user() or user.set_password(). Is this difference an expected one ?
You'll be fine. You just have some blank passwords in your database.
Going back as far as V0.95, django used the $ separators for delimiting algorithm/salt/hash. These days, django pulls out the algorithm first by looking at what is in front of the first $ and then passes the whole lot to the hasher to decode. This allows for a wider set of formats, including the one for PBKDF2 which adds an extra iterations parameter in this list (as per your first example).
However, it also recognises that some users may not be allowed to login and/or have no password. This is encoded using the second format you've seen. As you can see here:
If password is None then a concatenation of UNUSABLE_PASSWORD_PREFIX and a random string will be returned which disallows logins.
You can also see that the random string is exactly 40 characters long - just like your second example.
In short, then, this is all as expected.
There is no significant difference between User.objects.create_user() and user.set_password() since first uses second.
Basically, passwords are in string with format <algorithm>$<iterations>$<salt>$<hash> according to docs. The differences might come from PASSWORD_HASHERS settings variable. May be one password was created with one hasher and other password with another. But if you'll keep those hashers in variable mentioned above all should be fine, users will able to change it etc. You can read about it in little notice after bcrypt section.
Also docs for django.contrib.auth package might be helpful too. Link.
UPDATE:
If you find documentation of an old django versions (1.3 for example), you will see that
Previous Django versions, such as 0.90, used simple MD5 hashes without password salts. For backwards compatibility, those are still supported; they'll be converted automatically to the new style the first time check_password() works correctly for a given user.
So I think that the answer might be somewhere here. But it really depends on how legacy your project is, so you can decide if it's normal or what. Anyway you can issue check_password() to be sure. Or you can just email your user with "change password please" notification. There are many factors involved really.

GET search in multilanguage site

I've included a search form in my web2py application, in the following form:
myapp/controller/search?query=myquery
However, for security reasons web2py automatically replaces spaces and non-alphanumeric characters with underscores, which is okay for English-only sites but
an impediment for languages that use accent marks. For example, searching for "áéíóú" returns five underscores.
This could be solved by using POST instead of GET for the search form, but then the users wouldn't be able to bookmark the results.
Is there any option to solve this?
Thanks in advance.
Here's an idea that I've used in the past:
Use post to submit the query
Generate a unique string (e.g. youtube: https://www.youtube.com/watch?v=jX3DuS2Ak3g)
Associate the query to that string and store as key/value pair in session/app state/db (depending on how long you want it to live)
Redirect the user to that
If you don't want to occupy extra memory/space as they tend to grow a lot in some cases, you can substitute steps 2-3 with encrypting the string to something you can decrypt afterwards. You can do this in a middleware class so that it's transparent to your app's logic.
This is a general problem people face while handling urls.
You can use the quote/quote_plus module in urllib to normalize the strings -
For example, from the strings you suggested -
>>> print urllib.quote('éíóú')
%C3%A9%C3%AD%C3%B3%C3%BA
>>> print urllib.unquote('%C3%A9%C3%AD%C3%B3%C3%BA')
éíóú
you will have to perform the unquote when you retrieve it on the backend from the request.
There are also some other posts which might be helpful - urlencode implementation and unicode ready urls

How to get google ID from email

I'm using google ID as the datastore id for my user objects.
Sometimes I want to find a user by email. The gmail address can appear with dots or without, capital letters and other variations. How can I retrieve the user id from the given email?
First of all you should store the email property always in lowercase since the case is not relevant. Now if you also want to take into the account the dot or the plus symbols and being able to query on them, you should then store in another (hidden) property the stripped out version of the email and execute your queries on this one.
Google+ seems to have an API for this
https://developers.google.com/+/api/latest/people/search

Categories

Resources