Sort lexicographically?

Sort lexicographically? - python

I am working on integrating with the Photobucket API and I came across this in their api docs:
"Sort the parameters by name
lexographically [sic] (byte ordering, the
standard sorting, not natural or case
insensitive). If the parameters have
the same name, then sort by the value."
What does that mean? How do I sort something lexicographically? byte ordering?
The rest of their docs have been ok so far, but (to me) it seems like this line bears further explanation. Unfortunately there was none to be had.
Anyway, I'm writing the application in Python (it'll eventually become a Django app) in case you want to recommend specific modules that will handle such sorting for me ^_^

I think that here lexicographic is a "alias" for ascii sort?
Lexicographic Natural
z1.doc z1.doc
z10.doc z2.doc
z100.doc z3.doc
z101.doc z4.doc
z102.doc z5.doc
z11.doc z6.doc
z12.doc z7.doc
z13.doc z8.doc
z14.doc z9.doc
z15.doc z10.doc
z16.doc z11.doc
z17.doc z12.doc
z18.doc z13.doc
z19.doc z14.doc
z2.doc z15.doc
z20.doc z16.doc
z3.doc z17.doc
z4.doc z18.doc
z5.doc z19.doc
z6.doc z20.doc
z7.doc z100.doc
z8.doc z101.doc
z9.doc z102.doc

The word should be "lexicographic"
http://www.thefreedictionary.com/Lexicographic
Dictionary order. Using the letters as they appear in the strings.
As they suggest, don't fold upper- and lower-case together. Just use the Python built-in list.sort() method.

This is similar to the Facebook API — the query string needs to be normalized before generating the signature hash.
You probably have a dictionary of parameters like:
params = {
'consumer_key': "....",
'consumer_secret': "....",
'timestamp': ...,
...
}
Create the query string like so:
urllib.urlencode(sorted(params.items()))
params.items() returns the keys and values of the dictionary as a list tuples, sorted() sorts the list, and urllib.urlencode() concatenates them into a single string while escaping.

Quote a bit more from the section:
2 Generate the Base String:
Normalize the parameters:
Add the OAuth specific parameters for this request to the input parameters, including:
oauth_consumer_key = <consumer_key>
oauth_timestamp = <timestamp>
oauth_nonce = <nonce>
oauth_version = <version>
oauth_signature_method = <signature_method>
Sort the parameters by name lexographically [sic] (byte ordering, the standard sorting, not natural or case insensitive). If the parameters have the same name, then sort by the value.
Encode the parameter values as in RFC3986 Section 2 (i.e., urlencode).
Create parameter string (). This is the same format as HTTP 'postdata' or 'querystring', that is, each parameter represented as name=value separated by &. For example, a=1&b=2&c=hello%20there&c=something%20else
I think that they are saying that the parameters must appear in the sorted order - oauth_consumer_key before oauth_nonce before ...

Related

Is the order of execution guaranteed when looping over a string?

Is the program below guaranteed to always produce the same output?
s = 'fgvhlsdagfcisdghfjkfdshfsal'
for c in s:
print(c)

Yes, it is. This is because the str type is an immutable sequence. Sequences represent a finite ordered set of elements (see Sequences in the Data model chapter of the Reference guide).
Iteration through a given string (any Sequence) is guaranteed to always produce the same results in the same order for different runs of the CPython interpreter, versions of CPython and implementations of Python.

Yes. Internally the string you have there is stored in an c style array (depending on interpreter implementation), being a sequential array of data, one can create an iterator. In order to use for ... in ... syntax, you need to be able to iterate over the object after the in. A string supplies its own iterator which allows it to be parsed via for in syntax in sequential order as do all python sequences.
The same is true for lists, and even custom objects that you create. However not all iterable python objects will necessarily be in order or represent the values they store, a clear example of this is the dictionary. Dictionary iteration yields keys which may or may not be in the order you added them in (depending on the version of python you use among other things, so don't assume its ordered unless you use OrderedDict) instead of sequential values like list tuple and string.

Yes, it is. Over a string, a for-loop iterates over the characters in order. This is also true for lists and tuples -- a for-loop will iterate over the elements in order.
You may be thinking of sets and dictionaries. These don't specify a particular order, so:
for x in {"a","b","c"}: # over a set
print(x)
for key in {"x":1, "y":2, "z":3}: # over a dict
print(key)
will iterate in some arbitrary order that you can't easily predict in advance.
See this Stack Overflow answer for some additional information on what guarantees are made about the order for dictionaries and sets.

Yes. The for loop is sequential.

Yes, the loop will always print each letter one by one starting from the first character and ending with the last.

Why django uses tuple of tuples to store static dictionaries and should i do the same?

Why django uses tuple of tuples to store for example choices instead of standard dict?
Example:
ORGINAL_MARKET = 1
SECONDARY_MARKET = 2
MARKET_CHOICES = (
(ORGINAL_MARKET, _('Orginal Market')),
(SECONDARY_MARKET, _('Secondary Market')),
)
And should I do it to when I know the dict won't change in time?
I reckon the tuples are faster but does it matter if when I try to get value I'm still need to convert it to dict to find it?
UPDATE:
Clarification if I use it as a tuple of tuples I will be getting value using
dict(self.MARKET_CHOICES)[self.ORGINAL_MARKET]
Which will work faster, this or storing values in dict from the beginning?

The main reason is that ordering is preserved. If you used a dictionary, and called .items() on it to give the choices for a ChoiceField, for example, the ordering of items in the select box would be unreliable when you rendered the form.
If you want the dict, it is easy to create one from the tuple of tuples, the format is already one accepted by the constructer so you can just call dict() on it.
I don't think the immutability is a correct reason - it is not strictly necessary for them to be a tuple of tuples, a list of tuples or even a list of lists would work as well in Django.

Tuples are immutable, slightly faster, and Django uses them because they're immutable in the choices parameter in fields.
If you're using Python 3.4 or later you can use Enums also which is better than both tuples and dictionaries (but I'm not sure if Django supports them for the choices parameter).

To be clear: I'm not going to use it in choices=- I'm looking for most efficient method – Lord_JABA
If you want your choices to have a particular order (which often is the case with the choices parameter) then use tuples, if you don't care use whatever literal you find easier to type (from the allowed datatypes), I doubt you will see any significant difference regarding the memory/cpu footprint for this specific use case.

How To Create a Unique Key For A Dictionary In Python

What is the best way to generate a unique key for the contents of a dictionary. My intention is to store each dictionary in a document store along with a unique id or hash so that I don't have to load the whole dictionary from the store to check if it exists already or not. Dictionaries with the same keys and values should generate the same id or hash.
I have the following code:
import hashlib
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
print str(a)
print hashlib.sha1(str(a)).hexdigest()
print hashlib.sha1(str(b)).hexdigest()
The last two print statements generate the same string. Is this is a good implementation? or are there any pitfalls with this approach? Is there a better way to do this?
Update
Combining suggestions from the answers below, the following might be a good implementation
import hashlib
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
def get_id_for_dict(dict):
unique_str = ''.join(["'%s':'%s';"%(key, val) for (key, val) in sorted(dict.items())])
return hashlib.sha1(unique_str).hexdigest()
print get_id_for_dict(a)
print get_id_for_dict(b)

I prefer serializing the dict as JSON and hashing that:
import hashlib
import json
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
# Python 2
print hashlib.sha1(json.dumps(a, sort_keys=True)).hexdigest()
print hashlib.sha1(json.dumps(b, sort_keys=True)).hexdigest()
# Python 3
print(hashlib.sha1(json.dumps(a, sort_keys=True).encode()).hexdigest())
print(hashlib.sha1(json.dumps(b, sort_keys=True).encode()).hexdigest())
Returns:
71083588011445f0e65e11c80524640668d3797d
71083588011445f0e65e11c80524640668d3797d

No - you can't rely on particular order of elements when converting dictionary to a string.
You can, however, convert it to sorted list of (key,value) tuples, convert it to a string and compute a hash like this:
a_sorted_list = [(key, a[key]) for key in sorted(a.keys())]
print hashlib.sha1( str(a_sorted_list) ).hexdigest()
It's not fool-proof, as a formating of a list converted to a string or formatting of a tuple can change in some future major python version, sort order depends on locale etc. but I think it can be good enough.

A possible option would be using a serialized representation of the list that preserves order. I am not sure whether the default list to string mechanism imposes any kind of order, but it wouldn't surprise me if it were interpreter-dependent. So, I'd basically build something akin to urlencode that sorts the keys beforehand.
Not that I believe that you method would fail, but I'd rather play with predictable things and avoid undocumented and/or unpredictable behavior. It's true that despite "unordered", dictionaries end up having an order that may even be consistent, but the point is that you shouldn't take that for granted.

Why does CouchDb-python (or do I) confuse strings and dictionaries?

I'm trying to use the Python wrapper for CouchDB to update a database. The file is structured as a nested dictionary as follows.
doc = { ...,
'RLSoo': {'RT_freq': 2, 'tweet': "They're going to play monopoly now.
This makes me feel like an excellent mother. #Sandy #NYC"},
'GiltCityNYC': {},
....}
I would like to put each entry of the larger dicitionary, for example RLSoo into its own document. However, I get an error message when I try the following code.
for key in doc:
db.update(doc[key],all_or_nothing=True)
Error Message
TypeError: expected dict, got <type 'str'>
I don't understand why CouchDB won't accept the dictionary.

According Database.update() method realization and his documentation, first argument should be list of document objects (e.g. list of dicts). Since you doc variable has dict type, direct iteration over it actually iterates over all his keys which are string typed. If I understood your case right, probably your doc contains nested documents as values. So, try just:
db.update(doc.values(), all_or_nothing=True)
And it all first level values are dicts, it should works!

Python data structure design

The data structure should meet the following purpose:
each object is unique with certain key-value pairs
the keys and values are not predetermined and can contain any string value
querying for objects should be fast
Example:
object_123({'stupid':True, 'foo':'bar', ...})
structure.get({'stupid':True, 'foo':'bar', ...}) should return object_123
Optimally this structure is implemented with the standard python data structures available through the standard library.
How would you implement this?

The simplest solution I can think of is to use sorted tuple keys:
def key(d): return tuple(sorted(d.items()))
x = {}
x[key({'stupid':True, 'foo':'bar', ...})] = object_123
x.get(key({'stupid':True, 'foo':'bar', ...})) => object_123
Another option would be to come up with your own hashing scheme for your keys (either by wrapping them in a class or just using numeric keys in the dictionary), but depending on your access pattern this may be slower.

I think SQLite or is what you need. It may not be implemented with standard python structures but it's available through the standard library.

Say object_123 is a dict, which it pretty much looks like. Your structure seems to be a standard dict with keys like (('foo', 'bar'), ('stupid', True)); in other words, tuple(sorted(object_123.items())) so that they're always listed in a defined order.
The reason for the defined ordering is because dict.items() isn't guaranteed to return a list in a given ordering. If your dictionary key is (('foo', 'bar'), ('stupid', True)), you don't want a false negative just because you're searching for (('stupid', True),('foo', 'bar')). Sorting the values is probably the quickest way to protect against that.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sort lexicographically? - python

The word should be "lexicographic" http://www.thefreedictionary.com/Lexicographic Dictionary order. Using the letters as they appear in the strings. As they suggest, don't fold upper- and lower-case together. Just use the Python built-in list.sort() method.

Related

Is the order of execution guaranteed when looping over a string?

Why django uses tuple of tuples to store static dictionaries and should i do the same?

How To Create a Unique Key For A Dictionary In Python

Why does CouchDb-python (or do I) confuse strings and dictionaries?

Python data structure design

Categories

Resources