I have a list of dictionaries that is encoded:
[u"{'name':'Tom', 'uid':'asdlfkj223'}", u"{'name':'Jerry', 'uid':'alksd32'}", ...]
Is there anyway I can create a list of just the values of the key name?
Even better if someone knows Django ORM well enough to pull down a list of a data/column with properties from a PostgreSQL database.
Thanks!
To get only that value for the name column from the DB table, use:
names = Person.objects.values_list('name', flat=True)
(as per https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.values_list)
otherwise, given
people = [{'name':'Tom', 'uid':'asdlfkj223'}, {'name':'Jerry', 'uid':'alksd32'},]
this should do the job:
names = [person['name'] for person in people]
And you should find out why your data items are strings (containing a string representation of a dict) to start with—it doesn't look like the way it's supposed to be.
Or, if you're actually storing dict's in your database as strings, either prefer JSON over the Python string representation, or if you must use the current format, the AST parsing solution provided in another question here should do the job.
You can use ast.literal_eval:
>>> data = [u"{'name':'Tom', 'uid':'asdlfkj223'}",u"{'name':'Jerry', 'uid':'alksd32'}"]
>>> import ast
>>> [ast.literal_eval(d)['name'] for d in data]
['Tom', 'Jerry']
Related
I am presently using a for loop to print all the required key value pairs of a dictionary. However is there a simpler way to select the required key value pair?
for i in (out['elements']):
out = (i['insights'][0]['details']['socialInfo'])
out_temp.append(out)
The content of out is actually a JSON with list of dictionaries and each dictionary contains a list of dictionaries.
You can use map to generate the new list as well. But I think what you are doing is fine, it is much easier to read than the alternatives.
out_temp = list(map(lambda x: x['insights'][0]['details']['socialInfo'], out['elements']))
I cannot see an unequivocally simpler way to access the data you require. However, you can apply your logic more efficiently via a list comprehension:
out_temp = [i['insights'][0]['details']['socialInfo'] for i in out['elements']]
Whether or not this is also simpler is open to debate.
I am using Python 3.6 to process the data I receive in a text file containing a Dict having sorted keys. An example of such file can be:
{"0.1":"A","0.2":"B","0.3":"C","0.4":"D","0.5":"E","0.6":"F","0.7":"G","0.8":"H","0.9":"I","1.0":"J"}
My data load and transform is simple - I load the file, and then transform the dict into a list of tuples. Simplified it looks like this:
import json
import decimal
with open('test.json') as fp:
o = json.loads(fp.read())
l = [(decimal.Decimal(key), val) for key, val in o.items()]
is_sorted = all(l[i][0] <= l[i+1][0] for i in range(len(l)-1))
print(l)
print('Sorted:', is_sorted)
The list is always sorted in Windows and never in Linux. I know that I can sort the list manually, but since the data file is always sorted already and rather big, I'm looking for a different approach. Is there a way to somehow force json package to load the data to my dict sorted in both Windows and Linux?
For the clarification: I have no control over the structure of data I receive. My goal is to find the most efficient method to load the data into the list of tuples for further processing from what I get.
A dictionary is just a mapping between its keys and corresponding values. It doesn't have any order. It doesn't make sense to say you always find them sorted. In addition, any dictionary member access is O(1) so it's probably fast enough for your need. In case you think you still need some order, ordered dictionary may be useful.
Dicts are unordered objects in Python, so the problem you're running into is actually by design.
If you want to get a sorted list of tuples, you can do something like:
sorted_tuples = sorted(my_dict.items(),key=lambda x:x[0])
or
import operator
sorted_tuples = sorted(my_dict.items(),key=operator.itemgetter(0)
The dict method .items() converts the dict to a list of tuples and sorted() sorts that list. The key parameter to sorted explains how to sort the list. Both lambda x: x[0] and operator.itemgetter(0) select the first element of the tuple (the key from the original dict), and sort on that element.
I want to access to an element of a dictionary with a string.
For example, I have a dictionary like this:
data = {"masks": {"id": "valore"}}
I have one string campo="masks,id" I want to split this string with this campo.split(','). I obtain ['masks', 'id'] and with this I want to access to the element data["masks"]["id"].
This dictionary is an example, my dictionaries have more complexity. The point is that I want to access to the element data["masks"]["id"] with an input string "masks,id", or to the element data["masks"] with the string "masks" and to the element data["masks"]["id"]["X"] with the input string "masks,id,X" and so on.
How can I do this?
However, I won't recommend you to use the following method, as python dict is not meant to be accessed the way you want it to be, but since in Python you can change the object type at your own risk, I would like to attach the snippet which would get the work done for you.
So what I do is iterate over the keys and at each iteration fetch the child dictionary is present else put empty dictionary, the .get() method used, returns empty dict if the key was not found.
data = {"masks": {"id": "valore"}}
text = "masks, id"
nested_keys = text.split(", ")
nested_dict = data
for key in nested_keys:
nested_dict = nested_dict.get(key, {})
if (isinstance(nested_dict, str)):
print nested_dict
The point is that you are coming up with requirements that do not match the capability of the python-built-in dictionaries.
If you want to have nested maps that do this kind of automated "splitting" of a single key string like "masks, id, X" then ... you will have to implement that yourself.
In other words: the answer is - the built-in dictionary can't do that for you.
So, the "real" thing to do here: step back and carefully look into your requirements to understand exactly what you want to do; and why you want to do that. And going from there look for the best design to support that.
From an implementation side, I think what you "need" would roughly look like:
check if the provided "key" matches "key1,key2,key3"
if so, split that key into its sub-keys
then check if the "out dictionary" has a value for key1
then check, if the value for key1 is a dictionary
then check if that "inner" dictionary has a value for key2
...
and so on.
What is the best way to generate a unique key for the contents of a dictionary. My intention is to store each dictionary in a document store along with a unique id or hash so that I don't have to load the whole dictionary from the store to check if it exists already or not. Dictionaries with the same keys and values should generate the same id or hash.
I have the following code:
import hashlib
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
print str(a)
print hashlib.sha1(str(a)).hexdigest()
print hashlib.sha1(str(b)).hexdigest()
The last two print statements generate the same string. Is this is a good implementation? or are there any pitfalls with this approach? Is there a better way to do this?
Update
Combining suggestions from the answers below, the following might be a good implementation
import hashlib
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
def get_id_for_dict(dict):
unique_str = ''.join(["'%s':'%s';"%(key, val) for (key, val) in sorted(dict.items())])
return hashlib.sha1(unique_str).hexdigest()
print get_id_for_dict(a)
print get_id_for_dict(b)
I prefer serializing the dict as JSON and hashing that:
import hashlib
import json
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
# Python 2
print hashlib.sha1(json.dumps(a, sort_keys=True)).hexdigest()
print hashlib.sha1(json.dumps(b, sort_keys=True)).hexdigest()
# Python 3
print(hashlib.sha1(json.dumps(a, sort_keys=True).encode()).hexdigest())
print(hashlib.sha1(json.dumps(b, sort_keys=True).encode()).hexdigest())
Returns:
71083588011445f0e65e11c80524640668d3797d
71083588011445f0e65e11c80524640668d3797d
No - you can't rely on particular order of elements when converting dictionary to a string.
You can, however, convert it to sorted list of (key,value) tuples, convert it to a string and compute a hash like this:
a_sorted_list = [(key, a[key]) for key in sorted(a.keys())]
print hashlib.sha1( str(a_sorted_list) ).hexdigest()
It's not fool-proof, as a formating of a list converted to a string or formatting of a tuple can change in some future major python version, sort order depends on locale etc. but I think it can be good enough.
A possible option would be using a serialized representation of the list that preserves order. I am not sure whether the default list to string mechanism imposes any kind of order, but it wouldn't surprise me if it were interpreter-dependent. So, I'd basically build something akin to urlencode that sorts the keys beforehand.
Not that I believe that you method would fail, but I'd rather play with predictable things and avoid undocumented and/or unpredictable behavior. It's true that despite "unordered", dictionaries end up having an order that may even be consistent, but the point is that you shouldn't take that for granted.
I'm thinking to create a checksum of a dict to know if it was modified or not
For the moment i have that:
>>> import hashlib
>>> import pickle
>>> d = {'k': 'v', 'k2': 'v2'}
>>> z = pickle.dumps(d)
>>> hashlib.md5(z).hexdigest()
'8521955ed8c63c554744058c9888dc30'
Perhaps a better solution exists?
Note: I want to create an unique id of a dict to create a good Etag.
EDIT: I can have abstract data in the dict.
Something like this:
reduce(lambda x,y : x^y, [hash(item) for item in d.items()])
Take the hash of each (key, value) tuple in the dict and XOR them alltogether.
#katrielalex
If the dict contains unhashable items you could do this:
hash(str(d))
or maybe even better
hash(repr(d))
In Python 3, the hash function is initialized with a random number, which is different for each python session. If that is not acceptable for the intended application, use e.g. zlib.adler32 to build the checksum for a dict:
import zlib
d={'key1':'value1','key2':'value2'}
checksum=0
for item in d.items():
c1 = 1
for t in item:
c1 = zlib.adler32(bytes(repr(t),'utf-8'), c1)
checksum=checksum ^ c1
print(checksum)
I would recommend an approach very similar to the one your propose, but with some extra guarantees:
import hashlib, json
hashlib.md5(json.dumps(d, sort_keys=True, ensure_ascii=True).encode('utf-8')).hexdigest()
sort_keys=True: keep the same hash if the order of your keys changes
ensure_ascii=True: in case you have some non-ascii characters, to make sure the representation does not change
We use this for our ETag.
I don't know whether pickle guarantees you that the hash is serialized the same way every time.
If you only have dictionaries, I would go for o combination of calls to keys(), sorted(), build a string based on the sorted key/value pairs and compute the checksum on that
I think you may not realise some of the subtleties that go into this. The first problem is that the order that items appear in a dict is not defined by the implementation. This means that simply asking for str of a dict doesn't work, because you could have
str(d1) == "{'a':1, 'b':2}"
str(d2) == "{'b':2, 'a':1}"
and these will hash to different values. If you have only hashable items in the dict, you can hash them and then join up their hashes, as #Bart does or simply
hash(tuple(sorted(hash(x) for x in d.items())))
Note the sorted, because you have to ensure that the hashed tuple comes out in the same order irrespective of which order the items appear in the dict. If you have dicts in the dict, you could recurse this, but it will be complicated.
BUT it would be easy to break any implementation like this if you allow arbitrary data in the dictionary, since you can simply write an object with a broken __hash__ implementation and use that. And you can't use id, because then you might have equal items which compare different.
The moral of the story is that hashing dicts isn't supported in Python for a reason.
As you said, you wanted to generate an Etag based on the dictionary content, OrderedDict which preserves the order of the dictionary may be better candidate here. Just iterator through the key,value pairs and construct your Etag string.