Implementing "select distinct ... from ..." over a list of Python dictionaries

Implementing "select distinct ... from ..." over a list of Python dictionaries - python

Here is my problem: I have a list of Python dictionaries of identical form, that are meant to represent the rows of a table in a database, something like this:
[ {'ID': 1,
'NAME': 'Joe',
'CLASS': '8th',
... },
{'ID': 1,
'NAME': 'Joe',
'CLASS': '11th',
... },
...]
I have already written a function to get the unique values for a particular field in this list of dictionaries, which was trivial. That function implements something like:
select distinct NAME from ...
However, I want to be able to get the list of multiple unique fields, similar to:
select distinct NAME, CLASS from ...
Which I am finding to be non-trivial. Is there an algorithm or Python included function to help me with this quandry?
Before you suggest loading the CSV files into a SQLite table or something similar, that is not an option for the environment I'm in, and trust me, that was my first thought.

If you want it as a generator:
def select_distinct(dictionaries, keys):
seen = set()
for d in dictionaries:
v = tuple(d[k] for k in keys)
if v in seen: continue
yield v
seen.add(v)
if you want the result in some other form (e.g., a list instead of a generator) it's not hard to alter this (e.g., .append to the initially-empty result list instead of yielding, and return the result list at the end).
To be called, of course, as
for values_tuple in select_distinct(thedicts, ('NAME', 'CLASS')):
...
or the like.

distinct_list = list(set([(d['NAME'], d['CLASS']) for d in row_list]))
where row_list is the list of dicts you have

One can implement the task using hashing. Just hash the content of rows that appear in the distinct query and ignore the ones with same hash.

Related

How to parse a list of dictionaries in Python 3

I'm using python(requests) to query an API. The JSON response is list of dictionaries, like below:
locationDescriptions = timeseries.publish.get('/GetLocationDescriptionList')['LocationDescriptions']
print(locationDescriptions[0])
{'Name': 'Test',
'Identifier': '000045',
'UniqueId': '3434jdfsiu3hk34uh8',
'IsExternalLocation': False,
'PrimaryFolder': 'All Locations',
'SecondaryFolders': [],
'LastModified': '2021-02-09T06:01:25.0446910+00:00',}
I'd like to extract 1 field (Identifier) as a list for further analysis (count, min, max, etc.) but I'm having a hard time figuring out how to do this.

Python has a syntax feature called "list comprehensions", and you can do something like:
identifiers = [item['Identifier'] for item in locationDescriptions]
Here is a small article that gives you more details, and also shows an alternate way using map. And here is one of the many resources detailing list comprehensions, should you need it.

You could extract them with a list comprehension:
identifiers = [i['Identifier'] for i in locationDescriptions]
You allude to needing a list of numbers (count, min, max, etc...), in which case:
identifiers = [int(i['Identifier']) for i in locationDescriptions]

You can do
ids = [locationDescription['Identifier'] for locationDescription in locationDescriptions]
You will have a list of identifiers as a string.
Best regards

find an item inside a list of dictionaries

Say I have a list of dictionaries.
each dict in the list has 3 elements.
Name, id and status.
list_of_dicts = [{'id':1, 'name':'Alice', 'status':0},{'id':2, 'name':'Bob', 'status':0},{'id':3, 'name':'Robert', 'status':1}]
so I get:
In[20]: print list_of_dicts
Out[20]:
[{'id': 1, 'name': 'Alice', 'status': 0},
{'id': 2, 'name': 'Bob', 'status': 0},
{'id': 3, 'name': 'Robert', 'status': 1}]
If i recieve a name, how can I get its status without iterating on the list?
e.g. I get 'Robert' and I want to output 1. thank you.

for example you can use pandas
import pandas as pd
list_of_dicts = [{'id':1, 'name':'Alice', 'status':0},{'id':2, 'name':'Bob', 'status':0},{'id':3, 'name':'Robert', 'status':1}]
a = pd.DataFrame(list_of_dicts)
a.loc[a['name'] == 'Robert']
and play with dataframes its very fast because write on c++ and easy (like sql queries)

As you found you have to iterate (unless you are able to change your data structure to an enclosing dict) why don't you just do it?
>>> [d['status'] for d in list_of_dicts if d['name']=='Robert']
[1]
Despite this, I recommend considering a map type (like dict) every time you see some 'id' field in a proposed data structure. If it's there you probably want to use it for general identification, instead of carrying dicts around. They can be used for relations also, and transfer easily into a relational database if you need it later.

I don't think you can do what you ask without iterating through the dictionary:
Best case, you'll find someone that suggests you a method that hides the iteration.
If what really concerns you is the speed, you may break your iteration as soon as you find the first valid result:
for iteration, elements in enumerate(list_of_dicts):
if elements['name'] == "Robert":
print "Elements id: ", elements['id']
break
print "Iterations: ", iteration
# OUTPUT: Elements id: 3, Iterations: 1
Note that numbers of iteration may vary, since dictionaries are not indexed, and if you have more "Roberts", only for one the "id" will be printed

It's not possible to do this without iteration.
However, but you can transform you dictionary into a different data structure, such as a dictionary where names are the keys:
new_dict = {person["name"]: {k: v for k, v in person.items() if k != "name"} for person in list_of_dicts}
Then you can get the status like so:
new_dict["Robert"]["status"]
# 1
Additionally, as #tobias_k mentions in the comments, you can keep the internal dictionary the same:
{person["name"]: person for person in list_of_dicts}
The only issue with the above approaches is that it can't handle multiple names. You can instead add the unique id into the key to differentiate between names:
new_dict = {(person["name"], person["id"]): person["status"] for person in list_of_dicts}
Which can be called like this:
new_dict["Robert", 3]
# 1
Even though it takes extra computation(only once) to create these data structures, the lookups afterwards will be O(1), instead of iterating the list every time when you want to search a name.

Your list_of_dicts cannot be reached without a loop so for your desire your list should be modified a little like 1 dict and many lists in it:
list_of_dicts_modified = {'name':['Alice', 'Bob', 'Robert'],'id':[1, 2, 3], 'status': [0, 0, 1]}
index = list_of_dicts_modified['name'].index(input().strip())
print('Name: {0} ID: {1} Status: {2}'.format(list_of_dicts_modified['name'][index], list_of_dicts_modified['id'][index], list_of_dicts_modified['status'][index]))
Output:
C:\Users\Documents>py test.py
Alice
Name: Alice ID: 1 Status: 0

Refactoring: Merging two dictionaries, but ignoring None values

I need a bit of Python refactoring advice.
I have a list of dict objects (new_monitors), which can be empty. When there are new monitors, however, I want to add a bunch of fields to those monitors.
For each monitor, I would like to append all not None fields from the DogDump.HIDE_FIELDS dict:
if new_monitors:
for monitor in new_monitors:
for key, value in DogDump.HIDE_FIELDS.items():
if value:
monitor[key] = value
Note: This snippet below worked very well, but it included all of the None fields. I do not want the None fields!
if new_monitors:
for monitor in new_monitors:
monitor.update(DogDump.HIDE_FIELDS)
How can I refactor this snippet that looks more pythonic, but still maintain good readability?

Not sure which is really the most "pythonic" way of handling your need to filter DogDump.HIDE_FIELDS dict before adding the relevant key / value pairs to your monitor dict. One way would be to perform the "filtering" with dict comprehension.
Also, I would think that you could "filter" your DogDump.HIDE_FIELDS dict before your loop rather than repeating this operation for each loop iteration (unless there are other operations taking place that mutate DogDump.HIDE_FIELDS while you are iterating).
Example of "filtering" with dict comprehension (dump refers to your DogDump.HIDE_FIELDS dict):
monitor = {'key': 'value'}
dump = {'a': 1, 'b': None}
dump_filtered = {k:v for (k,v) in dump.items() if v}
monitor.update(dump_filtered)
print(monitor)
# OUTPUT
# {'key': 'value', 'a': 1}

How do I turn list values into an array with an index that matches the other dic values?

Hoping someone can help me out. I've spent the past couple hours trying to solve this, and fair warning, I'm still fairly new to python.
This is a repost of a question I recently deleted. I've misinterpreted my code in the last example.The correct example is:
I have a dictionary, with a list that looks similar to:
dic = [
{
'name': 'john',
'items': ['pants_1', 'shirt_2','socks_3']
},
{
'name': 'bob',
items: ['jacket_1', 'hat_1']
}
]
I'm using .append for both 'name', and 'items', which adds the dic values into two new lists:
for x in dic:
dic_name.append(dic['name'])
dic_items.append(dic['items'])
I need to split the item value using '_' as the delimiter, so I've also split the values by doing:
name, items = [i if i is None else i.split('_')[0] for i in dic_name],
[if i is None else i.split('_')[0] for i in chain(*dic_items)])
None is used in case there is no value. This provides me with a new list for name, items, with the delimiter used. Disregard the fact that I used '_' split for names in this example.
When I use this, the index for name, and item no longer match. Do i need to create the listed items in an array to match the name index, and if so, how?
Ideally, I want name[0] (which is john), to also match items[0] (as an array of the items in the list, so pants, shirt, socks). This way when I refer to index 0 for name, it also grabs all the values for items as index 0. The same thing regarding the index used for bob [1], which should match his items with the same index.
#avinash-raj, thanks for your patience, as I've had to update my question to reflect more closely to the code I'm working with.

I'm reading a little bit between the lines but are you trying to just collapse the list and get rid of the field names, e.g.:
>>> dic = [{'name': 'john', 'items':['pants_1','shirt_2','socks_3']},
{'name': 'bob', 'items':['jacket_1','hat_1']}]
>>> data = {d['name']: dict(i.split('_') for i in d['items']) for d in dic}
>>> data
{'bob': {'hat': '1', 'jacket': '1'},
'john': {'pants': '1', 'shirt': '2', 'socks': '3'}}
Now the data is directly related vs. indirectly related via a common index into 2 lists. If you want the dictionary split out you can always
>>> dic_name, dic_items = zip(*data.items())
>>> dic_name
('bob', 'john')
>>> dic_items
({'hat': '1', 'jacket': '1'}, {'pants': '1', 'shirt': '2', 'socks': '3'})

You need a list of dictionaries because the duplicate keys name and items are overwritten:
items = [[i.split('_')[0] for i in d['items']] for d in your_list]
names = [d['name'] for d in your_list] # then grab names from list
Alternatively, you can do this in one line with the built-in zip method and generators, like so:
names, items = zip(*((i['name'], [j.split('_')[0] for j in i['items']]) for i in dic))

From Looping Techniques in the Tutorial.
for name, items in div.items():
names.append(name)
items.append(item)
That will work if your dict is structured
{'name':[item1]}

In the loop body of
for x in dic:
dic_name.append(dic['name'])
dic_items.append(dic['items'])
you'll probably want to access x (to which the items in dic will be assigned in turn) rather than dic.

Accessing elements of Python dictionary by index

Consider a dict like
mydict = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'} }
How do I access for instance a particular element of this dictionary?
for instance, I would like to print the first element after some formatting the first element of Apple which in our case is 'American' only?
Additional information
The above data structure was created by parsing an input file in a python function. Once created however it remains the same for that run.
I am using this data structure in my function.
So if the file changes, the next time this application is run the contents of the file are different and hence the contents of this data structure will be different but the format would be the same.
So you see I in my function I don't know that the first element in Apple is 'American' or anything else so I can't directly use 'American' as a key.

Given that it is a dictionary you access it by using the keys. Getting the dictionary stored under "Apple", do the following:
>>> mydict["Apple"]
{'American': '16', 'Mexican': 10, 'Chinese': 5}
And getting how many of them are American (16), do like this:
>>> mydict["Apple"]["American"]
'16'

If the questions is, if I know that I have a dict of dicts that contains 'Apple' as a fruit and 'American' as a type of apple, I would use:
myDict = {'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'} }
print myDict['Apple']['American']
as others suggested. If instead the questions is, you don't know whether 'Apple' as a fruit and 'American' as a type of 'Apple' exist when you read an arbitrary file into your dict of dict data structure, you could do something like:
print [ftype['American'] for f,ftype in myDict.iteritems() if f == 'Apple' and 'American' in ftype]
or better yet so you don't unnecessarily iterate over the entire dict of dicts if you know that only Apple has the type American:
if 'Apple' in myDict:
if 'American' in myDict['Apple']:
print myDict['Apple']['American']
In all of these cases it doesn't matter what order the dictionaries actually store the entries. If you are really concerned about the order, then you might consider using an OrderedDict:
http://docs.python.org/dev/library/collections.html#collections.OrderedDict

As I noticed your description, you just know that your parser will give you a dictionary that its values are dictionary too like this:
sampleDict = {
"key1": {"key10": "value10", "key11": "value11"},
"key2": {"key20": "value20", "key21": "value21"}
}
So you have to iterate over your parent dictionary. If you want to print out or access all first dictionary keys in sampleDict.values() list, you may use something like this:
for key, value in sampleDict.items():
print value.keys()[0]
If you want to just access first key of the first item in sampleDict.values(), this may be useful:
print sampleDict.values()[0].keys()[0]
If you use the example you gave in the question, I mean:
sampleDict = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'}
}
The output for the first code is:
American
Indian
And the output for the second code is:
American
EDIT 1:
Above code examples does not work for version 3 and above of python; since from version 3, python changed the type of output of methods keys and values from list to dict_values. Type dict_values is not accepting indexing, but it is iterable. So you need to change above codes as below:
First One:
for key, value in sampleDict.items():
print(list(value.keys())[0])
Second One:
print(list(list(sampleDict.values())[0].keys())[0])

I know this is 8 years old, but no one seems to have actually read and answered the question.
You can call .values() on a dict to get a list of the inner dicts and thus access them by index.
>>> mydict = {
... 'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
... 'Grapes':{'Arabian':'25','Indian':'20'} }
>>>mylist = list(mydict.values())
>>>mylist[0]
{'American':'16', 'Mexican':10, 'Chinese':5},
>>>mylist[1]
{'Arabian':'25','Indian':'20'}
>>>myInnerList1 = list(mylist[0].values())
>>>myInnerList1
['16', 10, 5]
>>>myInnerList2 = list(mylist[1].values())
>>>myInnerList2
['25', '20']

As a bonus, I'd like to offer kind of a different solution to your issue. You seem to be dealing with nested dictionaries, which is usually tedious, especially when you have to check for existence of an inner key.
There are some interesting libraries regarding this on pypi, here is a quick search for you.
In your specific case, dict_digger seems suited.
>>> import dict_digger
>>> d = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'}
}
>>> print(dict_digger.dig(d, 'Apple','American'))
16
>>> print(dict_digger.dig(d, 'Grapes','American'))
None

You can use mydict['Apple'].keys()[0] in order to get the first key in the Apple dictionary, but there's no guarantee that it will be American. The order of keys in a dictionary can change depending on the contents of the dictionary and the order the keys were added.

You can't rely on order of dictionaries, but you may try this:
mydict['Apple'].items()[0][0]
If you want the order to be preserved you may want to use this:
http://www.python.org/dev/peps/pep-0372/#ordered-dict-api

Simple Example to understand how to access elements in the dictionary:-
Create a Dictionary
d = {'dog' : 'bark', 'cat' : 'meow' }
print(d.get('cat'))
print(d.get('lion'))
print(d.get('lion', 'Not in the dictionary'))
print(d.get('lion', 'NA'))
print(d.get('dog', 'NA'))
Explore more about Python Dictionaries and learn interactively here...

Few people appear, despite the many answers to this question, to have pointed out that dictionaries are un-ordered mappings, and so (until the blessing of insertion order with Python 3.7) the idea of the "first" entry in a dictionary literally made no sense. And even an OrderedDict can only be accessed by numerical index using such uglinesses as mydict[mydict.keys()[0]] (Python 2 only, since in Python 3 keys() is a non-subscriptable iterator.)
From 3.7 onwards and in practice in 3,6 as well - the new behaviour was introduced then, but not included as part of the language specification until 3.7 - iteration over the keys, values or items of a dict (and, I believe, a set also) will yield the least-recently inserted objects first. There is still no simple way to access them by numerical index of insertion.
As to the question of selecting and "formatting" items, if you know the key you want to retrieve in the dictionary you would normally use the key as a subscript to retrieve it (my_var = mydict['Apple']).
If you really do want to be able to index the items by entry number (ignoring the fact that a particular entry's number will change as insertions are made) then the appropriate structure would probably be a list of two-element tuples. Instead of
mydict = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'} }
you might use:
mylist = [
('Apple', {'American':'16', 'Mexican':10, 'Chinese':5}),
('Grapes', {'Arabian': '25', 'Indian': '20'}
]
Under this regime the first entry is mylist[0] in classic list-endexed form, and its value is ('Apple', {'American':'16', 'Mexican':10, 'Chinese':5}). You could iterate over the whole list as follows:
for (key, value) in mylist: # unpacks to avoid tuple indexing
if key == 'Apple':
if 'American' in value:
print(value['American'])
but if you know you are looking for the key "Apple", why wouldn't you just use a dict instead?
You could introduce an additional level of indirection by cacheing the list of keys, but the complexities of keeping two data structures in synchronisation would inevitably add to the complexity of your code.

With the following small function, digging into a tree-shaped dictionary becomes quite easy:
def dig(tree, path):
for key in path.split("."):
if isinstance(tree, dict) and tree.get(key):
tree = tree[key]
else:
return None
return tree
Now, dig(mydict, "Apple.Mexican") returns 10, while dig(mydict, "Grape") yields the subtree {'Arabian':'25','Indian':'20'}. If a key is not contained in the dictionary, dig returns None.
Note that you can easily change (or even parameterize) the separator char from '.' to '/', '|' etc.

mydict = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'} }
for n in mydict:
print(mydict[n])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Implementing "select distinct ... from ..." over a list of Python dictionaries - python

distinct_list = list(set([(d['NAME'], d['CLASS']) for d in row_list])) where row_list is the list of dicts you have

One can implement the task using hashing. Just hash the content of rows that appear in the distinct query and ignore the ones with same hash.

Related

How to parse a list of dictionaries in Python 3

find an item inside a list of dictionaries

Refactoring: Merging two dictionaries, but ignoring None values

How do I turn list values into an array with an index that matches the other dic values?

Accessing elements of Python dictionary by index

Categories

Resources