Group dictionary values together based on a key - python

I am looking to group values in an input set together with the first element in a tuple acting as the key. The second elements need to be grouped together to a list based on the common key. Output needs to be a list with tuples.
# Input set
values = {(304008, 2020.0), (304008, 2017.0), (250128, 2020.0), (93646, 2020.0), (93646, 2017.0)}
# Current workflow
keys = {i[0] for i in values}
id_dict = dict()
for k in keys:
id_dict[k] = [int(i[1]) for i in values if i[0] == k]
lst2 = list(id_dict.items())
# Expected output
# [(250128, [2020]), (304008, [2017, 2020]), (93646, [2020, 2017])]
I have the expected output, but the whole process is too slow. I am looking to make it faster. I was looking at groupby functions, but I can't seem to make them work.

You can use itertools.groupby to accomplish this. Basically groupby the first element in the tuple, then make a list of the second elements in each group.
>>> from itertools import groupby
>>> [(k, [i[1] for i in g]) for k, g in groupby(sorted(values), key=lambda i: i[0])]
[(93646, [2017.0, 2020.0]), (250128, [2020.0]), (304008, [2017.0, 2020.0])]

You can use setdefault to make a dict with key as first item of tuple and iterate on the set to populate it in single shot.
The use list constructor to get the required list. See below:
>>> values = {(304008, 2020.0), (304008, 2017.0), (250128, 2020.0), (93646, 2020.0), (93646, 2017.0)}
>>> info = {}
>>> for elements in values:
... info.setdefault(elements[0], []).append(elements[1])
...
>>> list(info.items())
[(304008, [2017.0, 2020.0]), (93646, [2017.0, 2020.0]), (250128, [2020.0])]
>>>
This does not use groupby but avoids your second loop.

Related

sorting a list by names in python

I have a list of filenames. I need to group them based on the ending names after underscore ( _ ). My list looks something like this:
[
'1_result1.txt',
'2_result2.txt',
'3_result2.txt',
'4_result3.txt',
'5_result4.txt',
'6_result1.txt',
'7_result2.txt',
'8_result3.txt',
]
My end result should be:
List1 = ['1_result1.txt', '6_result1.txt']
List2 = ['2_result2.txt', '3_result2.txt', '7_result2.txt']
List3 = ['4_result3.txt', '8_result3.txt']
List4 = ['5_result4.txt']
This will come down to making a dictionary of lists, then iterating the input and adding each item to its proper list:
output = {}
for item in inlist:
output.setdefault(item.split("_")[1], []).append(item)
print output.values()
We use setdefault to make sure there's a list for the entry, then add our current filename to the list. output.values() will return just the lists, not the entire dictionary, which appears to be what you want.
using defaultdict from collections module:
from collections import defaultdict
output = defaultdict(list)
for file in data:
output[item.split("_")[1]].append(file)
print output.values()
using groupby from itertools module:
data.sort(key=lambda x: x.split('_')[1])
for key, group in groupby(data, lambda x: x.split('_')[1]):
print list(group)
Starting with Python 2.4, both list.sort() and sorted() added a key parameter to specify a function to be called on each list element prior to making comparisons.
The value of the key parameter should be a function that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record.
So if l is the name of your list then you could use something like :
l.sort(key=lambda s: s.split('_')[1])
More information about key functions at here

Print both the duplicate values name from the nested list

students=[['Ash',85.25],['Kai',85.25],['Ray',75],['Jay',55.5]]
output:Ash
Kai
I'm trying to solve a task and i'm new in python.I am not getting what i want can anyone explain me how one can do it
One option would be to group the values into a defaultdict(list):
>>> from collections import defaultdict
>>>
>>> students = [['Ash',85.25],['Kai',85.25],['Ray',75],['Jay',55.5]]
>>> d = defaultdict(list)
>>> for value, key in students:
... d[key].append(value)
...
>>> for value in d.itervalues():
... if len(value) > 1:
... print(value)
...
['Ash', 'Kai']
I would do it like that:
students = [['Ash', 85.25], ['Kai', 85.25], ['Ray', 75], ['Jay', 55.5]]
common_names = []
for i, i_x in enumerate(students):
for i_y in students[:i] + students[i + 1:]:
if i_x[1] == i_y[1]:
common_names.append(i_x[0])
print(common_names)
#['Ash', 'Kai']
# or if you want it to print every entry in a single line:
print('\n'.join(x for x in common_names))
#Ash
#kai
Explain:
I grab an object from the original list students. Object is i_x and its ['Ash', 85.25] on the first iteration for example.
Then i slice the list students[:i] + students[i + 1:] to create another one in memory that contains all the elements of the original one apart from i_x
I check to see if there is any item in the newly created list that has the same [1] index value as that of i_x. If yes, i append the i_x[0] value to a third list that holds the results.
I do this for as many elements as there are originally in the students list.
Can anybody provide a list comprehension for the above?

Extracting keys-values from dictionary

import random
dictionary = {'dog': 1,'cat': 2,'animal': 3,'horse': 4}
keys = random.shuffle(list(dictionary.keys())*3)
values = list(dictionary.values())*3
random_key = []
random_key_value = []
random_key.append(keys.pop())
random_key_value.append(???)
For random_key_values.append, I need to add the value that corresponds to the key that was popped. How can I achieve this? I need to make use of multiples of the list and I can't multiply a dictionary directly, either.
I'm going on python (you should specify the language in your question).
If I understand, you want to multiply the elements in the dictionary. So
list(dictionary.keys()) * 3
is not your solution: [1,2] * 3 results in [1,2,1,2,1,2]
Try instead list comprehension:
[i * 3 for i in dictionary.keys()]
To take into account the order (because you shuffle it) shuffle the keys before the multiplication, then create the values list (in the same order that the shuffled keys) and finally multiply the keys:
keys = dictionary.keys()
random.shuffle(keys)
values = [dictionary[i]*3 for i in keys]
keys = [i * 3 for i in keys]
And finally:
random_key.append(keys.pop())
random_key_value.append(values.pop())
Also take care about the random function, it doesn't work as you are using it. See the documentation.

groupby iterator not adding to list in dictionary comprehension

I have a db query that returns a list. I then do a a dictionary comprehension like so:
results = {product: [g for g in group] for product, group in groupby(db_results, lambda x: x.product_id)}
The problem is that the value of the dictionary is only returning 1 value. I assume this do to the fact that the group is an iterator.
The following returns each item of the group, so I know that they are there:
groups = groupby(db_results, lambda x: x.product_id)
for k,g in groups:
if k==1001:
print list(g)
I am trying to get all the values of g in the above in a list whose key is the key of dictionary.
I've tried many variations like:
blah = dict((k,list(v)) for k,v in groupby(db_results, key=lambda x: x.product_id))
but I can't get it right.
If you insist on using groupby, then you need to make sure that the input is sorted byt the same key that you group on, however, I think I would suggest that you use defaultdict instead:
from collections import defaultdict
blah = defaultdict(list)
for item in db_results:
blah[item.product_id].append(item)

Comparing Two Arrays resulting into Third Array

I have two Arrays:
firstArray=[['AF','AFGHANISTAN'],['AL','ALBANIA'],['DZ','ALGERIA'],['AS','AMERICAN SAMOA']]
secondArray=[[1,'AFGHANISTAN'],[3,'AMERICAN SAMOA']]
So I just need an Array which is like
thirdArray=[[1,'AF'],[3,'AS']]
I tried any(e[1] == firstArray[i][1] for e in secondArray)
It returned me True and false if second element of both array matches. but i don't know how to build the third array.
First, convert firstArray into a dict with the country as the key and abbreviation as the value, then just look up the abbreviation for each country in secondArray using a list comprehension:
abbrevDict = {country: abbrev for abbrev, country in firstArray}
thirdArray = [[key, abbrevDict[country]] for key, country in secondArray]
If you are on a Python version without dict comprehensions (2.6 and below) you can use the following to create abbrevDict:
abbrevDict = dict((country, abbrev) for abbrev, country in firstArray)
Or the more concise but less readable:
abbrevDict = dict(map(reversed, firstArray))
It is better to store them into dictionaries:
firstDictionary = {key:value for value, key in firstArray}
# in older versions of Python:
# firstDictionary = dict((key, value) for value, key in firstArray)
then you could get the 3rd array simply by dictionary look-up:
thirdArray = [[value, firstDictionary[key]] for value, key in secondArray]
You could use an interim dict as a lookup:
firstArray=[['AF','AFGHANISTAN'],['AL','ALBANIA'],['DZ','ALGERIA'],['AS','AMERICAN SAMOA']]
secondArray=[[1,'AFGHANISTAN'],[3,'AMERICAN SAMOA']]
lookup = {snd:fst for fst, snd in firstArray}
thirdArray = [[n, lookup[name]] for n, name in secondArray]
If a dictionary would do, there is a special purpose Counter dictionary for exactly this use case.
>>> from collections import Counter
>>> Counter(firstArray + secondArray)
Counter({['AF','AFGHANISTAN']: 1 ... })
Note that the arguments are reversed from what you requested, but that's easily remedied.
The standard way to match data to a key is a dictionary. You can convert firstArray to a dictionary using dict comprehension.
firstDict = {x: y for (y, x) in firstArray}
You can then iterate over your second array using list comprehension.
[[i[0], firstDict[i[1]]] for i in secondArray]
use a list comprehension:
In [119]: fa=[['AF','AFGHANISTAN'],['AL','ALBANIA'],['DZ','ALGERIA'],['AS','AMERICAN SAMOA']]
In [120]: sa=[[1,'AFGHANISTAN'],[3,'AMERICAN SAMOA']]
In [121]: [[y[0],x[0]] for x in fa for y in sa if y[1]==x[1]]
Out[121]: [[1, 'AF'], [3, 'AS']]

Categories

Resources