I searched for sorting a Python dictionary based on value and got various answers on the internet.Tried few of them and finally used Sorted function.
I have simplified the example to make it clear.
I have a dictionary,say:
temp_dict = {'1': '40', '0': '109', '3': '37', '2': '42', '5': '26', '4': '45', '7': '109', '6': '42'}
Now ,to sort it out based on value,I did the following operation(using Operator module):
sorted_temp_dict = sorted(temp_dict.items(), key=operator.itemgetter(1))
The result I'm getting is(The result is a tuple,which is fine for me):
[('0', '109'), ('7', '109'), ('5', '26'), ('3', '37'), ('1', '40'), ('2', '42'), ('6', '42'), ('4', '45')]
The issue is,as you can see,the first two elements of the tuple is not sorted.The rest of the elements are sorted perfectly based on the value.
Not able to find the mistake here.Any help will be great.Thanks
Those are sorted. They are strings, and are sorted lexicographically: '1' is before '2', etc.
If you want to sort by numeric value, you'll need to convert to ints in the key function. For example:
sorted(temp_dict.items(), key=lambda x: int(x[1]))
They are sorted, the issue is that the elements are string , hence -
'109' < '26' # this is true, as they are string
Try converting them to int for the key argument, you can use a lambda such as -
>>> sorted_temp_dict = sorted(temp_dict.items(), key=lambda x: int(x[1]))
>>> sorted_temp_dict
[('5', '26'), ('3', '37'), ('1', '40'), ('6', '42'), ('2', '42'), ('4', '45'), ('7', '109'), ('0', '109')]
The problem is trying to sort with values that are str, and not int. If you first convert the values into int and then sort, it will work.
Related
My question is if there's any way to attribute the numbers in the first column to the ones in the second column. So that I can read the numbers in the second column but have them connected to the ones in the first column in some way, so that I can sort them as I do in the sorted_resistances list but after sorting them I replace them with the values in the first column that we're assigned to each of the values.
For information in the code it's opening up from a file the list that's why it's programed like that
1 30000
2 30511
3 30052
4 30033
5 30077
6 30055
7 30086
8 30044
9 30088
10 30019
11 30310
12 30121
13 30132
with open("file.txt") as file_in:
list_of_resistances = []
for line in file_in:
list_of_resistances.append(int(line.split()[1]))
sorted_resistances = sorted(list_of_resistances)
If you want to keep the correlation between the values in the two columns, you can keep all of the values from each line in a tuple (or list), and then sort the list of tuples using a specific piece by passing a lambda function to the key parameter of the sorted() function that tells it to use the second piece of each tuple as the sort value.
In this example, I used pprint.pprint to make the output the of the lists easier to read.
from pprint import pprint
with open("file.txt") as file_in:
list_of_resistances = []
for line in file_in:
list_of_resistances.append(tuple(line.strip().split(' ')))
print("Unsorted values:")
pprint(list_of_resistances)
sorted_resistances = sorted(list_of_resistances, key=lambda x: x[1])
print("\nSorted values:")
pprint(sorted_resistances)
print("\nSorted keys from column 1:")
pprint([x[0] for x in sorted_resistances])
Output:
Unsorted values:
[('1', '30000'),
('2', '30511'),
('3', '30052'),
('4', '30033'),
('5', '30077'),
('6', '30055'),
('7', '30086'),
('8', '30044'),
('9', '30088'),
('10', '30019'),
('11', '30310'),
('12', '30121'),
('13', '30132')]
Sorted values:
[('1', '30000'),
('10', '30019'),
('4', '30033'),
('8', '30044'),
('3', '30052'),
('6', '30055'),
('5', '30077'),
('7', '30086'),
('9', '30088'),
('12', '30121'),
('13', '30132'),
('11', '30310'),
('2', '30511')]
Sorted keys from column 1:
['1', '10', '4', '8', '3', '6', '5', '7', '9', '12', '13', '11', '2']
I have a following problem. I would like to convert dataframe into list of tuples based on a category. See simple code below:
data = {'product_id': ['5', '7', '8', '5', '30'], 'id_customer': ['1', '1', '1', '3', '3']}
df = pd.DataFrame.from_dict(data)
#desired output is:
result = [('5', '7', '8'), ('5', '30')]
how can I do it please? This question did not help me: Convert pandas dataframe into a list of unique tuple
Use GroupBy.agg with tuple like:
print (df.groupby('id_customer', sort=False)['product_id'].agg(tuple).tolist())
print (df.groupby('id_customer', sort=False)['product_id'].apply(tuple).tolist())
print (list(df.groupby('id_customer', sort=False)['product_id'].agg(tuple)))
print (list(df.groupby('id_customer', sort=False)['product_id'].apply(tuple)))
[('5', '7', '8'), ('5', '30')]
Use groupby.agg:
>>> [tuple(v) for _, v in df.groupby('id_customer')['product_id']]
[('5', '7', '8'), ('5', '30')]
>>>
So I have this tricky dictionary of tuples which I want to filter based on the first occurrence of the informative flag in the value elements. If the flag (which is the element occupying the first position of the tuple) is observed in other keys I will only retain only the first key-value pair in which it occurs and subsequent key-value pairs which contain the flag would be skipped.
old_dict = {'abc':[('abc', '1', '5'), ('def', '1', '5'), ('abcd', '2', '5')],
'def':[('abc', '2', '5'), ('def', '1', '5'), ('abcd', '1', '5')],
'ghi':[('ghi', '1', '5'), ('jkl', '1', '4'), ('mno', '2', '4')]}
I have struggled with a lot of attempts and this latest attempt does not produce anything meaningful.
flgset = set()
new_dict = {}
for elem, tp in old_dict.items():
for flg in tp:
flgset.add(flg[0])
counter = 0
for elem, tp in old_dict.items():
for (item1, item2, item3) in tp:
for flg in flgset:
if flg == item1:
counter = 1
new_dict[elem] = [(item1, item2, item3)]
break
Expected results should be:
new_dict = {'abc':[('abc', '1', '5'), ('def', '1', '5'), ('abcd', '2', '5')],
'ghi':[('ghi', '1', '5'), ('jkl', '1', '4'), ('mno', '2', '4')]}
Thanks in advance.
If i get you correctly, the following should do what you want:
flgset = set()
new_dict = {}
for k, tuple_list in old_dict.items():
# if the key is not in flgset, just keep the k, tuple_list pair
if k not in flgset:
new_dict[k] = tuple_list
# update the elements into flgset
# item in this case is ('abc', '2', '5'),
# since you only want to add the first element, use item[0]
for item in tuple_list:
flgset.add(item[0])
Output as such:
new_dict = {'abc': [('abc', '1', '5'), ('def', '1', '5'), ('abcd', '2', '5')],
'ghi': [('ghi', '1', '5'), ('jkl', '1', '4'), ('mno', '2', '4')]}
flgset = {'abc', 'abcd', 'def', 'ghi', 'jkl', 'mno'}
Others may have more efficient ways to do this, but here's one solution that incorporates your intuitions that you need to loop over old_dict items and use a set:
for key, val in old_dict.items():
if val[0][0] not in set([v[0][0] for v in new_dict.values()]):
new_dict.update({key: val})
Here's a brief explanation of what's going on: First, val[0][0] is the "informative flag" from your dictionary entry (i.e. the first item of the first tuple in the entry list). set([v[0][0] for v in new_dict.values()]) will give you the unique values of that flag in your new dictionary. The inner part is a list comprehension to get all the "flags" and then set will give a unique list. The last line just uses the update method to append to it.
REVISED ANSWER
#VinayPai raises two important issues below in the comments. First, this code is inefficient because it reconstructs the test set each time. Here's the more efficient way he suggests:
flag_list = set()
for key, val in old_dict.items():
if val[0][0] not in flag_list:
new_dict.update({key: val})
flag_list.add(val[0][0])
The second issue is that this will produce inconsistent results because dictionaries are not ordered. One possible solution is to use an OrderedDict. But as #SyntaxVoid suggests, this is only necessary if you're using Python3.5 or earlier (here is a great answer discussing the change). If you can create your data in this fashion, it would solve the problem:
from collections import OrderedDict
old_dict = OrderedDict{'abc':[('abc', '1', '5'), ('def', '1', '5'), ('abcd', '2', '5')],
'def':[('abc', '2', '5'), ('def', '1', '5'), ('abcd', '1', '5')],
'ghi':[('ghi', '1', '5'), ('jkl', '1', '4'), ('mno', '2', '4')]}
I'm looking for the most efficient and pythonic (mainly efficient) way to update a dictionary but keep the old values if an existing key is present. For example...
myDict1 = {'1': ('3', '2'), '3': ('2', '1'), '2': ('3', '1')}
myDict2 = {'4': ('5', '2'), '5': ('2', '4'), '2': ('5', '4')}
myDict1.update(myDict2) gives me the following....
{'1': ('3', '2'), '3': ('2', '1'), '2': ('5', '4'), '5': ('2', '4'), '4': ('5', '2')}
notice how the key '2' exists in both dictionaries and used to have values ('3', '1') but now it has the values from it's key in myDict2 ('5', '4')?
Is there a way to update the dictionary in an efficient manner so as the key '2' ends up having values ('3', '1', '5', '4')? #in no particular order
Thanks in advance
I think the most effective way to do it would be something like this:
for k, v in myDict2.iteritems():
myDict1[k] = myDict1.get(k, ()) + v
But there isn't an update equivalent for what you're looking to do, unfortunately.
What is wrong with 2 in-place update operations?
myDict2.update(myDict1)
myDict1.update(myDict2)
Explanation:
The first update will overwrite the already existing keys with the values from myDict1, and insert all key value pairs in myDict2 which don't exist.
The second update will overwrite the already existing keys in myDict1 with values from myDict2, which are actually the values from myDict1 itself due to the 1st operation. Any new key value pairs inserted will be from the original myDict2.
This of course is conditional to the fact that you don't care about preserving myDict2
Update: With python3, you can do this without having to touch myDict2
myDict1 = {**myDict1, **myDict2, **myDict1}
which would actually be same as
myDict1 = {**myDict2, **myDict1}
Output
{'1': ('3', '2'), '3': ('2', '1'), '2': ('3', '1'), '4': ('5', '2'), '5': ('2', '4')}
The fastest way to merge large dictionaries is to introduce an intermediate object that behaves as though the dicts are merged without actually merging them (see #Raymond Hettinger's answer):
from collections import ChainMap
class MergedMap(ChainMap):
def __getitem__(self, key):
result = []
found = False
for mapping in self.maps:
try:
result.extend(mapping[key])
found = True
except KeyError:
pass
return result if found else self.__missing__(key)
merged = MergedMap(myDict1, myDict2)
Whether it is applicable depends on how you want to use the combined dict later.
It uses collections.ChainMap from Python 3.3+ for convenience to provide the full MutableMapping interface; you could implement only parts that you use on older Python versions.
Perhaps a defaultdict would help
from collections import defaultdict
myDict0= {'1': ('3', '2'), '3': ('2', '1'), '2': ('3', '1')}
myDict2 = {'4': ('5', '2'), '5': ('2', '4'), '2': ('5', '4')}
myDict1 = defaultdict(list)
for (key, value) in myDict0.iteritems():
myDict1[key].extend(value)
for (key, value) in myDict2.iteritems():
myDict1[key].extend(value)
print myDict1
defaultdict(<type 'list'>, {'1': ['3', '2'], '3': ['2', '1'], '2': ['3', '1', '5', '4'], '5': ['2', '4'], '4': ['5', '2']})
No there's no easy way to do it I'm afraid.
The best way is probably iterating and merging. Something like:
for key in myDict1.iterkeys():
# Thank you to user2246674 and Nolen Royalty to help me optimise this in their comments
if key in myDict2:
myDict2[key] = myDict2[key] + myDict1[key]
else:
myDict2[key] = myDict1[key]
print activities
activities = sorted(activities,key = lambda item:item[1])
print activities
Activities in this case is a list of tuples like (start_number,finish_number) the output of the above code according to me should be the list of values sorted according the the increasing order of finish_number. When I tried the above code in shell I got the following output. I am not sure why the second list is not sorted according the the increasing order of the finish_number. Please help me in understanding this.
[('1', '4'), ('3', '5'), ('0', '6'), ('5', '7'), ('3', '9'), ('5', '9'), ('6', '10'), ('8', '11'), ('8', '12'), ('2', '14'), ('12', '16')]
[('6', '10'), ('8', '11'), ('8', '12'), ('2', '14'), ('12', '16'), ('1', '4'), ('3', '5'), ('0', '6'), ('5', '7'), ('3', '9'), ('5', '9')]
You are sorting strings instead of integers: in that case, 10 is "smaller" than 4. To sort on integers, convert it to this:
activites = sorted(activities,key = lambda item:int(item[1]))
print activities
Results in:
[('1', '4'), ('3', '5'), ('0', '6'), ('5', '7'), ('3', '9'), ('5', '9'), ('6', '10'), ('8', '11'), ('8', '12'), ('2', '14'), ('12', '16')]
Your items are being compared as strings, not as numbers. Thus, since the 1 character comes before 4 lexicographically, it makes sense that 10 comes before 4.
You need to cast the value to an int first:
activities = sorted(activities,key = lambda item:int(item[1]))
You are sorting strings, not numbers. Strings get sorted character by character.
So, for example '40' is greater than '100' because character 4 is larger than 1.
You can fix this on the fly by simply casting the item as an integer.
activities = sorted(activities,key = lambda item: int(item[1]))
It's because you're not storing the number as a number, but as a string. The string '10' comes before the string '2'. Try:
activities = sorted(activities, key=lambda i: int(i[1]))
Look for a BROADER solution to your problem: Convert your data from str to int immediately on input, work with it as int (otherwise you'll be continually be bumping into little problems like this), and format your data as str for output.
This principle applies generally, e.g. when working with non-ASCII string data, do UTF-8 -> unicode -> UTF-8; don't try to manipulate undecoded text.