Python3: sorting a list of dictionary keys - python

I have a list of 760 files, from which I extract 2 lines of data, which are then stored in a dictionary:
output = {'file number':data, '1':somedatain1, '2':somedatain2, ... '760':somedatain760}
N.B.
The numbers are strings because they have been obtained by doing an os.listdir('.') in order to get a list of the filenames and splitting the string down. [I could convert this into an integer number (using int()) if needed]
The dictionary is then printed by creating a list of the keys and iterating:
keys = output.keys()
for x in keys:
print(x, '\t', output[x])
However the output is in a random order [because of the unordered nature of a dictionary, which is, I believe, an inherent property - although I don't know why this is] and it would be far more convenient if the output was in numerical order according to the file number. This, then throws up the question:
Given that my list of keys is either
1.keys = ['filename', '2', '555', '764' ... '10']
or, if i change the string of the file number to an integer:
2.keys = ['filename', 2, 555, 764 ... 10]
how do i sort my list of keys according to the numeric value of the file number if it is strings (as shown in 1. above), or if it is of mixed object types (i.e. 1 string and 760 integers as shown in 2 above)?

You can give the sorted() function a key:
sorted(output, key=lambda k: int(k) if k.isdigit() else float('-inf'))
This will sort strings before numbers, however. Note that there is no need to call dict.keys(); iteration over a dictionary already yields a sequence of keys, just call sorted() directly on the dictionary.
Python 3 does not define ordering for strings when compared with numbers, so for any key that is not a digit, float('-inf') (negative infinity) is returned instead to at least put those keys at the start of the ordering.
Demo:
>>> sorted(keys, key=lambda k: int(k) if k.isdigit() else float('-inf'))
['filename', '2', '10', '555', '764']

Just add your list to another variable and then following statement you get correct output:
listofdict = [{'key': value1,'key': value2,.......}]
output = listofdict[::-1]
print(output)

Related

How to sort a list containing frozensets (python)

I have a list of frozensets that I'd like to sort, Each of the frozensets contains a single integer value that results from an intersection operation between two sets:
k = frozenset(w) & frozenset(string.digits)
d[k] = w # w is the value
list(d) # sorted(d) doesn't work since the keys are sets and sets are unordered.
Here is the printed list:
[frozenset({'2'}), frozenset({'1'}), frozenset({'4'}), frozenset({'3'})]
How can I sort the list using the values contained in the sets?
You need to provide function as key to sorted which would accept frozenset as argument and return something which might be compared. If each frozenset has exactly 1 element and said element is always single digit then you might use max function (it will extract that single element, as sole element is always biggest element of frozenset) that is
d1 = [frozenset({'2'}), frozenset({'1'}), frozenset({'4'}), frozenset({'3'})]
d2 = sorted(d1,key=max)
print(d2)
output
[frozenset({'1'}), frozenset({'2'}), frozenset({'3'}), frozenset({'4'})]
If you want to know more read Sorting HOW TO
Previous answers can not sorted correctly, Because of strings
d = [frozenset({'224'}), frozenset({'346'}), frozenset({'2'}), frozenset({'22345'})]
sorted(d, key=lambda x: int(list(x)[0]))
Output:
[frozenset({'2'}),
frozenset({'224'}),
frozenset({'346'}),
frozenset({'22345'})]
Honestly, unless you really need to keep the elements as frozenset, the best might be to generate a list of values upstream ([2, 1, 4, 3]).
Anyway, to be able to sort the frozensets you need to make them ordered elements, for instance by converting to tuple. You can do this transparently using the key parameter of sorted
l = [frozenset({'2'}), frozenset({'1'}), frozenset({'4'}), frozenset({'3'})]
sorted(l, key=tuple)
or natsorted for strings with multiple digits:
from natsort import natsorted
l = [frozenset({'2'}), frozenset({'1'}), frozenset({'14'}), frozenset({'3'})]
natsorted(l, key=tuple)
output:
[frozenset({'1'}), frozenset({'2'}), frozenset({'3'}), frozenset({'14'})]

Sorting a list of string using a custom order stored in another list

I have a list with 2 or 3 character strings with the last character being the same.
example_list = ['h1','ee1','hi1','ol1','b1','ol1','b1']
is there any way to sort this list using the order of another list.
order_list = ['ee','hi','h','b','ol']
So the answer should be something like example_list.sort(use_order_of=order_list)
Which should produce an output like ['ee1','hi1','h1','b1','b1','ol1','ol1']
I have found other questions on StackOverflow but I am still unable find a answer with a good explanation.
You could build an order_map that maps the prefixes to their sorting key, and then use that map for the key when calling sorted:
example_list = ['h1','ee1','hi1','ol1','b1','ol1','b1']
order_list = ['ee','hi','h','b','ol']
order_map = {x: i for i, x in enumerate(order_list)}
sorted(example_list, key=lambda x: order_map[x[:-1]])
This has an advantage over calling order_list.index for each element, as fetching elements from the dictionary is fast.
You can also make it work with elements that are missing from the order_list by using dict.get with a default value. If the default value is small (e.g. -1) then the values that don't appear in order_list will be put at the front of the sorted list. If the default value is large (e.g. float('inf')) then the values that don't appear in order_list will be put at the back of the sorted list.
You can use sorted with key using until the last string of each element in example_list:
sorted(example_list, key=lambda x: order_list.index(x[:-1]))
Ourput:
['ee1', 'hi1', 'h1', 'b1', 'b1', 'ol1', 'ol1']
Note that this assumes all element in example_list without the last character is in order_list
Something like this? It has the advantage of handling duplicates.
sorted_list = [
i
for i, _
in sorted(zip(example_list, order_list), key=lambda x: x[1])
]

Sorting a tuple list by the Counter Python

I have read and tried to implement suggestions from around Stack Overflow.
In Python 3.6+ I have a list of tuples that looks something like this:
tuple_list=[(a=3,b=gt,c=434),(a=4,b=lodf,c=We),(a=3,b=gt,c=434)]
created by
for row in result:
tuple_list.append(var_tuple(row['d'], row['f'], row['q']))
I want to count the number of duplicates in the list and then sort the list so the number with the highest duplicates is at the top so I used
tuple_counter = collections.Counter(tuple(sorted(tup)) for tup in tuple_list)
But this returns in error because
TypeError: unorderable types: int() < str()
I've also tried this but it doesn't seem to sort by the highest counter.
tuple_counter = collections.Counter(tuple_list)
tuple_counter = sorted(tuple_counter, key=lambda x: x[1])
As well as this
tuple_counter = collections.Counter(tuple_list)
tuple_counter = tuple_counter.most_common()
Is there a better way to do this?
tuple contains different types
tuple_counter = collections.Counter(tuple(sorted(tup)) for tup in tuple_list)
This line errors saying that int < str cannot be ordered. before any of this is evaluated, the generator expression must be, and sorted(tup) immediately breaks. Why? From the error, I am confident that tup contains both integers and strings. You can't sort integers and strings in the same list because you can't compare an integer and a string with <. If you have a method of comparing ints and strs, try sorted(tup, key = function) with your function to order ints and strs.
Since you want to count by the number of occurrences, try this:
sorted_tuples = sorted(tuple_list, key = tuple_list.count)
This sorts the tuples using the counter function of tuple_list as a key. If you want to sort descending, do sorted(tuple_list, key = tuple_list.count, reversed = True).

repeatedly calling a method in list comprehension

Let's consider a list called 'my_list` whose contents are as follows:
['10', '100', '1,000', '10,000', 100,000']
I want to verify that my_list is a list of stringified integers, which are multiples of 10 and are sorted in ascending order, so here's what i do
int_list = [int(each_int.replace(',', '')) for each_int in my_list]
boolean = all([int_list[idx] == int_list[idx-1]*10 for idx in xrange(1, len(int_list))])
my question is will the len() be called for every iteration? what is better practice in such cases?
assign the length to a variable and use that instead of the len() itself in the list comprehension
it doesn't matter, len() is executed only once for all the iterations
if 2. , is it applicable to cases when say, I am iterating through the values/ keys of a dictionary of lists (or maybe just a dictionary)
ex: d_list = [set(value) for value in my_dict.values()]
You can do it as:
my_list = ['10', '100', '1,000', '10,000', '100,000']
int_list = [int(each_int.replace(',', '')) for each_int in my_list]
>>> print all(i*10==j for i,j in zip(int_list, int_list[1:]))
True
This will avoid any unnecessary repetitions of calculations and is also my faster since it is doing comparison by division. I have also replaced all([...]) with all(...) since all can handle generators and will save from having to create a temporary list.

concatenate the values of dictionary into single string or sequence

i have a dictionary called self.__sequences reads like "ID:DNA sequence", and the following is part of that dictionary
{'1111758': ('TTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAA\n', ''),
'1111762': ('AGAGTTTGATCCTGGCTCAGATTGA\n', ''),
'1111763': ('AGAGTTTGATCCTGGCCTT\n', '') }
I want to concatenate the values of the dictionary into one single string or sequence (no \n and no ""), that is, I want something like
"TTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAAAGAGTTTGATCCTGGCTCAGATTGAAGAGTTTGATCCTGGCCTT"
I write the following code, however, it does not give what I want. I guess it is because the value has two elements(DNA sequence and ""). I am struggling improving my code. Can anyone help me to make it work?
def sequence_statistics(self):
total_len=self.__sequences.values()[0]
for i in range(len(self.__sequences)):
total_len += self.__sequences.values()[i]
return total_len
This will iterate over the sorted keys of your sequences, extract the first value of the tuples in the dict and strip whitespaces. Mind that dicts are unordered in Python 2.7:
''.join(d[k][0].strip() for k in sorted(self.__sequences))
>>> d = {'1111758': ('TTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAA\n', ''),
... '1111762': ('AGAGTTTGATCCTGGCTCAGATTGA\n', ''),
... '1111763': ('AGAGTTTGATCCTGGCCTT\n', '') }
>>>
>>> lis = []
>>> for tup in d.values():
... lis.append(tup[0].rstrip('\n'))
...
>>> ''.join(lis)
'AGAGTTTGATCCTGGCTCAGATTGAAGAGTTTGATCCTGGCCTTTTAGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCAGGCCTAA'
>>>
This is a generator that yields the first element of each value, with the "\n" stripped off:
(value[0].strip() for value in self.__sequences.values())
Since you probably want them sorted by keys, it becomes slightly more complicated:
(value[0].strip() for key, value in sorted(self.__sequences.items()))
And to turn that into a single string joined by '' (empty strings) in between, do:
''.join(value[0].strip() for key, value in sorted(self.__sequences.items()))
Try this code instead:
return "".join(v[0].strip() for k, v in self.__sequences.items())

Categories

Resources