How can I improve this heavily nested for-loop? - python

I have a function which I'd like to optimize, if possible. But I cannot easily tell if there's a better way to refactor (and optimize) this...
Suppose,
keys_in_order = ['A', 'B', 'C', 'D', 'E']
key_table = { 'A': {'A1': 'one', 'A2': 'two', 'A3': 'three', 'A4': 'four'},
'B': {'B1': 'one-one', 'B2': 'two-two', 'B3': 'three-three'},
... # mapping for 'C', 'D' here
'E': {'E1': 'one-one', 'E2': 'two-two', 'E3': 'three-three', 'E6': 'six-six'}
}
The purpose is to feed the above two parameters to the function as below:
def generate_all_possible_key_combinations(keys_in_order, key_table):
first_key = keys_in_order[0]
second_key = keys_in_order[1]
third_key = keys_in_order[2]
fourth_key = keys_in_order[3]
fifth_key = keys_in_order[4]
table_out = [['Demo Group', first_key, second_key, third_key, fourth_key, fifth_key]] # just the header row so that we can write to a CSV file later
for k1, v1 in key_table[first_key].items():
for k2, v2 in key_table[second_key].items():
for k3, v3 in key_table[third_key].items():
for k4, v4 in key_table[fourth_key].items():
for k5, v5 in key_table[fifth_key].items():
demo_gp = k1 + k2 + k3 + k4 + k5
table_out.append([demo_gp, v1, v2, v3, v4, v5])
return table_out
so that the goal is to have a table with all possible combination of sub-keys (that is, 'A1B1C1D1E1', 'A1B1C1D1E2', 'A1B1C1D1E3', etc.) along with their corresponding values in key_table.
To me, the current code with five heavily nested loop through the dict key_table is ugly, not to mention it being inefficient computation-wise. Is there a way to improve this? I hope folks from code_review might be able to shed some lights on how I might go about it. Thank you!

I have implemented with an alternative method. Consider as key_table as your main dictionary.
My logic is
From this i will get all the possible sub keys from the main dict.
In [1]: [i.keys() for i in key_table.values()]
Out[1]:
[['A1', 'A3', 'A2', 'A4'],
['C3', 'C2', 'C1'],
['B1', 'B2', 'B3'],
['E6', 'E1', 'E3', 'E2'],
['D2', 'D3', 'D1']]
Then i made this list of list as a single list.
In [2]: print [item for sublist in [i.keys() for i in key_table.values()] for item in sublist]
['A1', 'A3', 'A2', 'A4', 'C3', 'C2', 'C1', 'B1', 'B2', 'B3', 'E6', 'E1', 'E3', 'E2', 'D2', 'D3', 'D1']
With using itertools.combinations implemented the combination of all possible values. It have 5 elements so i given that as a hard code method. You can replace that with len([i.keys() for i in key_table.values()]) if you more values. Here provides an example of itertools.combinations. Then you can understand it.
In [83]: for i in itertools.combinations(['A1','B1','C1'],2):
....: print i
....:
('A1', 'B1')
('A1', 'C1')
('B1', 'C1')
Here is the full code with one line implementation.
for item in itertools.combinations([item for sublist in [i.keys() for i in key_table.values()] for item in sublist],5):
print ''.join(item)

Some optimizations:
The various key_table[?].items() could be computed before the nested loop
You could compute partials of demo_gp when they are available: demo_gp12 = k1 + k2, demo_gp123 = demo_gp12 + k3, etc. Similar thing could be done with the array of vs.
As #JohnColeman suggested, itertools would be a good place to look to simplifying it.

Related

Concatenating one dimensional numpyarrays with variable size numpy array in loop

nNumbers = [1,2,3]
baseVariables = ['a','b','c','d','e']
arr = np.empty(0)
for i in nNumbers:
x = np.empty(0)
for v in baseVariables:
x = np.append(x, y['result'][i][v])
print(x)
arr = np.concatenate((arr, x))
I have one Json input stored in y. need to filter some variables out of that json format. the above code works in that it gives me the output in an array, but it is only in a one dimensional array. I want the output in a two dimensional array like:
[['q','qr','qe','qw','etc']['','','','','']['','','','','']]
I have tried various different ways but am not able to figure it out. Any feedback on how to get it to the desired output format would be greatly appreciated.
A correct basic Python way of making a nested list of strings:
In [57]: nNumbers = [1,2,3]
...: baseVariables = ['a','b','c','d','e']
In [58]: alist = []
...: for i in nNumbers:
...: blist = []
...: for v in baseVariables:
...: blist.append(v+str(i))
...: alist.append(blist)
...:
In [59]: alist
Out[59]:
[['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']]
That can be turned into an array if necessary - though numpy doesn't provide much added utility for strings:
In [60]: np.array(alist)
Out[60]:
array([['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']], dtype='<U2')
Or in a compact list comprehension form:
In [61]: [[v+str(i) for v in baseVariables] for i in nNumbers]
Out[61]:
[['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']]
You are starting with lists! And making strings! And selecting items from a JSON, with y['result'][i][v]. None of that benefits from using numpy, especially not the repeated use of np.append and np.concatenate.
Could you provide an example of JSON? It sounds like you basically want to
Filter the JSON
Flatten the JSON
Depending on what your output example means, you might want to not filter, but replace certain values with empty values, is that correct?
Please note that Pandas has very powerfull out-of-the-box options to handle, and in particular, flatten JSONs. https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-json-reader. An approach could be to first load in Pandas and filter it from there. Flattening a JSON can also be done by iterating over it like so:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
I got this code from: https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10. The author explains some challenges of flattening JSON. Of course, you can put some if statement into the function for your filtering need. I hope this can get you started at least!

Sort a sublist of elements in a list leaving the rest in place

Say I have a sorted list of strings as in:
['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
Now I want to sort based on the trailing numerical value for the Bs - so I have:
['A', 'B' , 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
One possible algorithm would be to hash up a regex like regex = re.compile(ur'(B)(\d*)), find the indices of the first and last B, slice the list, sort the slice using the regex's second group, then insert the sorted slice. However this seems too much of a hassle. Is there a way to write a key function that "leaves the item in place" if it does not match the regex and only
sorts the items (sublists) that match ?
Note: the above is just an example; I don't necessarily know the pattern (or I may want to also sort C's, or any string that has a trailing number in there). Ideally, I'm looking for an approach to the general problem of sorting only subsequences which match a given criterion (or failing that, just those that meet the specific criterion of a given prefix followed by a string of digits).
In the simple case where you just want to sort trailing digits numerically and their non-digit prefixes alphabetically, you need a key function which splits each item into non-digit and digit components as follows:
'AB123' -> ['AB', 123]
'CD' -> ['CD']
'456' -> ['', 456]
Note: In the last case, the empty string '' is not strictly necessary in CPython 2.x, as integers sort before strings – but that's an implementation detail rather than a guarantee of the language, and in Python 3.x it is necessary, because strings and integers can't be compared at all.
You can build such a key function using a list comprehension and re.split():
import re
def trailing_digits(x):
return [
int(g) if g.isdigit() else g
for g in re.split(r'(\d+)$', x)
]
Here it is in action:
>>> s1 = ['11', '2', 'A', 'B', 'B1', 'B11', 'B2', 'B21', 'C', 'C11', 'C2']
>>> sorted(s1, key=trailing_digits)
['2', '11', 'A', 'B', 'B1', 'B2', 'B11', 'B21', 'C', 'C2', 'C11']
Once you add the restriction that only strings with a particular prefix or prefixes have their trailing digits sorted numerically, things get a little more complicated.
The following function builds and returns a key function which fulfils the requirement:
def prefixed_digits(*prefixes):
disjunction = '|'.join('^' + re.escape(p) for p in prefixes)
pattern = re.compile(r'(?<=%s)(\d+)$' % disjunction)
def key(x):
return [
int(g) if g.isdigit() else g
for g in re.split(pattern, x)
]
return key
The main difference here is that a precompiled regex is created (containing a lookbehind constructed from the supplied prefix or prefixes), and a key function using that regex is returned.
Here are some usage examples:
>>> s2 = ['A', 'B', 'B11', 'B2', 'B21', 'C', 'C11', 'C2', 'D12', 'D2']
>>> sorted(s2, key=prefixed_digits('B'))
['A', 'B', 'B2', 'B11', 'B21', 'C', 'C11', 'C2', 'D12', 'D2']
>>> sorted(s2, key=prefixed_digits('B', 'C'))
['A', 'B', 'B2', 'B11', 'B21', 'C', 'C2', 'C11', 'D12', 'D2']
>>> sorted(s2, key=prefixed_digits('B', 'D'))
['A', 'B', 'B2', 'B11', 'B21', 'C', 'C11', 'C2', 'D2', 'D12']
If called with no arguments, prefixed_digits() returns a key function which behaves identically to trailing_digits:
>>> sorted(s1, key=prefixed_digits())
['2', '11', 'A', 'B', 'B1', 'B2', 'B11', 'B21', 'C', 'C2', 'C11']
Caveats:
Due to a restriction in Python's re module regarding lookbhehind syntax, multiple prefixes must have the same length.
In Python 2.x, strings which are purely numeric will be sorted numerically regardless of which prefixes are supplied to prefixed_digits(). In Python 3, they'll cause an exception (except when called with no arguments, or in the special case of key=prefixed_digits('') – which will sort purely numeric strings numerically, and prefixed strings alphabetically). Fixing that may be possible with a significantly more complex regex, but I gave up trying after about twenty minutes.
If I understand correctly, your ultimate goal is to sort sub-sequences,
while leaving alone the items that are not part of the sub-sequences.
In your example, the sub-sequence is defined as items starting with "B".
Your example list happens to contain items in lexicographic order,
which is a bit too convenient,
and can be distracting from finding a generalized solution.
Let's mix things up a little by using a different example list.
How about:
['X', 'B2', 'B11', 'B22', 'B', 'B1', 'B21', 'C', 'Q1', 'C11', 'C2']
Here, the items are no longer ordered (at least I tried to organize them so that they are not), neither the ones starting with "B", nor the others.
However, the items starting with "B" still form a single contiguous sub-sequence, occupying the single range 1-6 rather than split ranges for example as 0-3 and 6-7.
This again might be distracting, I will address that aspect further down.
If I understand your ultimate goal correctly, you would like this list to get sorted like this:
['X', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'Q1', 'C11', 'C2']
To make this work, we need a key function that will return a tuple, such that:
First value:
If the item doesn't start with "B", then the index in the original list (or a value in the same order)
If the item starts with "B", then the index of the last item that didn't start with "B"
Second value:
If the item doesn't start with "B", then omit this
If the item starts with "B", then the numeric value
This can be implemented like this, and with some doctests:
def order_sublist(items):
"""
>>> order_sublist(['A', 'B2', 'B11', 'B22', 'B', 'B1', 'B21', 'C', 'C1', 'C11', 'C2'])
['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
>>> order_sublist(['X', 'B2', 'B11', 'B22', 'B', 'B1', 'B21', 'C', 'Q1', 'C11', 'C2'])
['X', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'Q1', 'C11', 'C2']
"""
def key():
ord1 = [0]
def inner(item):
if not item.startswith('B'):
ord1[0] += 1
return ord1[0],
return ord1[0], int(item[1:] or 0)
return inner
return sorted(items, key=key())
In this implementation, the items get sorted by these keys:
[(1,), (1, 2), (1, 11), (1, 22), (1, 0), (1, 1), (1, 21), (2,), (3,), (4,), (5,)]
The items not starting by "B" keep their order, thanks to the first value in the key tuple, and the items starting with "B" get sorted thanks to the second value of the key tuple.
This implementation contains a few tricks that are worth explaining:
The key function returns a tuple of 1 or 2 elements, as explained earlier: the non-B items have one value, the B items have two.
The first value of the tuple is not exactly the original index, but it's good enough. The value before the first B item is 1, all the B items use the same value, and the values after the B get an incremented value every time. Since (1,) < (1, x) < (2,) where x can be anything, these keys will get sorted as we wanted them.
And now on to the "real" tricks :-)
What's up with the ord1 = [0] and ord1[0] += 1 ? This is a technique to change a non-local value in a function. Had I used simply ord1 = 0 and ord1 += 1 would not work, because ord1 is a primitive value defined outside of the function. Without the global keyword it's neither visible nor reassignable. A primitive ord1 value inside the inner function would shadow the outer primitive value. But ord1 being a list, it's visible inside inner, and its content can be modified. Note that cannot be reassigned. If you replaced with ord1[0] += 1 as ord1 = [ord1[0] + 1] which would result in the same value, it would not work, as in that case ord1 at the left side is a local variable, shadowing the ord1 in the outer scope, and not modifying its value.
What's up with the key and inner functions? I thought it would be neat if the key function we will pass to sorted will be reusable. This simpler version works too:
def order_sublist(items):
ord1 = [0]
def inner(item):
if not item.startswith('B'):
ord1[0] += 1
return ord1[0],
return ord1[0], int(item[1:] or 0)
return sorted(items, key=inner)
The important difference is that if you wanted to use inner twice, both uses would share the same ord1 list. Which can be acceptable, as longs as the integer value ord1[0] doesn't overflow during the use. In this case you won't use the function twice, and even if you did probably there wouldn't be a risk of integer overflow, but as a matter of principle, it's nice to make the function clean and reusable by wrapping it as I did in my initial proposal. What the key function does is simply initialize ord1 = [0] in its scope, define the inner function, and return the inner function. This way ord1 is effectively private, thanks to the closure. Every time you call key(), it returns a function that has its private, fresh ord1 value.
Last but not least, notice the doctests: the """ ... """ comment is more than just documentation, it's executable tests. The >>> lines are code to execute in a Python shell, and the following lines are the expected output. If you have this program in a file called script.py, you can run the tests with python -m doctest script.py. When all tests pass, you get no output. When a test fails, you get a nice report. It's a great way to verify that your program works, through demonstrated examples. You can have multiple test cases, separated by blank lines, to cover interesting corner cases. In this example there are two test cases, with your original sorted input, and the modified unsorted input.
However, as #zero-piraeus has made an interesting remark:
I can see that your solution relies on sorted() scanning the list left-to-right (which is reasonable – I can't imagine TimSort is going to be replaced or radically changed any time soon – but not guaranteed by Python AFAIK, and there are sorting algorithms that don't work like that).
I tried to be self-critical and doubt that the scanning from left to right is reasonable.
But I think it is.
After all, the sorting really happens based on the keys,
not the actual values.
I think most likely Python does something like this:
Take a list of the key values with [key(value) for value in input], visiting the values from left to right.
zip the list of keys with the original items
Apply whatever sorting algorithm on the zipped list, comparing items by the first value of the zip, and swapping items
At the end, return the sorted items with return [t[1] for t in zipped]
When building the list of key values,
it could work on multiple threads,
let's say two, the first thread one populating the first half and the second thread populating the second half in parallel.
That would mess up the ord1[0] += 1 trick.
But I doubt it does this kind of optimization,
as it simply seems overkill.
But to eliminate any shadow of doubt,
we can follow this alternative implementation strategy ourselves,
though the solution becomes a bit more verbose:
def order_sublist(items):
"""
>>> order_sublist(['A', 'B2', 'B11', 'B22', 'B', 'B1', 'B21', 'C', 'C1', 'C11', 'C2'])
['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
>>> order_sublist(['X', 'B2', 'B11', 'B22', 'B', 'B1', 'B21', 'C', 'Q1', 'C11', 'C2'])
['X', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'Q1', 'C11', 'C2']
"""
ord1 = 0
zipped = []
for item in items:
if not item.startswith('B'):
ord1 += 1
zipped.append((ord1, item))
def key(item):
if not item[1].startswith('B'):
return item[0],
return item[0], int(item[1][1:] or 0)
return [v for _, v in sorted(zipped, key=key)]
Do note that thanks to the doctests,
we have an easy way to verify that the alternative implementation still works as before.
What if you wanted this example list:
['X', 'B', 'B1', 'B11', 'B2', 'B22', 'C', 'Q1', 'C11', 'C2', 'B21']
To get sorted like this:
['X', 'B', 'B1', 'B2', 'B11', 'B21', 'C', 'Q1', 'C11', 'C2', 'B22']
That is, the items starting with "B" sorted by their numeric value,
even when they don't form a contiguous sub-sequence?
That won't be possible with a magical key function.
It certainly is possible though, with some more legwork.
You could:
Create a list with the original indexes of the items starting with "B"
Create a list with the items starting with "B" and sort it with whatever way you like
Write back the content of the sorted list at the original indexes
If you need help with this last implementation, let me know.
Most of the answers focused on the B's while I needed a more general solution as noted. Here's one:
def _order_by_number(items):
regex = re.compile('(.*?)(\d*)$') # pass as an argument for generality
keys = {k: regex.match(k) for k in items}
keys = {k: (v.groups()[0], int(v.groups()[1] or 0))
for k, v in keys.items()}
items.sort(key=keys.__getitem__)
I am still looking for a magic key however that would leave stuff in place
You can use the natsort module:
>>> from natsort import natsorted
>>>
>>> a = ['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
>>> natsorted(a)
['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C2', 'C11']
If the elements that are to be sorted are all adjacent to each other in the list:
You can use cmp in the sorted()-function instead of key:
s1=['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
def compare(a,b):
if (a[0],b[0])==('B','B'): #change to whichever condition you'd like
inta=int(a[1:] or 0)
intb=int(b[1:] or 0)
return cmp(inta,intb) #change to whichever mode of comparison you'd like
else:
return 0 #if one of a, b doesn't fulfill the condition, do nothing
sorted(s1,cmp=compare)
This assumes transitivity for the comparator, which is not true for a more general case. This is also much slower than using key, but the advantage is that it can take context into account (to a small extent).
If the elements that are to be sorted are not all adjacent to each other in the list:
You could generalise the comparison-type sorting algorithms by checking every other element in the list, and not just neighbours:
s1=['11', '2', 'A', 'B', 'B11', 'B21', 'B1', 'B2', 'C', 'C11', 'C2', 'B09','C8','B19']
def cond1(a): #change this to whichever condition you'd like
return a[0]=='B'
def comparison(a,b): #change this to whichever type of comparison you'd like to make
inta=int(a[1:] or 0)
intb=int(b[1:] or 0)
return cmp(inta,intb)
def n2CompareSort(alist,condition,comparison):
for i in xrange(len(alist)):
for j in xrange(i):
if condition(alist[i]) and condition(alist[j]):
if comparison(alist[i],alist[j])==-1:
alist[i], alist[j] = alist[j], alist[i] #in-place swap
n2CompareSort(s1,cond1,comparison)
I don't think that any of this is less of a hassle than making a separate list/tuple, but it is "in-place" and leaves elements that don't fulfill our condition untouched.
You can use the following key function. It will return a tuple of the form (letter, number) if there is a number, or of the form (letter,) if there is no number. This works since ('A',) < ('A', 1).
import re
a = ['A', 'B' ,'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
regex = re.compile(r'(\d+)')
def order(e):
num = regex.findall(e)
if num:
num = int(num[0])
return e[0], num
return e,
print(sorted(a, key=order))
>> ['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C2', 'C11']
If I'm understanding your question clear, you are trying to sort an array by two attributes; the alphabet and the trailing 'number'.
You could just do something like
data = ['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
data.sort(key=lambda elem: (elem[0], int(elem[1:]))
but since this would throw an exception for elements without a number trailing them, we can go ahead and just make a function (we shouldn't be using lambda anyways!)
def sortKey(elem):
try:
attribute = (elem[0], int(elem[1:]))
except:
attribute = (elem[0], 0)
return attribute
With this sorting key function made, we can sort the element in place by
data.sort(key=sortKey)
Also, you could just go ahead and adjust the sortKey function to give priority to certain alphabets if you wanted to.
To answer precisely what you describe you can do this :
l = ['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2', 'D']
def custom_sort(data, c):
s = next(i for i, x in enumerate(data) if x.startswith(c))
e = next((i for i, x in enumerate(data) if not x.startswith(c) and i > s), -1)
return data[:s] + sorted(data[s:e], key=lambda d: int(d[1:] or -1)) + data[e:]
print(custom_sort(l, "B"))
if you what an complete sort you can simply do this (as #Mike JS Choi answered but simplier) :
output = sorted(l, key=lambda elem: (elem[0], int(elem[1:] or -1)))
You can use ord() to transform for exemple 'B11' in numerical value:
cells = ['B11', 'C1', 'A', 'B1', 'B2', 'B21', 'B22', 'C11', 'C2', 'B']
conv_cells = []
## Transform expression in numerical value.
for x, cell in enumerate(cells):
val = ord(cell[0]) * (ord(cell[0]) - 65) ## Add weight to ensure respect order.
if len(cell) > 1:
val += int(cell[1:])
conv_cells.append((val, x)) ## List of tuple (num_val, index).
## Display result.
for x in sorted(conv_cells):
print(str(cells[x[1]]) + ' - ' + str(x[0]))
If you wish to sort with different rules for different subgroups you may use tuples as sorting keys. In this case items would be grouped and sorted layer by layer: first by first tuple item, next in each subgroup by second tuple item and so on. This allows us to have different sorting rules in different subgroups. The only limit - items should be comparable within each group. For example, you cannot have int and str type keys in the same subgroup, but you can have them in different subgroups.
Lets try to apply it to the task. We will prepare tuples with elements types (str, int) for B elements, and tuples with (str, str) for all others.
def sorter(elem):
letter, num = elem[0], elem[1:]
if letter == 'B':
return letter, int(num or 0) # hack - if we've got `''` as num, replace it with `0`
else:
return letter, num
data = ['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
sorted(data, key=sorter)
# returns
['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
UPDATE
If you prefer it in one line:
data = ['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
sorted(data, key=lambda elem: (elem[0], int(elem[1:] or 0) if elem[0]=='B' else elem[:1]
# result
['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C2', 'C11']
Anyway these key functions are quite simple, so you can adopt them to real needs.
import numpy as np
def sort_with_prefix(list, prefix):
alist = np.array(list)
ix = np.where([l.startswith(prefix) for l in list])
alist[ix] = [prefix + str(n or '')
for n in np.sort([int(l.split(prefix)[-1] or 0)
for l in alist[ix]])]
return alist.tolist()
For example:
l = ['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C2', 'C11']
print(sort_with_prefix(l, 'B'))
>> ['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
Using just key and the precondition that the sequence is already 'sorted':
import re
s = ['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
def subgroup_ordinate(element):
# Split the sequence element values into groups and ordinal values.
# use a simple regex and int() in this case
m = re.search('(B)(.+)', element)
if m:
subgroup = m.group(1)
ordinate = int(m.group(2))
else:
subgroup = element
ordinate = None
return (subgroup, ordinate)
print sorted(s, key=subgroup_ordinate)
['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C2', 'C11']
The subgroup_ordinate() function does two things: identifies groups to be sorted and also determines the ordinal number within the groups. This example uses regular expression but the function could be arbitrarily complex. For example we can change it to ur'(B|C)(.+)' and sort both B and C sequences .
['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C2', 'C11']
Reading the bounty question carefully I note the requirement 'sorts some values while leaving others "in place"'. Defining the comparison function to return 0 for elements that are not in subgroups would leave these elements where they were in the sequence.
s2 = ['X', 'B', 'B1', 'B2', 'B11', 'B21', 'A', 'C', 'C1', 'C2', 'C11']
def compare((_a,a),(_b,b)):
return 0 if a is None or b is None else cmp(a,b)
print sorted(s, compare, subgroup_ordinate)
['X', 'B', 'B1', 'B2', 'B11', 'B21', 'A', 'C', 'C1', 'C2', 'C11']
import re
from collections import OrderedDict
a = ['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
dict = OrderedDict()
def get_str(item):
_str = list(map(str, re.findall(r"[A-Za-z]", item)))
return _str
def get_digit(item):
_digit = list(map(int, re.findall(r"\d+", item)))
return _digit
for item in a:
_str = get_str(item)
dict[_str[0]] = sorted([get_digit(dig) for dig in a if _str[0] in dig])
nested_result = [[("{0}{1}".format(k,v[0]) if v else k) for v in dict[k]] for k in dict.keys()]
print (nested_result)
# >>> [['A'], ['B', 'B1', 'B2', 'B11', 'B21', 'B22'], ['C', 'C1', 'C2', 'C11']]
result = []
for k in dict.keys():
for v in dict[k]:
result.append("{0}{1}".format(k,v[0]) if v else k)
print (result)
# >>> ['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C2', 'C11']
If you want to sort an arbitrary subset of elements while leaving other elements in place, it can be useful to design a view over the original list.
The idea of a view in general is that it's like a lens over the original list, but modifying it will manipulate the underlying original list.
Consider this helper class:
class SubList:
def __init__(self, items, predicate):
self.items = items
self.indexes = [i for i in range(len(items)) if predicate(items[i])]
#property
def values(self):
return [self.items[i] for i in self.indexes]
def sort(self, key):
for i, v in zip(self.indexes, sorted(self.values, key=key)):
self.items[i] = v
The constructor saves the original list in self.items, and the original indexes in self.indexes, as determined by predicate. In your examples, the predicate function can be this:
def predicate(item):
return item.startswith('B')
Then, the values property is the lens over the original list,
returning a list of values picked from the original list by the original indexes.
Finally, the sort function uses self.values to sort,
and then modifies the original list.
Consider this demo with doctests:
def demo(values):
"""
>>> demo(['X', 'b3', 'a', 'b1', 'b2'])
['X', 'b1', 'a', 'b2', 'b3']
"""
def predicate(item):
return item.startswith('b')
sub = SubList(values, predicate)
def key(item):
return int(item[1:])
sub.sort(key)
return values
Notice how SubList is used only as a tool through which to manipulate the input values. After the sub.sort call, values is modified, with elements to sort selected by the predicate function, and sorted according to the key function, and all other elements never moved.
Using this SubList helper with appropriate predicate and key functions,
you can sort arbitrary selection of elements of a list.
def compound_sort(input_list, natural_sort_prefixes=()):
padding = '{:0>%s}' % len(max(input_list, key=len))
return sorted(
input_list,
key = lambda li: \
''.join(
[li for c in '_' if not li.startswith(natural_sort_prefixes)] or
[c for c in li if not c.isdigit()] + \
[c for c in padding.format(li) if c.isdigit()]
)
)
This sort method receives:
input_list: The list to be sorted,
natural_sort_prefixes: A string or a tuple of strings.
List items targeted by the natural_sort_prefixes will be sorted naturally. Items not matching those prefixes will be sorted lexicographically.
This method assumes that the list items are structured as one or more non-numerical characters followed by one or more digits.
It should be more performant than solutions using regex, and doesn't depend on external libraries.
You can use it like:
print compound_sort(['A', 'B' , 'B11', 'B1', 'B2', 'C11', 'C2'], natural_sort_prefixes=("A","B"))
# ['A', 'B', 'B1', 'B2', 'B11', 'C11', 'C2']

Create a table from a list of tuples in Python 3

I have homework where I have to take a list containing tuples and print out a table. For example, the list might look like this:
data = [('Item1', a1, b1, c1, d1, e1, f1),
('Item2', a2, b2, c2, d2, e2, f2),
('Item3', a3, b3, c3, d3, e3, f3)]
I would have to print out this:
Item1 Item2 Item3
DataA: a1 a2 a3
DataB: b1 b2 b3
DataC: c1 c2 c3
DataD: d1 d2 d3
DataE: e1 e2 e3
DataF: f1 f2 f3
I have initialised a list:
data_headings = ['','DataA:','DataB','DataC:','DataD:','DataE':,'DataF:']
My teacher has also given us the option to use a function he created:
display_with_padding(str):
print("{0: <15}".format(s), end = '')
Some guidance with how to do this will be much appreciated. I've been playing with this for the past day and I still am unable to work it out.
def display_with_padding(s):
print("{0: <15}".format(s), end='')
def print_row(iterable):
[display_with_padding(x) for x in iterable]
print()
def main():
data = [
('Item1', 'a1', 'b1', 'c1', 'd1', 'e1', 'f1'),
('Item2', 'a2', 'b2', 'c2', 'd2', 'e2', 'f2'),
('Item3', 'a3', 'b3', 'c3', 'd3', 'e3', 'f3')
]
col_headers = [''] + [x[0] for x in data] # Build headers
print_row(col_headers)
labels = ['DataA:','DataB:','DataC:','DataD:','DataE:','DataF:']
# Build each row
rows = []
for row_num, label in enumerate(labels, start=1):
content = [label]
for col in data:
content.append(col[row_num])
rows.append(content)
for row in rows:
print_row(row)
if __name__ == '__main__':
main()
The trick is to iterate over each element of the table in some order. Rows first or columns first?
Because you're printing line by line, you must iterate over the rows first, and look up the corresponding value in each column. Observe that data is a list of columns, while the second list provides a label for each row. I've renamed it to row_labels in the following demonstration.
def display_with_padding(s):
print("{0: <15}".format(s), end = '')
data = [('Item1', 'a1', 'b1', 'c1', 'd1', 'e1', 'f1'),
('Item2', 'a2', 'b2', 'c2', 'd2', 'e2', 'f2'),
('Item3', 'a3', 'b3', 'c3', 'd3', 'e3', 'f3')]
row_labels = ['', 'DataA:', 'DataB:', 'DataC:', 'DataD:', 'DataE:', 'DataF:']
for row_index, label in enumerate(row_labels):
display_with_padding(label)
for column in data:
display_with_padding(column[row_index])
print()
Note the use of enumerate() to get the index and value at the same time as we iterate over row_labels.
Oh, and I fixed a few bugs in the code you posted. There were a couple of problems with the function definition (missing def and bad parameter name) plus a syntax error in the row labels. I renamed the row labels, too.

Using a Python list comprehension a bit like a zip

Ok, so I'm really bad at writing Python list comprehensions with more than one "for," but I want to get better at it. I want to know for sure whether or not the line
>>> [S[j]+str(i) for i in range(1,11) for j in range(3) for S in "ABCD"]
can be amended to return something like ["A1","B1","C1","D1","A2","B2","C2","D2"...(etc.)]
and if not, if there is a list comprehension that can return the same list, namely, a list of strings of all of the combinations of "ABCD" and the numbers from 1 to 10.
You have too many loops there. You don't need j at all.
This does the trick:
[S+str(i) for i in range(1,11) for S in "ABCD"]
The way I like to see more than one for loop in list comprehension is like the nested loop. Treat the next for loop as the loop nested in the first one and that will make it whole lot easier. To add to Daniel's answer:
[S+str(i) for i in range(1,11) for S in "ABCD"]
is nothing more than:
new_loop=[]
for i in range (1,11):
for S in "ABCD:
new_loop.append(S+str(i))
You may use itertools.product like this
import itertools
print [item[1] + str(item[0]) for item in itertools.product(range(1, 11),"ABCD")]
Output
['A1', 'B1', 'C1', 'D1', 'A2', 'B2', 'C2', 'D2', 'A3', 'B3', 'C3', 'D3', 'A4',
'B4', 'C4', 'D4', 'A5', 'B5', 'C5', 'D5', 'A6', 'B6', 'C6', 'D6', 'A7', 'B7',
'C7', 'D7', 'A8', 'B8', 'C8', 'D8', 'A9', 'B9', 'C9', 'D9', 'A10', 'B10', 'C10',
'D10']
EVERY time you think in combining all the elements if a iterable with all the elements of another iterable, think itertools.product. It is a cartesian product of two sets (or lists).
I've found a solution that is slightly more fast than the ones presented here until now. And more than 2x fast than #daniel solution (Although his solution looks far more elegant):
import itertools
[x + y for (x,y) in (itertools.product('ABCD', map(str,range(1,5))))]
The difference here is that I casted the int to strings using map. Applying functions over vectors is usually faster than applying them on individual items.
And a general tip when dealing with complex comprehensions:
When you have lots of for and lots of conditionals inside your comprehension, break it into several lines, like this:
[S[j]+str(i) for i in range(1,11)
for j in range(3)
for S in "ABCD"]
In this case the change in easyness to read wasn't so big, but, when you have lots of conditionals and lots of fors, it makes a big diference. It's exactly like writing for loops and if statements nested, but without the ":" and the identation.
See the code using regular fors:
ans = []
for i in range(1,11):
for j in range(3):
for S in "ABCD":
ans.append(S[j] + str(i))
Almost the same thing :)
Why don't use itertools.product?
>>> import itertools
>>> [ i[0] + str(i[1]) for i in itertools.product('ABCD', range(1,5))]
['A1', 'A2', 'A3', 'A4', 'B1', 'B2', 'B3', 'B4', 'C1', 'C2', 'C3', 'C4', 'D1', 'D2', 'D3', 'D4']

Sorting dictionary keys based on their values

I have a python dictionary setup like so
mydict = { 'a1': ['g',6],
'a2': ['e',2],
'a3': ['h',3],
'a4': ['s',2],
'a5': ['j',9],
'a6': ['y',7] }
I need to write a function which returns the ordered keys in a list, depending on which column your sorting on so for example if we're sorting on mydict[key][1] (ascending)
I should receive a list back like so
['a2', 'a4', 'a3', 'a1', 'a6', 'a5']
It mostly works, apart from when you have columns of the same value for multiple keys, eg. 'a2': ['e',2] and 'a4': ['s',2]. In this instance it returns the list like so
['a4', 'a4', 'a3', 'a1', 'a6', 'a5']
Here's the function I've defined
def itlist(table_dict,column_nb,order="A"):
try:
keys = table_dict.keys()
values = [i[column_nb-1] for i in table_dict.values()]
combo = zip(values,keys)
valkeys = dict(combo)
sortedCols = sorted(values) if order=="A" else sorted(values,reverse=True)
sortedKeys = [valkeys[i] for i in sortedCols]
except (KeyError, IndexError), e:
pass
return sortedKeys
And if I want to sort on the numbers column for example it is called like so
sortedkeysasc = itmethods.itlist(table,2)
So any suggestions?
Paul
Wouldn't it be much easier to use
sorted(d, key=lambda k: d[k][1])
(with d being the dictionary)?
>>> L = sorted(d.items(), key=lambda (k, v): v[1])
>>> L
[('a2', ['e', 2]), ('a4', ['s', 2]), ('a3', ['h', 3]), ('a1', ['g', 6]), ('a6', ['y', 7]), ('a5', ['j', 9])]
>>> map(lambda (k,v): k, L)
['a2', 'a4', 'a3', 'a1', 'a6', 'a5']
Here you sort the dictionary items (key-value pairs) using a key - callable which establishes a total order on the items.
Then, you just filter out needed values using a map with a lambda which just selects the key. So you get the needed list of keys.
EDIT: see this answer for a much better solution.
Although there are multiple working answers above, a slight variation / combination of them is the most pythonic to me:
[k for (k,v) in sorted(mydict.items(), key=lambda (k, v): v[1])]
>>> mydict = { 'a1': ['g',6],
... 'a2': ['e',2],
... 'a3': ['h',3],
... 'a4': ['s',2],
... 'a5': ['j',9],
... 'a6': ['y',7] }
>>> sorted(mydict, key=lambda k:mydict[k][1])
['a2', 'a4', 'a3', 'a1', 'a6', 'a5']
>>> sorted(mydict, key=lambda k:mydict[k][0])
['a2', 'a1', 'a3', 'a5', 'a4', 'a6']
def itlist(table_dict, col, desc=False):
return [key for (key,val) in
sorted(
table_dict.iteritems(),
key=lambda x:x[1][col-1],
reverese=desc,
)
]

Categories

Resources