Create list of lists from list based on criteria

Create list of lists from list based on criteria - python

I have a list of strings that I am trying to convert into a list of lists based on when a specific character appears in the list. Below is an example:
I am starting with the following list:
lst = ['ab', 'c1', 'cd', 'd2', 'a1', 'b1', 'c1', 'ax', 'by', 'cz', 'dzz']
I want to convert lst into a list of lists where each list begins where there is a string that starts with "a" and ends one element before the next string that starts with "a". The result should look like this:
new_lst = [['ab', 'c1', 'cd', 'd2'], ['a1', 'b1', 'c1'], ['ax', 'by', 'cz', 'dzz']]
What I have tried was to find the index of all elements that begin with "a", and I do so with the following code indices = [idx for idx, x in enumerate(lst) if x.startswith('a')]. This got me the position of each string that matched that criteria. This yielded [0, 4, 7]
Then I looked into splitting the list using the ranges created from the indices. So split at ranges (0,3), (4,6), and (7,10). I've been at it for hours and I can't figure out how to do this dynamically. Couldn't find any solutions online either. I was wondering if anyone could help me with this. Or perhaps my approach wasn't the most ideal from the start.

import numpy as np
lst = ["ab", "c1", "cd", "d2", "a1", "b1", "c1", "ax", "by", "cz", "dzz"]
indices = [idx for idx, x in enumerate(lst) if x.startswith("a")]
print([each_split.tolist() for each_split in np.split(lst, indices) if len(each_split)])

Numpy does the job but your approach was also good ! Moreover, it could be interesting to see your code and work on it, rather than giving you the solution !
Like you said, you just have to iterate through your indices list and create ranges. To do this, consider to add the end of the list ! :
max_idx = len(lst)
append(max_idx)
print(idx)
>> [0, 4, 7, 11]
Then, you just have to construct your ranges :
new_lst = []
# the idea is to only iterate on [0, 4, 7]
# to create then the ranges [(0,4), (4,7), (7,11)]
# in python list[0:4] will take indexes 0,1,2,3 but not 4
for i in range(len(idx)-1):
new_lst.append(lst[idx[i]:idx[i+1]])
print(new_lst)
>> [['ab', 'c1', 'cd', 'd2'], ['a1', 'b1', 'c1'], ['ax', 'by', 'cz', 'dzz']]

Try numpy.split
idx = [lst.index(a) for a in lst if a.lower()[0] == 'a']
new_lst = np.split(lst, idx)
[array([], dtype='<U3'),
array(['ab', 'c1', 'cd', 'd2'], dtype='<U3'),
array(['a1', 'b1', 'c1'], dtype='<U3'),
array(['ax', 'by', 'cz', 'dzz'], dtype='<U3')]

def listoflists(l):
new_lst=[]
a=[]
c=0
for i in l:
if i[0]=='a':
c+=1
if c<2:
a.append(i)
else:
new_lst.append(a)
a=[]
a.append(i)
else:
a.append(i)
new_lst.append(a)
return(new_lst)
l = ['ab', 'c1', 'cd', 'd2', 'a1', 'b1', 'c1', 'ax', 'by', 'cz', 'dzz','an','bw','ey']
print(listoflists(l))

Related

Concatenating one dimensional numpyarrays with variable size numpy array in loop

nNumbers = [1,2,3]
baseVariables = ['a','b','c','d','e']
arr = np.empty(0)
for i in nNumbers:
x = np.empty(0)
for v in baseVariables:
x = np.append(x, y['result'][i][v])
print(x)
arr = np.concatenate((arr, x))
I have one Json input stored in y. need to filter some variables out of that json format. the above code works in that it gives me the output in an array, but it is only in a one dimensional array. I want the output in a two dimensional array like:
[['q','qr','qe','qw','etc']['','','','','']['','','','','']]
I have tried various different ways but am not able to figure it out. Any feedback on how to get it to the desired output format would be greatly appreciated.

A correct basic Python way of making a nested list of strings:
In [57]: nNumbers = [1,2,3]
...: baseVariables = ['a','b','c','d','e']
In [58]: alist = []
...: for i in nNumbers:
...: blist = []
...: for v in baseVariables:
...: blist.append(v+str(i))
...: alist.append(blist)
...:
In [59]: alist
Out[59]:
[['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']]
That can be turned into an array if necessary - though numpy doesn't provide much added utility for strings:
In [60]: np.array(alist)
Out[60]:
array([['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']], dtype='<U2')
Or in a compact list comprehension form:
In [61]: [[v+str(i) for v in baseVariables] for i in nNumbers]
Out[61]:
[['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']]
You are starting with lists! And making strings! And selecting items from a JSON, with y['result'][i][v]. None of that benefits from using numpy, especially not the repeated use of np.append and np.concatenate.

Could you provide an example of JSON? It sounds like you basically want to
Filter the JSON
Flatten the JSON
Depending on what your output example means, you might want to not filter, but replace certain values with empty values, is that correct?
Please note that Pandas has very powerfull out-of-the-box options to handle, and in particular, flatten JSONs. https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-json-reader. An approach could be to first load in Pandas and filter it from there. Flattening a JSON can also be done by iterating over it like so:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
I got this code from: https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10. The author explains some challenges of flattening JSON. Of course, you can put some if statement into the function for your filtering need. I hope this can get you started at least!

indexing multidimensional array in python

I have an array like below and I want to keep only the 3rd element for each row.
Input
array(['a', 'b', 'c'], ['a1', 'b1', 'c1'],........['an', 'bn', 'cn'])
Expected output
array('c', 'c1', ...... 'cn')
I tried it using a for loop but it is taking too long. Is there a faster way to do this? Can someone please help?

Assuming that you mean a list when you mention an array.
This would be the code:
>>> my_list = [['a', 'b', 'c'], ['a1', 'b1', 'c1'], ['an', 'bn', 'cn']]
>>> [x[2] for x in my_list]
['c', 'c1', 'cn']
>>>

The code
[x[2] for x in my_list]
pointed out by Gab is correct, however if you are making use of an numpy array, this will be probably more efficent:
x[:,2]

How to remove empty strings from the end of a list of strings (only at the end)

i have some list of strings. I want to remove empty strings from the end of the list (i.e. each list should end with a non empty element).
input
list1= ['a1','b1','c1','d1','']
list2 = ['a2','','b2','','c2','d2','']
list3 = ['a3','','b3','','','']
list4 = ['','','','','']
output
list1= ['a1','b1','c1','d1']
list2 = ['a2','','b2','','c2','d2']
list3 = ['a3','','b3']
list4 = ['']
if all the elements are empty strings , only one empty string should remain (eg. list4).

You can use a generator comprehension with enumerate and keep the first index starting from the end where there is a non-empty string. By using next we only need to iterate until the first non-empty string is found:
def trim_empty_end(l):
last_ix = next((ix for ix, i in enumerate(l[::-1]) if i), len(l)-1)
return l[:len(l) - last_ix]
trim_empty_end(['a1','b1','c1','d1',''])
# ['a1', 'b1', 'c1', 'd1']
trim_empty_end(['a2','','b2','','c2','d2',''])
# ['a2', '', 'b2', '', 'c2', 'd2']
trim_empty_end(['a3','','b3','','',''])
# ['a3', '', 'b3']
trim_empty_end(['','','','',''])
# ['']

This is one approach using str methods.
Ex:
list1= ['a1','b1','c1','d1','']
list2 = ['a2','','b2','','c2','d2','']
list3 = ['a3','','b3','','','']
list4 = ['','','','','']
data = [list1, list2, list3, list4]
result = ["*#*".join(i).strip("*#* ").split("*#*") for i in data]
print(result)
Output:
[['a1', 'b1', 'c1', 'd1'],
['a2', '', 'b2', '', 'c2', 'd2'],
['a3', '', 'b3'],
['']]

You can use recursion
def remove_empty(l):
if l[-1] != "" or len(l) <= 1:
return l
return remove_empty(l[:-1])
print(remove_empty(list1)) # ['a1', 'b1', 'c1', 'd1']
print(remove_empty(list2)) # ['a2', '', 'b2', '', 'c2', 'd2']
print(remove_empty(list3)) # ['a3', '', 'b3']
print(remove_empty(list4)) # ['']

def trim_empty_strings(l):
while len(l) > 1 and l[-1] == '':
l = l[:-1]
return l
trim_empty_strings(['a','b','', '']
trim_empty_strings(['','',''])

The easiest way (in my opinion)
def remove_empty_at_end(l: list):
i = len(l) - 1
# If you're not sure the items of l are strings, then, you can do while l[i] == ""
while not l[i] and i > 0:
i -= 1
return l[:i + 1]
It's simple and avoids creating countless copies of l

list1 = ['a1','b1','c1','d1','']
list2 = ['a2','','b2','','c2','d2','']
list3 = ['a3','','b3','','','']
list4 = ['','','','','']
data = [list1, list2, list3, list4] -->
data = [['a1','b1','c1','d1',''], ['a2','','b2','','c2','d2',''], ['a3','','b3','','',''], ['','','','','']]
for mysublist in data:
while (mysublist[-1].rstrip() == "" and len(mysublist) > 1):
del mysublist[-1]
For every sublist in data remove the last item if item is empty and if item is not the only* item.
Keep on doing it if there are still empty items at the end of a sublist.
(* the questioner: if all the elements are empty strings, only one empty string should remain)

Sort a sublist of elements in a list leaving the rest in place

Say I have a sorted list of strings as in:
['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
Now I want to sort based on the trailing numerical value for the Bs - so I have:
['A', 'B' , 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
One possible algorithm would be to hash up a regex like regex = re.compile(ur'(B)(\d*)), find the indices of the first and last B, slice the list, sort the slice using the regex's second group, then insert the sorted slice. However this seems too much of a hassle. Is there a way to write a key function that "leaves the item in place" if it does not match the regex and only
sorts the items (sublists) that match ?
Note: the above is just an example; I don't necessarily know the pattern (or I may want to also sort C's, or any string that has a trailing number in there). Ideally, I'm looking for an approach to the general problem of sorting only subsequences which match a given criterion (or failing that, just those that meet the specific criterion of a given prefix followed by a string of digits).

In the simple case where you just want to sort trailing digits numerically and their non-digit prefixes alphabetically, you need a key function which splits each item into non-digit and digit components as follows:
'AB123' -> ['AB', 123]
'CD' -> ['CD']
'456' -> ['', 456]
Note: In the last case, the empty string '' is not strictly necessary in CPython 2.x, as integers sort before strings – but that's an implementation detail rather than a guarantee of the language, and in Python 3.x it is necessary, because strings and integers can't be compared at all.
You can build such a key function using a list comprehension and re.split():
import re
def trailing_digits(x):
return [
int(g) if g.isdigit() else g
for g in re.split(r'(\d+)$', x)
]
Here it is in action:
>>> s1 = ['11', '2', 'A', 'B', 'B1', 'B11', 'B2', 'B21', 'C', 'C11', 'C2']
>>> sorted(s1, key=trailing_digits)
['2', '11', 'A', 'B', 'B1', 'B2', 'B11', 'B21', 'C', 'C2', 'C11']
Once you add the restriction that only strings with a particular prefix or prefixes have their trailing digits sorted numerically, things get a little more complicated.
The following function builds and returns a key function which fulfils the requirement:
def prefixed_digits(*prefixes):
disjunction = '|'.join('^' + re.escape(p) for p in prefixes)
pattern = re.compile(r'(?<=%s)(\d+)$' % disjunction)
def key(x):
return [
int(g) if g.isdigit() else g
for g in re.split(pattern, x)
]
return key
The main difference here is that a precompiled regex is created (containing a lookbehind constructed from the supplied prefix or prefixes), and a key function using that regex is returned.
Here are some usage examples:
>>> s2 = ['A', 'B', 'B11', 'B2', 'B21', 'C', 'C11', 'C2', 'D12', 'D2']
>>> sorted(s2, key=prefixed_digits('B'))
['A', 'B', 'B2', 'B11', 'B21', 'C', 'C11', 'C2', 'D12', 'D2']
>>> sorted(s2, key=prefixed_digits('B', 'C'))
['A', 'B', 'B2', 'B11', 'B21', 'C', 'C2', 'C11', 'D12', 'D2']
>>> sorted(s2, key=prefixed_digits('B', 'D'))
['A', 'B', 'B2', 'B11', 'B21', 'C', 'C11', 'C2', 'D2', 'D12']
If called with no arguments, prefixed_digits() returns a key function which behaves identically to trailing_digits:
>>> sorted(s1, key=prefixed_digits())
['2', '11', 'A', 'B', 'B1', 'B2', 'B11', 'B21', 'C', 'C2', 'C11']
Caveats:
Due to a restriction in Python's re module regarding lookbhehind syntax, multiple prefixes must have the same length.
In Python 2.x, strings which are purely numeric will be sorted numerically regardless of which prefixes are supplied to prefixed_digits(). In Python 3, they'll cause an exception (except when called with no arguments, or in the special case of key=prefixed_digits('') – which will sort purely numeric strings numerically, and prefixed strings alphabetically). Fixing that may be possible with a significantly more complex regex, but I gave up trying after about twenty minutes.

If I understand correctly, your ultimate goal is to sort sub-sequences,
while leaving alone the items that are not part of the sub-sequences.
In your example, the sub-sequence is defined as items starting with "B".
Your example list happens to contain items in lexicographic order,
which is a bit too convenient,
and can be distracting from finding a generalized solution.
Let's mix things up a little by using a different example list.
How about:
['X', 'B2', 'B11', 'B22', 'B', 'B1', 'B21', 'C', 'Q1', 'C11', 'C2']
Here, the items are no longer ordered (at least I tried to organize them so that they are not), neither the ones starting with "B", nor the others.
However, the items starting with "B" still form a single contiguous sub-sequence, occupying the single range 1-6 rather than split ranges for example as 0-3 and 6-7.
This again might be distracting, I will address that aspect further down.
If I understand your ultimate goal correctly, you would like this list to get sorted like this:
['X', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'Q1', 'C11', 'C2']
To make this work, we need a key function that will return a tuple, such that:
First value:
If the item doesn't start with "B", then the index in the original list (or a value in the same order)
If the item starts with "B", then the index of the last item that didn't start with "B"
Second value:
If the item doesn't start with "B", then omit this
If the item starts with "B", then the numeric value
This can be implemented like this, and with some doctests:
def order_sublist(items):
"""
>>> order_sublist(['A', 'B2', 'B11', 'B22', 'B', 'B1', 'B21', 'C', 'C1', 'C11', 'C2'])
['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
>>> order_sublist(['X', 'B2', 'B11', 'B22', 'B', 'B1', 'B21', 'C', 'Q1', 'C11', 'C2'])
['X', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'Q1', 'C11', 'C2']
"""
def key():
ord1 = [0]
def inner(item):
if not item.startswith('B'):
ord1[0] += 1
return ord1[0],
return ord1[0], int(item[1:] or 0)
return inner
return sorted(items, key=key())
In this implementation, the items get sorted by these keys:
[(1,), (1, 2), (1, 11), (1, 22), (1, 0), (1, 1), (1, 21), (2,), (3,), (4,), (5,)]
The items not starting by "B" keep their order, thanks to the first value in the key tuple, and the items starting with "B" get sorted thanks to the second value of the key tuple.
This implementation contains a few tricks that are worth explaining:
The key function returns a tuple of 1 or 2 elements, as explained earlier: the non-B items have one value, the B items have two.
The first value of the tuple is not exactly the original index, but it's good enough. The value before the first B item is 1, all the B items use the same value, and the values after the B get an incremented value every time. Since (1,) < (1, x) < (2,) where x can be anything, these keys will get sorted as we wanted them.
And now on to the "real" tricks :-)
What's up with the ord1 = [0] and ord1[0] += 1 ? This is a technique to change a non-local value in a function. Had I used simply ord1 = 0 and ord1 += 1 would not work, because ord1 is a primitive value defined outside of the function. Without the global keyword it's neither visible nor reassignable. A primitive ord1 value inside the inner function would shadow the outer primitive value. But ord1 being a list, it's visible inside inner, and its content can be modified. Note that cannot be reassigned. If you replaced with ord1[0] += 1 as ord1 = [ord1[0] + 1] which would result in the same value, it would not work, as in that case ord1 at the left side is a local variable, shadowing the ord1 in the outer scope, and not modifying its value.
What's up with the key and inner functions? I thought it would be neat if the key function we will pass to sorted will be reusable. This simpler version works too:
def order_sublist(items):
ord1 = [0]
def inner(item):
if not item.startswith('B'):
ord1[0] += 1
return ord1[0],
return ord1[0], int(item[1:] or 0)
return sorted(items, key=inner)
The important difference is that if you wanted to use inner twice, both uses would share the same ord1 list. Which can be acceptable, as longs as the integer value ord1[0] doesn't overflow during the use. In this case you won't use the function twice, and even if you did probably there wouldn't be a risk of integer overflow, but as a matter of principle, it's nice to make the function clean and reusable by wrapping it as I did in my initial proposal. What the key function does is simply initialize ord1 = [0] in its scope, define the inner function, and return the inner function. This way ord1 is effectively private, thanks to the closure. Every time you call key(), it returns a function that has its private, fresh ord1 value.
Last but not least, notice the doctests: the """ ... """ comment is more than just documentation, it's executable tests. The >>> lines are code to execute in a Python shell, and the following lines are the expected output. If you have this program in a file called script.py, you can run the tests with python -m doctest script.py. When all tests pass, you get no output. When a test fails, you get a nice report. It's a great way to verify that your program works, through demonstrated examples. You can have multiple test cases, separated by blank lines, to cover interesting corner cases. In this example there are two test cases, with your original sorted input, and the modified unsorted input.
However, as #zero-piraeus has made an interesting remark:
I can see that your solution relies on sorted() scanning the list left-to-right (which is reasonable – I can't imagine TimSort is going to be replaced or radically changed any time soon – but not guaranteed by Python AFAIK, and there are sorting algorithms that don't work like that).
I tried to be self-critical and doubt that the scanning from left to right is reasonable.
But I think it is.
After all, the sorting really happens based on the keys,
not the actual values.
I think most likely Python does something like this:
Take a list of the key values with [key(value) for value in input], visiting the values from left to right.
zip the list of keys with the original items
Apply whatever sorting algorithm on the zipped list, comparing items by the first value of the zip, and swapping items
At the end, return the sorted items with return [t[1] for t in zipped]
When building the list of key values,
it could work on multiple threads,
let's say two, the first thread one populating the first half and the second thread populating the second half in parallel.
That would mess up the ord1[0] += 1 trick.
But I doubt it does this kind of optimization,
as it simply seems overkill.
But to eliminate any shadow of doubt,
we can follow this alternative implementation strategy ourselves,
though the solution becomes a bit more verbose:
def order_sublist(items):
"""
>>> order_sublist(['A', 'B2', 'B11', 'B22', 'B', 'B1', 'B21', 'C', 'C1', 'C11', 'C2'])
['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
>>> order_sublist(['X', 'B2', 'B11', 'B22', 'B', 'B1', 'B21', 'C', 'Q1', 'C11', 'C2'])
['X', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'Q1', 'C11', 'C2']
"""
ord1 = 0
zipped = []
for item in items:
if not item.startswith('B'):
ord1 += 1
zipped.append((ord1, item))
def key(item):
if not item[1].startswith('B'):
return item[0],
return item[0], int(item[1][1:] or 0)
return [v for _, v in sorted(zipped, key=key)]
Do note that thanks to the doctests,
we have an easy way to verify that the alternative implementation still works as before.
What if you wanted this example list:
['X', 'B', 'B1', 'B11', 'B2', 'B22', 'C', 'Q1', 'C11', 'C2', 'B21']
To get sorted like this:
['X', 'B', 'B1', 'B2', 'B11', 'B21', 'C', 'Q1', 'C11', 'C2', 'B22']
That is, the items starting with "B" sorted by their numeric value,
even when they don't form a contiguous sub-sequence?
That won't be possible with a magical key function.
It certainly is possible though, with some more legwork.
You could:
Create a list with the original indexes of the items starting with "B"
Create a list with the items starting with "B" and sort it with whatever way you like
Write back the content of the sorted list at the original indexes
If you need help with this last implementation, let me know.

Most of the answers focused on the B's while I needed a more general solution as noted. Here's one:
def _order_by_number(items):
regex = re.compile('(.*?)(\d*)$') # pass as an argument for generality
keys = {k: regex.match(k) for k in items}
keys = {k: (v.groups()[0], int(v.groups()[1] or 0))
for k, v in keys.items()}
items.sort(key=keys.__getitem__)
I am still looking for a magic key however that would leave stuff in place

You can use the natsort module:
>>> from natsort import natsorted
>>>
>>> a = ['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
>>> natsorted(a)
['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C2', 'C11']

If the elements that are to be sorted are all adjacent to each other in the list:
You can use cmp in the sorted()-function instead of key:
s1=['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
def compare(a,b):
if (a[0],b[0])==('B','B'): #change to whichever condition you'd like
inta=int(a[1:] or 0)
intb=int(b[1:] or 0)
return cmp(inta,intb) #change to whichever mode of comparison you'd like
else:
return 0 #if one of a, b doesn't fulfill the condition, do nothing
sorted(s1,cmp=compare)
This assumes transitivity for the comparator, which is not true for a more general case. This is also much slower than using key, but the advantage is that it can take context into account (to a small extent).
If the elements that are to be sorted are not all adjacent to each other in the list:
You could generalise the comparison-type sorting algorithms by checking every other element in the list, and not just neighbours:
s1=['11', '2', 'A', 'B', 'B11', 'B21', 'B1', 'B2', 'C', 'C11', 'C2', 'B09','C8','B19']
def cond1(a): #change this to whichever condition you'd like
return a[0]=='B'
def comparison(a,b): #change this to whichever type of comparison you'd like to make
inta=int(a[1:] or 0)
intb=int(b[1:] or 0)
return cmp(inta,intb)
def n2CompareSort(alist,condition,comparison):
for i in xrange(len(alist)):
for j in xrange(i):
if condition(alist[i]) and condition(alist[j]):
if comparison(alist[i],alist[j])==-1:
alist[i], alist[j] = alist[j], alist[i] #in-place swap
n2CompareSort(s1,cond1,comparison)
I don't think that any of this is less of a hassle than making a separate list/tuple, but it is "in-place" and leaves elements that don't fulfill our condition untouched.

You can use the following key function. It will return a tuple of the form (letter, number) if there is a number, or of the form (letter,) if there is no number. This works since ('A',) < ('A', 1).
import re
a = ['A', 'B' ,'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
regex = re.compile(r'(\d+)')
def order(e):
num = regex.findall(e)
if num:
num = int(num[0])
return e[0], num
return e,
print(sorted(a, key=order))
>> ['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C2', 'C11']

If I'm understanding your question clear, you are trying to sort an array by two attributes; the alphabet and the trailing 'number'.
You could just do something like
data = ['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
data.sort(key=lambda elem: (elem[0], int(elem[1:]))
but since this would throw an exception for elements without a number trailing them, we can go ahead and just make a function (we shouldn't be using lambda anyways!)
def sortKey(elem):
try:
attribute = (elem[0], int(elem[1:]))
except:
attribute = (elem[0], 0)
return attribute
With this sorting key function made, we can sort the element in place by
data.sort(key=sortKey)
Also, you could just go ahead and adjust the sortKey function to give priority to certain alphabets if you wanted to.

To answer precisely what you describe you can do this :
l = ['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2', 'D']
def custom_sort(data, c):
s = next(i for i, x in enumerate(data) if x.startswith(c))
e = next((i for i, x in enumerate(data) if not x.startswith(c) and i > s), -1)
return data[:s] + sorted(data[s:e], key=lambda d: int(d[1:] or -1)) + data[e:]
print(custom_sort(l, "B"))
if you what an complete sort you can simply do this (as #Mike JS Choi answered but simplier) :
output = sorted(l, key=lambda elem: (elem[0], int(elem[1:] or -1)))

You can use ord() to transform for exemple 'B11' in numerical value:
cells = ['B11', 'C1', 'A', 'B1', 'B2', 'B21', 'B22', 'C11', 'C2', 'B']
conv_cells = []
## Transform expression in numerical value.
for x, cell in enumerate(cells):
val = ord(cell[0]) * (ord(cell[0]) - 65) ## Add weight to ensure respect order.
if len(cell) > 1:
val += int(cell[1:])
conv_cells.append((val, x)) ## List of tuple (num_val, index).
## Display result.
for x in sorted(conv_cells):
print(str(cells[x[1]]) + ' - ' + str(x[0]))

If you wish to sort with different rules for different subgroups you may use tuples as sorting keys. In this case items would be grouped and sorted layer by layer: first by first tuple item, next in each subgroup by second tuple item and so on. This allows us to have different sorting rules in different subgroups. The only limit - items should be comparable within each group. For example, you cannot have int and str type keys in the same subgroup, but you can have them in different subgroups.
Lets try to apply it to the task. We will prepare tuples with elements types (str, int) for B elements, and tuples with (str, str) for all others.
def sorter(elem):
letter, num = elem[0], elem[1:]
if letter == 'B':
return letter, int(num or 0) # hack - if we've got `''` as num, replace it with `0`
else:
return letter, num
data = ['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
sorted(data, key=sorter)
# returns
['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
UPDATE
If you prefer it in one line:
data = ['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
sorted(data, key=lambda elem: (elem[0], int(elem[1:] or 0) if elem[0]=='B' else elem[:1]
# result
['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C2', 'C11']
Anyway these key functions are quite simple, so you can adopt them to real needs.

import numpy as np
def sort_with_prefix(list, prefix):
alist = np.array(list)
ix = np.where([l.startswith(prefix) for l in list])
alist[ix] = [prefix + str(n or '')
for n in np.sort([int(l.split(prefix)[-1] or 0)
for l in alist[ix]])]
return alist.tolist()
For example:
l = ['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C2', 'C11']
print(sort_with_prefix(l, 'B'))
>> ['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']

Using just key and the precondition that the sequence is already 'sorted':
import re
s = ['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
def subgroup_ordinate(element):
# Split the sequence element values into groups and ordinal values.
# use a simple regex and int() in this case
m = re.search('(B)(.+)', element)
if m:
subgroup = m.group(1)
ordinate = int(m.group(2))
else:
subgroup = element
ordinate = None
return (subgroup, ordinate)
print sorted(s, key=subgroup_ordinate)
['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C2', 'C11']
The subgroup_ordinate() function does two things: identifies groups to be sorted and also determines the ordinal number within the groups. This example uses regular expression but the function could be arbitrarily complex. For example we can change it to ur'(B|C)(.+)' and sort both B and C sequences .
['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C2', 'C11']
Reading the bounty question carefully I note the requirement 'sorts some values while leaving others "in place"'. Defining the comparison function to return 0 for elements that are not in subgroups would leave these elements where they were in the sequence.
s2 = ['X', 'B', 'B1', 'B2', 'B11', 'B21', 'A', 'C', 'C1', 'C2', 'C11']
def compare((_a,a),(_b,b)):
return 0 if a is None or b is None else cmp(a,b)
print sorted(s, compare, subgroup_ordinate)
['X', 'B', 'B1', 'B2', 'B11', 'B21', 'A', 'C', 'C1', 'C2', 'C11']

import re
from collections import OrderedDict
a = ['A', 'B' , 'B1', 'B11', 'B2', 'B21', 'B22', 'C', 'C1', 'C11', 'C2']
dict = OrderedDict()
def get_str(item):
_str = list(map(str, re.findall(r"[A-Za-z]", item)))
return _str
def get_digit(item):
_digit = list(map(int, re.findall(r"\d+", item)))
return _digit
for item in a:
_str = get_str(item)
dict[_str[0]] = sorted([get_digit(dig) for dig in a if _str[0] in dig])
nested_result = [[("{0}{1}".format(k,v[0]) if v else k) for v in dict[k]] for k in dict.keys()]
print (nested_result)
# >>> [['A'], ['B', 'B1', 'B2', 'B11', 'B21', 'B22'], ['C', 'C1', 'C2', 'C11']]
result = []
for k in dict.keys():
for v in dict[k]:
result.append("{0}{1}".format(k,v[0]) if v else k)
print (result)
# >>> ['A', 'B', 'B1', 'B2', 'B11', 'B21', 'B22', 'C', 'C1', 'C2', 'C11']

If you want to sort an arbitrary subset of elements while leaving other elements in place, it can be useful to design a view over the original list.
The idea of a view in general is that it's like a lens over the original list, but modifying it will manipulate the underlying original list.
Consider this helper class:
class SubList:
def __init__(self, items, predicate):
self.items = items
self.indexes = [i for i in range(len(items)) if predicate(items[i])]
#property
def values(self):
return [self.items[i] for i in self.indexes]
def sort(self, key):
for i, v in zip(self.indexes, sorted(self.values, key=key)):
self.items[i] = v
The constructor saves the original list in self.items, and the original indexes in self.indexes, as determined by predicate. In your examples, the predicate function can be this:
def predicate(item):
return item.startswith('B')
Then, the values property is the lens over the original list,
returning a list of values picked from the original list by the original indexes.
Finally, the sort function uses self.values to sort,
and then modifies the original list.
Consider this demo with doctests:
def demo(values):
"""
>>> demo(['X', 'b3', 'a', 'b1', 'b2'])
['X', 'b1', 'a', 'b2', 'b3']
"""
def predicate(item):
return item.startswith('b')
sub = SubList(values, predicate)
def key(item):
return int(item[1:])
sub.sort(key)
return values
Notice how SubList is used only as a tool through which to manipulate the input values. After the sub.sort call, values is modified, with elements to sort selected by the predicate function, and sorted according to the key function, and all other elements never moved.
Using this SubList helper with appropriate predicate and key functions,
you can sort arbitrary selection of elements of a list.

def compound_sort(input_list, natural_sort_prefixes=()):
padding = '{:0>%s}' % len(max(input_list, key=len))
return sorted(
input_list,
key = lambda li: \
''.join(
[li for c in '_' if not li.startswith(natural_sort_prefixes)] or
[c for c in li if not c.isdigit()] + \
[c for c in padding.format(li) if c.isdigit()]
)
)
This sort method receives:
input_list: The list to be sorted,
natural_sort_prefixes: A string or a tuple of strings.
List items targeted by the natural_sort_prefixes will be sorted naturally. Items not matching those prefixes will be sorted lexicographically.
This method assumes that the list items are structured as one or more non-numerical characters followed by one or more digits.
It should be more performant than solutions using regex, and doesn't depend on external libraries.
You can use it like:
print compound_sort(['A', 'B' , 'B11', 'B1', 'B2', 'C11', 'C2'], natural_sort_prefixes=("A","B"))
# ['A', 'B', 'B1', 'B2', 'B11', 'C11', 'C2']

Sorting dictionary keys based on their values

I have a python dictionary setup like so
mydict = { 'a1': ['g',6],
'a2': ['e',2],
'a3': ['h',3],
'a4': ['s',2],
'a5': ['j',9],
'a6': ['y',7] }
I need to write a function which returns the ordered keys in a list, depending on which column your sorting on so for example if we're sorting on mydict[key][1] (ascending)
I should receive a list back like so
['a2', 'a4', 'a3', 'a1', 'a6', 'a5']
It mostly works, apart from when you have columns of the same value for multiple keys, eg. 'a2': ['e',2] and 'a4': ['s',2]. In this instance it returns the list like so
['a4', 'a4', 'a3', 'a1', 'a6', 'a5']
Here's the function I've defined
def itlist(table_dict,column_nb,order="A"):
try:
keys = table_dict.keys()
values = [i[column_nb-1] for i in table_dict.values()]
combo = zip(values,keys)
valkeys = dict(combo)
sortedCols = sorted(values) if order=="A" else sorted(values,reverse=True)
sortedKeys = [valkeys[i] for i in sortedCols]
except (KeyError, IndexError), e:
pass
return sortedKeys
And if I want to sort on the numbers column for example it is called like so
sortedkeysasc = itmethods.itlist(table,2)
So any suggestions?
Paul

Wouldn't it be much easier to use
sorted(d, key=lambda k: d[k][1])
(with d being the dictionary)?

>>> L = sorted(d.items(), key=lambda (k, v): v[1])
>>> L
[('a2', ['e', 2]), ('a4', ['s', 2]), ('a3', ['h', 3]), ('a1', ['g', 6]), ('a6', ['y', 7]), ('a5', ['j', 9])]
>>> map(lambda (k,v): k, L)
['a2', 'a4', 'a3', 'a1', 'a6', 'a5']
Here you sort the dictionary items (key-value pairs) using a key - callable which establishes a total order on the items.
Then, you just filter out needed values using a map with a lambda which just selects the key. So you get the needed list of keys.
EDIT: see this answer for a much better solution.

Although there are multiple working answers above, a slight variation / combination of them is the most pythonic to me:
[k for (k,v) in sorted(mydict.items(), key=lambda (k, v): v[1])]

>>> mydict = { 'a1': ['g',6],
... 'a2': ['e',2],
... 'a3': ['h',3],
... 'a4': ['s',2],
... 'a5': ['j',9],
... 'a6': ['y',7] }
>>> sorted(mydict, key=lambda k:mydict[k][1])
['a2', 'a4', 'a3', 'a1', 'a6', 'a5']
>>> sorted(mydict, key=lambda k:mydict[k][0])
['a2', 'a1', 'a3', 'a5', 'a4', 'a6']

def itlist(table_dict, col, desc=False):
return [key for (key,val) in
sorted(
table_dict.iteritems(),
key=lambda x:x[1][col-1],
reverese=desc,
)
]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create list of lists from list based on criteria - python

import numpy as np lst = ["ab", "c1", "cd", "d2", "a1", "b1", "c1", "ax", "by", "cz", "dzz"] indices = [idx for idx, x in enumerate(lst) if x.startswith("a")] print([each_split.tolist() for each_split in np.split(lst, indices) if len(each_split)])

Try numpy.split idx = [lst.index(a) for a in lst if a.lower()[0] == 'a'] new_lst = np.split(lst, idx) [array([], dtype='<U3'), array(['ab', 'c1', 'cd', 'd2'], dtype='<U3'), array(['a1', 'b1', 'c1'], dtype='<U3'), array(['ax', 'by', 'cz', 'dzz'], dtype='<U3')]

def listoflists(l): new_lst=[] a=[] c=0 for i in l: if i[0]=='a': c+=1 if c<2: a.append(i) else: new_lst.append(a) a=[] a.append(i) else: a.append(i) new_lst.append(a) return(new_lst) l = ['ab', 'c1', 'cd', 'd2', 'a1', 'b1', 'c1', 'ax', 'by', 'cz', 'dzz','an','bw','ey'] print(listoflists(l))

Related

Concatenating one dimensional numpyarrays with variable size numpy array in loop

indexing multidimensional array in python

How to remove empty strings from the end of a list of strings (only at the end)

Sort a sublist of elements in a list leaving the rest in place

Sorting dictionary keys based on their values

Categories

Resources