Custom re-arrange list using python - python

I have the following list:
lst = ['A', 'B', 'C','D', 'E', 'F', 'G', 'H','I','J', 'K', 'L','M','N','O']
I would like to sort the list such that every sixth element comes after the fifth value, eleventh after the second, second after the third, so on. The list should be of the following output:
['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O']
What I tried so far?
lst = ['A', 'B', 'C','D', 'E', 'F', 'G', 'H','I','J', 'K', 'L','M','N','O']
new_lst = [lst[0], lst[5], lst[10], lst[1], lst[6], lst[11], lst[2], lst[7], lst[12], lst[3], lst[8], lst[13] , lst[4], lst[9], lst[14]]
new_lst
This provides the desired output, but I am looking for an optimal script. How do I do that?

From the pattern, reshape as 2d then transpose and flatten
sum is convenient function where you can mention start point, in this case the identity is () or [], depending on type
### sol 1
import numpy as np
print('Using numpy')
x = ['A', 'B', 'C','D', 'E', 'F', 'G', 'H','I','J', 'K', 'L','M','N','O']
np.array(x).reshape((-1, 5)).transpose().reshape(-1)
# array(['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O'], dtype='<U1')
# Sol 2
print('One more way without numpy')
list(
sum(
zip(x[:6], x[5:11], x[10:]),
()
)
)
# Sol 3
print('One more way without numpy')
sum(
[list(y) for y in zip(x[:6], x[5:11], x[10:])],
[]
)
# Sol 4
print('One more way without numpy')
list(
sum(
[y for y in zip(x[:6], x[5:11], x[10:])],
()
)
)

You can also use list comprehension if you want to avoid libraries:
lst = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
[x for t in zip(lst[:6], lst[5:11], lst[10:]) for x in t]
# ['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O']

If you want it repeating for every fifth and tenth element after current, then it would be
# Must consist of at least 14 values
input_list = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
output_list = []
for i in range(len(t) // 3):
output_list.append(t[i])
output_list.append(t[i + 5])
output_list.append(t[i + 10])
print(output_list)
No libraries used. It will give the desired result:
['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O']

Related

Subset a list in python on pre-defined string

I have some extremely large lists of character strings I need to parse. I need to break them into smaller lists based on a pre-defined character string, and I figured out a way to do it, but I worry that this will not be performant on my real data. Is there a better way to do this?
My goal is to turn this list:
['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
Into this list:
[['a', 'b'], ['c', 'd', 'e', 'f', 'g'], ['h', 'i', 'j', 'k']]
What I tried:
# List that replicates my data. `string_to_split_on` is a fixed character string I want to break my list up on
my_list = ['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
# Inspect List
print(my_list)
# Create empty lists to store dat ain
new_list = []
good_letters = []
# Iterate over each string in the list
for i in my_list:
# If the string is the seporator, append data to new_list, reset `good_letters` and move to the next string
if i == 'string_to_split_on':
new_list.append(good_letters)
good_letters = []
continue
# Append letter to the list of good letters
else:
good_letters.append(i)
# I just like printing things thay because its easy to read
for item in new_list:
print(item)
print('-'*100)
### Output
['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
['a', 'b']
----------------------------------------------------------------------------------------------------
['c', 'd', 'e', 'f', 'g']
----------------------------------------------------------------------------------------------------
['h', 'i', 'j', 'k']
----------------------------------------------------------------------------------------------------
You can also use one line of code:
original_list = ['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
split_string = 'string_to_split_on'
new_list = [sublist.split() for sublist in ' '.join(original_list).split(split_string) if sublist]
print(new_list)
This approach is more efficient when dealing with large data set:
import itertools
new_list = [list(j) for k, j in itertools.groupby(original_list, lambda x: x != split_string) if k]
print(new_list)
[['a', 'b'], ['c', 'd', 'e', 'f', 'g'], ['h', 'i', 'j', 'k']]

Given a Python list of lists, find all possible flat lists that keeps the order of each sublist?

I have a list of lists. I want to find all flat lists that keeps the order of each sublist. As an example, let's say I have a list of lists like this:
ll = [['D', 'O', 'G'], ['C', 'A', 'T'], ['F', 'I', 'S', 'H']]
It is trivial to get one solution. I managed to write the following code which can generate a random flat list that keeps the order of each sublist.
import random
# Flatten the list of lists
flat = [x for l in ll for x in l]
# Shuffle to gain randomness
random.shuffle(flat)
for l in ll:
# Find the idxs in the flat list that belongs to the sublist
idxs = [i for i, x in enumerate(flat) if x in l]
# Change the order to match the order in the sublist
for j, idx in enumerate(idxs):
flat[idx] = l[j]
print(flat)
This can generate flat lists that looks as follows:
['F', 'D', 'O', 'C', 'A', 'G', 'I', 'S', 'T', 'H']
['C', 'D', 'F', 'O', 'G', 'I', 'S', 'A', 'T', 'H']
['C', 'D', 'O', 'G', 'F', 'I', 'S', 'A', 'T', 'H']
['F', 'C', 'D', 'I', 'S', 'A', 'H', 'O', 'T', 'G']
As you can see, 'A' always appears after 'C', 'T' always appears after 'A', 'O' always appears after 'D', and so on...
However, I want to get all possible solutions.
Please note that :
I want a general code that works for any given list of lists, not just for "dog cat fish";
It does not matter whether there are duplicants or not because every item is distinguishable.
Can anyone suggest a fast Python algorithm for this?
Suppose you are combining the lists by hand. Instead of shuffling and putting things back in order, you would select one list and take its first element, then again select a list and take its first (unused) element, and so on. So the algorithm you need is this: What are all the different ways to pick from a collection of lists with these particular sizes?
In your example you have lists of length 3, 3, 4; suppose you had a bucket with three red balls, three yellow balls and four green balls, which orderings are possible? Model this, and then just pick the first unused element from the corresponding list to get your output.
Say what? For your example, the (distinct) pick orders would be given by
set(itertools.permutations("RRRYYYGGGG"))
For any list of lists, we'll use integer keys instead of letters. The pick orders are:
elements = []
for key, lst in enumerate(ll):
elements.extend( [ key ] * len(lst))
pick_orders = set(itertools.permutations(elements))
Then you just use each pick order to present the elements from your list of lists, say with pop(0) (from a copy of the lists, since pop() is destructive).
Yet another solution, but this one doesn't use any libraries.
def recurse(lst, indices, total, curr):
done = True
for l, (pos, index) in zip(lst, enumerate(indices)):
if index < len(l): # can increment index
curr.append(l[index]) # add on corresponding value
indices[pos] += 1 # increment index
recurse(lst, indices, total, curr)
# backtrack
indices[pos] -= 1
curr.pop()
done = False # modification made, so not done
if done: # no changes made
total.append(curr.copy())
return
def list_to_all_flat(lst):
seq = [0] * len(lst) # set up indexes
total, curr = [], []
recurse(lst, seq, total, curr)
return total
if __name__ == "__main__":
lst = [['D', 'O', 'G'], ['C', 'A', 'T'], ['F', 'I', 'S', 'H']]
print(list_to_all_flat(lst))
Try:
from itertools import permutations, chain
ll = [["D", "O", "G"], ["C", "A", "T"], ["F", "I", "S", "H"]]
x = [[(i1, i2, o) for i2, o in enumerate(subl)] for i1, subl in enumerate(ll)]
l = sum(len(subl) for subl in ll)
def is_valid(c):
seen = {}
for i1, i2, _ in c:
if i2 != seen.get(i1, -1) + 1:
return False
else:
seen[i1] = i2
return True
for c in permutations(chain(*x), l):
if is_valid(c):
print([o for *_, o in c])
Prints:
['D', 'O', 'G', 'C', 'A', 'T', 'F', 'I', 'S', 'H']
['D', 'O', 'G', 'C', 'A', 'F', 'T', 'I', 'S', 'H']
['D', 'O', 'G', 'C', 'A', 'F', 'I', 'T', 'S', 'H']
['D', 'O', 'G', 'C', 'A', 'F', 'I', 'S', 'T', 'H']
['D', 'O', 'G', 'C', 'A', 'F', 'I', 'S', 'H', 'T']
['D', 'O', 'G', 'C', 'F', 'A', 'T', 'I', 'S', 'H']
['D', 'O', 'G', 'C', 'F', 'A', 'I', 'T', 'S', 'H']
['D', 'O', 'G', 'C', 'F', 'A', 'I', 'S', 'T', 'H']
...
['F', 'I', 'S', 'H', 'C', 'D', 'A', 'O', 'T', 'G']
['F', 'I', 'S', 'H', 'C', 'D', 'A', 'T', 'O', 'G']
['F', 'I', 'S', 'H', 'C', 'A', 'D', 'O', 'G', 'T']
['F', 'I', 'S', 'H', 'C', 'A', 'D', 'O', 'T', 'G']
['F', 'I', 'S', 'H', 'C', 'A', 'D', 'T', 'O', 'G']
['F', 'I', 'S', 'H', 'C', 'A', 'T', 'D', 'O', 'G']
You can use a recursive generator function:
ll = [['D', 'O', 'G'], ['C', 'A', 'T'], ['F', 'I', 'S', 'H']]
def get_combos(d, c = []):
if not any(d) and len(c) == sum(map(len, ll)):
yield c
elif any(d):
for a, b in enumerate(d):
for j, k in enumerate(b):
yield from get_combos(d[:a]+[b[j+1:]]+d[a+1:], c+[k])
print(list(get_combos(ll)))
Output (first ten permutations):
[['D', 'O', 'G', 'C', 'A', 'T', 'F', 'I', 'S', 'H'], ['D', 'O', 'G', 'C', 'A', 'F', 'T', 'I', 'S', 'H'], ['D', 'O', 'G', 'C', 'A', 'F', 'I', 'T', 'S', 'H'], ['D', 'O', 'G', 'C', 'A', 'F', 'I', 'S', 'T', 'H'], ['D', 'O', 'G', 'C', 'A', 'F', 'I', 'S', 'H', 'T'], ['D', 'O', 'G', 'C', 'F', 'A', 'T', 'I', 'S', 'H'], ['D', 'O', 'G', 'C', 'F', 'A', 'I', 'T', 'S', 'H'], ['D', 'O', 'G', 'C', 'F', 'A', 'I', 'S', 'T', 'H'], ['D', 'O', 'G', 'C', 'F', 'A', 'I', 'S', 'H', 'T'], ['D', 'O', 'G', 'C', 'F', 'I', 'A', 'T', 'S', 'H']]
For simplicity, let's start with two list item in a list.
from itertools import permutations
Ls = [['D', 'O', 'G'], ['C', 'A', 'T']]
L_flattened = []
for L in Ls:
for item in L:
L_flattened.append(item)
print("L_flattened:", L_flattened)
print(list(permutations(L_flattened, len(L_flattened))))
[('D', 'O', 'G', 'C', 'A', 'T'), ('D', 'O', 'G', 'C', 'T', 'A'), ('D', 'O', 'G', 'A', 'C', 'T'), ('D', 'O', 'G', 'A', 'T', 'C'), ('D', 'O', 'G', 'T', 'C', 'A'), ('D', 'O', 'G', 'T', 'A', 'C'), ('D', 'O', 'C', 'G', 'A', 'T'),
('D', 'O', 'C', 'G', 'T', 'A'), ('D', 'O', 'C', 'A', 'G', 'T'), ('D', 'O', 'C', 'A', 'T', 'G'),
...
Beware that permutations grow very quickly in their sizes.
In your example there are 10 items and Permutation(10, 10) = 3628800.
I suggest you to calculate permutation here to get an idea before running actual code (which may cause memory error/freeze/crash in system).
You can try verifying all possible permutations:
import random
import itertools
import numpy as np
ll = [['D', 'O', 'G'], ['C', 'A', 'T'], ['F', 'I', 'S', 'H']]
flat = [x for l in ll for x in l]
all_permutations = list(itertools.permutations(flat))
good_permutations = []
count = 0
for perm in all_permutations:
count += 1
cond = True
for l in ll:
idxs = [perm.index(x) for i, x in enumerate(flat) if x in l]
# check if ordered
if not np.all(np.diff(np.array(idxs)) >= 0):
cond = False
break
if cond == True:
good_permutations.append(perm)
if count >= 10000:
break
print(len(good_permutations))
It is only a basic solution as it is really slow to compute (I set the count to limit the number of permutations that are verified).

Slice a list such that the result has the 2 elements before and 2 elements after the subject and a constant result length?

Given this sorted array:
>>> x = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']
I want to slice up this array so that there are always 5 elements. 2 above and 2 below. I went with:
>>> [x[i-2:i+2] for i, v in enumerate(x)]
This results in:
[[], [], ['a', 'b', 'c', 'd'], ['b', 'c', 'd', 'e'], ['c', 'd', 'e', 'f'], ['d', 'e', 'f', 'g'], ['e', 'f', 'g', 'h'], ['f', 'g', 'h', 'i'], ['g', 'h', 'i', 'j'], ['h', 'i', 'j', 'k'], ['i', 'j', 'k', 'l'], ['j', 'k', 'l']]
The problems with this are:
There are 4 elements per group, not 5
Not every group has 2 above and 2 below.
The first and last groups are special cases. I do not want
blanks at the front. What I want to see is ['a', 'b', 'c', 'd', 'e'] as the first group and then ['b', 'c', 'd', 'e', 'f']
as the second group.
I also played around with clamping the slices.
First I defined a clamp function like so:
>>> def clamp(n, smallest, largest): return max(smallest, min(n, largest))
Then, I applied the function like so:
>>> [x[clamp(i-2, 0, i):clamp(i+2, i, len(x))] for i, v in enumerate(x)]
But it didn't really work out so well:
[['a', 'b'], ['a', 'b', 'c'], ['a', 'b', 'c', 'd'], ['b', 'c', 'd', 'e'], ['c', 'd', 'e', 'f'], ['d', 'e', 'f', 'g'], ['e', 'f', 'g', 'h'], ['f', 'g', 'h', 'i'], ['g', 'h', 'i', 'j'], ['h', 'i', 'j', 'k'], ['i', 'j', 'k', 'l'], ['j', 'k', 'l']]
Am I even barking up the right tree?
I found two SO articles about this issue, but they didn't address these edge cases:
Search a list for item(s)and return x number of surrounding items in python
efficient way to find several rows above and below a subset of data
A couple of observations:
you may want to use range(len(x)) instead of enumerate, then you will avoid having to unpack the result.
If anyone need to understand slice notation, this may help
Then, you can filter the list inside the comprehension
x = list('abcdefghijklmno')
[ x[i-2:i+2+1] for i in range(len(x)) if len(x[i-2:i+2+1]) == 5 ]
# [['a', 'b', 'c', 'd', 'e'], ['b', 'c', 'd', 'e', 'f'], ['c', 'd', 'e', 'f', 'g'], ['d', 'e', 'f', 'g', 'h'], ['e', 'f', 'g', 'h', 'i'], ['f', 'g', 'h', 'i', 'j'], ['g', 'h', 'i', 'j', 'k'], ['h', 'i', 'j', 'k', 'l'], ['i', 'j', 'k', 'l', 'm'], ['j', 'k', 'l', '`
# On python 3.8 you can use the walrus operator!!!
[ y for i in range(len(x)) if len(y:=x[i-2:i+2+1]) == 5 ]
# [['a', 'b', 'c', 'd', 'e'], ['b', 'c', 'd', 'e', 'f'], ['c', 'd', 'e', 'f', 'g'], ['d', 'e', 'f', 'g', 'h'], ['e', 'f', 'g', 'h', 'i'], ['f', 'g', 'h', 'i', 'j'], ['g', 'h', 'i', 'j', 'k'], ['h', 'i', 'j', 'k', 'l'], ['i', 'j', 'k', 'l', 'm'], ['j', 'k', 'l', 'm', 'n'], ['k', 'l', 'm', 'n', 'o']]

Sliding window in Python

I used the following famous code for my sliding window through the tokenised text document:
def window(fseq, window_size):
"Sliding window"
it = iter(fseq)
result = tuple(islice(it, 0, window_size, round(window_size/4)))
if len(result) == window_size:
yield result
for elem in it:
result = result[1:] + (elem,)
result_list = list(result)
yield result_list
when I want to call my function with window size less than 6, everything is ok, but when I increase it, the beginning of the text is cut.
For example:
c=['A','B','C','D','E', 'F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
print(list(window(c, 4)))
print(list(window(c, 8)))
Output:
[('A', 'B', 'C', 'D'), ['B', 'C', 'D', 'E'], ['C', 'D', 'E', 'F'], ['D', 'E', 'F', 'G'], ['E', 'F', 'G', 'H'], ['F', 'G', 'H', 'I'],...
[['C', 'E', 'G', 'I'], ['E', 'G', 'I', 'J'], ['G', 'I', 'J', 'K'], ['I', 'J', 'K', 'L'], ['J', 'K', 'L', 'M']...
What's wrong? And why in the first output the first element is in round brackets?
My expected output for print(list(window(c, 8))) is:
[['A','B','C', 'D', 'E', 'F','G','H'], ['C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], ['E', 'F', 'G', 'H', 'I', 'K', 'L', 'M']...
Your version is incorrect. It adds a 4th argument (the step size) to the islice() function that limits how large the first slice taken is going to be:
result = tuple(islice(it, 0, window_size, round(window_size/4)))
For 4 or 5, round(window_size/4) produces 1, the default step size. But for larger values, this produces a step size that guarantees that values will be omitted from that first window, so the next test, len(result) == window_size is guaranteed to be false.
Remove that step size argument, and you'll get your first window back again. Also see Rolling or sliding window iterator in Python.
The first result is in 'round brackets' because it is a tuple. If you wanted a list instead, use list() rather than tuple() in your code.
If you wanted to have your window slide along in steps larger than 1, you should not alter the initial window. You need to add and remove step size elements from the window as you iterate along. That's easier done with a while loop:
def window_with_larger_step(fseq, window_size):
"""Sliding window
The step size the window moves over increases with the size of the window.
"""
it = iter(fseq)
result = list(islice(it, 0, window_size))
if len(result) == window_size:
yield result
step_size = max(1, int(round(window_size / 4))) # no smaller than 1
while True:
new_elements = list(islice(it, step_size))
if len(new_elements) < step_size:
break
result = result[step_size:] + list(islice(it, step_size))
yield result
This adds step_size elements to the running result, removing step_size elements from the start to keep the window size even.
Demo:
>>> print(list(window_with_larger_step(c, 6)))
[['A', 'B', 'C', 'D', 'E', 'F'], ['C', 'D', 'E', 'F', 'I', 'J'], ['E', 'F', 'I', 'J', 'M', 'N'], ['I', 'J', 'M', 'N', 'Q', 'R'], ['M', 'N', 'Q', 'R', 'U', 'V'], ['Q', 'R', 'U', 'V', 'Y', 'Z']]
>>> print(list(window_with_larger_step(c, 8)))
[['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'], ['C', 'D', 'E', 'F', 'G', 'H', 'K', 'L'], ['E', 'F', 'G', 'H', 'K', 'L', 'O', 'P'], ['G', 'H', 'K', 'L', 'O', 'P', 'S', 'T'], ['K', 'L', 'O', 'P', 'S', 'T', 'W', 'X'], ['O', 'P', 'S', 'T', 'W', 'X']]
>>> print(list(window_with_larger_step(c, 10)))
[['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], ['D', 'E', 'F', 'G', 'H', 'I', 'J', 'N', 'O', 'P'], ['G', 'H', 'I', 'J', 'N', 'O', 'P', 'T', 'U', 'V'], ['J', 'N', 'O', 'P', 'T', 'U', 'V', 'Z']]

python 3, how to fix the frequency of occurrence

I would like to write a function named "frequency" which can fix the frequency of pairs in the second half of my output, for example if I fix the frequency of the couple ['A', 'C'] at 0,5 and the frequency of the couple ['M', 'K'] at 0,5, I would like an output like following:
['A', 'K']
['A', 'K']
['A', 'K']
['A', 'K']
['A', 'K']
['A', 'C']
['A', 'C']
['A', 'C']
['M', 'K']
['M', 'K']
['M', 'K']
I would like to change easily the value of the frequency I set. I tried to build a Function for this purpose, but I just could count the frequency of the existing couples, without fixing them.
the code I have is the following:
for i in range(int(lengthPairs/2)):
pairs.append([aminoacids[0], aminoacids[11]])
print(int(lengthPairs/2))
for j in range(int(lengthPairs/2)+1):
dictionary = dict()
r1 = randrange(20)
r2 = randrange(20)
pairs.append([aminoacids[r1], aminoacids[r2]])
for pair in pairs:
print (pair)
where:
aminoacids = ['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V']
lengthPairs = 10
pairs = list(list())
it gives me an output like this:
['A', 'K']
['A', 'K']
['A', 'K']
['A', 'K']
['A', 'K']
['A', 'C']
['M', 'K']
['I', 'I']
['F', 'G']
['V', 'H']
['V', 'I']
thank you very much for any assistance!
I tried my best to understand what you meant. And let's see if the following does what you want:
aminoacids = ['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V']
pair_fq_change = [aminoacids[0], aminoacids[11]] #the pair that you'd like to change the frequency, e.g. ['A', 'K']
original_pairs = [['D', 'E'], ['S', 'F'], ['A', 'K'], ['A', 'K'], ['A', 'K'], ['A', 'K'], ['A', 'K'], ['B', 'C']]
def frequency(original_pairs, pair_fq_change, fq):
'''fq is the number of frequency that you want the pair_fq_change to have'''
updated_pairs = []
count = 0
for pair in original_pairs:
if pair != pair_fq_change:
updated_pairs.append(pair)
elif pair == pair_fq_change and count < fq:
updated_pairs.append(pair)
count += 1
else:
continue
return updated_pairs
updated_pairs = frequency(original_pairs, pair_fq_change, 3)
print(updated_pairs)
>>>[['D', 'E'], ['S', 'F'], ['A', 'K'], ['A', 'K'], ['A', 'K'], ['B', 'C']]

Categories

Resources