Merging nested lists with overlap - python

I have to merge nested lists which have an overlap. I keep thinking that there has to be an intelligent solution using list comprehensions and probably difflib, but I can't figure out how it should work.
My lists look like this:
[['C', 'x', 'F'], ['A', 'D', 'E']]
and
[['x', 'F', 'G', 'x'], ['D', 'E', 'H', 'J']].
They are above another, like rows in a matrix. Therefore, they have overlap (in the form of
[['x', 'F'], ['D', 'E']]).
A merge should yield:
[['C', 'x', 'F', 'G', 'x'], ['A', 'D', 'E', 'H', 'J']].
How can I achieve this?

You can try something like this.
list1 = [['C', 'x', 'F'], ['A', 'D', 'E']]
list2 = [['x', 'F', 'G', 'x'], ['D', 'E', 'H', 'J']]
for x in range(len(list1)):
for element in list2[x]:
if element not in list1[x]:
list1[x].append(element)
print list1[x]
Output:
['C', 'x', 'F', 'G']
['A', 'D', 'E', 'H', 'J']
I hope this helps you.

Related

Combine a list of list by element in python [duplicate]

This question already has answers here:
Transpose list of lists
(14 answers)
Closed last month.
I have a list of 4 list show below.
list1 = [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i'], ['j', 'k', 'l']]
How do I create a list of list by element position so that the new list of list is as follows?
list2 = [['a', 'd', 'g', 'j'], ['b', 'e', 'h', 'k'], ['c', 'f', 'i', 'l']]
I tried using a for loop such as
res = []
for listing in list1:
for i in list:
res.append(i)
however it just created a single list.
Use zip with the * operator to zip all of the sublists together. The resulting tuples will have the list contents you want, so just use list() to convert them into lists.
>>> list1 = [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i'], ['j', 'k', 'l']]
>>> [list(z) for z in zip(*list1)]
[['a', 'd', 'g', 'j'], ['b', 'e', 'h', 'k'], ['c', 'f', 'i', 'l']]

Subset a list in python on pre-defined string

I have some extremely large lists of character strings I need to parse. I need to break them into smaller lists based on a pre-defined character string, and I figured out a way to do it, but I worry that this will not be performant on my real data. Is there a better way to do this?
My goal is to turn this list:
['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
Into this list:
[['a', 'b'], ['c', 'd', 'e', 'f', 'g'], ['h', 'i', 'j', 'k']]
What I tried:
# List that replicates my data. `string_to_split_on` is a fixed character string I want to break my list up on
my_list = ['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
# Inspect List
print(my_list)
# Create empty lists to store dat ain
new_list = []
good_letters = []
# Iterate over each string in the list
for i in my_list:
# If the string is the seporator, append data to new_list, reset `good_letters` and move to the next string
if i == 'string_to_split_on':
new_list.append(good_letters)
good_letters = []
continue
# Append letter to the list of good letters
else:
good_letters.append(i)
# I just like printing things thay because its easy to read
for item in new_list:
print(item)
print('-'*100)
### Output
['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
['a', 'b']
----------------------------------------------------------------------------------------------------
['c', 'd', 'e', 'f', 'g']
----------------------------------------------------------------------------------------------------
['h', 'i', 'j', 'k']
----------------------------------------------------------------------------------------------------
You can also use one line of code:
original_list = ['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
split_string = 'string_to_split_on'
new_list = [sublist.split() for sublist in ' '.join(original_list).split(split_string) if sublist]
print(new_list)
This approach is more efficient when dealing with large data set:
import itertools
new_list = [list(j) for k, j in itertools.groupby(original_list, lambda x: x != split_string) if k]
print(new_list)
[['a', 'b'], ['c', 'd', 'e', 'f', 'g'], ['h', 'i', 'j', 'k']]

Slice a list such that the result has the 2 elements before and 2 elements after the subject and a constant result length?

Given this sorted array:
>>> x = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']
I want to slice up this array so that there are always 5 elements. 2 above and 2 below. I went with:
>>> [x[i-2:i+2] for i, v in enumerate(x)]
This results in:
[[], [], ['a', 'b', 'c', 'd'], ['b', 'c', 'd', 'e'], ['c', 'd', 'e', 'f'], ['d', 'e', 'f', 'g'], ['e', 'f', 'g', 'h'], ['f', 'g', 'h', 'i'], ['g', 'h', 'i', 'j'], ['h', 'i', 'j', 'k'], ['i', 'j', 'k', 'l'], ['j', 'k', 'l']]
The problems with this are:
There are 4 elements per group, not 5
Not every group has 2 above and 2 below.
The first and last groups are special cases. I do not want
blanks at the front. What I want to see is ['a', 'b', 'c', 'd', 'e'] as the first group and then ['b', 'c', 'd', 'e', 'f']
as the second group.
I also played around with clamping the slices.
First I defined a clamp function like so:
>>> def clamp(n, smallest, largest): return max(smallest, min(n, largest))
Then, I applied the function like so:
>>> [x[clamp(i-2, 0, i):clamp(i+2, i, len(x))] for i, v in enumerate(x)]
But it didn't really work out so well:
[['a', 'b'], ['a', 'b', 'c'], ['a', 'b', 'c', 'd'], ['b', 'c', 'd', 'e'], ['c', 'd', 'e', 'f'], ['d', 'e', 'f', 'g'], ['e', 'f', 'g', 'h'], ['f', 'g', 'h', 'i'], ['g', 'h', 'i', 'j'], ['h', 'i', 'j', 'k'], ['i', 'j', 'k', 'l'], ['j', 'k', 'l']]
Am I even barking up the right tree?
I found two SO articles about this issue, but they didn't address these edge cases:
Search a list for item(s)and return x number of surrounding items in python
efficient way to find several rows above and below a subset of data
A couple of observations:
you may want to use range(len(x)) instead of enumerate, then you will avoid having to unpack the result.
If anyone need to understand slice notation, this may help
Then, you can filter the list inside the comprehension
x = list('abcdefghijklmno')
[ x[i-2:i+2+1] for i in range(len(x)) if len(x[i-2:i+2+1]) == 5 ]
# [['a', 'b', 'c', 'd', 'e'], ['b', 'c', 'd', 'e', 'f'], ['c', 'd', 'e', 'f', 'g'], ['d', 'e', 'f', 'g', 'h'], ['e', 'f', 'g', 'h', 'i'], ['f', 'g', 'h', 'i', 'j'], ['g', 'h', 'i', 'j', 'k'], ['h', 'i', 'j', 'k', 'l'], ['i', 'j', 'k', 'l', 'm'], ['j', 'k', 'l', '`
# On python 3.8 you can use the walrus operator!!!
[ y for i in range(len(x)) if len(y:=x[i-2:i+2+1]) == 5 ]
# [['a', 'b', 'c', 'd', 'e'], ['b', 'c', 'd', 'e', 'f'], ['c', 'd', 'e', 'f', 'g'], ['d', 'e', 'f', 'g', 'h'], ['e', 'f', 'g', 'h', 'i'], ['f', 'g', 'h', 'i', 'j'], ['g', 'h', 'i', 'j', 'k'], ['h', 'i', 'j', 'k', 'l'], ['i', 'j', 'k', 'l', 'm'], ['j', 'k', 'l', 'm', 'n'], ['k', 'l', 'm', 'n', 'o']]

Combine all elements of n lists in python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
there are a lot of questions and answers about combining and merging lists in python but I have not found a way to create a full combination of all elements.
If I had a list of lists like the following:
data_small = [ ['a','b','c'], ['d','e','f'] ]
data_big = [ ['a','b','c'], ['d','e','f'], ['u','v','w'], ['x','y','z'] ]
How can I get a list of lists with all combinations?
For data_small this should be:
[ [a,b,c], [d,b,c], [a,b,f], [a,e,c],
[d,e,c], [d,b,f], [a,e,f], [d,e,f], ... ]
This should also work for an arbitrary number of lists of the same length like data_big.
I am pretty sure there is a fancy itertools solution for this, right?
I think I deciphered the question:
def so_called_combs(data):
for sublist in data:
for sbl in data:
if sbl==sublist:
yield sbl
continue
for i in range(len(sublist)):
c = sublist[:]
c[i] = sbl[i]
yield c
This returns the required list, if I understood it correctly:
For every list in the data, every element is replaced (but only one at a time) with the corresponding element (same position) in each of the other lists.
For data_big, this returns:
[['a', 'b', 'c'], ['d', 'b', 'c'], ['a', 'e', 'c'], ['a', 'b', 'f'],
['u', 'b', 'c'], ['a', 'v', 'c'], ['a', 'b', 'w'], ['x', 'b', 'c'],
['a', 'y', 'c'], ['a', 'b', 'z'], ['a', 'e', 'f'], ['d', 'b', 'f'],
['d', 'e', 'c'], ['d', 'e', 'f'], ['u', 'e', 'f'], ['d', 'v', 'f'],
['d', 'e', 'w'], ['x', 'e', 'f'], ['d', 'y', 'f'], ['d', 'e', 'z'],
['a', 'v', 'w'], ['u', 'b', 'w'], ['u', 'v', 'c'], ['d', 'v', 'w'],
['u', 'e', 'w'], ['u', 'v', 'f'], ['u', 'v', 'w'], ['x', 'v', 'w'],
['u', 'y', 'w'], ['u', 'v', 'z'], ['a', 'y', 'z'], ['x', 'b', 'z'],
['x', 'y', 'c'], ['d', 'y', 'z'], ['x', 'e', 'z'], ['x', 'y', 'f'],
['u', 'y', 'z'], ['x', 'v', 'z'], ['x', 'y', 'w'], ['x', 'y', 'z']]
Here is another way to do using itertools permutations and chain function. You also need to check if the indexes line up and are all the same length, and whether there is more than one element being replaced
from itertools import *
data_small = [ ['a','b','c'], ['d','e','f'] ]
data_big = [ ['a','b','c'], ['d','e','f'], ['u','v','w'], ['x','y','z'] ]
def check(data, sub):
check_for_mul_repl = []
for i in data:
if len(i) != len(data[0]):
return False
for j in i:
if j in sub:
if i.index(j) != sub.index(j):
return False
else:
if i not in check_for_mul_repl:
check_for_mul_repl.append(i)
if len(check_for_mul_repl) <= 2:
return True
print [x for x in list(permutations(chain(*data_big), 3)) if check(data_big, x)]
['a', 'b', 'c'], ['a', 'b', 'f'], ['a', 'b', 'w'], ['a', 'b', 'z'],
['a', 'e', 'c'], ['a', 'e', 'f'], ['a', 'v', 'c'], ['a', 'v', 'w'],
['a', 'y', 'c'], ['a', 'y', 'z'], ['d', 'b', 'c'], ['d', 'b', 'f'],
['d', 'e', 'c'], ['d', 'e', 'f'], ['d', 'e', 'w'], ['d', 'e', 'z'],
['d', 'v', 'f'], ['d', 'v', 'w'], ['d', 'y', 'f'], ['d', 'y', 'z'],
['u', 'b', 'c'], ['u', 'b', 'w'], ['u', 'e', 'f'], ['u', 'e', 'w'],
['u', 'v', 'c'], ['u', 'v', 'f'], ['u', 'v', 'w'], ['u', 'v', 'z'],
['u', 'y', 'w'], ['u', 'y', 'z'], ['x', 'b', 'c'], ['x', 'b', 'z'],
['x', 'e', 'f'], ['x', 'e', 'z'], ['x', 'v', 'w'], ['x', 'v', 'z'],
['x', 'y', 'c'], ['x', 'y', 'f'], ['x', 'y', 'w'], ['x', 'y', 'z']
This doesn't care if there is more than one element being replaced
from itertools import permutations, chain
data_small = [ ['a','b','c'], ['d','e','f'] ]
data_big = [ ['a','b','c'], ['d','e','f'], ['u','v','w'], ['x','y','z'] ]
def check(data, sub):
for i in data:
if len(i) != len(data[0]):
return False
for j in i:
if j in sub:
if i.index(j) != sub.index(j):
return False
return True
#If you really want lists just change the first x to list(x)
print [x for x in list(permutations(chain(*data_big), 3)) if check(data_big, x)]
['a', 'b', 'c'], ['a', 'b', 'f'], ['a', 'b', 'w'], 61 more...
The reason I use permutations instead of combinations is because ('d','b','c') is equal to ('c','b','d') in terms of combinations and not in permutations
If you just want combinations then that's a lot easier. You can just do
def check(data) #Check if all sub lists are same length
for i in data:
if len(i) != len(data[0]):
return False
return True
if check(data_small):
print list(combinations(chain(*data_small), 3))
[('a', 'b', 'c'), ('a', 'b', 'd'), ('a', 'b', 'e'), ('a', 'b', 'f'),
('a', 'c', 'd'), ('a', 'c', 'e'), ('a', 'c', 'f'), ('a', 'd', 'e'),
('a', 'd', 'f'), ('a', 'e', 'f'), ('b', 'c', 'd'), ('b', 'c', 'e'),
('b', 'c', 'f'), ('b', 'd', 'e'), ('b', 'd', 'f'), ('b', 'e', 'f'),
('c', 'd', 'e'), ('c', 'd', 'f'), ('c', 'e', 'f'), ('d', 'e', 'f')]
Sorry for being late to the party, but here is the fancy "one--liner" (split over multiple lines for readability) using itertools and the extremely useful new Python 3.5 unpacking generalizations (which, by the way, is a significantly faster and more readable way of converting between iterable types than, say, calling list explicitly) --- and assuming unique elements:
>>> from itertools import permutations, repeat, chain
>>> next([*map(lambda m: [m[i][i] for i in range(a)],
{*permutations((*chain(*map(
repeat, map(tuple, l), repeat(a - 1))),), a)})]
for l in ([['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']],)
for a in (len(l[0]),))
[['g', 'h', 'f'], ['g', 'b', 'i'], ['g', 'b', 'f'],
['d', 'b', 'f'], ['d', 'h', 'f'], ['d', 'h', 'i'],
['a', 'e', 'c'], ['g', 'e', 'i'], ['a', 'h', 'i'],
['a', 'e', 'f'], ['g', 'e', 'c'], ['a', 'b', 'i'],
['g', 'b', 'c'], ['g', 'h', 'c'], ['d', 'h', 'c'],
['d', 'b', 'c'], ['d', 'e', 'c'], ['a', 'b', 'f'],
['d', 'b', 'i'], ['a', 'h', 'c'], ['g', 'e', 'f'],
['a', 'e', 'i'], ['d', 'e', 'i'], ['a', 'h', 'f']]
The use of next on the generator and the last two lines are, of course, just unnecessary exploitations of syntax to put the expression into one line, and I hope people will not use this as an example of good coding practice.
EDIT
I just realised that maybe I should give a short explanation. So, the inner part creates a - 1 copies of each sublist (converted to tuples for hashability and uniqueness testing) and chains them together to allow permutations to do its magic, which is to create all permutations of sublists of a sublists length. These are then converted to a set which gets rid of all duplicates which are bound to occur, and then a map pulls out the ith element of the ith sublist within each unique permutation. Finally, a is the length of the first sublist, since all sublists are assumed to have identical lengths.

python 3, how to fix the frequency of occurrence

I would like to write a function named "frequency" which can fix the frequency of pairs in the second half of my output, for example if I fix the frequency of the couple ['A', 'C'] at 0,5 and the frequency of the couple ['M', 'K'] at 0,5, I would like an output like following:
['A', 'K']
['A', 'K']
['A', 'K']
['A', 'K']
['A', 'K']
['A', 'C']
['A', 'C']
['A', 'C']
['M', 'K']
['M', 'K']
['M', 'K']
I would like to change easily the value of the frequency I set. I tried to build a Function for this purpose, but I just could count the frequency of the existing couples, without fixing them.
the code I have is the following:
for i in range(int(lengthPairs/2)):
pairs.append([aminoacids[0], aminoacids[11]])
print(int(lengthPairs/2))
for j in range(int(lengthPairs/2)+1):
dictionary = dict()
r1 = randrange(20)
r2 = randrange(20)
pairs.append([aminoacids[r1], aminoacids[r2]])
for pair in pairs:
print (pair)
where:
aminoacids = ['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V']
lengthPairs = 10
pairs = list(list())
it gives me an output like this:
['A', 'K']
['A', 'K']
['A', 'K']
['A', 'K']
['A', 'K']
['A', 'C']
['M', 'K']
['I', 'I']
['F', 'G']
['V', 'H']
['V', 'I']
thank you very much for any assistance!
I tried my best to understand what you meant. And let's see if the following does what you want:
aminoacids = ['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V']
pair_fq_change = [aminoacids[0], aminoacids[11]] #the pair that you'd like to change the frequency, e.g. ['A', 'K']
original_pairs = [['D', 'E'], ['S', 'F'], ['A', 'K'], ['A', 'K'], ['A', 'K'], ['A', 'K'], ['A', 'K'], ['B', 'C']]
def frequency(original_pairs, pair_fq_change, fq):
'''fq is the number of frequency that you want the pair_fq_change to have'''
updated_pairs = []
count = 0
for pair in original_pairs:
if pair != pair_fq_change:
updated_pairs.append(pair)
elif pair == pair_fq_change and count < fq:
updated_pairs.append(pair)
count += 1
else:
continue
return updated_pairs
updated_pairs = frequency(original_pairs, pair_fq_change, 3)
print(updated_pairs)
>>>[['D', 'E'], ['S', 'F'], ['A', 'K'], ['A', 'K'], ['A', 'K'], ['B', 'C']]

Categories

Resources