Splitting a list based on a delimiter word - python

I have a list containing various string values. I want to split the list whenever I see WORD. The result will be a list of lists (which will be the sublists of original list) containing exactly one instance of the WORD I can do this using a loop but is there a more pythonic way to do achieve this ?
Example = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
result = [['A'], ['WORD','B','C'],['WORD','D']]
This is what I have tried but it actually does not achieve what I want since it will put WORD in a different list that it should be in:
def split_excel_cells(delimiter, cell_data):
result = []
temp = []
for cell in cell_data:
if cell == delimiter:
temp.append(cell)
result.append(temp)
temp = []
else:
temp.append(cell)
return result

import itertools
lst = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
w = 'WORD'
spl = [list(y) for x, y in itertools.groupby(lst, lambda z: z == w) if not x]
this creates a splitted list without delimiters, which looks more logical to me:
[['A'], ['B', 'C'], ['D']]
If you insist on delimiters to be included, this should do the trick:
spl = [[]]
for x, y in itertools.groupby(lst, lambda z: z == w):
if x: spl.append([])
spl[-1].extend(y)

I would use a generator:
def group(seq, sep):
g = []
for el in seq:
if el == sep:
yield g
g = []
g.append(el)
yield g
ex = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
result = list(group(ex, 'WORD'))
print(result)
This prints
[['A'], ['WORD', 'B', 'C'], ['WORD', 'D']]
The code accepts any iterable, and produces an iterable (which you don't have to flatten into a list if you don't want to).

#NPE's solution looks very pythonic to me. This is another one using itertools:
izip is specific to python 2.7. Replace izip with zip to work in python 3
from itertools import izip, chain
example = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
indices = [i for i,x in enumerate(example) if x=="WORD"]
pairs = izip(chain([0], indices), chain(indices, [None]))
result = [example[i:j] for i, j in pairs]
This code is mainly based on this answer.

Given
import more_itertools as mit
iterable = ["A", "WORD", "B" , "C" , "WORD" , "D"]
pred = lambda x: x == "WORD"
Code
list(mit.split_before(iterable, pred))
# [['A'], ['WORD', 'B', 'C'], ['WORD', 'D']]
more_itertools is a third-party library installable via > pip install more_itertools.
See also split_at and split_after.

Related

Rearrange python list to avoid any position matches

I have 2 lists list1 and list2. How would I rearrange list1 (without rearranging list2) so that there are no matches on any positions eg:
UNDESIRED: list1 = [‘A’, ‘B’, ‘C’] list2 = [‘X’, ‘B’, ‘Z’] as you can see the B is on the same position, 1, in both lists…so I would then like to rearrange list1 to list1 = [‘B’, ‘A’, ‘C’] or any other order where there are no positional matches with list2 WITHOUT rearranging list2
You could do this with the help of the itertools module. There may be better / more efficient mechanisms. Note that the process() function will return None when there is no possible solution.
import itertools
def process(L1, L2):
for s in set(itertools.permutations(L1, len(L1))):
if all([a != b for a, b in zip(s, L2)]):
return s
return None
list1 = ['A', 'B', 'C']
list2 = ['X', 'B', 'Z']
print(process(list1, list2))
from itertools import permutations
def diff(list1, list2):
for item in permutations(list1, len(list1)):
if all(map(lambda a, b: a!=b, item, list2)):
return list(item)
return None
list1 = ['A', 'B', 'C']
list2 = ['X', 'B', 'Z']
result = diff(list1, list2)
if result:
print(result, 'vs', list2)
['A', 'C', 'B'] vs ['X', 'B', 'Z']
I solved it using this piece of code - however, as noted above, there are many possible cases where this very solution may not work. In this very case, you can try this:
import random
list1 = ['A', 'B', 'C']
list2 = ['X', 'B', 'Z']
for i in list1:
for j in list2:
if i == j in list2:
while list1.index(i) == list2.index(j):
list1.remove(i)
z = len(list1)
rand = random.choice(range(z))
list1.insert(rand, i)
print(list1, list2)
Also, note that you use quite weird apostrophes inside the list: ‘ instead of ' or ". AFAIK, Python won't be able to comprehend them correctly.

How to generate combination of characters in a string at a particular position?

I have a string list :
li = ['a', 'b', 'c', 'd']
Using the following code in Python, I generated all the possible combination of characters for list li and got a result of 256 strings.
from itertools import product
li = ['a', 'b', 'c', 'd']
for comb in product(li, repeat=4):
print(''.join(comb))
Say for example, I know the character of the second and fourth position of the string in the list li which is 'b' and 'c'.
So the result will be a set of only 16 strings which is :
abac
abbc
abcc
abdc
bbac
bbbc
bbcc
bbdc
cbac
cbbc
cbcc
cbdc
dbac
dbbc
dbcc
dbdc
How to get this result? Is there a Pythonic way to achieve this?
Thanks.
Edit : My desired size of list li is a to z and the value for repeat is 13. When I tried the above code, compiler throwed memory error!
Use list comprehension:
from itertools import product
li = ['a', 'b', 'c', 'd']
combs = [list(x) for x in product(li, repeat=4)]
selected_combs = [comb for comb in combs if (comb[1] == 'b' and comb[3] == 'c')]
print(["".join(comb) for comb in selected_combs])
# ['abac', 'abbc', 'abcc', 'abdc', 'bbac', 'bbbc', 'bbcc', 'bbdc', 'cbac', 'cbbc', 'cbcc', 'cbdc', 'dbac', 'dbbc', 'dbcc', 'dbdc']
To save memory in case you do not need all the combinations combs, you can simply do:
li = ['a', 'b', 'c', 'd']
selected_combs = [comb for comb in product(li, repeat=4) if (comb[1] == 'b' and comb[3] == 'c')]
print(["".join(comb) for comb in selected_combs])
def permute(s):
out = []
if len(s) == 1:
return s
else:
for i,let in enumerate(s):
for perm in permute(s[:i] + s[i+1:]):
out += [let + perm]
return out
per=permute(['a', 'b', 'c', 'd'])
print(per)
Do you want this?

How to combine two elements of a list based on a given condition

I want to combine two elements in a list based on a given condition.
For example if I encounter the character 'a' in a list, I would like to combine it with the next element. The list:
['a', 'b', 'c', 'a', 'd']
becomes
['ab', 'c', 'ad']
Is there any quick way to do this?
One solution I have thought of is to create a new empty list and iterate through the first list. As we encounter the element 'a' in list 1, we join list1[index of a] and list1[index of a + 1] and append the result to list 2. However I wanted to know if there is any way to do it without creating a new list and copying values into it.
This does not create a new list, just modifies the existing one.
l = ['a', 'b', 'c', 'a', 'd']
for i in range(len(l)-2, -1, -1):
if l[i] == 'a':
l[i] = l[i] + l.pop(i+1)
print(l)
If you don't want to use list comprehension to create a new list (maybe because your input list is huge) you could modify the list in-place:
i=0
while i < len(l):
if l[i]=='a':
l[i] += l.pop(i+1)
i += 1
Use a list comprehension with an iterator on your list. When the current iteratee is a simply join it with the next item from the iterator using next:
l = ['a', 'b', 'c', 'a', 'd']
it = iter(l)
l[:] = [i+next(it) if i == 'a' else i for i in it]
print l
# ['ab', 'c', 'ad']
Well, if you don't want to create a new list so much, here we go:
from itertools import islice
a = list("abcdabdbac")
i = 0
for x, y in zip(a, islice(a, 1, None)):
if x == 'a':
a[i] = x + y
i += 1
elif y != 'a':
a[i] = y
i += 1
try:
del a[i:]
except:
pass
you could use itertools.groupby and group by:
letter follows a or
letter is not a
using enumerate to generate the current index, which allows to fetch the previous element from the list (creating a new list but one-liner)
import itertools
l = ['a', 'b', 'c', 'a', 'd']
print(["".join(x[1] for x in v) for _,v in itertools.groupby(enumerate(l),key=lambda t: (t[0] > 0 and l[t[0]-1]=='a') or t[1]=='a')])
result:
['ab', 'c', 'ad']
This is easy way. Mb not pythonic way.
l1 = ['a', 'b', 'c', 'a', 'd']
do_combine = False
combine_element = None
for el in l1:
if do_combine:
indx = l1.index(el)
l1[indx] = combine_element + el
do_combine = False
l1.remove(combine_element)
if el == 'a':
combine_element = el
do_combine = True
print(l1)
# ['ab', 'c', 'ad']

Separating a String

Given a string, I want to generate all possible combinations. In other words, all possible ways of putting a comma somewhere in the string.
For example:
input: ["abcd"]
output: ["abcd"]
["abc","d"]
["ab","cd"]
["ab","c","d"]
["a","bc","d"]
["a","b","cd"]
["a","bcd"]
["a","b","c","d"]
I am a bit stuck on how to generate all the possible lists. Combinations will just give me lists with length of subset of the set of strings, permutations will give all possible ways to order.
I can make all the cases with only one comma in the list because of iterating through the slices, but I can't make cases with two commas like "ab","c","d" and "a","b","cd"
My attempt w/slice:
test="abcd"
for x in range(len(test)):
print test[:x],test[x:]
How about something like:
from itertools import combinations
def all_splits(s):
for numsplits in range(len(s)):
for c in combinations(range(1,len(s)), numsplits):
split = [s[i:j] for i,j in zip((0,)+c, c+(None,))]
yield split
after which:
>>> for x in all_splits("abcd"):
... print(x)
...
['abcd']
['a', 'bcd']
['ab', 'cd']
['abc', 'd']
['a', 'b', 'cd']
['a', 'bc', 'd']
['ab', 'c', 'd']
['a', 'b', 'c', 'd']
You can certainly use itertools for this, but I think it's easier to write a recursive generator directly:
def gen_commas(s):
yield s
for prefix_len in range(1, len(s)):
prefix = s[:prefix_len]
for tail in gen_commas(s[prefix_len:]):
yield prefix + "," + tail
Then
print list(gen_commas("abcd"))
prints
['abcd', 'a,bcd', 'a,b,cd', 'a,b,c,d', 'a,bc,d', 'ab,cd', 'ab,c,d', 'abc,d']
I'm not sure why I find this easier. Maybe just because it's dead easy to do it directly ;-)
You could generate the power set of the n - 1 places that you could put commas:
what's a good way to combinate through a set?
and then insert commas in each position.
Using itertools:
import itertools
input_str = "abcd"
for k in range(1,len(input_str)):
for subset in itertools.combinations(range(1,len(input_str)), k):
s = list(input_str)
for i,x in enumerate(subset): s.insert(x+i, ",")
print "".join(s)
Gives:
a,bcd
ab,cd
abc,d
a,b,cd
a,bc,d
ab,c,d
a,b,c,d
Also a recursive version:
def commatoze(s,p=1):
if p == len(s):
print s
return
commatoze(s[:p] + ',' + s[p:], p + 2)
commatoze(s, p + 1)
input_str = "abcd"
commatoze(input_str)
You can solve the integer composition problem and use the compositions to guide where to split the list. Integer composition can be solved fairly easily with a little bit of dynamic programming.
def composition(n):
if n == 1:
return [[1]]
comp = composition (n - 1)
return [x + [1] for x in comp] + [y[:-1] + [y[-1]+1] for y in comp]
def split(lst, guide):
ret = []
total = 0
for g in guide:
ret.append(lst[total:total+g])
total += g
return ret
lst = list('abcd')
for guide in composition(len(lst)):
print split(lst, guide)
Another way to generate integer composition:
from itertools import groupby
def composition(n):
for i in xrange(2**(n-1)):
yield [len(list(group)) for _, group in groupby('{0:0{1}b}'.format(i, n))]
Given
import more_itertools as mit
Code
list(mit.partitions("abcd"))
Output
[[['a', 'b', 'c', 'd']],
[['a'], ['b', 'c', 'd']],
[['a', 'b'], ['c', 'd']],
[['a', 'b', 'c'], ['d']],
[['a'], ['b'], ['c', 'd']],
[['a'], ['b', 'c'], ['d']],
[['a', 'b'], ['c'], ['d']],
[['a'], ['b'], ['c'], ['d']]]
Install more_itertools via > pip install more-itertools.

list comprehension question

Is there a way to add multiple items to a list in a list comprehension per iteration? For example:
y = ['a', 'b', 'c', 'd']
x = [1,2,3]
return [x, a for a in y]
output: [[1,2,3], 'a', [1,2,3], 'b', [1,2,3], 'c', [1,2,3], 'd']
sure there is, but not with a plain list comprehension:
EDIT: Inspired by another answer:
y = ['a', 'b', 'c', 'd']
x = [1,2,3]
return sum([[x, a] for a in y],[])
How it works: sum will add a sequence of anythings, so long as there is a __add__ member to do the work. BUT, it starts of with an initial total of 0. You can't add 0 to a list, but you can give sum() another starting value. Here we use an empty list.
If, instead of needing an actual list, you wanted just a generator, you can use itertools.chain.from_iterable, which just strings a bunch of iterators into one long iterator.
from itertools import *
return chain.from_iterable((x,a) for a in y)
or an even more itertools friendly:
return itertools.chain.from_iterable(itertools.izip(itertools.repeat(x),y))
There are other ways, too, of course: To start with, we can improve Adam Rosenfield's answer by eliminating an unneeded lambda expression:
return reduce(list.__add__,([x, a] for a in y))
since list already has a member that does exactly what we need. We could achieve the same using map and side effects in list.extend:
l = []
map(l.extend,[[x, a] for a in y])
return l
Finally, lets go for a pure list comprehension that is as inelegant as possible:
return [ y[i/2] if i%2 else x for i in range(len(y)*2)]
Here's one way:
y = ['a', 'b', 'c', 'd']
x = [1,2,3]
return reduce(lambda a,b:a+b, [[x,a] for a in y])
x = [1,2,3]
y = ['a', 'b', 'c', 'd']
z = []
[z.extend([x, a]) for a in y]
(The correct value will be in z)

Categories

Resources