Find all combinations of each individual list element - python

Given the following list
myList = ['A' , 'B' , 'C, D' , 'E, F, G', 'H' , 'I']
How do I go about getting every possible combination for each element in the list that has more than 2 characters. I also do not want to get combinations of all of the elements together if that makes sense.
An example output using the above list would look like below:
myList = ['A' , 'B' , 'C, D' , 'E, F' , 'E, G' , 'F, G' , 'H' , 'I']
Note: I only care about finding the combinations of each element that has more than two characters.
I have attempted using a few times using itertools but that seems to want to find all possible combinations of ALL elements in the list, as opposed to combinations of the individual parts.
for L in range(0, len(myList)+1):
for subset in itertools.combinations(myList, L):
print(subset)

Use itertools combinations on only those elements that have more than 2 letters after splitting.
import itertools
myList = ['A' , 'B' , 'C, D' , 'E, F, G', 'H' , 'I']
result = []
for item in myList:
item_split = item.split(',') #split each item on , separator
if len(item_split) <= 2:
result.append(item)
else: #more than 2 items after splitting. use combinations
result.extend(",".join(pair) for pair in itertools.combinations(item_split, 2))
print(result)
#Output:
['A', 'B', 'C, D', 'E, F', 'E, G', ' F, G', 'H', 'I']

Similar to Paritosh Singh's answer, but with more parentheses :)
from operator import methodcaller
from itertools import chain, combinations
sep = ', '
splitter = methodcaller('split', sep)
def pairs(x):
return combinations(x, 2 if len(x) > 1 else 1)
joiner = sep.join
result = list(map(joiner,
chain.from_iterable(map(pairs,
map(splitter,
my_list)))))
[DIGRESSION ALERT]
... which arguably reads a little better if you use Coconut:
from itertools import chain, combinations
my_list = ['A' , 'B' , 'C, D' , 'E, F, G', 'H' , 'I']
my_result = (my_list
|> split_each
|> pairs
|> chain.from_iterable
|> join_each
|> list
)
where:
split_each = map$(.split(", "))
pairs = map$((x) -> combinations(x, 2 if len(x) > 1 else 1))
join_each = map$(", ".join)

Related

How to get certain number of alphabets from a list?

I have a 26-digit list. I want to print out a list of alphabets according to the numbers. For example, I have a list(consisting of 26-numbers from input):
[0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
I did like the output to be like this:
[e,e,l,s]
'e' is on the output 2-times because on the 4-th index it is the 'e' according to the English alphabet formation and the digit on the 4-th index is 2. It's the same for 'l' since it is on the 11-th index and it's digit is 1. The same is for s. The other letters doesn't appear because it's digits are zero.
For example, I give another 26-digit input. Like this:
[1,2,2,3,4,0,3,4,4,1,3,1,4,4,1,0,0,0,0,0,4,2,3,2,2,1]
The output should be:
[a,b,b,c,c,d,d,d,e,e,e,e,g,g,g,h,h,h,h,i,i,i,i,j,k,k,k,l,m,m,m,m,n,n,n,n,o,u,u,u,u,v,v,w,w,w,x,x,y,y,z]
Is, there any possible to do this in Python 3?
You can use chr(97 + item_index) to get the respective items and then multiply by the item itself:
In [40]: [j * chr(97 + i) for i, j in enumerate(lst) if j]
Out[40]: ['ee', 'l', 's']
If you want them separate you can utilize itertools module:
In [44]: from itertools import repeat, chain
In [45]: list(chain.from_iterable(repeat(chr(97 + i), j) for i, j in enumerate(lst) if j))
Out[45]: ['e', 'e', 'l', 's']
Yes, it is definitely possible in Python 3.
Firstly, define an example list (as you did) of numbers and an empty list to store the alphabetical results.
The actual logic to link with the index is using chr(97 + index), ord("a") = 97 therefore, the reverse is chr(97) = a. First index is 0 so 97 remains as it is and as it iterates the count increases and your alphabets too.
Next, a nested for-loop to iterate over the list of numbers and then another for-loop to append the same alphabet multiple times according to the number list.
We could do this -> result.append(chr(97 + i) * my_list[i]) in the first loop itself but it wouldn't yield every alphabet separately [a,b,b,c,c,d,d,d...] rather it would look like [a,bb,cc,ddd...].
my_list = [1,2,2,3,4,0,3,4,4,1,3,1,4,4,1,0,0,0,0,0,4,2,3,2,2,1]
result = []
for i in range(len(my_list)):
if my_list[i] > 0:
for j in range(my_list[i]):
result.append(chr(97 + i))
else:
pass
print(result)
An alternative to the wonderful answer by #Kasramvd
import string
n = [0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
res = [i * c for i, c in zip(n, string.ascii_lowercase) if i]
print(res) # -> ['ee', 'l', 's']
Your second example produces:
['a', 'bb', 'cc', 'ddd', 'eeee', 'ggg', 'hhhh', 'iiii', 'j', 'kkk', 'l', 'mmmm', 'nnnn', 'o', 'uuuu', 'vv', 'www', 'xx', 'yy', 'z']
Splitting the strings ('bb' to 'b', 'b') can be done with the standard schema:
[x for y in something for x in y]
Using a slightly different approach, which gives the characters individually as in your example:
import string
a = [0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
alphabet_lookup = np.repeat(np.arange(len(a)), a)
letter_lookup = np.array(list(string.ascii_lowercase))
res = letter_lookup[alphabet_lookup]
print(res)
To get
['e' 'e' 'l' 's']

how to turn a string of letters embedded in squared brackets into embedded lists

I'm trying to find a simple way to convert a string like this:
a = '[[a b] [c d]]'
into the corresponding nested list structure, where the letters are turned into strings:
a = [['a', 'b'], ['c', 'd']]
I tried to use
import ast
l = ast.literal_eval('[[a b] [c d]]')
l = [i.strip() for i in l]
as found here
but it doesn't work because the characters a,b,c,d are not within quotes.
in particular I'm looking for something that turns:
'[[X v] -s]'
into:
[['X', 'v'], '-s']
You can use regex to find all items between brackets then split the result :
>>> [i.split() for i in re.findall(r'\[([^\[\]]+)\]',a)]
[['a', 'b'], ['c', 'd']]
The regex r'\[([^\[\]]+)\]' will match anything between square brackets except square brackets,which in this case would be 'a b' and 'c d' then you can simply use a list comprehension to split the character.
Note that this regex just works for the cases like this, which all the characters are between brackets,and for another cases you can write the corresponding regex, also not that the regex tick won't works in all cases .
>>> a = '[[a b] [c d] [e g]]'
>>> [i.split() for i in re.findall(r'\[([^\[\]]+)\]',a)]
[['a', 'b'], ['c', 'd'], ['e', 'g']]
Use isalpha method of string to wrap all characters into brackets:
a = '[[a b] [c d]]'
a = ''.join(map(lambda x: '"{}"'.format(x) if x.isalpha() else x, a))
Now a is:
'[["a" "b"] ["c" "d"]]'
And you can use json.loads (as #a_guest offered):
json.loads(a.replace(' ', ','))
>>> import json
>>> a = '[[a b] [c d]]'
>>> a = ''.join(map(lambda x: '"{}"'.format(x) if x.isalpha() else x, a))
>>> a
'[["a" "b"] ["c" "d"]]'
>>> json.loads(a.replace(' ', ','))
[[u'a', u'b'], [u'c', u'd']]
This will work with any degree of nested lists following the above pattern, e.g.
>>> a = '[[[a b] [c d]] [[e f] [g h]]]'
>>> ...
>>> json.loads(a.replace(' ', ','))
[[[u'a', u'b'], [u'c', u'd']], [[u'e', u'f'], [u'g', u'h']]]
For the specific example of '[[X v] -s]':
>>> import json
>>> a = '[[X v] -s]'
>>> a = ''.join(map(lambda x: '"{}"'.format(x) if x.isalpha() or x=='-' else x, a))
>>> json.loads(a.replace('[ [', '[[').replace('] ]', ']]').replace(' ', ',').replace('][', '],[').replace('""',''))
[[u'X', u'v'], u'-s']

Read all possible sequential substrings in Python

If I have a list of letters, such as:
word = ['W','I','N','E']
and need to get every possible sequence of substrings, of length 3 or less, e.g.:
W I N E, WI N E, WI NE, W IN E, WIN E etc.
What is the most efficient way to go about this?
Right now, I have:
word = ['W','I','N','E']
for idx,phon in enumerate(word):
phon_seq = ""
for p_len in range(3):
if idx-p_len >= 0:
phon_seq = " ".join(word[idx-(p_len):idx+1])
print(phon_seq)
This just gives me the below, rather than the sub-sequences:
W
I
W I
N
I N
W I N
E
N E
I N E
I just can't figure out how to create every possible sequence.
Try this recursive algorithm:
def segment(word):
def sub(w):
if len(w) == 0:
yield []
for i in xrange(1, min(4, len(w) + 1)):
for s in sub(w[i:]):
yield [''.join(w[:i])] + s
return list(sub(word))
# And if you want a list of strings:
def str_segment(word):
return [' '.join(w) for w in segment(word)]
Output:
>>> segment(word)
[['W', 'I', 'N', 'E'], ['W', 'I', 'NE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'N', 'E'], ['WI', 'NE'], ['WIN', 'E']]
>>> str_segment(word)
['W I N E', 'W I NE', 'W IN E', 'W INE', 'WI N E', 'WI NE', 'WIN E']
As there can either be a space or not in each of three positions (after W, after I and after N), you can think of this as similar to bits being 1 or 0 in a binary representation of a number ranging from 1 to 2^3 - 1.
input_word = "WINE"
for variation_number in xrange(1, 2 ** (len(input_word) - 1)):
output = ''
for position, letter in enumerate(input_word):
output += letter
if variation_number >> position & 1:
output += ' '
print output
Edit: To include only variations with sequences of 3 characters or less (in the general case where input_word may be longer than 4 characters), we can exclude cases where the binary representation contains 3 zeroes in a row. (We also start the range from a higher number in order to exclude the cases which would have 000 at the beginning.)
for variation_number in xrange(2 ** (len(input_word) - 4), 2 ** (len(input_word) - 1)):
if not '000' in bin(variation_number):
output = ''
for position, letter in enumerate(input_word):
output += letter
if variation_number >> position & 1:
output += ' '
print output
My implementation for this problem.
#!/usr/bin/env python
# this is a problem of fitting partitions in the word
# we'll use itertools to generate these partitions
import itertools
word = 'WINE'
# this loop generates all possible partitions COUNTS (up to word length)
for partitions_count in range(1, len(word)+1):
# this loop generates all possible combinations based on count
for partitions in itertools.combinations(range(1, len(word)), r=partitions_count):
# because of the way python splits words, we only care about the
# difference *between* partitions, and not their distance from the
# word's beginning
diffs = list(partitions)
for i in xrange(len(partitions)-1):
diffs[i+1] -= partitions[i]
# first, the whole word is up for taking by partitions
splits = [word]
# partition the word's remainder (what was not already "taken")
# with each partition
for p in diffs:
remainder = splits.pop()
splits.append(remainder[:p])
splits.append(remainder[p:])
# print the result
print splits
As an alternative answer , you can do it with itertools module and use groupby function for grouping your list and also i use combination to create a list of pair index for grouping key : (i<=word.index(x)<=j) and at last use set for get a unique list .
Also note that you can got a unique combination of pair index at first by this method that when you have pairs like (i1,j1) and (i2,j2) if i1==0 and j2==3 and j1==i2 like (0,2) and (2,3) it mean that those slices result are same you need to remove one of them.
All in one list comprehension :
subs=[[''.join(i) for i in j] for j in [[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in list(combinations(range(len(word)),2))]]
set([' '.join(j) for j in subs]) # set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE'])
Demo in details :
>>> cl=list(combinations(range(len(word)),2))
>>> cl
[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
>>> new_l=[[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in cl]
>>> new_l
[[['W', 'I'], ['N', 'E']], [['W', 'I', 'N'], ['E']], [['W', 'I', 'N', 'E']], [['W'], ['I', 'N'], ['E']], [['W'], ['I', 'N', 'E']], [['W', 'I'], ['N', 'E']]]
>>> last=[[''.join(i) for i in j] for j in new_l]
>>> last
[['WI', 'NE'], ['WIN', 'E'], ['WINE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'NE']]
>>> set([' '.join(j) for j in last])
set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE'])
>>> for i in set([' '.join(j) for j in last]):
... print i
...
WIN E
W IN E
W INE
WI NE
WINE
>>>
i think it can be like this:
word = "ABCDE"
myList = []
for i in range(1, len(word)+1,1):
myList.append(word[:i])
for j in range(len(word[len(word[1:]):]), len(word)-len(word[i:]),1):
myList.append(word[j:i])
print(myList)
print(sorted(set(myList), key=myList.index))
return myList

How to split a string into characters in python

I have a string 'ABCDEFG'
I want to be able to list each character sequentially followed by the next one.
Example
A B
B C
C D
D E
E F
F G
G
Can you tell me an efficient way of doing this? Thanks
In Python, a string is already seen as an enumerable list of characters, so you don't need to split it; it's already "split". You just need to build your list of substrings.
It's not clear what form you want the result in. If you just want substrings, this works:
s = 'ABCDEFG'
[s[i:i+2] for i in range(len(s))]
#=> ['AB', 'BC', 'CD', 'DE', 'EF', 'FG', 'G']
If you want the pairs to themselves be lists instead of strings, just call list on each one:
[list([s[i:i+2]) for i in range(len(s))]
#=> [['A', 'B'], ['B', 'C'], ['C', 'D'], ['D', 'E'], ['E', 'F'], ['F', 'G'], ['G']]
And if you want strings after all, but with something like a space between the letters, join them back together after the list call:
[' '.join(list(s[i:i+2])) for i in range(len(s))]
#=> ['A B', 'B C', 'C D', 'D E', 'E F', 'F G', 'G']
You need to keep the last character, so use izip_longest from itertools
>>> import itertools
>>> s = 'ABCDEFG'
>>> for c, cnext in itertools.izip_longest(s, s[1:], fillvalue=''):
... print c, cnext
...
A B
B C
C D
D E
E F
F G
G
def doit(input):
for i in xrange(len(input)):
print input[i] + (input[i + 1] if i != len(input) - 1 else '')
doit("ABCDEFG")
Which yields:
>>> doit("ABCDEFG")
AB
BC
CD
DE
EF
FG
G
There's an itertools pairwise recipe for exactly this use case:
import itertools
def pairwise(myStr):
a,b = itertools.tee(myStr)
next(b,None)
for s1,s2 in zip(a,b):
print(s1,s2)
Output:
In [121]: pairwise('ABCDEFG')
A B
B C
C D
D E
E F
F G
Your problem is that you have a list of strings, not a string:
with open('ref.txt') as f:
f1 = f.read().splitlines()
f.read() returns a string. You call splitlines() on it, getting a list of strings (one per line). If your input is actually 'ABCDEFG', this will of course be a list of one string, ['ABCDEFG'].
l = list(f1)
Since f1 is already a list, this just makes l a duplicate copy of that list.
print l, f1, len(l)
And this just prints the list of lines, and the copy of the list of lines, and the number of lines.
So, first, what happens if you drop the splitlines()? Then f1 will be the string 'ABCDEFG', instead of a list with that one string. That's a good start. And you can drop the l part entirely, because f1 is already an iterable of its characters; list(f1) will just be a different iterable of the same characters.
So, now you want to print each letter with the next letter. One way to do that is by zipping 'ABCDEFG' and 'BCDEFG '. But how do you get that 'BCDEFG '? Simple; it's just f1[1:] + ' '.
So:
with open('ref.txt') as f:
f1 = f.read()
for left, right in zip(f1, f1[1:] + ' '):
print left, right
Of course for something this simple, there are many other ways to do the same thing. You can iterate over range(len(f1)) and get 2-element slices, or you can use itertools.zip_longest, or you can write a general-purpose "overlapping adjacent groups of size N from any iterable" function out of itertools.tee and zip, etc.
As you want space between the characters you can use zip function and list comprehension :
>>> s="ABCDEFG"
>>> l=[' '.join(i) for i in zip(s,s[1:])]
['A B', 'B C', 'C D', 'D E', 'E F', 'F G']
>>> for i in l:
... print i
...
A B
B C
C D
D E
E F
F G
if you dont want space just use list comprehension :
>>> [s[i:i+2] for i in range(len(s))]
['AB', 'BC', 'CD', 'DE', 'EF', 'FG', 'G']

Splitting a list based on a delimiter word

I have a list containing various string values. I want to split the list whenever I see WORD. The result will be a list of lists (which will be the sublists of original list) containing exactly one instance of the WORD I can do this using a loop but is there a more pythonic way to do achieve this ?
Example = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
result = [['A'], ['WORD','B','C'],['WORD','D']]
This is what I have tried but it actually does not achieve what I want since it will put WORD in a different list that it should be in:
def split_excel_cells(delimiter, cell_data):
result = []
temp = []
for cell in cell_data:
if cell == delimiter:
temp.append(cell)
result.append(temp)
temp = []
else:
temp.append(cell)
return result
import itertools
lst = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
w = 'WORD'
spl = [list(y) for x, y in itertools.groupby(lst, lambda z: z == w) if not x]
this creates a splitted list without delimiters, which looks more logical to me:
[['A'], ['B', 'C'], ['D']]
If you insist on delimiters to be included, this should do the trick:
spl = [[]]
for x, y in itertools.groupby(lst, lambda z: z == w):
if x: spl.append([])
spl[-1].extend(y)
I would use a generator:
def group(seq, sep):
g = []
for el in seq:
if el == sep:
yield g
g = []
g.append(el)
yield g
ex = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
result = list(group(ex, 'WORD'))
print(result)
This prints
[['A'], ['WORD', 'B', 'C'], ['WORD', 'D']]
The code accepts any iterable, and produces an iterable (which you don't have to flatten into a list if you don't want to).
#NPE's solution looks very pythonic to me. This is another one using itertools:
izip is specific to python 2.7. Replace izip with zip to work in python 3
from itertools import izip, chain
example = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
indices = [i for i,x in enumerate(example) if x=="WORD"]
pairs = izip(chain([0], indices), chain(indices, [None]))
result = [example[i:j] for i, j in pairs]
This code is mainly based on this answer.
Given
import more_itertools as mit
iterable = ["A", "WORD", "B" , "C" , "WORD" , "D"]
pred = lambda x: x == "WORD"
Code
list(mit.split_before(iterable, pred))
# [['A'], ['WORD', 'B', 'C'], ['WORD', 'D']]
more_itertools is a third-party library installable via > pip install more_itertools.
See also split_at and split_after.

Categories

Resources