Python permutations - python

I am trying to generate pandigital numbers using the itertools.permutations function, but whenever I do it generates them as a list of separate digits, which is not what I want.
For example:
for x in itertools.permutations("1234"):
print(x)
will produce:
('1', '2', '3', '4')
('1', '2', '4', '3')
('1', '3', '2', '4')
('1', '3', '4', '2')
('1', '4', '2', '3')
('1', '4', '3', '2'), etc.
whereas I want it to return 1234, 1243, 1324, 1342, 1423, 1432, etc. How would I go about doing this in an optimal fashion?

A list comprehension with the built-in str.join() function is what you need:
import itertools
a = [''.join(i) for i in itertools.permutations("1234") ]
print(a)
Output:
['1234', '1243', '1324', '1342', '1423', '1432', '2134', '2143', '2314', '2341', '2413', '2431', '3124', '3142', '3214', '3241', '3412', '3421', '4123', '4132', '4213', '4231', '4312', '4321']

itertools.permutations takes an iterable and returns an iterator yielding tuples.
Use join() that return a string which is the concatenation of the strings in the iterable iterable
join() DOCS,
itertools.permutations DOCS
Use this:
import itertools
for x in itertools.permutations("1234"):
print (''.join(x))
Output:
1234
1243
1324
1342
1423
1432
2134
2143
2314
2341
....

see itertools.permutations return tuple.
see join function:
In [1]: ''.join(('1','2','3'))
Out[1]: '123'
try this:
for x in itertools.permutations("1234"):
print ''.join(x)

Related

How to convert dataframe into list of tuples with respect to category?

I have a following problem. I would like to convert dataframe into list of tuples based on a category. See simple code below:
data = {'product_id': ['5', '7', '8', '5', '30'], 'id_customer': ['1', '1', '1', '3', '3']}
df = pd.DataFrame.from_dict(data)
#desired output is:
result = [('5', '7', '8'), ('5', '30')]
how can I do it please? This question did not help me: Convert pandas dataframe into a list of unique tuple
Use GroupBy.agg with tuple like:
print (df.groupby('id_customer', sort=False)['product_id'].agg(tuple).tolist())
print (df.groupby('id_customer', sort=False)['product_id'].apply(tuple).tolist())
print (list(df.groupby('id_customer', sort=False)['product_id'].agg(tuple)))
print (list(df.groupby('id_customer', sort=False)['product_id'].apply(tuple)))
[('5', '7', '8'), ('5', '30')]
Use groupby.agg:
>>> [tuple(v) for _, v in df.groupby('id_customer')['product_id']]
[('5', '7', '8'), ('5', '30')]
>>>

How Would You Recursively Get All The Combinations Of Items Within A List

In my attempt, I have a list: "stuff" that is supposed to be iterated over recursively to find all possible combinations. It does this by trying to recurse on all items except the first, trying to recurse on all items except the [1] index (rollover), and finally iterating over all items except the [2] index (ollie).
stuff = ['1','2','3','4','5']
def rollOver(aL):
neuList = []
neuList.append(aL[0])
neuList.extend(aL[2:])
return neuList
def ollie(aL):
neuList = []
neuList.extend(aL[0:1])
neuList.extend(aL[3:])
return neuList
def recurse(info):
try:
if len(info) == 3:
print(info)
if len(info) > 1:
recurse(info[1:])
recurse(rollOver(info))
recurse(ollie(info))
except:
l = 0
recurse(stuff)
I manually tried this method on paper and it seemed to work. However, in the code I get the results:
['3', '4', '5']
['2', '4', '5']
['3', '4', '5']
['1', '4', '5']
['1', '4', '5']
1, 3, 5 should be a listed possibility, but it doesn't show up, which leads me to think I've done something wrong.
One way to do this is via the itertools package:
from itertools import combinations
stuff = ['1','2','3','4','5']
for i in combinations(stuff, 3):
print(i)
Which gives you the desired output:
('1', '2', '3')
('1', '2', '4')
('1', '2', '5')
('1', '3', '4')
('1', '3', '5')
('1', '4', '5')
('2', '3', '4')
('2', '3', '5')
('2', '4', '5')
('3', '4', '5')
Alternatively, if you want to code this yourself in a recursive fashion, you could implement your own function as follows:
def combs(stuff):
if len(stuff) == 0:
return [[]]
cs = []
for c in combs(stuff[1:]):
cs += [c, c+[stuff[0]]]
return cs
I'll leave it to you to edit this function to only return results of a given size.

Regex for split or findall each digit python

What is the best solution to split this str var into a continuous number list
My solution :
>>> str
> '2223334441214844'
>>> filter(None, re.split("(0+)|(1+)|(2+)|(3+)|(4+)|(5+)|(6+)|(7+)|(8+)|(9+)", str))
> ['222', '333', '444', '1', '2', '1', '4', '8', '44']
The more flexible way would be to use itertools.groupby which is made to match consecutive groups in iterables:
>>> s = '2223334441214844'
>>> import itertools
>>> [''.join(group) for key, group in itertools.groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
The key would be the single key that is being grouped on (in your case, the digit). And the group is an iterable of all the items in the group. Since the source iterable is a string, each item is a character, so in order to get back the fully combined group, we need to join the characters back together.
You could also repeat the key for the length of the group to get this output:
>>> [key * len(list(group)) for key, group in itertools.groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
If you wanted to use regular expressions, you could make use of backreferences to find consecutive characters without having to specify them explicitly:
>>> re.findall('((.)\\2*)', s)
[('222', '2'), ('333', '3'), ('444', '4'), ('1', '1'), ('2', '2'), ('1', '1'), ('4', '4'), ('8', '8'), ('44', '4')]
For finding consecutive characters in a string, this is essentially the same that groupby will do. You can then filter out the combined match to get the desired result:
>>> [x for x, *_ in re.findall('((.)\\2*)', s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
One solution without regex (that is not specific to digits) would be to use itertools.groupby():
>>> from itertools import groupby
>>> s = '2223334441214844'
>>> [''.join(g) for _, g in groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
If you only need to extract consecutive identical digits, you may use a matching approach using r'(\d)\1*' regex:
import re
s='2223334441214844'
print([x.group() for x in re.finditer(r'(\d)\1*', s)])
# => ['222', '333', '444', '1', '2', '1', '4', '8', '44']
See the Python demo
Here,
(\d) - matches and captures into Group 1 any digit
\1* - a backreference to Group 1 matching the same value, 0+ repetitions.
This solution can be customized to match any specific consecutive chars (instead of \d, you may use \S - non-whitespace, \w - word, [a-fA-F] - a specific set, etc.). If you replace \d with . and use re.DOTALL modifier, it will work as the itertools solutions posted above.
Use a capture group and backreference.
str = '2223334441214844'
import re
print([i[0] for i in re.findall(r'((\d)\2*)', str)])
\2 matches whatever the (\d) capture group matched. The list comprehension is needed because when the RE contains capture groups, findall returns a list of the capture groups, not the whole match. So we need an extra group to get the whole match, and then need to extract that group from the result.
What about without importing any external module ?
You can create your own logic in pure python without importing any module Here is recursive approach,
string_1='2223334441214844'
list_2=[i for i in string_1]
def con(list_1):
group = []
if not list_1:
return 0
else:
track=list_1[0]
for j,i in enumerate(list_1):
if i==track[0]:
group.append(i)
else:
print(group)
return con(list_1[j:])
return group
print(con(list_2))
output:
['2', '2', '2']
['3', '3', '3']
['4', '4', '4']
['1']
['2']
['1']
['4']
['8']
['4', '4']

How to turn a list of integers into a string of numbers?

I'd like to turn
[1,2,3,4,5,4,3,2,1]
into
['1','2','3','4','5','4','3','2','1']
in preparation of doing FreqDist.
Use the built-in map function when casting each item in the list:
newList = map(str, oldList)
Just use a simple list comprehension
>>> myar=[1,2,3,4,5,4,3,2,1]
>>> [str(i) for i in myar]
['1', '2', '3', '4', '5', '4', '3', '2', '1']

Create strings with all possible combinations

I am using a OCR algorithm (tesseract based) which has difficulties with recognizing certain characters. I have partially solved that by creating my own "post-processing hash-table" which includes pairs of characters. For example, since the text is just numbers, I have figured out that if there is Q character inside the text, it should be 9 instead.
However I have a more serious problem with 6 and 8 characters since both of them are recognized as B. Now since I know what I am looking for (when I am translating the image to text) and the strings are fairly short (6~8 digits), I thought to create strings with all possible combinations of 6 and 8 and compare each one of them to the one I am looking for.
So for example, I have the following string recognized by the OCR:
L0B7B0B5
So each B here can be 6 or 8.
Now I want to generate a list like the below:
L0878085
L0878065
L0876085
L0876065
.
.
So it's kind of binary table with 3 digits and in this case there are 8 options. But the amount of B characters in string can be other than 3 (it can be any number).
I have tried to use Python itertools module with something like that:
list(itertools.product(*["86"] * 3))
Which will provide the following result:
[('8', '8', '8'), ('8', '8', '6'), ('8', '6', '8'), ('8', '6', '6'), ('6', '8', '8'), ('6', '8', '6'), ('6', '6', '8'), ('6', '6', '6')]
which I assume I can then later use to swap B characters. However, for some reason I can't make itertools work in my environment. I assume it has something to do the fact I am using Jython and not pure Python.
I will be happy to hear any other ideas as how to complete this task. Maybe there is a simpler solution I didn't think of?
itertools.product accepts a repeat keyword that you can use:
In [92]: from itertools import product
In [93]: word = "L0B7B0B5"
In [94]: subs = product("68", repeat=word.count("B"))
In [95]: list(subs)
Out[95]:
[('6', '6', '6'),
('6', '6', '8'),
('6', '8', '6'),
('6', '8', '8'),
('8', '6', '6'),
('8', '6', '8'),
('8', '8', '6'),
('8', '8', '8')]
Then one fairly concise method to make the substitutions is to do a reduction operation with the string replace method:
In [97]: subs = product("68", repeat=word.count("B"))
In [98]: [reduce(lambda s, c: s.replace('B', c, 1), sub, word) for sub in subs]
Out[98]:
['L0676065',
'L0676085',
'L0678065',
'L0678085',
'L0876065',
'L0876085',
'L0878065',
'L0878085']
Another method, using a couple more functions from itertools:
In [90]: from itertools import chain, izip_longest
In [91]: subs = product("68", repeat=word.count("B"))
In [92]: [''.join(chain(*izip_longest(word.split('B'), sub, fillvalue=''))) for sub in subs]
Out[92]:
['L0676065',
'L0676085',
'L0678065',
'L0678085',
'L0876065',
'L0876085',
'L0878065',
'L0878085']
Here simple recursive function for generating your strings : - (It is a pseudo code)
permut(char[] original,char buff[],int i) {
if(i<original.length) {
if(original[i]=='B') {
buff[i] = '6'
permut(original,buff,i+1)
buff[i] = '8'
permut(original,buff,i+1)
}
else if(original[i]=='Q') {
buff[i] = '9'
permut(original,buff,i+1)
}
else {
buff[i] = ch[i];
permut(original,buff,i+1)
}
}
else {
store buff[]
}
}

Categories

Resources