Create strings with all possible combinations - python

I am using a OCR algorithm (tesseract based) which has difficulties with recognizing certain characters. I have partially solved that by creating my own "post-processing hash-table" which includes pairs of characters. For example, since the text is just numbers, I have figured out that if there is Q character inside the text, it should be 9 instead.
However I have a more serious problem with 6 and 8 characters since both of them are recognized as B. Now since I know what I am looking for (when I am translating the image to text) and the strings are fairly short (6~8 digits), I thought to create strings with all possible combinations of 6 and 8 and compare each one of them to the one I am looking for.
So for example, I have the following string recognized by the OCR:
L0B7B0B5
So each B here can be 6 or 8.
Now I want to generate a list like the below:
L0878085
L0878065
L0876085
L0876065
.
.
So it's kind of binary table with 3 digits and in this case there are 8 options. But the amount of B characters in string can be other than 3 (it can be any number).
I have tried to use Python itertools module with something like that:
list(itertools.product(*["86"] * 3))
Which will provide the following result:
[('8', '8', '8'), ('8', '8', '6'), ('8', '6', '8'), ('8', '6', '6'), ('6', '8', '8'), ('6', '8', '6'), ('6', '6', '8'), ('6', '6', '6')]
which I assume I can then later use to swap B characters. However, for some reason I can't make itertools work in my environment. I assume it has something to do the fact I am using Jython and not pure Python.
I will be happy to hear any other ideas as how to complete this task. Maybe there is a simpler solution I didn't think of?

itertools.product accepts a repeat keyword that you can use:
In [92]: from itertools import product
In [93]: word = "L0B7B0B5"
In [94]: subs = product("68", repeat=word.count("B"))
In [95]: list(subs)
Out[95]:
[('6', '6', '6'),
('6', '6', '8'),
('6', '8', '6'),
('6', '8', '8'),
('8', '6', '6'),
('8', '6', '8'),
('8', '8', '6'),
('8', '8', '8')]
Then one fairly concise method to make the substitutions is to do a reduction operation with the string replace method:
In [97]: subs = product("68", repeat=word.count("B"))
In [98]: [reduce(lambda s, c: s.replace('B', c, 1), sub, word) for sub in subs]
Out[98]:
['L0676065',
'L0676085',
'L0678065',
'L0678085',
'L0876065',
'L0876085',
'L0878065',
'L0878085']
Another method, using a couple more functions from itertools:
In [90]: from itertools import chain, izip_longest
In [91]: subs = product("68", repeat=word.count("B"))
In [92]: [''.join(chain(*izip_longest(word.split('B'), sub, fillvalue=''))) for sub in subs]
Out[92]:
['L0676065',
'L0676085',
'L0678065',
'L0678085',
'L0876065',
'L0876085',
'L0878065',
'L0878085']

Here simple recursive function for generating your strings : - (It is a pseudo code)
permut(char[] original,char buff[],int i) {
if(i<original.length) {
if(original[i]=='B') {
buff[i] = '6'
permut(original,buff,i+1)
buff[i] = '8'
permut(original,buff,i+1)
}
else if(original[i]=='Q') {
buff[i] = '9'
permut(original,buff,i+1)
}
else {
buff[i] = ch[i];
permut(original,buff,i+1)
}
}
else {
store buff[]
}
}

Related

Atributting values to specific numbers trough a file

My question is if there's any way to attribute the numbers in the first column to the ones in the second column. So that I can read the numbers in the second column but have them connected to the ones in the first column in some way, so that I can sort them as I do in the sorted_resistances list but after sorting them I replace them with the values in the first column that we're assigned to each of the values.
For information in the code it's opening up from a file the list that's why it's programed like that
1 30000
2 30511
3 30052
4 30033
5 30077
6 30055
7 30086
8 30044
9 30088
10 30019
11 30310
12 30121
13 30132
with open("file.txt") as file_in:
list_of_resistances = []
for line in file_in:
list_of_resistances.append(int(line.split()[1]))
sorted_resistances = sorted(list_of_resistances)
If you want to keep the correlation between the values in the two columns, you can keep all of the values from each line in a tuple (or list), and then sort the list of tuples using a specific piece by passing a lambda function to the key parameter of the sorted() function that tells it to use the second piece of each tuple as the sort value.
In this example, I used pprint.pprint to make the output the of the lists easier to read.
from pprint import pprint
with open("file.txt") as file_in:
list_of_resistances = []
for line in file_in:
list_of_resistances.append(tuple(line.strip().split(' ')))
print("Unsorted values:")
pprint(list_of_resistances)
sorted_resistances = sorted(list_of_resistances, key=lambda x: x[1])
print("\nSorted values:")
pprint(sorted_resistances)
print("\nSorted keys from column 1:")
pprint([x[0] for x in sorted_resistances])
Output:
Unsorted values:
[('1', '30000'),
('2', '30511'),
('3', '30052'),
('4', '30033'),
('5', '30077'),
('6', '30055'),
('7', '30086'),
('8', '30044'),
('9', '30088'),
('10', '30019'),
('11', '30310'),
('12', '30121'),
('13', '30132')]
Sorted values:
[('1', '30000'),
('10', '30019'),
('4', '30033'),
('8', '30044'),
('3', '30052'),
('6', '30055'),
('5', '30077'),
('7', '30086'),
('9', '30088'),
('12', '30121'),
('13', '30132'),
('11', '30310'),
('2', '30511')]
Sorted keys from column 1:
['1', '10', '4', '8', '3', '6', '5', '7', '9', '12', '13', '11', '2']

Alternate way to find all lexicographic orderings of a string in Python

Problem: Find all the different ways to arrange a string
E.G. 123 can be arranged--123, 132, 213, 231, 321, 312
So I honestly have no idea how to go about engineering the solution to this problem, as I have not done any official Data Structures and Algorithms courses, but I came up with a more mathematical solution that I was able to turn into code:
if __name__ == '__main__':
string = 'ABCD'
count = 0
for first in string:
if first is not string[0]:
print()
for second in string:
if second is first:
continue
for third in string:
if third in [second, first]:
continue
for fourth in string:
if fourth in [third, second, first]:
continue
count += 1
print(str(first)+str(second)+str(third)+str(fourth), end=', ')
print('\n{} possible combinations'.format(count))
but I have to manually add or remove for-loops depending on the size of the string. What methodology should I use to go about this problem
You want to find all the permutations of the string. Python's standard library includes a function for this.
>>> from itertools import permutations
>>> list(permutations('123'))
[('1', '2', '3'), ('1', '3', '2'), ('2', '1', '3'), ('2', '3', '1'), ('3', '1', '2'), ('3', '2', '1')]
there is actually a very simple solution, only by using the itertools library;
from itertools import permutations
answer = [''.join(perm) for perm in permutations(s)]
That gives you a list of all the different permutations of s,
e.g.
s = 'abc'
Then the answer would be:
answer = ['abc', 'acb', 'bac', 'bca', 'cab', 'cba']

Sorting a dictionary based on values

I searched for sorting a Python dictionary based on value and got various answers on the internet.Tried few of them and finally used Sorted function.
I have simplified the example to make it clear.
I have a dictionary,say:
temp_dict = {'1': '40', '0': '109', '3': '37', '2': '42', '5': '26', '4': '45', '7': '109', '6': '42'}
Now ,to sort it out based on value,I did the following operation(using Operator module):
sorted_temp_dict = sorted(temp_dict.items(), key=operator.itemgetter(1))
The result I'm getting is(The result is a tuple,which is fine for me):
[('0', '109'), ('7', '109'), ('5', '26'), ('3', '37'), ('1', '40'), ('2', '42'), ('6', '42'), ('4', '45')]
The issue is,as you can see,the first two elements of the tuple is not sorted.The rest of the elements are sorted perfectly based on the value.
Not able to find the mistake here.Any help will be great.Thanks
Those are sorted. They are strings, and are sorted lexicographically: '1' is before '2', etc.
If you want to sort by numeric value, you'll need to convert to ints in the key function. For example:
sorted(temp_dict.items(), key=lambda x: int(x[1]))
They are sorted, the issue is that the elements are string , hence -
'109' < '26' # this is true, as they are string
Try converting them to int for the key argument, you can use a lambda such as -
>>> sorted_temp_dict = sorted(temp_dict.items(), key=lambda x: int(x[1]))
>>> sorted_temp_dict
[('5', '26'), ('3', '37'), ('1', '40'), ('6', '42'), ('2', '42'), ('4', '45'), ('7', '109'), ('0', '109')]
The problem is trying to sort with values that are str, and not int. If you first convert the values into int and then sort, it will work.

Python permutations

I am trying to generate pandigital numbers using the itertools.permutations function, but whenever I do it generates them as a list of separate digits, which is not what I want.
For example:
for x in itertools.permutations("1234"):
print(x)
will produce:
('1', '2', '3', '4')
('1', '2', '4', '3')
('1', '3', '2', '4')
('1', '3', '4', '2')
('1', '4', '2', '3')
('1', '4', '3', '2'), etc.
whereas I want it to return 1234, 1243, 1324, 1342, 1423, 1432, etc. How would I go about doing this in an optimal fashion?
A list comprehension with the built-in str.join() function is what you need:
import itertools
a = [''.join(i) for i in itertools.permutations("1234") ]
print(a)
Output:
['1234', '1243', '1324', '1342', '1423', '1432', '2134', '2143', '2314', '2341', '2413', '2431', '3124', '3142', '3214', '3241', '3412', '3421', '4123', '4132', '4213', '4231', '4312', '4321']
itertools.permutations takes an iterable and returns an iterator yielding tuples.
Use join() that return a string which is the concatenation of the strings in the iterable iterable
join() DOCS,
itertools.permutations DOCS
Use this:
import itertools
for x in itertools.permutations("1234"):
print (''.join(x))
Output:
1234
1243
1324
1342
1423
1432
2134
2143
2314
2341
....
see itertools.permutations return tuple.
see join function:
In [1]: ''.join(('1','2','3'))
Out[1]: '123'
try this:
for x in itertools.permutations("1234"):
print ''.join(x)

Why am I not getting the result of sorted function in expected order?

print activities
activities = sorted(activities,key = lambda item:item[1])
print activities
Activities in this case is a list of tuples like (start_number,finish_number) the output of the above code according to me should be the list of values sorted according the the increasing order of finish_number. When I tried the above code in shell I got the following output. I am not sure why the second list is not sorted according the the increasing order of the finish_number. Please help me in understanding this.
[('1', '4'), ('3', '5'), ('0', '6'), ('5', '7'), ('3', '9'), ('5', '9'), ('6', '10'), ('8', '11'), ('8', '12'), ('2', '14'), ('12', '16')]
[('6', '10'), ('8', '11'), ('8', '12'), ('2', '14'), ('12', '16'), ('1', '4'), ('3', '5'), ('0', '6'), ('5', '7'), ('3', '9'), ('5', '9')]
You are sorting strings instead of integers: in that case, 10 is "smaller" than 4. To sort on integers, convert it to this:
activites = sorted(activities,key = lambda item:int(item[1]))
print activities
Results in:
[('1', '4'), ('3', '5'), ('0', '6'), ('5', '7'), ('3', '9'), ('5', '9'), ('6', '10'), ('8', '11'), ('8', '12'), ('2', '14'), ('12', '16')]
Your items are being compared as strings, not as numbers. Thus, since the 1 character comes before 4 lexicographically, it makes sense that 10 comes before 4.
You need to cast the value to an int first:
activities = sorted(activities,key = lambda item:int(item[1]))
You are sorting strings, not numbers. Strings get sorted character by character.
So, for example '40' is greater than '100' because character 4 is larger than 1.
You can fix this on the fly by simply casting the item as an integer.
activities = sorted(activities,key = lambda item: int(item[1]))
It's because you're not storing the number as a number, but as a string. The string '10' comes before the string '2'. Try:
activities = sorted(activities, key=lambda i: int(i[1]))
Look for a BROADER solution to your problem: Convert your data from str to int immediately on input, work with it as int (otherwise you'll be continually be bumping into little problems like this), and format your data as str for output.
This principle applies generally, e.g. when working with non-ASCII string data, do UTF-8 -> unicode -> UTF-8; don't try to manipulate undecoded text.

Categories

Resources