How to optimize the search algorithm for large 2d-array?

How to optimize the search algorithm for large 2d-array? - python

After adding a line of code
pathResult.append(find_max_path(arr, a + 1, b + 1, path))
began to run slowly, but without this code it does not work correctly. How can i optimize the code? The function looks for the path with the maximum number of points in a two-dimensional array where values equal to 100 lie predominantly on the main diagonal. Rows can have the same value equal to 100, but in any column the value 100 is one or none. Full code:
arr = [
[000,000,000,000,000,100,000],
[000,000,000,000,000,000,000],
[000,000,100,000,000,000,000],
[000,100,000,000,000,000,000],
[100,000,000,000,000,100,000],
[000,000,000,000,100,000,000],
[000,000,000,000,000,000,000],
[000,000,000,000,000,000,000]]
def find_max_path(arr, a=0, b=0, path=None):
if path is None:
path = []
while (a < len(arr)) and (b < len(arr[a])):
if arr[a][b] == 100:
path.append({"a": a, "b": b})
b += 1
else:
try:
if arr[a + 1][b + 1] == 100:
a += 1
b += 1
continue
except IndexError:
pass
check = []
for j in range(b + 1, len(arr[a])):
if arr[a][j] == 100:
check.append({"a": a, "b": j})
break
if not check:
a += 1
continue
i = a + 1
while i < len(arr):
if arr[i][b] == 100:
check.append({"a": i, "b": b})
break
i += 1
pathResult = []
for c in check:
pathNew = path[:]
pathNew.append({"a": c["a"], "b": c["b"]})
pathResult.append(find_max_path(arr, c["a"] + 1, c["b"] + 1, pathNew))
pathResult.append(find_max_path(arr, a + 1, b + 1, path))
maximum = 0
maxpath = []
for p in pathResult:
if len(p) > maximum:
maximum = len(p)
maxpath = p[:]
if maxpath:
return maxpath
a += 1
return path
print(find_max_path(arr))
UPDATE1: add two break in inner loops (execution time is halved)
Output:
[{'a': 4, 'b': 0}, {'a': 5, 'b': 4}]
UPDATE2
Usage.
I use this algorithm to synchronize two streams of information. I have words from the text along the lines, about which it is known where they are in the text of the book L_word. By columns, I have recognized words from the audiobook, about which the recognized word itself is known and when it was spoken in the audio stream R_word.
It turns out two arrays of words. To synchronize these two lists, I use something like this
from rapidfuzz import process, fuzz
import numpy as np
window = 50
# L_word = ... # words from text book
# R_word = ... # recognize words from audiobook
L = 0
R = 0
L_chunk = L_word[L:L+window]
R_chunk = R_word[R:R+window]
scores = process.cdist(L_chunk,
R_chunk,
scorer=fuzz.ratio,
type=np.uint8,
score_cutoff=100)
p = find_max_path(scores)
# ... path processing ...
...
as a result of all the work, we get something like this video book with pagination and subtitles synchronized with audio download 3GB
UPDATE3: adding this code reduces the execution time by almost ten times!
try:
if arr[a + 1][b + 1] == 100:
a += 1
b += 1
continue
except IndexError:
pass

Python shows how to do debugging and profiling. Go around the algorithm and time functions to see where the bottleneck is

Related

Check the difference between string i and p

How would I go about checking to see what the difference between string p and i? So the 2nd line can equal the first line.
t=int(input())
print(t)
for i in range(t):
print(i)
i=input()
p=input()
print(i,p)
print('Case #'+(str(i+1))+': ')
if len(i)==0:
#print(len(p))
else:
#print((len(p)-len(i)))
Help Barbara find out how many extra letters she needs to remove in order to obtain I or if I cannot be obtained from P by removing letters then output IMPOSSIBLE.
input:
2
aaaa
aaaaa
bbbbb
bbbbc
output:
Case #1: 1
Case #2: IMPOSSIBLE

You can use Levenshtein distance to calculate the difference and decide what is possible and impossible yourself.
You can find more resources on YouTube to understand the concept better. E.g. https://www.youtube.com/watch?v=We3YDTzNXEk
I have provided a version of code for your convenient as well.
import numpy as np
def calculate_edit_distance(source, target):
'''Calculate the edit distance from source to target
[In] source="ab" target="bc"
[Out] return 2
'''
num_row = len(target) + 1
num_col = len(source) + 1
distance_table = np.array([[0] * num_col for _ in range(num_row)])
# getting from X[0...i] to empty target string requires i deletions
distance_table[:, 0] = [i for i in range(num_row)]
# getting from Y[0...i] to empty source string requires i deletions
distance_table[0] = [i for i in range(num_col)]
# loop through all the characters and calculate their respective distances
for i in range(num_row - 1):
for j in range(num_col - 1):
insert = distance_table[i + 1, j]
delete = distance_table[i, j + 1]
substitute = distance_table[i, j]
# if target char and source char are the same,
# just copy the diagonal value
if target[i] == source[j]:
distance_table[i + 1, j + 1] = substitute
else:
operations = [delete, insert, substitute]
best_operation = np.argmin(operations)
if best_operation == 2: # +2 if the operation is to substitute
distance_table[i + 1, j + 1] = substitute + 2
else: # same formula for both delete and insert operation
distance_table[i + 1, j + 1] = operations[best_operation] + 1
return distance_table[num_row - 1, num_col - 1]

How to group genes regarding their id and position , python

I have a file containing genes of different genomes. Gene is denoted by NZ_CP019047.1_2993 and Genome by NZ_CP019047
They look like this :
NZ_CP019047.1_2993
NZ_CP019047.1_2994
NZ_CP019047.1_2995
NZ_CP019047.1_2999
NZ_CP019047.1_3000
NZ_CP019047.1_3001
NZ_CP019047.1_3003
KE699235.1_379
KE699235.1_1000
KE699235.1_1001
what I want to do is group the genes of a genome (if a genome has more than 1 gene) regarding their distance meaning, if I have genes nearer than 4 positions I want to group them together.The position can be understood as the number after '_'. I want something like these:
[NZ_CP019047.1_2993,NZ_CP019047.1_2994,NZ_CP019047.1_2995]
[NZ_CP019047.1_2999,NZ_CP019047.1_3000,NZ_CP019047.1_3001,NZ_CP019047.1_3003]
[KE699235.1_1000,KE699235.1_1001]
What I have tried so far is creating a dictionary holding for each genome, in my case NZ_CP019047 and KE699235, all the number after '_'. Then I calculate their differences, if it is less than 4 I try to group them. The problem is that I am having duplication and I am having problem in the case when 1 genome has more than 1 group of genes like this case :
[NZ_CP019047.1_2993,NZ_CP019047.1_2994,NZ_CP019047.1_2995]
[NZ_CP019047.1_2999,NZ_CP019047.1_3000,NZ_CP019047.1_3001,NZ_CP019047.1_3003]
This is my code:
for key in sortedDict1:
cassette = ''
differences = []
numbers = sortedDict1[key]
differences = [x - numbers[i - 1] for i, x in enumerate(numbers)][1:]
print(differences)
for i in range(0,len(differences)):
if differences[i] <= 3:
pos = i
el1 = key + str(numbers[i])
el2 = key + str(numbers[i+1])
cas = el1 + ' '
cassette += cas
cas = el2 + ' '
cassette += cas
else:
cassette + '/n'
i+=1
I am referring to groups with variable cassette.
Can someone please help?

Please see below. You can modify the labels and distances to your requirements.
def get_genome_groups(genome_info):
genome_info.sort(key = lambda x: (x.split('.')[0], int(x.split('_')[-1])))
#print(genome_info)
genome_groups = []
close_genome_group = []
last_genome = ''
position = 0
last_position = 0
#'NZ_CP019047.1_2995',
for genomes in genome_info:
genome, position = genomes.split('.')
position = int(position.split('_')[1])
if last_genome and (genome != last_genome):
genome_groups.append(close_genome_group)
close_genome_group = []
elif close_genome_group and position and (position > last_position+3):
genome_groups.append(close_genome_group)
close_genome_group = []
if genomes:
close_genome_group.append(genomes)
last_position = position
last_genome = genome
if close_genome_group:
genome_groups.append(close_genome_group)
return genome_groups
if __name__ == '__main__':
genome_group = get_genome_groups(genome_info)
print(genome_group)
user#Inspiron:~/code/general$ python group_genes.py
[['KE699235.1_379'], ['KE699235.1_1000', 'KE699235.1_1001'], ['NZ_CP019047.1_2993', 'NZ_CP019047.1_2994', 'NZ_CP019047.1_2995'], ['NZ_CP019047.1_2999', 'NZ_CP019047.1_3000', 'NZ_CP019047.1_3001', 'NZ_CP019047.1_3003']]
user#Inspiron:~/code/general$

Input:
NZ_CP019047.1_2993
NZ_CP019047.1_2994
NZ_CP019047.1_2995
NZ_CP019047.1_2999
NZ_CP019047.1_3000
NZ_CP019047.1_3001
NZ_CP019047.1_3003
KE699235.1_379
KE699235.1_1000
KE699235.1_1001
KE6992351.2_379
KE6992352.2_1000
KE6992353.2_1001
Code:
from operator import itemgetter, attrgetter
with open("genes.dat", "r") as msg:
data = msg.read().splitlines()
for i, gene in enumerate(data):
gene_name = gene.split(".")[0]
chr_pos = gene.split(".")[1]
data[i] = (gene_name,int(chr_pos.split("_")[0]),int(chr_pos.split("_")[1]))
data = sorted(data, key=itemgetter(1,2))
output = []
j = 0
for i in range(1,len(data)):
if i == 1:
output.append([data[i]])
elif data[i][1] == output[j][0][1]:
if data[i][2] - output[j][0][2] < 5:
output[j].append(data[i])
else:
output.append([data[i]])
j += 1
else:
output.append([data[i]])
j += 1
print (output)
Output:
[[('KE699235', 1, 1000), ('KE699235', 1, 1001)], [('NZ_CP019047', 1, 2993), ('NZ_CP019047', 1, 2994), ('NZ_CP019047', 1, 2995)], [('NZ_CP019047', 1, 2999), ('NZ_CP019047', 1, 3000), ('NZ_CP019047', 1, 3001), ('NZ_CP019047', 1, 3003)], [('KE6992351', 2, 379)], [('KE6992352', 2, 1000), ('KE6992353', 2, 1001)]]
This should make groups based on max 5 difference in position between the most backward element and the most forward in the same group.
It should work if you get a list of mixed genes considering chr location.

How to debug my Python dice game?

So I recently posted my code for a simple dice program I'm having trouble with. It is supposed to randomly generate 5 numbers in an array, then check if there are any matching values, if there are, it adds to MATCH, so once it's done checking, MATCH+1 is how many 'of a kind' there are(match=1 means two of a kind, match=2 means three of a kind etc.)
It randomly generates and then displays the numbers correctly, and the program seems to check without errors except when the last two playerDice elements match, then it throws an out of bounds error, Why is it doing that? Also it never actually displays the last print line with how many of a kind there are, even when it runs error free, Why is that?
Here is the code:
import random
playerDice = [random.randint(1,6),random.randint(1,6),random.randint(1,6),random.randint(1,6),random.randint(1,6)]
compDice = [random.randint(1,6),random.randint(1,6),random.randint(1,6),random.randint(1,6),random.randint(1,6)]
match = 0
compmatch = 0
#print player dice
print("You rolled: ",end=" ")
a = 0
while a < len(playerDice):
print(str(playerDice[a]) + ", ",end=" ")
a = a + 1
#player check matches
i = 0
while i < len(playerDice):
j = i + 1
if playerDice[i] == playerDice[j]:
match = match + 1
while playerDice[i] != playerDice[j]:
j = j + 1
if playerDice[i] == playerDice[j]:
match = match + 1
i = i + 1
print("Player has: " + str(match + 1) + " of a kind.")

There's a much easier way to look for matches: sort the dice, and then look for runs of repeated dice. You could look for those runs manually, but the standard library has a function for that: itertools.groupby. Here's a short demo.
import random
from itertools import groupby
# Seed the randomizer while testing so that the results are repeatable.
random.seed(7)
def roll_dice(num):
return [random.randint(1,6) for _ in range(num)]
def find_matches(dice):
matches = []
for k, g in groupby(sorted(dice)):
matchlen = len(list(g))
if matchlen > 1:
matches.append('{} of a kind: {}'.format(matchlen, k))
return matches
for i in range(1, 6):
print('Round', i)
player_dice = roll_dice(5)
#comp_dice = roll_dice(5)
print('You rolled: ', end='')
print(*player_dice, sep=', ')
matches = find_matches(player_dice)
if not matches:
print('No matches')
else:
for row in matches:
print(row)
print()
output
Round 1
You rolled: 3, 2, 4, 6, 1
No matches
Round 2
You rolled: 1, 5, 1, 3, 5
2 of a kind: 1
2 of a kind: 5
Round 3
You rolled: 1, 5, 2, 1, 1
3 of a kind: 1
Round 4
You rolled: 4, 4, 1, 2, 1
2 of a kind: 1
2 of a kind: 4
Round 5
You rolled: 5, 4, 1, 5, 1
2 of a kind: 1
2 of a kind: 5
Here's an alternative version of find_matches that doesn't use groupby. It's probably a good idea to run through this algorithm on paper to see exactly how it works.
def find_matches(dice):
matches = []
dice = sorted(dice)
prev = dice[0]
matchlen = 1
# Add a "dummy" entry so we can detect a group at the end of the list
for d in dice[1:] + [0]:
# Compare this die to the previous one
if d == prev:
# We're inside a run of matching dice
matchlen += 1
else:
# The previous run has ended, so see if it's
# long enough to add to the matches list
if matchlen > 1:
matches.append('{} of a kind: {}'.format(matchlen, prev))
# Reset the match length counter
matchlen = 1
# This die will be the previous die on the next loop iteration
prev = d
return matches

Count consecutive characters

How would I count consecutive characters in Python to see the number of times each unique digit repeats before the next unique digit?
At first, I thought I could do something like:
word = '1000'
counter = 0
print range(len(word))
for i in range(len(word) - 1):
while word[i] == word[i + 1]:
counter += 1
print counter * "0"
else:
counter = 1
print counter * "1"
So that in this manner I could see the number of times each unique digit repeats. But this, of course, falls out of range when i reaches the last value.
In the example above, I would want Python to tell me that 1 repeats 1, and that 0 repeats 3 times. The code above fails, however, because of my while statement.
How could I do this with just built-in functions?

Consecutive counts:
You can use itertools.groupby:
s = "111000222334455555"
from itertools import groupby
groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]
After which, result looks like:
[("1": 3), ("0", 3), ("2", 3), ("3", 2), ("4", 2), ("5", 5)]
And you could format with something like:
", ".join("{}x{}".format(label, count) for label, count in result)
# "1x3, 0x3, 2x3, 3x2, 4x2, 5x5"
Total counts:
Someone in the comments is concerned that you want a total count of numbers so "11100111" -> {"1":6, "0":2}. In that case you want to use a collections.Counter:
from collections import Counter
s = "11100111"
result = Counter(s)
# {"1":6, "0":2}
Your method:
As many have pointed out, your method fails because you're looping through range(len(s)) but addressing s[i+1]. This leads to an off-by-one error when i is pointing at the last index of s, so i+1 raises an IndexError. One way to fix this would be to loop through range(len(s)-1), but it's more pythonic to generate something to iterate over.
For string that's not absolutely huge, zip(s, s[1:]) isn't a a performance issue, so you could do:
counts = []
count = 1
for a, b in zip(s, s[1:]):
if a==b:
count += 1
else:
counts.append((a, count))
count = 1
The only problem being that you'll have to special-case the last character if it's unique. That can be fixed with itertools.zip_longest
import itertools
counts = []
count = 1
for a, b in itertools.zip_longest(s, s[1:], fillvalue=None):
if a==b:
count += 1
else:
counts.append((a, count))
count = 1
If you do have a truly huge string and can't stand to hold two of them in memory at a time, you can use the itertools recipe pairwise.
def pairwise(iterable):
"""iterates pairwise without holding an extra copy of iterable in memory"""
a, b = itertools.tee(iterable)
next(b, None)
return itertools.zip_longest(a, b, fillvalue=None)
counts = []
count = 1
for a, b in pairwise(s):
...

A solution "that way", with only basic statements:
word="100011010" #word = "1"
count=1
length=""
if len(word)>1:
for i in range(1,len(word)):
if word[i-1]==word[i]:
count+=1
else :
length += word[i-1]+" repeats "+str(count)+", "
count=1
length += ("and "+word[i]+" repeats "+str(count))
else:
i=0
length += ("and "+word[i]+" repeats "+str(count))
print (length)
Output :
'1 repeats 1, 0 repeats 3, 1 repeats 2, 0 repeats 1, 1 repeats 1, and 0 repeats 1'
#'1 repeats 1'

Totals (without sub-groupings)
#!/usr/bin/python3 -B
charseq = 'abbcccdddd'
distros = { c:1 for c in charseq }
for c in range(len(charseq)-1):
if charseq[c] == charseq[c+1]:
distros[charseq[c]] += 1
print(distros)
I'll provide a brief explanation for the interesting lines.
distros = { c:1 for c in charseq }
The line above is a dictionary comprehension, and it basically iterates over the characters in charseq and creates a key/value pair for a dictionary where the key is the character and the value is the number of times it has been encountered so far.
Then comes the loop:
for c in range(len(charseq)-1):
We go from 0 to length - 1 to avoid going out of bounds with the c+1 indexing in the loop's body.
if charseq[c] == charseq[c+1]:
distros[charseq[c]] += 1
At this point, every match we encounter we know is consecutive, so we simply add 1 to the character key. For example, if we take a snapshot of one iteration, the code could look like this (using direct values instead of variables, for illustrative purposes):
# replacing vars for their values
if charseq[1] == charseq[1+1]:
distros[charseq[1]] += 1
# this is a snapshot of a single comparison here and what happens later
if 'b' == 'b':
distros['b'] += 1
You can see the program output below with the correct counts:
➜ /tmp ./counter.py
{'b': 2, 'a': 1, 'c': 3, 'd': 4}

You only need to change len(word) to len(word) - 1. That said, you could also use the fact that False's value is 0 and True's value is 1 with sum:
sum(word[i] == word[i+1] for i in range(len(word)-1))
This produces the sum of (False, True, True, False) where False is 0 and True is 1 - which is what you're after.
If you want this to be safe you need to guard empty words (index -1 access):
sum(word[i] == word[i+1] for i in range(max(0, len(word)-1)))
And this can be improved with zip:
sum(c1 == c2 for c1, c2 in zip(word[:-1], word[1:]))

If we want to count consecutive characters without looping, we can make use of pandas:
In [1]: import pandas as pd
In [2]: sample = 'abbcccddddaaaaffaaa'
In [3]: d = pd.Series(list(sample))
In [4]: [(cat[1], grp.shape[0]) for cat, grp in d.groupby([d.ne(d.shift()).cumsum(), d])]
Out[4]: [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 4), ('f', 2), ('a', 3)]
The key is to find the first elements that are different from their previous values and then make proper groupings in pandas:
In [5]: sample = 'abba'
In [6]: d = pd.Series(list(sample))
In [7]: d.ne(d.shift())
Out[7]:
0 True
1 True
2 False
3 True
dtype: bool
In [8]: d.ne(d.shift()).cumsum()
Out[8]:
0 1
1 2
2 2
3 3
dtype: int32

This is my simple code for finding maximum number of consecutive 1's in binaray string in python 3:
count= 0
maxcount = 0
for i in str(bin(13)):
if i == '1':
count +=1
elif count > maxcount:
maxcount = count;
count = 0
else:
count = 0
if count > maxcount: maxcount = count
maxcount

There is no need to count or groupby. Just note the indices where a change occurs and subtract consecutive indicies.
w = "111000222334455555"
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw) # digits
['1', '0', '2', '3', '4']
print(cw) # counts
[3, 3, 3, 2, 2, 5]
w = 'XXYXYYYXYXXzzzzzYYY'
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw) # characters
print(cw) # digits
['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'z', 'Y']
[2, 1, 1, 3, 1, 1, 2, 5, 3]

A one liner that returns the amount of consecutive characters with no imports:
def f(x):s=x+" ";t=[x[1] for x in zip(s[0:],s[1:],s[2:]) if (x[1]==x[0])or(x[1]==x[2])];return {h: t.count(h) for h in set(t)}
That returns the amount of times any repeated character in a list is in a consecutive run of characters.
alternatively, this accomplishes the same thing, albeit much slower:
def A(m):t=[thing for x,thing in enumerate(m) if thing in [(m[x+1] if x+1<len(m) else None),(m[x-1] if x-1>0 else None)]];return {h: t.count(h) for h in set(t)}
In terms of performance, I ran them with
site = 'https://web.njit.edu/~cm395/theBeeMovieScript/'
s = urllib.request.urlopen(site).read(100_000)
s = str(copy.deepcopy(s))
print(timeit.timeit('A(s)',globals=locals(),number=100))
print(timeit.timeit('f(s)',globals=locals(),number=100))
which resulted in:
12.528256356999918
5.351301653001428
This method can definitely be improved, but without using any external libraries, this was the best I could come up with.

In python
your_string = "wwwwweaaaawwbbbbn"
current = ''
count = 0
for index, loop in enumerate(your_string):
current = loop
count = count + 1
if index == len(your_string)-1:
print(f"{count}{current}", end ='')
break
if your_string[index+1] != current:
print(f"{count}{current}",end ='')
count = 0
continue
This will output
5w1e4a2w4b1n

#I wrote the code using simple loops and if statement
s='feeekksssh' #len(s) =11
count=1 #f:0, e:3, j:2, s:3 h:1
l=[]
for i in range(1,len(s)): #range(1,10)
if s[i-1]==s[i]:
count = count+1
else:
l.append(count)
count=1
if i == len(s)-1: #To check the last character sequence we need loop reverse order
reverse_count=1
for i in range(-1,-(len(s)),-1): #Lopping only for last character
if s[i] == s[i-1]:
reverse_count = reverse_count+1
else:
l.append(reverse_count)
break
print(l)

Today I had an interview and was asked the same question. I was struggling with the original solution in mind:
s = 'abbcccda'
old = ''
cnt = 0
res = ''
for c in s:
cnt += 1
if old != c:
res += f'{old}{cnt}'
old = c
cnt = 0 # default 0 or 1 neither work
print(res)
# 1a1b2c3d1
Sadly this solution always got unexpected edge cases result(is there anyone to fix the code? maybe i need post another question), and finally timeout the interview.
After the interview I calmed down and soon got a stable solution I think(though I like the groupby best).
s = 'abbcccda'
olds = []
for c in s:
if olds and c in olds[-1]:
olds[-1].append(c)
else:
olds.append([c])
print(olds)
res = ''.join([f'{lst[0]}{len(lst)}' for lst in olds])
print(res)
# [['a'], ['b', 'b'], ['c', 'c', 'c'], ['d'], ['a']]
# a1b2c3d1a1

Here is my simple solution:
def count_chars(s):
size = len(s)
count = 1
op = ''
for i in range(1, size):
if s[i] == s[i-1]:
count += 1
else:
op += "{}{}".format(count, s[i-1])
count = 1
if size:
op += "{}{}".format(count, s[size-1])
return op

data_input = 'aabaaaabbaaaaax'
start = 0
end = 0
temp_dict = dict()
while start < len(data_input):
if data_input[start] == data_input[end]:
end = end + 1
if end == len(data_input):
value = data_input[start:end]
temp_dict[value] = len(value)
break
if data_input[start] != data_input[end]:
value = data_input[start:end]
temp_dict[value] = len(value)
start = end
print(temp_dict)

PROBLEM: we need to count consecutive characters and return characters with their count.
def countWithString(input_string:str)-> str:
count = 1
output = ''
for i in range(1,len(input_string)):
if input_string[i]==input_string[i-1]:
count +=1
else:
output += f"{count}{input_string[i-1]}"
count = 1
# Used to add last string count (at last else condition will not run and data will not be inserted to ouput string)
output += f"{count}{input_string[-1]}"
return output
countWithString(input)
input:'aaabbbaabbcc'
output:'3a3b2a2b2c'
Time Complexity: O(n)
Space Complexity: O(1)

temp_str = "aaaajjbbbeeeeewwjjj"
def consecutive_charcounter(input_str):
counter = 0
temp_list = []
for i in range(len(input_str)):
if i==0:
counter+=1
elif input_str[i]== input_str[i-1]:
counter+=1
if i == len(input_str)-1:
temp_list.extend([input_str[i - 1], str(counter)])
else:
temp_list.extend([input_str[i-1],str(counter)])
counter = 1
print("".join(temp_list))
consecutive_charcounter(temp_str)

Python Dynamic Knapsack

Right now I am attempting to code the knapsack problem in Python 3.2. I am trying to do this dynamically with a matrix. The algorithm that I am trying to use is as follows
Implements the memoryfunction method for the knapsack problem
Input: A nonnegative integer i indicating the number of the first
items being considered and a nonnegative integer j indicating the knapsack's capacity
Output: The value of an optimal feasible subset of the first i items
Note: Uses as global variables input arrays Weights[1..n], Values[1...n]
and table V[0...n, 0...W] whose entries are initialized with -1's except for
row 0 and column 0 initialized with 0's
if V[i, j] < 0
if j < Weights[i]
value <-- MFKnapsack(i - 1, j)
else
value <-- max(MFKnapsack(i -1, j),
Values[i] + MFKnapsack(i -1, j - Weights[i]))
V[i, j} <-- value
return V[i, j]
If you run the code below that I have you can see that it tries to insert the weight into the the list. Since this is using the recursion I am having a hard time spotting the problem. Also I get the error: can not add an integer with a list using the '+'. I have the matrix initialized to start with all 0's for the first row and first column everything else is initialized to -1. Any help will be much appreciated.
#Knapsack Problem
def knapsack(weight,value,capacity):
weight.insert(0,0)
value.insert(0,0)
print("Weights: ",weight)
print("Values: ",value)
capacityJ = capacity+1
## ------ initialize matrix F ---- ##
dimension = len(weight)+1
F = [[-1]*capacityJ]*dimension
#first column zeroed
for i in range(dimension):
F[i][0] = 0
#first row zeroed
F[0] = [0]*capacityJ
#-------------------------------- ##
d_index = dimension-2
print(matrixFormat(F))
return recKnap(F,weight,value,d_index,capacity)
def recKnap(matrix, weight,value,index, capacity):
print("index:",index,"capacity:",capacity)
if matrix[index][capacity] < 0:
if capacity < weight[index]:
value = recKnap(matrix,weight,value,index-1,capacity)
else:
value = max(recKnap(matrix,weight,value,index-1,capacity),
value[index] +
recKnap(matrix,weight,value,index-1,capacity-(weight[index]))
matrix[index][capacity] = value
print("matrix:",matrix)
return matrix[index][capacity]
def matrixFormat(*doubleLst):
matrix = str(list(doubleLst)[0])
length = len(matrix)-1
temp = '|'
currChar = ''
nextChar = ''
i = 0
while i < length:
if matrix[i] == ']':
temp = temp + '|\n|'
#double digit
elif matrix[i].isdigit() and matrix[i+1].isdigit():
temp = temp + (matrix[i]+matrix[i+1]).center(4)
i = i+2
continue
#negative double digit
elif matrix[i] == '-' and matrix[i+1].isdigit() and matrix[i+2].isdigit():
temp = temp + (matrix[i]+matrix[i+1]+matrix[i+2]).center(4)
i = i + 2
continue
#negative single digit
elif matrix[i] == '-' and matrix[i+1].isdigit():
temp = temp + (matrix[i]+matrix[i+1]).center(4)
i = i + 2
continue
elif matrix[i].isdigit():
temp = temp + matrix[i].center(4)
#updates next round
currChar = matrix[i]
nextChar = matrix[i+1]
i = i + 1
return temp[:-1]
def main():
print("Knapsack Program")
#num = input("Enter the weights you have for objects you would like to have:")
#weightlst = []
#valuelst = []
## for i in range(int(num)):
## value , weight = eval(input("What is the " + str(i) + " object value, weight you wish to put in the knapsack? ex. 2,3: "))
## weightlst.append(weight)
## valuelst.append(value)
weightLst = [2,1,3,2]
valueLst = [12,10,20,15]
capacity = 5
value = knapsack(weightLst,valueLst,5)
print("\n Max Matrix")
print(matrixFormat(value))
main()

F = [[-1]*capacityJ]*dimension
does not properly initialize the matrix. [-1]*capacityJ is fine, but [...]*dimension creates dimension references to the exact same list. So modifying one list modifies them all.
Try instead
F = [[-1]*capacityJ for _ in range(dimension)]
This is a common Python pitfall. See this post for more explanation.

for the purpose of cache illustration, I generally use a default dict as follows:
from collections import defaultdict
CS = defaultdict(lambda: defaultdict(int)) #if i want to make default vals as 0
###or
CACHE_1 = defaultdict(lambda: defaultdict(lambda: int(-1))) #if i want to make default vals as -1 (or something else)
This keeps me from making the 2d arrays in python on the fly...
To see an answer to z1knapsack using this approach:
http://ideone.com/fUKZmq

def zeroes(n,m):
v=[['-' for i in range(0,n)]for j in range(0,m)]
return v
value=[0,12,10,20,15]
w=[0,2,1,3,2]
v=zeroes(6,5)
def knap(i,j):
global v
if i==0 or j==0:
v[i][j]= 0
elif j<w[i] :
v[i][j]=knap(i-1,j)
else:
v[i][j]=max(knap(i-1,j),value[i]+knap(i-1,j-w[i]))
return v[i][j]
x=knap(4,5)
print (x)
for i in range (0,len(v)):
for j in range(0,len(v[0])):
print(v[i][j],end="\t\t")
print()
print()
#now these calls are for filling all the boxes in the matrix as in the above call only few v[i][j]were called and returned
knap(4,1)
knap(4,2)
knap(4,3)
knap(4,4)
for i in range (0,len(v)):
for j in range(0,len(v[0])):
print(v[i][j],end="\t\t")
print()
print()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to optimize the search algorithm for large 2d-array? - python

Python shows how to do debugging and profiling. Go around the algorithm and time functions to see where the bottleneck is

Related

Check the difference between string i and p

How to group genes regarding their id and position , python

How to debug my Python dice game?

Count consecutive characters

Python Dynamic Knapsack

Categories

Resources