Check the difference between string i and p - python

How would I go about checking to see what the difference between string p and i? So the 2nd line can equal the first line.
t=int(input())
print(t)
for i in range(t):
print(i)
i=input()
p=input()
print(i,p)
print('Case #'+(str(i+1))+': ')
if len(i)==0:
#print(len(p))
else:
#print((len(p)-len(i)))
Help Barbara find out how many extra letters she needs to remove in order to obtain I or if I cannot be obtained from P by removing letters then output IMPOSSIBLE.
input:
2
aaaa
aaaaa
bbbbb
bbbbc
output:
Case #1: 1
Case #2: IMPOSSIBLE

You can use Levenshtein distance to calculate the difference and decide what is possible and impossible yourself.
You can find more resources on YouTube to understand the concept better. E.g. https://www.youtube.com/watch?v=We3YDTzNXEk
I have provided a version of code for your convenient as well.
import numpy as np
def calculate_edit_distance(source, target):
'''Calculate the edit distance from source to target
[In] source="ab" target="bc"
[Out] return 2
'''
num_row = len(target) + 1
num_col = len(source) + 1
distance_table = np.array([[0] * num_col for _ in range(num_row)])
# getting from X[0...i] to empty target string requires i deletions
distance_table[:, 0] = [i for i in range(num_row)]
# getting from Y[0...i] to empty source string requires i deletions
distance_table[0] = [i for i in range(num_col)]
# loop through all the characters and calculate their respective distances
for i in range(num_row - 1):
for j in range(num_col - 1):
insert = distance_table[i + 1, j]
delete = distance_table[i, j + 1]
substitute = distance_table[i, j]
# if target char and source char are the same,
# just copy the diagonal value
if target[i] == source[j]:
distance_table[i + 1, j + 1] = substitute
else:
operations = [delete, insert, substitute]
best_operation = np.argmin(operations)
if best_operation == 2: # +2 if the operation is to substitute
distance_table[i + 1, j + 1] = substitute + 2
else: # same formula for both delete and insert operation
distance_table[i + 1, j + 1] = operations[best_operation] + 1
return distance_table[num_row - 1, num_col - 1]

Related

Hex string compression program

I Create a program to compress 'mac number' in Python.
I am trying here to shrink consecutive zeros with under the rules of that.
But I encountered some problem with the last index of the 'string' as last field
If Last field has '0000' ; It Couldn't get the Answer Which I Supposed.
It give the Answer the last field two times with double colon(::) in between.
Please help me to solve this Problem
This is my coding part:
hex_num = 'fe80:0d5e:0d2f:f180:f542:0d5e:00f0:f45d:3d5e:0000'
first_list = []
second_list = []
dec_value = ([(int(i,16)) for i in hex_num.split(":")])
print('list :',dec_value)
for i in range(len(dec_value) - 1):
if dec_value[i] == 0:
if dec_value[i] + dec_value[i + 1] == 0:
## print(dec_value[i], dec_value[i + 1])
pass
if dec_value[i + 1] != 0:
p = dec_value.index(dec_value[i + 1])
break
else:
## print(dec_value[i], dec_value[i + 1])
p = dec_value.index(dec_value[i])
first_list.append(dec_value[i])
for h in range(p, len(dec_value) - 1):
second_list.append(dec_value[h])
non_leading_zeros1 = ":".join([str(hex(h).replace("0x",'')) for h in first_list])
non_leading_zeros2 = ":".join([str(hex(h).replace("0x",'')) for h in second_list])
print(non_leading_zeros1 +'::'+ non_leading_zeros2)
OutPut :fe80:d5e:d2f:f180:f542:d5e:f0:f45d:3d5e::3d5e
it is wrong according to the rules of mac number or ipv6 number compression
expected value:fe80:d5e:d2f:f180:f542:d5e:f0:f45d:3d5e::

Making permanent change in a dataframe using python pandas

I would like to convert y dataframe from one format (X:XX:XX:XX) of values to another (X.X seconds)
Here is my dataframe looks like:
Start End
0 0:00:00:00
1 0:00:00:00 0:07:37:80
2 0:08:08:56 0:08:10:08
3 0:08:13:40
4 0:08:14:00 0:08:14:84
And I would like to transform it in seconds, something like that
Start End
0 0.0
1 0.0 457.80
2 488.56 490.80
3 493.40
4 494.0 494.84
To do that I did:
i = 0
j = 0
while j < 10:
while i < 10:
if data.iloc[i, j] != "":
Value = (int(data.iloc[i, j][0]) * 3600) + (int(data.iloc[i, j][2:4]) *60) + int(data.iloc[i, j][5:7]) + (int(data.iloc[i, j][8: 10])/100)
NewValue = data.iloc[:, j].replace([data.iloc[i, j]], Value)
i += 1
else:
NewValue = data.iloc[:, j].replace([data.iloc[i, j]], "")
i += 1
data.update(NewValue)
i = 0
j += 1
But I failed to replace the new values in my oldest dataframe in a permament way, when I do:
print(data)
I still get my old data frame in the wrong format.
Some one could hep me? I tried so hard!
Thank you so so much!
You are using pandas.DataFrame.update that requires a pandas dataframe as an argument. See the Example part of the update function documentation to really understand what update does https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html
If I may suggest a more idiomatic solution; you can directly map a function to all values of a pandas Series
def parse_timestring(s):
if s == "":
return s
else:
# weird to use centiseconds and not milliseconds
# l is a list with [hour, minute, second, cs]
l = [int(nbr) for nbr in s.split(":")]
return sum([a*b for a,b in zip(l, (3600, 60, 1, 0.01))])
df["Start"] = df["Start"].map(parse_timestring)
You can remove the if ... else ... from parse_timestring if you replace all empty string with nan values in your dataframe with df = df.replace("", numpy.nan) then use df["Start"] = df["Start"].map(parse_timestring, na_action='ignore')
see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html
The datetimelibrary is made to deal with such data. You should also use the apply function of pandas to avoid iterating on the dataframe like that.
You should proceed as follow :
from datetime import datetime, timedelta
def to_seconds(date):
comp = date.split(':')
delta = (datetime.strptime(':'.join(comp[1:]),"%H:%M:%S") - datetime(1900, 1, 1)) + timedelta(days=int(comp[0]))
return delta.total_seconds()
data['Start'] = data['Start'].apply(to_seconds)
data['End'] = data['End'].apply(to_seconds)
Thank you so much for your help.
Your method was working. I also found a method using loop:
To summarize, my general problem was that I had an ugly csv file that I wanted to transform is a csv usable for doing statistics, and to do that I wanted to use python.
my csv file was like:
MiceID = 1 Beginning End Type of behavior
0 0:00:00:00 Video start
1 0:00:01:36 grooming type 1
2 0:00:03:18 grooming type 2
3 0:00:06:73 0:00:08:16 grooming type 1
So in my ugly csv file I was writing only the moment of the begining of the behavior type without the end when the different types of behaviors directly followed each other, and I was writing the moment of the end of the behavior when the mice stopped to make any grooming, that allowed me to separate sequences of grooming. But this type of csv was not usable for easily making statistics.
So I wanted 1) transform all my value in seconds to have a correct format, 2) then I wanted to fill the gap in the end colonne (a gap has to be fill with the following begining value, as the end of a specific behavior in a sequence is the begining of the following), 3) then I wanted to create columns corresponding to the duration of each behavior, and finally 4) to fill this new column with the duration.
My questionning was about the first step, but I put here the code for each step separately:
step 1: transform the values in a good format
import pandas as pd
import numpy as np
data = pd.read_csv("D:/Python/TestPythonTraitementDonnéesExcel/RawDataBatch2et3.csv", engine = "python")
data.replace(np.nan, "", inplace = True)
i = 0
j = 0
while j < len(data.columns):
while i < len(data.index):
if (":" in data.iloc[i, j]) == True:
Value = str((int(data.iloc[i, j][0]) * 3600) + (int(data.iloc[i, j][2:4]) *60) + int(data.iloc[i, j][5:7]) + (int(data.iloc[i, j][8: 10])/100))
data = data.replace([data.iloc[i, j]], Value)
data.update(data)
i += 1
else:
i += 1
i = 0
j += 1
print(data)
step 2: fill the gaps
i = 0
j = 2
while j < len(data.columns):
while i < len(data.index) - 1:
if data.iloc[i, j] == "":
data.iloc[i, j] = data.iloc[i + 1, j - 1]
data.update(data)
i += 1
elif np.all(data.iloc[i:len(data.index), j] == ""):
break
else:
i += 1
i = 0
j += 4
print(data)
step 3: create a new colunm for each mice:
j = 1
k = 0
while k < len(data.columns) - 1:
k = (j * 4) + (j - 1)
data.insert(k, "Duree{}".format(k), "")
data.update(data)
j += 1
print(data)
step 3: fill the gaps
j = 4
i = 0
while j < len(data.columns):
while i < len(data.index):
if data.iloc[i, j - 2] != "":
data.iloc[i, j] = str(float(data.iloc[i, j - 2]) - float(data.iloc[i, j - 3]))
data.update(data)
i += 1
else:
break
i = 0
j += 5
print(data)
And of course, export my new usable dataframe
data.to_csv(r"D:/Python/TestPythonTraitementDonnéesExcel/FichierPropre.csv", index = False, header = True)
here are the transformations:
click on the links for the pictures
before step1
after step 1
after step 2
after step 3
after step 4

Python: How to find all ways to decode a string?

I'm trying to solve this problem but it fails with input "226".
Problem:
A message containing letters from A-Z is being encoded to numbers using the following mapping:
'A' -> 1
'B' -> 2
...
'Z' -> 26
Given a non-empty string containing only digits, determine the total number of ways to decode it.
My Code:
class Solution:
def numDecodings(self, s: str) -> int:
decode =[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]
ways = []
for d in decode:
for i in s:
if str(d) == s or str(d) in s:
ways.append(d)
if int(i) in decode:
ways.append(str(i))
return len(ways)
My code returns 2. It only takes care of combinations (22,6) and (2,26).
It should be returning 3, so I'm not sure how to take care of the (2,2,6) combination.
Looks like this problem can be broken down into many subproblems thus can be solved recursively
Subproblem 1 = when the last digit of the string is valid ( i.e. non zero number ) for that you can just recur for (n-1) digits left
if s[n-1] > "0":
count = number_of_decodings(s,n-1)
Subproblem 2 = when last 2 digits form a valid number ( less then 27 ) for that you can just recur for remaining (n-2) digits
if (s[n - 2] == '1' or (s[n - 2] == '2' and s[n - 1] < '7') ) :
count += number_of_decodings(s, n - 2)
Base Case = length of the string is 0 or 1
if n == 0 or n == 1 :
return 1
EDIT: A quick searching on internet , I found another ( more interesting ) method to solve this particular problem which uses dynamic programming to solve this problem
# A Dynamic Programming based function
# to count decodings
def countDecodingDP(digits, n):
count = [0] * (n + 1); # A table to store
# results of subproblems
count[0] = 1;
count[1] = 1;
for i in range(2, n + 1):
count[i] = 0;
# If the last digit is not 0, then last
# digit must add to the number of words
if (digits[i - 1] > '0'):
count[i] = count[i - 1];
# If second last digit is smaller than 2
# and last digit is smaller than 7, then
# last two digits form a valid character
if (digits[i - 2] == '1' or
(digits[i - 2] == '2' and
digits[i - 1] < '7') ):
count[i] += count[i - 2];
return count[n];
the above solution solves the problem in complexity of O(n) and uses the similar method as that of fibonacci number problem
source: https://www.geeksforgeeks.org/count-possible-decodings-given-digit-sequence/
This seemed like a natural for recursion. Since I was bored, and the first answer didn't use recursion and didn't return the actual decodings, I thought there was room for improvement. For what it's worth...
def encodings(str, prefix = ''):
encs = []
if len(str) > 0:
es = encodings(str[1:], (prefix + ',' if prefix else '') + str[0])
encs.extend(es)
if len(str) > 1 and int(str[0:2]) <= 26:
es = encodings(str[2:], (prefix + ',' if prefix else '') + str[0:2])
encs.extend(es)
return encs if len(str) else [prefix]
This returns a list of the possible decodings. To get the count, you just take the length of the list. Here a sample run:
encs = encodings("123")
print("{} {}".format(len(encs), encs))
with result:
3 ['1,2,3', '1,23', '12,3']
Another sample run:
encs = encodings("123123")
print("{} {}".format(len(encs), encs))
with result:
9 ['1,2,3,1,2,3', '1,2,3,1,23', '1,2,3,12,3', '1,23,1,2,3', '1,23,1,23', '1,23,12,3', '12,3,1,2,3', '12,3,1,23', '12,3,12,3']

Levenshtein distance in python giving only 1 as edit distance

I have a python program to read two lists (one with errors and other with correct data). Every element in my list with error needs to be compared with every element in my correct list. After comparing i get all the edit distance between every compared pair. now i can find the least edit distance for the given error data and thru get my correct data.
I am trying to use levenshtein distance to calculate the edit distance but its returning all edit distance as 1 even which is wrong.
This means the code for calculating levenshtein distance is not correct. I am struggling to find a fix for this. HELP!
My Code
import csv
def lev(a, b):
if not a: return len(b)
if not b: return len(a)
return min(lev(a[1:], b[1:])+(a[0] != b[0]), lev(a[1:], b)+1, lev(a, b[1:])+1)
if __name__ == "__main__":
with open("all_correct_promo.csv","rb") as file1:
reader1 = csv.reader(file1)
correctPromoList = list(reader1)
#print correctPromoList
with open("all_extracted_promo.csv","rb") as file2:
reader2 = csv.reader(file2)
extractedPromoList = list(reader2)
#print extractedPromoList
incorrectPromo = []
count = 0
for extracted in extractedPromoList:
if(extracted not in correctPromoList):
incorrectPromo.append(extracted)
else:
count = count + 1
#print incorrectPromo
for promos in incorrectPromo:
for correctPromo in correctPromoList:
distance = lev(promos,correctPromo)
print promos, correctPromo , distance
There is a Python package available that implements the levenshtein distance : python-levenshtein
To install it:
pip install python-levenshtein
To use it:
>>> import Levenshtein
>>> string1 = 'dsfjksdjs'
>>> string2 = 'dsfiksjsd'
>>> print Levenshtein.distance(string1, string2)
3
The implementation is correct. I tested this:
def lev(a, b):
if not a: return len(b)
if not b: return len(a)
return min(lev(a[1:], b[1:])+(a[0] != b[0]), lev(a[1:], b)+1, lev(a, b[1:])+1)
print lev('abcde','bc') # prints 3, which is correct
print lev('abc','bc') # prints 1, which is correct
Your problem, as I noticed by your comments, is probably when you call the method:
a = ['NSP-212690']
b = ['FE SV X']
print lev(a,b) # prints 1 which is incorrect because you are comparing arrays, not strings
print lev(a[0],b[0]) # prints 10, which is correct
So, what you can do is:
Before the call to "lev(a,b)", extract the first element of each array
def lev(a, b):
if not a: return len(b)
if not b: return len(a)
return min(lev(a[1:], b[1:])+(a[0] != b[0]), lev(a[1:], b)+1, lev(a, b[1:])+1)
a = ['NSP-212690']
b = ['FE SV X']
a = a[0] # this is the key part
b = b[0] # and this
print lev(a,b) # prints 10, which is correct
Anyway, I would not recommend you that recursive implementation, because the performance is veeeery poor
I would recommend this implementation instead (source: wikipedia-levenshtein)
def lev(seq1, seq2):
oneago = None
thisrow = range(1, len(seq2) + 1) + [0]
for x in xrange(len(seq1)):
twoago, oneago, thisrow = oneago, thisrow, [0] * len(seq2) + [x + 1]
for y in xrange(len(seq2)):
delcost = oneago[y] + 1
addcost = thisrow[y - 1] + 1
subcost = oneago[y - 1] + (seq1[x] != seq2[y])
thisrow[y] = min(delcost, addcost, subcost)
return thisrow[len(seq2) - 1]
or maybe this slightly modified version:
def lev(seq1, seq2):
if not a: return len(b)
if not b: return len(a)
oneago = None
thisrow = range(1, len(seq2) + 1) + [0]
for x in xrange(len(seq1)):
twoago, oneago, thisrow = oneago, thisrow, [0] * len(seq2) + [x + 1]
for y in xrange(len(seq2)):
delcost = oneago[y] + 1
addcost = thisrow[y - 1] + 1
subcost = oneago[y - 1] + (seq1[x] != seq2[y])
thisrow[y] = min(delcost, addcost, subcost)
return thisrow[len(seq2) - 1]

Python Dynamic Knapsack

Right now I am attempting to code the knapsack problem in Python 3.2. I am trying to do this dynamically with a matrix. The algorithm that I am trying to use is as follows
Implements the memoryfunction method for the knapsack problem
Input: A nonnegative integer i indicating the number of the first
items being considered and a nonnegative integer j indicating the knapsack's capacity
Output: The value of an optimal feasible subset of the first i items
Note: Uses as global variables input arrays Weights[1..n], Values[1...n]
and table V[0...n, 0...W] whose entries are initialized with -1's except for
row 0 and column 0 initialized with 0's
if V[i, j] < 0
if j < Weights[i]
value <-- MFKnapsack(i - 1, j)
else
value <-- max(MFKnapsack(i -1, j),
Values[i] + MFKnapsack(i -1, j - Weights[i]))
V[i, j} <-- value
return V[i, j]
If you run the code below that I have you can see that it tries to insert the weight into the the list. Since this is using the recursion I am having a hard time spotting the problem. Also I get the error: can not add an integer with a list using the '+'. I have the matrix initialized to start with all 0's for the first row and first column everything else is initialized to -1. Any help will be much appreciated.
#Knapsack Problem
def knapsack(weight,value,capacity):
weight.insert(0,0)
value.insert(0,0)
print("Weights: ",weight)
print("Values: ",value)
capacityJ = capacity+1
## ------ initialize matrix F ---- ##
dimension = len(weight)+1
F = [[-1]*capacityJ]*dimension
#first column zeroed
for i in range(dimension):
F[i][0] = 0
#first row zeroed
F[0] = [0]*capacityJ
#-------------------------------- ##
d_index = dimension-2
print(matrixFormat(F))
return recKnap(F,weight,value,d_index,capacity)
def recKnap(matrix, weight,value,index, capacity):
print("index:",index,"capacity:",capacity)
if matrix[index][capacity] < 0:
if capacity < weight[index]:
value = recKnap(matrix,weight,value,index-1,capacity)
else:
value = max(recKnap(matrix,weight,value,index-1,capacity),
value[index] +
recKnap(matrix,weight,value,index-1,capacity-(weight[index]))
matrix[index][capacity] = value
print("matrix:",matrix)
return matrix[index][capacity]
def matrixFormat(*doubleLst):
matrix = str(list(doubleLst)[0])
length = len(matrix)-1
temp = '|'
currChar = ''
nextChar = ''
i = 0
while i < length:
if matrix[i] == ']':
temp = temp + '|\n|'
#double digit
elif matrix[i].isdigit() and matrix[i+1].isdigit():
temp = temp + (matrix[i]+matrix[i+1]).center(4)
i = i+2
continue
#negative double digit
elif matrix[i] == '-' and matrix[i+1].isdigit() and matrix[i+2].isdigit():
temp = temp + (matrix[i]+matrix[i+1]+matrix[i+2]).center(4)
i = i + 2
continue
#negative single digit
elif matrix[i] == '-' and matrix[i+1].isdigit():
temp = temp + (matrix[i]+matrix[i+1]).center(4)
i = i + 2
continue
elif matrix[i].isdigit():
temp = temp + matrix[i].center(4)
#updates next round
currChar = matrix[i]
nextChar = matrix[i+1]
i = i + 1
return temp[:-1]
def main():
print("Knapsack Program")
#num = input("Enter the weights you have for objects you would like to have:")
#weightlst = []
#valuelst = []
## for i in range(int(num)):
## value , weight = eval(input("What is the " + str(i) + " object value, weight you wish to put in the knapsack? ex. 2,3: "))
## weightlst.append(weight)
## valuelst.append(value)
weightLst = [2,1,3,2]
valueLst = [12,10,20,15]
capacity = 5
value = knapsack(weightLst,valueLst,5)
print("\n Max Matrix")
print(matrixFormat(value))
main()
F = [[-1]*capacityJ]*dimension
does not properly initialize the matrix. [-1]*capacityJ is fine, but [...]*dimension creates dimension references to the exact same list. So modifying one list modifies them all.
Try instead
F = [[-1]*capacityJ for _ in range(dimension)]
This is a common Python pitfall. See this post for more explanation.
for the purpose of cache illustration, I generally use a default dict as follows:
from collections import defaultdict
CS = defaultdict(lambda: defaultdict(int)) #if i want to make default vals as 0
###or
CACHE_1 = defaultdict(lambda: defaultdict(lambda: int(-1))) #if i want to make default vals as -1 (or something else)
This keeps me from making the 2d arrays in python on the fly...
To see an answer to z1knapsack using this approach:
http://ideone.com/fUKZmq
def zeroes(n,m):
v=[['-' for i in range(0,n)]for j in range(0,m)]
return v
value=[0,12,10,20,15]
w=[0,2,1,3,2]
v=zeroes(6,5)
def knap(i,j):
global v
if i==0 or j==0:
v[i][j]= 0
elif j<w[i] :
v[i][j]=knap(i-1,j)
else:
v[i][j]=max(knap(i-1,j),value[i]+knap(i-1,j-w[i]))
return v[i][j]
x=knap(4,5)
print (x)
for i in range (0,len(v)):
for j in range(0,len(v[0])):
print(v[i][j],end="\t\t")
print()
print()
#now these calls are for filling all the boxes in the matrix as in the above call only few v[i][j]were called and returned
knap(4,1)
knap(4,2)
knap(4,3)
knap(4,4)
for i in range (0,len(v)):
for j in range(0,len(v[0])):
print(v[i][j],end="\t\t")
print()
print()

Categories

Resources