Python Insertion Sort Algorithm - python

Basically i'm trying to write an insertion sort algorithm in python and I have no idea where i'm going wrong
#!/usr/bin/env python
# coding: utf-8
import random
Array = random.sample(range(30), 5)
First = 1
Last = len(Array)
PositionOfNext = Last – 1
while PositionOfNext >= First:
Next = Array(PositionOfNext)
Current = PositionOfNext
while (Current < Last) and (Next > Array[Current] + 1):
Current = Current + 1
(Array[Current] - 1) = Array[Current]
Array[Current] = Next
PositionOfNext = PositionOfNext - 1
print Array

Fixing some syntax issues and some of the indices.
Also replacing:
(Array[Current] - 1) = Array[Current]
by:
Array[Current - 1], Array[Current] = Array[Current], Array[Current - 1]
The code completed
#!/usr/bin/env python
# coding: utf-8
import random
Array = random.sample(range(30), 5)
print Array
First = 0
Last = len(Array) - 1
PositionOfNext = Last - 1
while PositionOfNext >= First:
Next = Array[PositionOfNext]
Current = PositionOfNext
while (Current < Last) and (Array[Current] > Array[Current + 1]):
Current = Current + 1
Array[Current - 1], Array[Current] = Array[Current], Array[Current - 1]
Array[Current] = Next
PositionOfNext = PositionOfNext - 1
print Array

def insertionSort(alist):
for index in range(1,len(alist)):
currentvalue = alist[index]
position = index
while position>0 and alist[position-1]>currentvalue:
alist[position]=alist[position-1]
position = position-1
alist[position]=currentvalue
alist = [54,26,93,17,77,31,44,55,20]
insertionSort(alist)
print(alist)
The Insertion Sort

How about:
def insertion_sort(x):
# insertion sort
# we can optimize for desc, asc if we want to
# advantages: online, O(nk) for nearly sorted
x_sorted = [x[0]]
x_unsorted = x[1::]
for xx in x_unsorted:
x_sorted.append(xx) # make room, and/or assume a sorted input list
for i in range(len(x_sorted)-1):
if xx < x_sorted[i]: # asc?
x_sorted[i+1::] = x_sorted[i:-1] # shift old values
x_sorted[i] = xx # insert new
break # nothing to do in the inner loop form here on out
i += 1
return x_sorted

Related

Check the difference between string i and p

How would I go about checking to see what the difference between string p and i? So the 2nd line can equal the first line.
t=int(input())
print(t)
for i in range(t):
print(i)
i=input()
p=input()
print(i,p)
print('Case #'+(str(i+1))+': ')
if len(i)==0:
#print(len(p))
else:
#print((len(p)-len(i)))
Help Barbara find out how many extra letters she needs to remove in order to obtain I or if I cannot be obtained from P by removing letters then output IMPOSSIBLE.
input:
2
aaaa
aaaaa
bbbbb
bbbbc
output:
Case #1: 1
Case #2: IMPOSSIBLE
You can use Levenshtein distance to calculate the difference and decide what is possible and impossible yourself.
You can find more resources on YouTube to understand the concept better. E.g. https://www.youtube.com/watch?v=We3YDTzNXEk
I have provided a version of code for your convenient as well.
import numpy as np
def calculate_edit_distance(source, target):
'''Calculate the edit distance from source to target
[In] source="ab" target="bc"
[Out] return 2
'''
num_row = len(target) + 1
num_col = len(source) + 1
distance_table = np.array([[0] * num_col for _ in range(num_row)])
# getting from X[0...i] to empty target string requires i deletions
distance_table[:, 0] = [i for i in range(num_row)]
# getting from Y[0...i] to empty source string requires i deletions
distance_table[0] = [i for i in range(num_col)]
# loop through all the characters and calculate their respective distances
for i in range(num_row - 1):
for j in range(num_col - 1):
insert = distance_table[i + 1, j]
delete = distance_table[i, j + 1]
substitute = distance_table[i, j]
# if target char and source char are the same,
# just copy the diagonal value
if target[i] == source[j]:
distance_table[i + 1, j + 1] = substitute
else:
operations = [delete, insert, substitute]
best_operation = np.argmin(operations)
if best_operation == 2: # +2 if the operation is to substitute
distance_table[i + 1, j + 1] = substitute + 2
else: # same formula for both delete and insert operation
distance_table[i + 1, j + 1] = operations[best_operation] + 1
return distance_table[num_row - 1, num_col - 1]

Making permanent change in a dataframe using python pandas

I would like to convert y dataframe from one format (X:XX:XX:XX) of values to another (X.X seconds)
Here is my dataframe looks like:
Start End
0 0:00:00:00
1 0:00:00:00 0:07:37:80
2 0:08:08:56 0:08:10:08
3 0:08:13:40
4 0:08:14:00 0:08:14:84
And I would like to transform it in seconds, something like that
Start End
0 0.0
1 0.0 457.80
2 488.56 490.80
3 493.40
4 494.0 494.84
To do that I did:
i = 0
j = 0
while j < 10:
while i < 10:
if data.iloc[i, j] != "":
Value = (int(data.iloc[i, j][0]) * 3600) + (int(data.iloc[i, j][2:4]) *60) + int(data.iloc[i, j][5:7]) + (int(data.iloc[i, j][8: 10])/100)
NewValue = data.iloc[:, j].replace([data.iloc[i, j]], Value)
i += 1
else:
NewValue = data.iloc[:, j].replace([data.iloc[i, j]], "")
i += 1
data.update(NewValue)
i = 0
j += 1
But I failed to replace the new values in my oldest dataframe in a permament way, when I do:
print(data)
I still get my old data frame in the wrong format.
Some one could hep me? I tried so hard!
Thank you so so much!
You are using pandas.DataFrame.update that requires a pandas dataframe as an argument. See the Example part of the update function documentation to really understand what update does https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html
If I may suggest a more idiomatic solution; you can directly map a function to all values of a pandas Series
def parse_timestring(s):
if s == "":
return s
else:
# weird to use centiseconds and not milliseconds
# l is a list with [hour, minute, second, cs]
l = [int(nbr) for nbr in s.split(":")]
return sum([a*b for a,b in zip(l, (3600, 60, 1, 0.01))])
df["Start"] = df["Start"].map(parse_timestring)
You can remove the if ... else ... from parse_timestring if you replace all empty string with nan values in your dataframe with df = df.replace("", numpy.nan) then use df["Start"] = df["Start"].map(parse_timestring, na_action='ignore')
see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html
The datetimelibrary is made to deal with such data. You should also use the apply function of pandas to avoid iterating on the dataframe like that.
You should proceed as follow :
from datetime import datetime, timedelta
def to_seconds(date):
comp = date.split(':')
delta = (datetime.strptime(':'.join(comp[1:]),"%H:%M:%S") - datetime(1900, 1, 1)) + timedelta(days=int(comp[0]))
return delta.total_seconds()
data['Start'] = data['Start'].apply(to_seconds)
data['End'] = data['End'].apply(to_seconds)
Thank you so much for your help.
Your method was working. I also found a method using loop:
To summarize, my general problem was that I had an ugly csv file that I wanted to transform is a csv usable for doing statistics, and to do that I wanted to use python.
my csv file was like:
MiceID = 1 Beginning End Type of behavior
0 0:00:00:00 Video start
1 0:00:01:36 grooming type 1
2 0:00:03:18 grooming type 2
3 0:00:06:73 0:00:08:16 grooming type 1
So in my ugly csv file I was writing only the moment of the begining of the behavior type without the end when the different types of behaviors directly followed each other, and I was writing the moment of the end of the behavior when the mice stopped to make any grooming, that allowed me to separate sequences of grooming. But this type of csv was not usable for easily making statistics.
So I wanted 1) transform all my value in seconds to have a correct format, 2) then I wanted to fill the gap in the end colonne (a gap has to be fill with the following begining value, as the end of a specific behavior in a sequence is the begining of the following), 3) then I wanted to create columns corresponding to the duration of each behavior, and finally 4) to fill this new column with the duration.
My questionning was about the first step, but I put here the code for each step separately:
step 1: transform the values in a good format
import pandas as pd
import numpy as np
data = pd.read_csv("D:/Python/TestPythonTraitementDonnéesExcel/RawDataBatch2et3.csv", engine = "python")
data.replace(np.nan, "", inplace = True)
i = 0
j = 0
while j < len(data.columns):
while i < len(data.index):
if (":" in data.iloc[i, j]) == True:
Value = str((int(data.iloc[i, j][0]) * 3600) + (int(data.iloc[i, j][2:4]) *60) + int(data.iloc[i, j][5:7]) + (int(data.iloc[i, j][8: 10])/100))
data = data.replace([data.iloc[i, j]], Value)
data.update(data)
i += 1
else:
i += 1
i = 0
j += 1
print(data)
step 2: fill the gaps
i = 0
j = 2
while j < len(data.columns):
while i < len(data.index) - 1:
if data.iloc[i, j] == "":
data.iloc[i, j] = data.iloc[i + 1, j - 1]
data.update(data)
i += 1
elif np.all(data.iloc[i:len(data.index), j] == ""):
break
else:
i += 1
i = 0
j += 4
print(data)
step 3: create a new colunm for each mice:
j = 1
k = 0
while k < len(data.columns) - 1:
k = (j * 4) + (j - 1)
data.insert(k, "Duree{}".format(k), "")
data.update(data)
j += 1
print(data)
step 3: fill the gaps
j = 4
i = 0
while j < len(data.columns):
while i < len(data.index):
if data.iloc[i, j - 2] != "":
data.iloc[i, j] = str(float(data.iloc[i, j - 2]) - float(data.iloc[i, j - 3]))
data.update(data)
i += 1
else:
break
i = 0
j += 5
print(data)
And of course, export my new usable dataframe
data.to_csv(r"D:/Python/TestPythonTraitementDonnéesExcel/FichierPropre.csv", index = False, header = True)
here are the transformations:
click on the links for the pictures
before step1
after step 1
after step 2
after step 3
after step 4

How to optimize an O(N*M) to be O(n**2)?

I am trying to solve USACO's Milking Cows problem. The problem statement is here: https://train.usaco.org/usacoprob2?S=milk2&a=n3lMlotUxJ1
Given a series of intervals in the form of a 2d array, I have to find the longest interval and the longest interval in which no milking was occurring.
Ex. Given the array [[500,1200],[200,900],[100,1200]], the longest interval would be 1100 as there is continuous milking and the longest interval without milking would be 0 as there are no rest periods.
I have tried looking at whether utilizing a dictionary would decrease run times but I haven't had much success.
f = open('milk2.in', 'r')
w = open('milk2.out', 'w')
#getting the input
farmers = int(f.readline().strip())
schedule = []
for i in range(farmers):
schedule.append(f.readline().strip().split())
#schedule = data
minvalue = 0
maxvalue = 0
#getting the minimums and maximums of the data
for time in range(farmers):
schedule[time][0] = int(schedule[time][0])
schedule[time][1] = int(schedule[time][1])
if (minvalue == 0):
minvalue = schedule[time][0]
if (maxvalue == 0):
maxvalue = schedule[time][1]
minvalue = min(schedule[time][0], minvalue)
maxvalue = max(schedule[time][1], maxvalue)
filled_thistime = 0
filled_max = 0
empty_max = 0
empty_thistime = 0
#goes through all the possible items in between the minimum and the maximum
for point in range(minvalue, maxvalue):
isfilled = False
#goes through all the data for each point value in order to find the best values
for check in range(farmers):
if point >= schedule[check][0] and point < schedule[check][1]:
filled_thistime += 1
empty_thistime = 0
isfilled = True
break
if isfilled == False:
filled_thistime = 0
empty_thistime += 1
if (filled_max < filled_thistime) :
filled_max = filled_thistime
if (empty_max < empty_thistime) :
empty_max = empty_thistime
print(filled_max)
print(empty_max)
if (filled_max < filled_thistime):
filled_max = filled_thistime
w.write(str(filled_max) + " " + str(empty_max) + "\n")
f.close()
w.close()
The program works fine, but I need to decrease the time it takes to run.
A less pretty but more efficient approach would be to solve this like a free list, though it is a bit more tricky since the ranges can overlap. This method only requires looping through the input list a single time.
def insert(start, end):
for existing in times:
existing_start, existing_end = existing
# New time is a subset of existing time
if start >= existing_start and end <= existing_end:
return
# New time ends during existing time
elif end >= existing_start and end <= existing_end:
times.remove(existing)
return insert(start, existing_end)
# New time starts during existing time
elif start >= existing_start and start <= existing_end:
# existing[1] = max(existing_end, end)
times.remove(existing)
return insert(existing_start, end)
# New time is superset of existing time
elif start <= existing_start and end >= existing_end:
times.remove(existing)
return insert(start, end)
times.append([start, end])
data = [
[500,1200],
[200,900],
[100,1200]
]
times = [data[0]]
for start, end in data[1:]:
insert(start, end)
longest_milk = 0
longest_gap = 0
for i, time in enumerate(times):
duration = time[1] - time[0]
if duration > longest_milk:
longest_milk = duration
if i != len(times) - 1 and times[i+1][0] - times[i][1] > longest_gap:
longes_gap = times[i+1][0] - times[i][1]
print(longest_milk, longest_gap)
As stated in the comments, if the input is sorted, the complexity could be O(n), if that's not the case we need to sort it first and the complexity is O(nlog n):
lst = [ [300,1000],
[700,1200],
[1500,2100] ]
from itertools import groupby
longest_milking = 0
longest_idle = 0
l = sorted(lst, key=lambda k: k[0])
for v, g in groupby(zip(l[::1], l[1::1]), lambda k: k[1][0] <= k[0][1]):
l = [*g][0]
if v:
mn, mx = min(i[0] for i in l), max(i[1] for i in l)
if mx-mn > longest_milking:
longest_milking = mx-mn
else:
mx = max((i2[0] - i1[1] for i1, i2 in zip(l[::1], l[1::1])))
if mx > longest_idle:
longest_idle = mx
# corner case, N=1 (only one interval)
if len(lst) == 1:
longest_milking = lst[0][1] - lst[0][0]
print(longest_milking)
print(longest_idle)
Prints:
900
300
For input:
lst = [ [500,1200],
[200,900],
[100,1200] ]
Prints:
1100
0

Levenshtein distance in a file

The statement says:
Modify the above program so that given the GGCCTTGCCATTGG pattern, each of the first 10 lines of the previous file indicates:
· The distance of edition that finds the substring more similar of that line.
· The substrings of that line that finds to minimum distance of edition
The above program is this:
import time
def levenshtein_distance (first, second):
if len(first) > len(second):
first, second = second, first
if len(second) == 0:
return len(fist)
first_length = len(first) + 1
second_length = len(second) + 1
distance_matrix = [[0]*second_length for x in range(first_length)]
for i in range(first_length): distance_matrix[i][0] = i
for j in range(second_length): distance_matrix[0][j] = j
for i in xrange(1, first_length):
for j in range(1, second_length):
deletion = distance_matrix[i-1][j] + 1
insertion = distance_matrix[i][j-1] + 2
substitution = distance_matrix[i-1][j-1] + 1
if first[i-1] != second[j-1]:
substitution += 1
distance_matrix[i][j] = min(insertion, deletion, substitution)
return distance_matrix[first_length-1][second_length-1]
def dna(patro):
t1 = time.clock()
f = open("HUMAN-DNA.txt")
text = f.readlines()
f.close()
distanciaMin = 100000000
distanciaPosicion = 0
distanciaLinea = 0
distanciaSubstring = ""
numeroLinea = 0
for line in text:
numeroLinea = numeroLinea + 1
for i in range(len(line)-len(patro)):
cadena = line[i:i+len(patro)]
distancia = levenshtein_distance(cadena, patro)
if distancia < distanciaMin:
distanciaMin = distancia
distanciaPosicion = 1
distanciaLinea = numeroLinea
distanciaSubstring = cadena
t2 = time.clock()
Now i put the new pattern
dna("GGCCTTGCCATTGG")
I have the distance of edition that is distanciaMin and I'm not sure about result of distanciaSubstring that is the substrings of that line(second point of statement), my question is How can i count the first ten lines in the text?
A part of the file is:
CCCATCTCTTTCTCATTCCTTGGTTGAGAACACGAACTTCAGGACTTGCCTCACACTAGGGCCCATTCTT
TGTTTCCCAGAAAGAAGAGGCTCTCCACACAGAGTCCCATGTACACCAGGCTGTCAACAAACATGAATTG
AATGAAGGAGTGGATGGTTGGGTGGAAGTGATTTAAGAAATCCTAACTGGGGAATTTCACTGGAAACTTA
GGAAATTCAATTTATATAAAGTCTATGAATCGTCCATTTTTGTGTCCGCACATTCAAATGCTGTAGCTAA
TTTCCTGCTAAACAGTAGAAATTCAGTAAGTGTTCATGTTGAAAGGATGAAATTTGAGTGCTCTTGCATC
CTCAAAGAACTCTAGTAAAATAGAAATAAAGCTTTATTTGGAAGATTAAGTCATGAGCATAATTATGAGA
AGGCGGTCATTCTAATAATAGTGTCTTCACAAGTAGATGCTACATGCTGTGTAATATTTTGACTAAAAAA
AGTTCCTCTCAACATTTCTGAAGTGAGATAATGTACAACGATCCATGTTTTTAGCTACCTTGATAAGTTT
AGTGCATCCAGGGCTCCTTTCTTACCTGCTAACCGCCGAGTTTCAAATGCTAAGAAATTCTTCATTTCCT
AACACAAATATTCAATATAATTGCTGGTTGTTTGGGAGAAGAAAAATTTAGAATTCAGAAAGAAATACAG
AATGAAATGTTCTAATCAATCGAAAAAGGATTCTATAGACTTCGACGTTGTCTGGTTTACAAAGCAGTCT
I couldn't understand your full question. But I am trying to solve How can i count the first ten lines in the text?. You can use filehandler.readlines(). It will load files in memory as a list where each row is separated by new line character.
Then you can read 10 lines from the list. You can try something like this,
>>> a = [0,1,2,3,4,5,6,7,8,9] # read file as a list of lines (a)
>>> def line(a, jump=2): # keep jump = 10 for your requirement.
lines = len(a)
i = 0
while i < lines+1:
yield a[i:i+jump]
i += jump
>>> foo = line(a)
>>> foo.next()
[0, 1]
>>> foo.next()
[2, 3]
>>> foo.next()
[4, 5]
For your code it will be,
foo = line(text, 10)
foo.next() # should return you 10 elements in each call

leading number groups between two numbers

(Python) Given two numbers A and B. I need to find all nested "groups" of numbers:
range(2169800, 2171194)
leading numbers: 21698XX, 21699XX, 2170XX, 21710XX, 217110X, 217111X,
217112X, 217113X, 217114X, 217115X, 217116X, 217117X, 217118X, 2171190X,
2171191X, 2171192X, 2171193X, 2171194X
or like this:
range(1000, 1452)
leading numbers: 10XX, 11XX, 12XX, 13XX, 140X, 141X, 142X, 143X,
144X, 1450, 1451, 1452
Harder than it first looked - pretty sure this is solid and will handle most boundary conditions. :) (There are few!!)
def leading(a, b):
# generate digit pairs a=123, b=456 -> [(1, 4), (2, 5), (3, 6)]
zip_digits = zip(str(a), str(b))
zip_digits = map(lambda (x,y):(int(x), int(y)), zip_digits)
# this ignores problems where the last matching digits are 0 and 9
# leading (12000, 12999) is same as leading(12, 12)
while(zip_digits[-1] == (0,9)):
zip_digits.pop()
# start recursion
return compute_leading(zip_digits)
def compute_leading(zip_digits):
if(len(zip_digits) == 1): # 1 digit case is simple!! :)
(a,b) = zip_digits.pop()
return range(a, b+1)
#now we partition the problem
# given leading(123,456) we decompose this into 3 problems
# lows -> leading(123,129)
# middle -> leading(130,449) which we can recurse to leading(13,44)
# highs -> leading(450,456)
last_digits = zip_digits.pop()
low_prefix = reduce(lambda x, y : 10 * x + y, [tup[0] for tup in zip_digits]) * 10 # base for lows e.g. 120
high_prefix = reduce(lambda x, y : 10 * x + y, [tup[1] for tup in zip_digits]) * 10 # base for highs e.g. 450
lows = range(low_prefix + last_digits[0], low_prefix + 10)
highs = range(high_prefix + 0, high_prefix + last_digits[1] + 1)
#check for boundary cases where lows or highs have all ten digits
(a,b) = zip_digits.pop() # pop last digits of middle so they can be adjusted
if len(lows) == 10:
lows = []
else:
a = a + 1
if len(highs) == 10:
highs = []
else:
b = b - 1
zip_digits.append((a,b)) # push back last digits of middle after adjustments
return lows + compute_leading(zip_digits) + highs # and recurse - woohoo!!
print leading(199,411)
print leading(2169800, 2171194)
print leading(1000, 1452)
def foo(start, end):
index = 0
is_lower = False
while index < len(start):
if is_lower and start[index] == '0':
break
if not is_lower and start[index] < end[index]:
first_lower = index
is_lower = True
index += 1
return index-1, first_lower
start = '2169800'
end = '2171194'
result = []
while int(start) < int(end):
index, first_lower = foo(start, end)
range_end = index > first_lower and 10 or int(end[first_lower])
for x in range(int(start[index]), range_end):
result.append(start[:index] + str(x) + 'X'*(len(start)-index-1))
if range_end == 10:
start = str(int(start[:index])+1)+'0'+start[index+1:]
else:
start = start[:index] + str(range_end) + start[index+1:]
result.append(end)
print "Leading numbers:"
print result
I test the examples you've given, it is right. Hope this will help you
This should give you a good starting point :
def leading(start, end):
leading = []
hundreds = start // 100
while (end - hundreds * 100) > 100:
i = hundreds * 100
leading.append(range(i,i+100))
hundreds += 1
c = hundreds * 100
tens = 1
while (end - c - tens * 10) > 10:
i = c + tens * 10
leading.append(range(i, i + 10))
tens += 1
c += tens * 10
ones = 1
while (end - c - ones) > 0:
i = c + ones
leading.append(i)
ones += 1
leading.append(end)
return leading
Ok, the whole could be one loop-level deeper. But I thought it might be clearer this way. Hope, this helps you...
Update :
Now I see what you want. Furthermore, maria's code doesn't seem to be working for me. (Sorry...)
So please consider the following code :
def leading(start, end):
depth = 2
while 10 ** depth > end : depth -=1
leading = []
const = 0
coeff = start // 10 ** depth
while depth >= 0:
while (end - const - coeff * 10 ** depth) >= 10 ** depth:
leading.append(str(const / 10 ** depth + coeff) + "X" * depth)
coeff += 1
const += coeff * 10 ** depth
coeff = 0
depth -= 1
leading.append(end)
return leading
print leading(199,411)
print leading(2169800, 2171194)
print leading(1000, 1453)
print leading(1,12)
Now, let me try to explain the approach here.
The algorithm will try to find "end" starting from value "start" and check whether "end" is in the next 10^2 (which is 100 in this case). If it fails, it will make a leap of 10^2 until it succeeds. When it succeeds it will go one depth level lower. That is, it will make leaps one order of magnitude smaller. And loop that way until the depth is equal to zero (= leaps of 10^0 = 1). The algorithm stops when it reaches the "end" value.
You may also notice that I have the implemented the wrapping loop I mentioned so it is now possible to define the starting depth (or leap size) in a variable.
The first while loop makes sure the first leap does not overshoot the "end" value.
If you have any questions, just feel free to ask.

Categories

Resources