Python bubble sort breaks with large numbers only

Python bubble sort breaks with large numbers only - python

Im trying to create a bubble sort test algorithm that generates x amount of random integers and outputs to console and a text file. The number of numbers created as well as the max value for the random integers is determined by the variable bigsize. The code seems to work up to around when big size is ~2300, sometimes its more and sometimes it's less. I can always get 2000 to work though.
Edit: Also worth noting, it seems that the code breaks during the sorting process, as I get get a file to output unsorted numbers with no issues.
import random
import sys
bigsize = 2000
def main():
sys.setrecursionlimit(7000)
array = create_list()
print_array(array)
bubble_sort(array)
display_out(array)
def create_list():
array = [0] * bigsize
for x in range(bigsize):
array[x] = random.randint(0, bigsize)
return array
def bubble_sort(array):
increment = 0
buffer = 0
for x in range(len(array) - 1):
if (array[increment + 1] <= array[increment]):
buffer = array[increment + 1]
array[increment + 1] = array[increment]
array[increment] = buffer
increment = increment + 1
increment = 0
for x in range(len(array) - 1):
if (array[increment + 1] >= array[increment]):
increment = increment + 1
elif (array[increment + 1] < array[increment]):
bubble_sort(array)
def display_out(array):
for x in range(bigsize):
print(array[x])
main()

You have a dysfunctional sort. First and foremost, there is nothing useful about your recursion: you don't reduce the task -- you simply use recursion in place of the sort's outer loop. At that, you have implemented it incorrectly. I strongly recommend that you get more practice with the more basic skills of programming before you tackle recursion.
The first (non-)problem is a simple inefficiency: your loop has x as the index, but the loop body ignores x, while it maintains increment with the same value. There is no reason to have two separate variables. You can see how this is used in almost any bubble sort on the web:
for pos in range(len(array) - 1):
if array[pos+1] < array[pos]:
# switch the elements
You have a similar inefficiency in the second loop:
increment = 0
for x in range(len(array) - 1):
if (array[increment + 1] >= array[increment]):
increment = increment + 1
Again, you ignore x and maintain increment at the same value ... up until you find elements out of order:
elif (array[increment + 1] < array[increment]):
bubble_sort(array)
When you do so, you recur, but without altering increment. When you return from this recursion, the array must be properly sorted (assuming that the recursion logic is correct), and then you continue with this loop, iterating through the now-sorted array, ensuring up to bigsize times that the array is in order.
This entire loop is silly: if you simply set a flag when you make a switch in the first loop, you'll know whether or not you need to sort again. You'll do one extra iteration, but that doesn't affect the time complexity.
For instance:
done = True
for pos in range(len(array) - 1):
if array[pos+1] < array[pos]:
array[pos], array[pos+1] = array[pos+1], array[pos]
# Replace the second loop entirely
if not done:
bubble_sort(array)
I strongly recommend that you check the operation of your program by properly tracing the results. First, however, clean up the logic. Remove (for now) the superfluous code that writes to files, put in some basic tracing print statements, and study existing bubble sorts to see where you're making this all too "wordy". In fact, remove the recursion and simply repeat the sorting until done.
When I try this with bigsize=5000, it recurs to 3818 levels and quits. I'll leave the tracing up to you, if the problem is still there once you've cleaned up the program. There's not much point to fixing the "silent death" until you tighten the logic and trace the operation, so that you know you're fixing an otherwise working program. The current code does not "Make it easy for others to help you", as the posting guidelines say.

Related

Python list first n entries in a custom base number system

I am sorry if the title is a misnomer and/or doesn't properly describe what this is all about, you are welcome to edit the title to make it clear once you understand what this is about.
The thing is very simple, but I find it hard to describe, this thing is sorta like a number system, except it is about lists of integers.
So we start with a list of integers with only zero, foreach iteration we add one to it, until a certain limit is reached, then we insert 1 at the start of the list, and set the second element to 0, then iterate over the second element until the limit is reached again, then we add 1 to the first element and set the second element 0, and when the first element reaches the limit, insert another element with value of 1 to the start of the list, and zero the two elements after it, et cetera.
And just like this, when a place reaches limit, zero the place and the places after it, increase the place before it by one, and when all available places reach limit, add 1 to the left, for example:
0
1
2
1, 0
1, 1
1, 2
2, 0
2, 1
2, 2
1, 0, 0
The limit doesn't have to be three.
This is what I currently have that does something similar to this:
array = []
for c in range(26):
for b in range(26):
for a in range(26):
array.append((c, b, a))
I don't want leading zeroes but I can remove them, but I can't figure out how to do this with a variable number of elements.
What I want is a function that takes two arguments, limit (or base) and number of tuples to be returned, and returns the first n such tuples in order.
This must be very simple, but I just can't figure it out, and Google returns completely irrelevant results, so I am asking for help here.
How can this be done? Any help will truly be appreciated!
Hmm, I was thinking about something like this, but very unfortunately I can't make it work, please help me figure out why it doesn't work and how to make it work:
array = []
numbers = [0]
for i in range(1000):
numbers[-1] += 1
while 26 in numbers:
index = numbers.index(26)
numbers[index:] = [0] * (len(numbers) - index)
if index != 0:
numbers[index - 1] += 1
else:
numbers.insert(0, 1)
array.append(numbers)
I don't quite understand it, my testing shows everything inside the loop work perfectly fine outside the loop, the results are correct, but it just simply magically will not work in a loop, I don't know the reason for this, it is very strange.
I discovered the fact that if I change the last line to print(numbers) then everything prints correctly, but if I use append only the last element will be added, how so?

from math import log
def number_to_base(n,base):
number=[]
for digit in range(int(log(n+0.500001,base)),-1,-1):
number.append(n//base**digit%base)
return number
def first_numbers_in_base(n,base):
numbers=[]
for i in range(n):
numbers.append(tuple(number_to_base(i,base)))
return numbers
#tests:
print(first_numbers_in_base(10,3))
print(number_to_base(1048,10))
print(number_to_base(int("10201122110212",3),3))
print(first_numbers_in_base(25,10))

I finally did it!
The logic is very simple, but the hard part is to figure out why it won't work in a loop, turns out I need to use .copy(), because for whatever reason, doing an in-place modification to a list directly modifies the data reside in its memory space, such behavior modifies the same memory space, and .append() method always appends the latest data in a memory space.
So here is the code:
def steps(base, num):
array = []
numbers = [0]
for i in range(num):
copy = numbers.copy()
copy[-1] += 1
while base in copy:
index = copy.index(base)
copy[index:] = [0] * (len(copy) - index)
if index != 0:
copy[index - 1] += 1
else:
copy.insert(0, 1)
array.append(copy)
numbers = copy
return array
Use it like this:
steps(26, 1000)
For the first 1000 lists in base 26.

Here is a a function, that will satisfy original requirements (returns list of tuples, first tuple represents 0) and is faster than other functions that have been posted to this thread:
def first_numbers_in_base(n,base):
if n<2:
if n:
return [(0,)]
return []
numbers=[(0,),(1,)]
base-=1
l=-1
num=[1]
for i in range(n-2):
if num[-1]==base:
num[-1]=0
for i in range(l,-1,-1):
if num[i]==base:
num[i]=0
else:
num[i]+=1
break
else:
num=[1]+num
l+=1
else:
num[-1]+=1
numbers.append(tuple(num))#replace tuple(num) with num.copy() if you want resutl to contain lists instead of tuples.
return numbers

Breaking an iterative function in Python before a condition turns False

This is for a school assignment.
I have been tasked to define a function determining the largest square pyramidal number up to a given integer(argument). For some background, these are square pyramidal numbers:
1 = 1^2
5 = 1^2+2^2
14 = 1^2+2^2+3^2
So for a function and parameter largest_square_pyramidal_num(15), the function should return 14, because that's the largest number within the domain of the argument.
I get the idea. And here's my code:
def largest_square_pyramidal_num(n):
sum = 0
i = 0
while sum < n:
sum += i**2
i += 1
return sum
Logically to me, it seemed nice and rosy until I realised it doesn't stop when it's supposed to. When n = 15, sum = 14, sum < n, so the code adds one more round of i**2, and n is exceeded. I've been cracking my head over how to stop the iteration before the condition sum < n turns false, including an attempt at break and continue:
def largest_square_pyramidal_num(n):
sum = 0
for i in range(n+1):
sum += i**2
if sum >= n:
break
else:
continue
return sum
Only to realise it doesn't make any difference.
Can someone give me any advice? Where is my logical lapse? Greatly appreciated!

You can do the following:
def largest_pyr(x):
pyr=[sum([i**2 for i in range(1,k+1)]) for k in range(int(x**0.5)+1)]
pyr=[i for i in pyr if i<=x]
return pyr[-1]
>>>largest_pyr(15)
14
>>> largest_pyr(150)
140
>>> largest_pyr(1500)
1496
>>> largest_pyr(15000)
14910
>>> largest_pyr(150000)
149226

Let me start by saying that continue in the second code piece is redundant. This instruction is used for scenario when you don't want the code in for loop to continue but rather to start a new iteration (in your case there are not more instructions in the loop body).
For example, let's print every number from 1 to 100, but skip those ending with 0:
for i in range(1, 100 + 1):
if i % 10 != 0:
print(i)
for i in range(1, 100 + 1):
if i % 10 == 0:
# i don't want to continue executing the body of for loop,
# get me to the next iteration
continue
print(i)
The first example is to accept all "good" numbers while the second is rather to exclude the "bad" numbers. IMHO, continue is a good way to get rid of some "unnecessary" elements in the container rather than writing an if (your code inside if becomes extra-indented, which worsens readability for bigger functions).
As for your first piece, let's think about it for a while. You while loop terminates when the piramid number is greater or equal than n. And that is not what you really want (yes, you may end up with a piramid number which is equal to n, but it is not always the case).
What I like to suggest is to generate a pyramid number until in exceedes n and then take a step back by removing an extra term:
def largest_square_pyramidal_num(n):
result = 0
i = 0
while result <= n:
i += 1
result += i**2
result -= i ** 2
return result
2 things to note:
don't use sum as a name for the variable (it might confuse people with built-in sum() function)
I swapped increment and result updating in the loop body (such that i is up-to-date when the while loop terminates)
So the function reads like this: keep adding terms until we take too much and go 1 step back.
Hope that makes some sense.
Cheers :)

Check if differences between elements already exists in a list

I'm trying to build a heuristic for the simplest feasible Golomb Ruler as possible. From 0 to n, find n numbers such that all the differences between them are different. This heuristic consists of incrementing the ruler by 1 every time. If a difference already exists on a list, jump to the next integer. So the ruler starts with [0,1] and the list of differences = [ 1 ]. Then we try to add 2 to the ruler [0,1,2], but it's not feasible, since the difference (2-1 = 1) already exists in the list of differences. Then we try to add 3 to the ruler [0,1,3] and it is feasible, and thus the list of differences becomes [1,2,3] and so on. Here's what I've come to so far:
n = 5
positions = list(range(1,n+1))
Pos = []
Dist = []
difs = []
i = 0
while (i < len(positions)):
if len(Pos)==0:
Pos.append(0)
Dist.append(0)
elif len(Pos)==1:
Pos.append(1)
Dist.append(1)
else:
postest = Pos + [i] #check feasibility to enter the ruler
difs = [a-b for a in postest for b in postest if a > b]
if any (d in difs for d in Dist)==True:
pass
else:
for d in difs:
Dist.append(d)
Pos.append(i)
i += 1
However I can't make the differences check to work. Any suggestions?

For efficiency I would tend to use a set to store the differences, because they are good for inclusion testing, and you don't care about the ordering (possibly until you actually print them out, at which point you can use sorted).
You can use a temporary set to store the differences between the number that you are testing and the numbers you currently have, and then either add it to the existing set, or else discard it if you find any matches. (Note else block on for loop, that will execute if break was not encountered.)
n = 5
i = 0
vals = []
diffs = set()
while len(vals) < n:
diffs1 = set()
for j in reversed(vals):
diff = i - j
if diff in diffs:
break
diffs1.add(diff)
else:
vals.append(i)
diffs.update(diffs1)
i += 1
print(vals, sorted(diffs))
The explicit loop over values (rather than the use of any) is to avoid unnecessarily calculating the differences between the candidate number and all the existing values, when most candidate numbers are not successful and the loop can be aborted early after finding the first match.
It would work for vals also to be a set and use add instead of append (although similarly, you would probably want to use sorted when printing it). In this case a list is used, and although it does not matter in principle in which order you iterate over it, this code is iterating in reverse order to test the smaller differences first, because the likelihood is that unusable candidates are rejected more quickly this way. Testing it with n=200, the code ran in about 0.2 seconds with reversed and about 2.1 without reversed; the effect is progressively more noticeable as n increases. With n=400, it took 1.7 versus 27 seconds with and without the reversed.

Is there a Pythonic way of skipping if statements in a for loop to make my code run faster?

I'm writing a script in Python that essentially rolls a dice and checks whether the die roll exceeds a number x. I want to repeat this process n times and get the probability that the die roll exceeds the number x. e.g.
Count = 0
for _ in itertools.repeat(None, Iterations):
x = 3
die_roll = rnd.randint(1,6)
if die_roll > x:
Count += 1
Probability_of_exceed = Count / Iterations
I want to modify both the die roll and x based on user input. This user input will select different routines to modify the script e.g. "Andy's_Routine" might change x to 4. Currently I implement this using if statements in the for loop to check which routines are active, then applying them e.g.
Count = 0
for _ in itertools.repeat(None, Iterations):
x = 3
if "Andy's_Routine" in Active_Routines:
x = 4
die_roll = rnd.randint(1,6)
if "Bill's_Routine" in Active_Routines:
die_roll += 1
if "Chloe's_Routine" in Active_Routines:
# do something
pass
if "Person_10^5's_Routine" in Active_Routines:
# do something else
pass
if die_roll > x:
Count += 1
Probability_of_exceed = Count / Iterations
In practice the routines are not so simple that they can be generalised, they might add an extra output for example. The routines can be and are concurrently implemented. The problem is that there could be thousands of different routines, such that each loop will spend the majority of its time checking the if statements, slowing down the program.
Is there a better way of structuring the code that checks which routines are in use only once, and then modifies the iteration somehow?

You're asking two things here - you want your code to be more Pythonic, and you want it to run faster.
The first one is easier to answer: make Active_Routines a list of functions instead of a list of strings, and call the functions from the list. Since these functions may need to change the local state (x and die_roll), you will need to pass them the state as parameters, and let them return a new state. The refactor might look like this:
def Andy(x, die_roll):
return (4, die_roll)
def Bill(x, die_roll):
return (x, die_roll + 1)
def Chloe(x, die_roll):
# do something
return (x, die_roll)
Active_Routines = [Andy, Bill, Chloe]
Count = 0
for i in range(Iterations):
x = 3
die_roll = rnd.randint(1,6)
for routine in Active_Routines:
x, die_roll = routine(x, die_roll)
if die_roll > x:
Count += 1
Probability_of_exceed = Count / Iterations
The second one is harder to answer. This refactoring now makes a lot of function calls instead of checking if conditions; so there could be fewer missed branch predictions, but more function call overhead. You would have to benchmark it (e.g. using the timeit library) to be sure. However, at least this code should be easier to maintain.

Insertion Sort Python

I have implemented insertion sort in python and was wondering how to determine the complexity of the algorithm. Is this an inefficient way of implementing insertion sort? To me, this seems like the most readable algorithm.
import random as rand
source = [3,1,0,10,20,2,1]
target = []
while len(source)!=0:
if len(target) ==0:
target.append(source[0])
source.pop(0)
element = source.pop(0)
if(element <= target[0]):
target.reverse()
target.append(element)
target.reverse()
elif element > target[len(target)-1]:
target.append(element)
else:
for i in range(0,len(target)-1):
if element >= target[i] and element <= target[i+1]:
target.insert(i+1,element)
break
print target

Instead of:
target.reverse()
target.append(element)
target.reverse()
try:
target.insert(0, element)
Also, maybe use a for loop, instead of a while loop, to avoid source.pop()?:
for value in source:
...
In the final else block, the first part of the if test is redundant:
else:
for i in range(0,len(target)-1):
if element >= target[i] and element <= target[i+1]:
target.insert(i+1,element)
break
Since the list is already sorted, as soon as you find an element larger than the one you're inserting, you've found the insertion location.

I would say it is rather inefficient. How can you tell? Your approach creates a second array, but you don't need one in a selection sort. You use a lot of operations -- selection sort requires lookups and exchanges, but you have lookups, appends, pops, inserts, and reverses. So you know that you can probably do better.

def insertionsort( aList ):
for i in range( 1, len( aList ) ):
tmp = aList[i]
k = i
while k > 0 and tmp < aList[k - 1]:
aList[k] = aList[k - 1]
k -= 1
aList[k] = tmp
This code is taken from geekviewpoint.com. Clearly it's a O(n^2) algorithm since it's using two loops. If the input is already sorted, however, then it's O(n) since the while-loop would then always be skipped due to tmp < aList[k - 1] failing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.