Efficient way of keeping numbers in order

Efficient way of keeping numbers in order - python

This is a problem that could apply to any language, but I'll use python to show it.
Say you have a list of numbers, ls = [0,100,200,300,400]
You can insert an element at any index, but the elements must always stay in numerical order. Duplicates are not allowed.
For example, ls.insert(2, 150) results in ls = [0,100,150,200,300,400]. The elements are in the correct order, so this is correct.
However, ls.insert(3, 190) results in ls = [0,100,200,190,300,400]. This is incorrect.
For any index i, what is the best number x to use in ls.insert(i,x) to minimize the number of sorts?
My first intuition was to add half the difference between the previous and next numbers to the previous one. So to insert a number at index 3, x would equal 200 + (300-200), or 250. However this approaches the asymptote far too quickly. When the differences get too close to 0, I could restore the differences by looping through and changing each number to produce a larger difference. I want to choose the best number for x so to minimize the number of times I need to reset.
EDIT
The specific problem I'm applying this to is a iOS app with a list view. The items in the list are represented in a Set, and each object has an attribute orderingValue. I can't use an Array to represent the list (due to issues with cache-server syncing), so I have to sort the set each time I display the list to the user. In order to do this, the orderingValue must be stored on the ListItem object.
One additional detail is, due to the nature of the UI, the user is probably more likely to add an item to the top or bottom of the list rather than insert it in the middle.

You can generate sort keys indefinitely if you use strings rather than integers. That's because a lexicographical ordering of strings puts an infinite number of values between any two strings (as long as the larger isn't the smaller one followed by "a").
Here's a function to generate a lowercase string key between two other keys:
def get_key_str(low="a", high="z"):
if low == "":
low = "a"
assert(low < high)
for i, (a, b) in enumerate(zip(low, high)):
if a < b:
mid = chr((ord(a) + ord(b))//2) # get the character half-way between a and b
if mid != a:
return low[:i] + mid
else:
return low[:i+1] + get_key_str(low[i+1:], "z")
return low + get_key_str("a", high[len(low):])
It always returns a string s such that "a" <= low < s < high <= "z". "a" and "z" are never used themselves as keys, they're special values to indicate the boundaries of the possible results.
You'd call it with get_key_str([lst[i-1], lst[i]) to get a value to insert before the value at index i. You can insert and generate a value in one go with lst.insert(i, get_key_str(lst[i-1], lst[i])). Obviously though, the ends of the list need special handling.
The default values are set so that you can omit an argument to get a value to insert at the start or the end. That is, call get_key_str(high=lst[0]) to get a value to put at the start of your list or get_key_str(lst[-1]) to get a value to append to at the end. You can also explicitly pass "a" as low or "z" as high, if that's easier. With no arguments, it will return "m", which is a reasonable first value to put in an empty list.
It's possible that you could tune this a bit to give shorter keys when you're mostly adding at the start or end, but that would be a bit more complicated. This version should have its keys grow roughly evenly if you're inserting randomly.
Here's an example of doing some random inserts:
>>> import random
>>> lst = []
>>> for _ in range(10):
index = random.randint(0, len(lst))
print("inserting at", index)
if index == 0:
low = "a"
else:
low = lst[index-1]
if index == len(lst):
high = "z"
else:
high = lst[index]
lst.insert(index, get_key_str(low, high))
print(lst)
inserting at 0
['m']
inserting at 1
['m', 's']
inserting at 2
['m', 's', 'v']
inserting at 2
['m', 's', 't', 'v']
inserting at 2
['m', 's', 'sm', 't', 'v']
inserting at 0
['g', 'm', 's', 'sm', 't', 'v']
inserting at 3
['g', 'm', 's', 'sg', 'sm', 't', 'v']
inserting at 2
['g', 'm', 'p', 's', 'sg', 'sm', 't', 'v']
inserting at 2
['g', 'm', 'n', 'p', 's', 'sg', 'sm', 't', 'v']
inserting at 3
['g', 'm', 'n', 'o', 'p', 's', 'sg', 'sm', 't', 'v']
And here's how it behaves if we then do a bunch of inserts at the start and end:
>>> for _ in range(10):
lst.insert(0, get_key_str(high=lst[0])) # start
lst.insert(len(lst), get_key_str(low=lst[-1])) # end
print(lst)
['d', 'g', 'm', 'n', 'o', 'p', 's', 'sg', 'sm', 't', 'v', 'x']
['b', 'd', 'g', 'm', 'n', 'o', 'p', 's', 'sg', 'sm', 't', 'v', 'x', 'y']
['am', 'b', 'd', 'g', 'm', 'n', 'o', 'p', 's', 'sg', 'sm', 't', 'v', 'x', 'y', 'ym']
['ag', 'am', 'b', 'd', 'g', 'm', 'n', 'o', 'p', 's', 'sg', 'sm', 't', 'v', 'x', 'y', 'ym', 'ys']
['ad', 'ag', 'am', 'b', 'd', 'g', 'm', 'n', 'o', 'p', 's', 'sg', 'sm', 't', 'v', 'x', 'y', 'ym', 'ys', 'yv']
['ab', 'ad', 'ag', 'am', 'b', 'd', 'g', 'm', 'n', 'o', 'p', 's', 'sg', 'sm', 't', 'v', 'x', 'y', 'ym', 'ys', 'yv', 'yx']
['aam', 'ab', 'ad', 'ag', 'am', 'b', 'd', 'g', 'm', 'n', 'o', 'p', 's', 'sg', 'sm', 't', 'v', 'x', 'y', 'ym', 'ys', 'yv', 'yx', 'yy']
['aag', 'aam', 'ab', 'ad', 'ag', 'am', 'b', 'd', 'g', 'm', 'n', 'o', 'p', 's', 'sg', 'sm', 't', 'v', 'x', 'y', 'ym', 'ys', 'yv', 'yx', 'yy', 'yym']
['aad', 'aag', 'aam', 'ab', 'ad', 'ag', 'am', 'b', 'd', 'g', 'm', 'n', 'o', 'p', 's', 'sg', 'sm', 't', 'v', 'x', 'y', 'ym', 'ys', 'yv', 'yx', 'yy', 'yym', 'yys']
['aab', 'aad', 'aag', 'aam', 'ab', 'ad', 'ag', 'am', 'b', 'd', 'g', 'm', 'n', 'o', 'p', 's', 'sg', 'sm', 't', 'v', 'x', 'y', 'ym', 'ys', 'yv', 'yx', 'yy', 'yym', 'yys', 'yyv']
So at the start you may end up with keys prefixed by as, and at the end you'll get keys prefixed by ys.

As far as the 'best' value is concerned, it is always going to be halfway through the previous and the next element. And it is going to reach the asymptote.
One way to delay arrival at the asymptote if there are repeated insertions at a particular index is to decrement the previous and increment the next value (I'm assuming you are allowed to do this) every time you perform the insert.
So, for ls.insert(2,150), after insertion
ls[1] = ls[1] - (ls[1] - ls[0])/2
ls[3] = ls[3] + (ls[4] - ls[3])/2
For every other insertion, this rule will hold, and assuming insertions are at random indices, you would have a fair amount of time before you need to increase each number's value.
Also, the moment you encounter two adjacent numbers with a difference of 1, you would, of course, have to loop through the numbers and increase them.

Related

Checking For Every Letter

I am trying to make code that breaks a password that I create, at first I got it to just make random answers to the password I created and eventually I would get the right one.
But I realized that if I could change the first letter of my answer and then when I had done all of the letters, change the second letter.
Ex: AA AB AC ... AY AZ BA BB BC.
I understand that I could make a loop to print every single letter, but how would I be able to change the first letter after I have gone through every letter.
I also need this to be able to break a password of any length so the loop would have to be able to change how many letters I need. I also need to get rid of the brackets and quotes in the output.
lower_upper_alphabet = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
while done == 0:
for i in range (int(passwordlen)):
for i in range(52):
for i in range(len(lower_upper_alphabet)):
characters2 = []
characters2.append(str(lower_upper_alphabet[next1]))
next1 += 1
print(characters2)
Output:
["A"]
["B"]
["C"]
["D"]
["E"]
["F"]
["G"]
["H"]
["I"]
["J"]
["K"]
["L"]
["M"]
["N"]
["O"]
["P"]
["Q"]
["R"]
["S"]
["T"]
["U"]
["V"]
["W"]
["X"]
["Y"]
["Z"]

Printing a list from a random sample using join()

I am trying to print a python list using join after it has randomly selected a specified amount of characters. What I want is for it to print all characters beside each other instead of printing each character on a separate line. Everything works fine up until my for statement, if I print out password_letters it will print (on separate lines) the specified amount based on nr_letters. All I want is to join/concatenate the specified letters onto one line. I have followed the documentation on here and some on google, but I still can't find where I have gone wrong.
Please help me find where I have gone wrong in the below code:
import random
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
nr_letters= int(input("How many letters would you like in your password?\n"))
password_letters = random.sample(letters, nr_letters )
for letter in password_letters:
print("".join(letter))

No need for a loop, just join the list.
print("".join(password_letters))

Python script to generate a word with specific structure and letter combinations

I want to write a really short script that will help me generate a random/nonsense word with the following qualities:
-Has 8 letters
-First letter is "A"
-Second and Fourth letters are random letters
-Fifth letter is a vowel
-Sixth and Seventh letters are random letters and are the same
-Eighth letter is a vowel that's not "a"
This is what I have tried so far (using all the info I could find and understand online)
firsts = 'A'
seconds = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
thirds = ['a', 'e', 'i', 'o', 'u', 'y']
fourths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
fifths = ['a', 'e', 'i', 'o', 'u', 'y']
sixths = sevenths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
eighths = ['e', 'i', 'o', 'u', 'y']
print [''.join(first, second, third, fourth, fifth)
for first in firsts
for second in seconds
for third in thirds
for fourth in fourths
for fifth in fifths
for sixth in sixths
for seventh in sevenths
for eighth in eighths]
However it keeps showing a SyntaxError: invalid syntax after the for and now I have absolutely no idea how to make this work. If possible please look into this for me, thank you so much!

So the magic function you need to know about to pick a random letter is random.choice. You can pass a list into this function and it will give you a random element from that list. It also works with strings because strings are basically a list of chars. Also to make your life easier, use string module. string.ascii_lowercase returns all the letters from a to z in a string so you don't have to type it out. Lastly, you don't use loops to join strings together. Keep it simple. You can just add them together.
import string
from random import choice
first = 'A'
second = choice(string.ascii_lowercase)
third = choice(string.ascii_lowercase)
fourth = choice(string.ascii_lowercase)
fifth = choice("aeiou")
sixthSeventh = choice(string.ascii_lowercase)
eighth = choice("eiou")
word = first + second + third + fourth + fifth + sixthSeventh + sixthSeventh + eighth
print(word)

Try this:
import random
sixth=random.choice(sixths)
s='A'+random.choice(seconds)+random.choice(thirds)+random.choice(fourths)+random.choice(fifths)+sixth+sixth+random.choice(eighths)
print(s)
Output:
Awixonno
Ahiwojjy
etc

There are several things to consider. First, the str.join() method takes in an iterable (e.g. a list), not a bunch of individual elements. Doing
''.join([first, second, third, fourth, fifth])
fixes the program in this respect. If you are using Python 3, print() is a function, and so you should add parentheses around the entire list comprehension.
With the syntax out of the way, let's get to a more interesting problem: Your program constructs every (82255680 !) possible word. This takes a long time and memory. What you want is probably to just pick one. You can of course do this by first constructing all, then picking one at random. It's far cheaper though to pick one letter from each of firsts, seconds, etc. at random and then collecting these. All together then:
import random
firsts = ['A']
seconds = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
thirds = ['a', 'e', 'i', 'o', 'u', 'y']
fourths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
fifths = ['a', 'e', 'i', 'o', 'u', 'y']
sixths = sevenths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
eighths = ['e', 'i', 'o', 'u', 'y']
result = ''.join([
random.choice(firsts),
random.choice(seconds),
random.choice(thirds),
random.choice(fourths),
random.choice(fifths),
random.choice(sixths),
random.choice(sevenths),
random.choice(eighths),
])
print(result)
To improve the code from here, try to:
Find a way to generate the "data" in a neater way than writing it out explicitly. As an example:
import string
seconds = list(string.ascii_lowercase) # you don't even need list()!
Instead of having a separate variable firsts, seconds, etc., collect these into a single variable, e.g. a single list containing each original list as a single str with all characters included.

This will implement what you describe. You can make the code neater by putting the choices into an overall list rather than have several different variables, but you will have to explicitly deal with the fact that the sixth and seventh letters are the same; they will not be guaranteed to be the same simply because there are the same choices available for each of them.
The list choices_list could contain sub-lists per your original code, but as you are choosing single characters it will work equally with strings when using random.choice and this also makes the code a bit neater.
import random
choices_list = [
'A',
'abcdefghijklmnopqrstuvwxyz',
'aeiouy',
'abcdefghijklmnopqrstuvwxyz',
'aeiouy',
'abcdefghijklmnopqrstuvwxyz',
'eiouy'
]
letters = [random.choice(choices) for choices in choices_list]
word = ''.join(letters[:6] + letters[5:]) # here the 6th letter gets repeated
print(word)
Some example outputs:
Alaeovve
Aievellu
Ategiwwo
Aeuzykko

Here's the syntax fix:
print(["".join([first, second, third])
for first in firsts
for second in seconds
for third in thirds])
This method might take up a lot of memory.

Using .join() function on a set incorrectly reorders it [duplicate]

This question already has answers here:
Converting a list to a set changes element order
(16 answers)
Closed 3 years ago.
I have a set of characters (x) that is ordered as I need it:
{'a',
'b',
'c',
'd',
'e',
'f',
'g',
'h',
'i',
'j',
'k',
'l',
'm',
'n',
'o',
'p',
'q',
'r',
's',
't',
'u',
'v',
'w',
'x',
'y',
'z'}
However, when I attempt to convert these back to a string using the .join() function:
return ' '.join(x)
The characters are being randomly reordered:
'c g e w i z n t l a q h p d f v m k b x u r j o y'
Any ideas as to what's going on here?

Sets don't "promise" to maintain order, sometimes they do, but they shouldn't be used with a dependency on it. Furthermore, consider using the following:
alpha = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
Then:
return " ".join(alpha)
However, if you only care about it being in alphabetical and want to use a set you can force it to be sorted before using the join function...
return " ".join(sorted(x))
Good luck!

Sets and dictionaries are unordered (pre Python 3.7). Their exact implementation involves hashtables and can be a little complicated. However, suffice it to say that the order you put elements into the set does not determine the order they are stored.
You can use OrderedDict or you can convert the set to a list, sort, and go from there.

Rosalind: Open Reading Frame

I am working through the 'Rosalind' problems and I've become stuck on what the issue with my code is... The problem is:
Either strand of a DNA double helix can serve as the coding strand for
RNA transcription. Hence, a given DNA string implies six total reading
frames, or ways in which the same region of DNA can be translated into
amino acids: three reading frames result from reading the string
itself, whereas three more result from reading its reverse complement.
An open reading frame (ORF) is one which starts from the start codon
and ends by stop codon, without any other stop codons in between.
Thus, a candidate protein string is derived by translating an open
reading frame into amino acids until a stop codon is reached.
Given: A DNA string s of length at most 1 kbp in FASTA format.
Return: Every distinct candidate protein string that can be translated
from ORFs of s. Strings can be returned in any order.
Here is my code (Python):
DNA_Codons = {
'TTT': 'F', 'CTT': 'L', 'ATT': 'I', 'GTT': 'V',
'TTC': 'F', 'CTC': 'L', 'ATC': 'I', 'GTC': 'V',
'TTA': 'L', 'CTA': 'L', 'ATA': 'I', 'GTA': 'V',
'TTG': 'L', 'CTG': 'L', 'ATG': 'M', 'GTG': 'V',
'TCT': 'S', 'CCT': 'P', 'ACT': 'T', 'GCT': 'A',
'TCC': 'S', 'CCC': 'P', 'ACC': 'T', 'GCC': 'A',
'TCA': 'S', 'CCA': 'P', 'ACA': 'T', 'GCA': 'A',
'TCG': 'S', 'CCG': 'P', 'ACG': 'T', 'GCG': 'A',
'TAT': 'Y', 'CAT': 'H', 'AAT': 'N', 'GAT': 'D',
'TAC': 'Y', 'CAC': 'H', 'AAC': 'N', 'GAC': 'D',
'TAA': '-', 'CAA': 'Q', 'AAA': 'K', 'GAA': 'E',
'TAG': '-', 'CAG': 'Q', 'AAG': 'K', 'GAG': 'E',
'TGT': 'C', 'CGT': 'R', 'AGT': 'S', 'GGT': 'G',
'TGC': 'C', 'CGC': 'R', 'AGC': 'S', 'GGC': 'G',
'TGA': '-', 'CGA': 'R', 'AGA': 'R', 'GGA': 'G',
'TGG': 'W', 'CGG': 'R', 'AGG': 'R', 'GGG': 'G'
}
bases={"A":"T",
"T":"A",
"G":"C",
"C":"G"}
def Pro(DNA, start, Rev):
#Calculates the Reverse compliment if using
if Rev == True:
reverse=DNA[::-1]
compliment=[]
for base in reverse:
compliment+=bases[base]
Seq="".join(compliment)
elif Rev== False:
Seq=DNA
Protein=[]
#Finds a start codon
for i in range(start, len(Seq),3):
codon=Seq[i:i+3]
if codon=="ATG":
#Starting from that start codon, returns a protein, breaks if stop codon
#-2 included so that it's always in blocks of 3
for j in range(i,len(Seq)-2,3):
new_codon=Seq[j:j+3]
if DNA_Codons[new_codon]!="-":
Protein+=[DNA_Codons[new_codon]]
else:
#Adds in the '-' to split proteins that start within the same Reading Frame
Protein+=[DNA_Codons[new_codon]]
break
return Protein
f = open('rosalind_orf.txt','r').read()
#Puts each FASTA String into an arrary
strings=f.split(">")
#removes the FASTA ID from the string in array and new line characters
for i in range(len(strings)):
strings[i]=strings[i].strip("Rosalind_0123456789")
strings[i]=strings[i].replace("\n","")
DNA=strings[1]
#Adds proteins from all Open Reading Frames
Proteins=[]
for i in range(len(DNA)):
Proteins+="".join(Pro(DNA,i,False)).split('-')
Proteins+="".join(Pro(DNA,i,True)).split('-')
#Mades a list of Unique Proteins and prints them
Unique_Proteins=[]
for p in Proteins:
if (p not in Unique_Proteins and p!=""):
Unique_Proteins+=[p]
print p
Using the sample data:
Rosalind_99 AGCCATGTAGCTAACTCAGGTTACATGGGGATGACCCCGCGACTTGGATTAGAGTCTCTTTTGGAATAAGCCTGAATGATCCGAGTAGCATCTCAG
My code works fine, however for every question dataset I've been given it fails...
Here is one of the question datasets that I've failed on:
Rosalind_1485 GACCAGAATGCGTTAGTCGGCCTCAGAGCGCACAAAAACCAGTATTTACAAAGTGGGACG
TAGCGCCCCGCGGCGTCCTTTTGCCCTATCGAAAGTATAGGCATCAGCTTTTTACCACCT
TGTCATAGGTAAACTGCCCGACCCAGGTCCGGCCCTCAGCCCAACGCAGATAAACCAAGG
TTATAGATGTGGCCTGTAGGCATATTGCTCTTAATGTTATAAAGAGCGAAGCGTGGTCTC
GGTTTGTAAACATTAATCAAATTCCCAGGCACTAAGCCATGGTCGCCCCGGATTGGTTTT
CCGGTGTACGCATCGGTGGCAGCTGGAGGGGACAGTTTAGGTGCTGCAATTGAACATGAA
ACTGCACGAAAGGTGGGGTGGGCCGGATCTTGCGGGCCTCGAAAGGGTAGTGTTCCTCTG
CTATCTAGTCCAATTACCTGTAGTATATATGATCAGGCCGTCGGTTACTTAGCTAAGTAA
CCGACGGCCTGATCATCTCCTAGGAAATGGTCCTGAATGCGAACTAGGTTCCGTGGAATG
ATGGGGCCCAGAGGAAACCTGTACGCAATGGATCCCGGACAGATAGACCGGGAGGTCTTG
CAACCTCTTGTGGGAGTTACAGGCCGTACCTGAATTGCCCTCGTACCATTTGAAATGGTG
CGACGCCTGTACGCAACAATCGTTCGCCTGGATAATACAGACGGCCATTTCTGTAGGAAC
GATACCGTAACGCGACGTCAGGCATGACGTTAACTGCGTCACGTTTCATACCACTATGTG
AGGTACCCACTCCTTCATTTACCGCGAGATAAAGAGCCACCACCACCTTCTCTTGGTTTC
CATGCGCCGATCGGCTAAACGTGCATCACATTCAGGCGAAGAGTCAAATGGAAGCTCGCA
ATTTTAGGCCTTTATGGCGAATATCCCGCAAGCCTTAGGCGCGT
Obviously this code is nowhere near efficient and there's lot that could be improved upon, I'm just curious as to why it's not working.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.