is it possible in python to somehow subtract using multiple decimal places like in version numbers.
for example,
8.0.18 attempting to find a previous version of 8.0.17
any way or method to subtract 1 to get 8.0.17?
i was thinking of regex and pulling out the 18 and subtracting 1 then make myself a variable from the 8.0. and add 17 back to it :), something like this
version_found = "8.0.18"
version = re.search('^\d.\d\d.(\d\d)$', version_found).group(1)
prev_version = int(version) - 1
so prev_version would end up being 17, then i could convert back to a string and take it on to 8.0.
but was wondering if there something method i don't know about or am not considering? thanks
Here is a tiny little script I wrote, it should be fairly easy to implement in your code:
#!/usr/bin/env python3.6
version = "8.0.18"
version = version.split(".")
if int(version[-1]) > 0:
version[-1] = str(int(version[-1]) - 1)
version = '.'.join(version)
print(version)
else:
print("Error, version number ended in a zero!")
This works by splitting the string into a list on each period, resulting in ["8", "0", "18"]. Then it gets the last element in the list by accessing index -1. Then we subtract 1 from the value of that index and assign it back to the same index. Lastly, join the list into a string with periods in between each element then print the outcome.
I think the best way to do this would be count the number of periods in the string and split the text at the specific period you'd like it to subtract at. Then you'd have to turn the string into an integer, subtract 1 from that integer then readd it to the version number.
There are several ways of doing this but thats the way I'd do it. Also keep it in a function so you can call it multiple times at different points of periods.
version = "8.0.18"
index = version.rindex(".") + 1
version = version[:index] + str(int(version[index:])-1)
Just use rindex to find your last period.
Then, convert everything after that to a number, subtract one, turn it back into a string, and you're done.
This becomes more complicated if you want any value other than the last version number. You'd have to rindex from the location that is returned each time. E.g., to change the value after the "second from last" (i.e. first) decimal place, it gets uglier:
start_index = version.rindex(".")
for _ in range(1,1):
end_index = start_index
start_index = version.rindex(".", end=end_index)
version = version[:start_index+1] +
str(int(version[start_index+1:end_index])) +
version[end_index:]
lst = version.split('.') # make a list from individual parts
last_part = lst.pop() # returns the last element, deleting it from the list
last_part = int(last_part) - 1 # converts to an integer and decrements it
last_part = str(last_part) # and converts back to string
lst.append(last_part) # appends it back (now decremented)
version = '.'.join(lst) # convert lst back to string with period as delimiter
Based on Steampunkery
version = "6.4.2"
nums = version.split(".")
skip = 0 # skip from right, e.g. to go directly to 6.3.2, skip=1
for ind in range(skip,len(nums)):
curr_num = nums[-1-ind]
if int(curr_num) > 0:
nums[-1-ind] = str(int(curr_num) - 1)
break
else:
nums[-1-ind] = "x"
oldversion = '.'.join(nums)
print(oldversion)
Sample outputs:
8.2.0 --> 8.1.x
8.2.1 --> 8.2.0
8.0.0 --> 7.x.x
0.0.0 --> x.x.x
8.2.0 --> 8.1.0 (with skip=1)
Related
I wrote a Python3 code to manipulate lists of strings but the code gives Runtime Error for long strings. Here is my code for the problem:
string = "BANANA"
slist= list (string)
mark = list(range(len(slist)))
vowel_substrings = list()
consonants_substrings = list()
#print(mark)
for i in range(len(slist)):
if slist[i]=='A' or slist[i]=='E' or slist[i]=='I' or slist[i]=='O' or mark[i]=='U':
mark[i] = 1
else:
mark[i] = 0
#print(mark)
for j in range(len(slist)):
if mark[j] == 1:
for l in range(j,len(string)):
vowel_substrings.append(string[j:l+1])
#print(string[j:l+1])
else:
for l in range(j,len(string)):
consonants_substrings.append(string[j:l+1])
#print(consonants_substrings)
unique_consonants = list(set(consonants_substrings))
unique_vowels = list(set(vowel_substrings))
##add two lists
all_substrings = consonants_substrings+(vowel_substrings)
#print(all_substrings)
##Find points earned by vowel guy and consonant guy
vowel_guy_score = 0
consonant_guy_score = 0
for strng in unique_vowels:
vowel_guy_score += vowel_substrings.count(strng)
for strng in unique_consonants:
consonant_guy_score += consonants_substrings.count(strng)
#print(vowel_guy_score) #Kevin
#print(consonant_guy_score) #Stuart
if vowel_guy_score > consonant_guy_score:
print("Kevin ",vowel_guy_score)
elif vowel_guy_score < consonant_guy_score:
print("Stuart ",consonant_guy_score)
else:
print("Draw")
gives the right answer. But if you have a long string, shown below, it fails.
NANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANAN
I think initialization or memory allocation might be a problem but I don't know how to allocate memory before even knowing how much memory the code will need. Thank you in advance for any help you can provide.
In the middle there, you generate a data structure of size O(n³): for each starting position × each ending position × length of the substring. That's probably where your memory problems appear (you haven't posted a traceback).
One possible optimisation would be, instead of having a list of substrings and then generating the set, use instead a Counter class. That would let you know how many times each substring appears without storing all the copies:
vowel_substrings = collections.Counter()
consonant_substrings = collections.Counter()
for j in range(len(slist)):
if mark[j] == 1:
for l in range(j,len(string)):
vowel_substrings[string[j:l+1]] += 1
#print(string[j:l+1])
else:
for l in range(j,len(string)):
consonants_substrings[string[j:l+1]] += 1
Even better would be to calculate the scores as you go along, without storing any of the substrings. If I'm reading the code correctly, the substrings aren't actually used for anything — each letter is effectively scored based on its distance from the end of the string, and the scores are added up. This can be calculated in a single pass through the string, without making any additional copies or keeping track of anything other than the cumulative scores and the length of the string.
So for this problem I had to create a program that takes in two arguments. A CSV database like this:
name,AGATC,AATG,TATC
Alice,2,8,3
Bob,4,1,5
Charlie,3,2,5
And a DNA sequence like this:
TAAAAGGTGAGTTAAATAGAATAGGTTAAAATTAAAGGAGATCAGATCAGATCAGATCTATCTATCTATCTATCTATCAGAAAAGAGTAAATAGTTAAAGAGTAAGATATTGAATTAATGGAAAATATTGTTGGGGAAAGGAGGGATAGAAGG
My program works by first getting the "Short Tandem Repeat" (STR) headers from the database (AGATC, etc.), then counting the highest number of times each STR repeats consecutively within the sequence. Finally, it compares these counted values to the values of each row in the database, printing out a name if a match is found, or "No match" otherwise.
The program works for sure, but is ridiculously slow whenever ran using the larger database provided, to the point where the terminal pauses for an entire minute before returning any output. And unfortunately this is causing the 'check50' marking system to time-out and return a negative result upon testing with this large database.
I'm presuming the slowdown is caused by the nested loops within the 'STR_count' function:
def STR_count(sequence, seq_len, STR_array, STR_array_len):
# Creates a list to store max recurrence values for each STR
STR_count_values = [0] * STR_array_len
# Temp value to store current count of STR recurrence
temp_value = 0
# Iterates over each STR in STR_array
for i in range(STR_array_len):
STR_len = len(STR_array[i])
# Iterates over each sequence element
for j in range(seq_len):
# Ensures it's still physically possible for STR to be present in sequence
while (seq_len - j >= STR_len):
# Gets sequence substring of length STR_len, starting from jth element
sub = sequence[j:(j + (STR_len))]
# Compares current substring to current STR
if (sub == STR_array[i]):
temp_value += 1
j += STR_len
else:
# Ensures current STR_count_value is highest
if (temp_value > STR_count_values[i]):
STR_count_values[i] = temp_value
# Resets temp_value to break count, and pushes j forward by 1
temp_value = 0
j += 1
i += 1
return STR_count_values
And the 'DNA_match' function:
# Searches database file for DNA matches
def DNA_match(STR_values, arg_database, STR_array_len):
with open(arg_database, 'r') as csv_database:
database = csv.reader(csv_database)
name_array = [] * (STR_array_len + 1)
next(database)
# Iterates over one row of database at a time
for row in database:
name_array.clear()
# Copies entire row into name_array list
for column in row:
name_array.append(column)
# Converts name_array number strings to actual ints
for i in range(STR_array_len):
name_array[i + 1] = int(name_array[i + 1])
# Checks if a row's STR values match the sequence's values, prints the row name if match is found
match = 0
for i in range(0, STR_array_len, + 1):
if (name_array[i + 1] == STR_values[i]):
match += 1
if (match == STR_array_len):
print(name_array[0])
exit()
print("No match")
exit()
However, I'm new to Python, and haven't really had to consider speed before, so I'm not sure how to improve upon this.
I'm not particularly looking for people to do my work for me, so I'm happy for any suggestions to be as vague as possible. And honestly, I'll value any feedback, including stylistic advice, as I can only imagine how disgusting this code looks to those more experienced.
Here's a link to the full program, if helpful.
Thanks :) x
Thanks for providing a link to the entire program. It seems needlessly complex, but I'd say it's just a lack of knowing what features are available to you. I think you've already identified the part of your code that's causing the slowness - I haven't profiled it or anything, but my first impulse would also be the three nested loops in STR_count.
Here's how I would write it, taking advantage of the Python standard library. Every entry in the database corresponds to one person, so that's what I'm calling them. people is a list of dictionaries, where each dictionary represents one line in the database. We get this for free by using csv.DictReader.
To find the matches in the sequence, for every short tandem repeat in the database, we create a regex pattern (the current short tandem repeat, repeated one or more times). If there is a match in the sequence, the total number of repetitions is equal to the length of the match divided by the length of the current tandem repeat. For example, if AGATCAGATCAGATC is present in the sequence, and the current tandem repeat is AGATC, then the number of repetitions will be len("AGATCAGATCAGATC") // len("AGATC") which is 15 // 5, which is 3.
count is just a dictionary that maps short tandem repeats to their corresponding number of repetitions in the sequence. Finally, we search for a person whose short tandem repeat counts match those of count exactly, and print their name. If no such person exists, we print "No match".
def main():
import argparse
from csv import DictReader
import re
parser = argparse.ArgumentParser()
parser.add_argument("database_filename")
parser.add_argument("sequence_filename")
args = parser.parse_args()
with open(args.database_filename, "r") as file:
reader = DictReader(file)
short_tandem_repeats = reader.fieldnames[1:]
people = list(reader)
with open(args.sequence_filename, "r") as file:
sequence = file.read().strip()
count = dict(zip(short_tandem_repeats, [0] * len(short_tandem_repeats)))
for short_tandem_repeat in short_tandem_repeats:
pattern = f"({short_tandem_repeat}){{1,}}"
match = re.search(pattern, sequence)
if match is None:
continue
count[short_tandem_repeat] = len(match.group()) // len(short_tandem_repeat)
try:
person = next(person for person in people if all(int(person[k]) == count[k] for k in short_tandem_repeats))
print(person["name"])
except StopIteration:
print("No match")
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
The code is supposed to take a 5 digit zip code input and convert it to bar codes as the output. The bar code for each digit is:
{1:'...!!',2:'..!.!',3:'..!!.',4:'.!..!',5:'.!.!.',6:'.!!..',7:'!...!',8:'!..!.',9:'!.!..',0:'!!...'}
For example, the zip code 95014 is supposed to produce:
!!.!.. .!.!. !!... ...!! .!..! ...!!!
There is an extra ! at the start and end, that is used to determine where the bar code starts and stops. Notice that at the end of the bar code is an extra ...!! which is an 1. This is the check digit and you get the check digit by:
Adding up all the digits in the zipcode to make the sum Z
Choosing the check digit C so that Z + C is a multiple of 10
For example, the zipcode 95014 has a sum of Z = 9 + 5 + 0 + 1 + 4 = 19, so the check digit C is 1 to make the total sum Z + C equal to 20, which is a multiple of 10.
def printDigit(digit):
digit_dict = {1:'...!!',2:'..!.!',3:'..!!.',4:'.!..!',5:'.!.!.',6:'.!!..',7:'!...!',8:'!..!.',9:'!.!..',0:'!!...'}
return digit_dict[digit]
def printBarCode(zip_code):
sum_digits=0
num=zip_code
while num!=0:
sum_digits+=(num%10)
num/=10
rem = 20-(sum_digits%20)
answer=[]
for i in str(zip_code):
answer.append(printDigit(int(i)))
final='!'+' '.join(answer)+'!'
return final
print printBarCode(95014)
The code I currently have produces an output of
!!.!.. .!.!. !!... ...!! .!..!!
for the zip code 95014 which is missing the check digit. Is there something missing in my code that is causing the code not to output the check digit? Also, what to include in my code to have it ask the user for the zip code input?
Your code computes rem based on the sum of the digits, but you never use it to add the check-digit bars to the output (answer and final). You need to add code to do that in order to get the right answer. I suspect you're also not computing rem correctly, since you're using %20 rather than %10.
I'd replace the last few lines of your function with:
rem = (10 - sum_digits) % 10 # correct computation for the check digit
answer=[]
for i in str(zip_code):
answer.append(printDigit(int(i)))
answer.append(printDigit(rem)) # add the check digit to the answer!
final='!'+' '.join(answer)+'!'
return final
Interesting problem. I noticed that you solved the problem as a C-style programmer. I'm guessing your background is in C/C++. I's like to offer a more Pythonic way:
def printBarCode(zip_code):
digit_dict = {1:'...!!',2:'..!.!',3:'..!!.',4:'.!..!',5:'.!.!.',
6:'.!!..',7:'!...!',8:'!..!.',9:'!.!..',0:'!!...'}
zip_code_list = [int(num) for num in str(zip_code)]
bar_code = ' '.join([digit_dict[num] for num in zip_code_list])
check_code = digit_dict[10 - sum(zip_code_list) % 10]
return '!{} {}!'.format(bar_code, check_code)
print printBarCode(95014)
I used list comprehension to work with each digit rather than to iterate. I could have used the map() function to make it more readable, but list comprehension is more Pythonic. Also, I used the Python 3.x format for string formatting. Here is the output:
!!.!.. .!.!. !!... ...!! .!..! ...!!!
>>>
I have a long list of data which I am working with now,containing a list of 'timestamp' versus 'quantity'. However, the timestamp in the list is not all in order (for example,timestamp[x] can be 140056 while timestamp[x+1] can be 560). I am not going to arrange them, but to add up the value of timestamp[x] to timestamp[x+1] when this happens.
ps:The arrangement of quantity needs to be in the same order as in the list when plotting.
I have been working with this using the following code, which timestamp is the name of the list which contain all the timestamp values:
for t in timestamp:
previous = timestamp[t-1]
increment = 0
if previous > timestamp[t]:
increment = previous
t += increment
delta = datetime.timedelta(0, (t - startTimeStamp) / 1000);
timeAtT = fileStartDate + (delta + startTime)
print("time at t=" + str(t) + " is: " + str(timeAtT));
previous = t
However it comes out with TypeError: list indices must be integers, not tuples. May I know how to solve this, or any other ways of doing this task? Thanks!
The problem is that you're treating t as if it is an index of the list. In your case, t holds the actual values of the list, so constructions like timestamp[t] are not valid. You either want:
for t in range(len(timestamp)):
Or if you want both an index and the value:
for (t, value) in enumerate(timestamp):
When you for the in timestamp you are making t take on the value of each item in timestamp. But then you try to use t as an index to make previous. To do this, try:
for i, t, in enumerate(timestamp):
previous = timestamp[i]
current = t
Also when you get TypeErrors like this make sure you try printing out the intermediate steps, so you can see exactly what is going wrong.
Question: write a program which first defines functions minFromList(list) and maxFromList(list). Program should initialize an empty list and then prompt user for an integer and keep prompting for integers, adding each integer to the list, until the user enters a single period character. Program should than call minFromList and maxFromList with the list of integers as an argument and print the results returned by the function calls.
I can't figure out how to get the min and max returned from each function separately. And now I've added extra code so I'm totally lost. Anything helps! Thanks!
What I have so far:
def minFromList(list)
texts = []
while (text != -1):
texts.append(text)
high = max(texts)
return texts
def maxFromList(list)
texts []
while (text != -1):
texts.append(text)
low = min(texts)
return texts
text = raw_input("Enter an integer (period to end): ")
list = []
while text != '.':
textInt = int(text)
list.append(textInt)
text = raw_input("Enter an integer (period to end): ")
print "The lowest number entered was: " , minFromList(list)
print "The highest number entered was: " , maxFromList(list)
I think the part of the assignment that might have confused you was about initializing an empty list and where to do it. Your main body that collects data is good and does what it should. But you ended up doing too much with your max and min functions. Again a misleading part was that assignment is that it suggested you write a custom routine for these functions even though max() and min() exist in python and return exactly what you need.
Its another story if you are required to write your own max and min, and are not permitted to use the built in functions. At that point you would need to loop over each value in the list and track the biggest or smallest. Then return the final value.
Without directly giving you too much of the specific answer, here are some individual examples of the parts you may need...
# looping over the items in a list
value = 1
for item in aList:
if item == value:
print "value is 1!"
# basic function with arguments and a return value
def aFunc(start):
end = start + 1
return end
print aFunc(1)
# result: 2
# some useful comparison operators
print 1 > 2 # False
print 2 > 1 # True
That should hopefully be enough general information for you to piece together your custom min and max functions. While there are some more advanced and efficient ways to do min and max, I think to start out, a simple for loop over the list would be easiest.