Python using different module + package, and counts

Python using different module + package, and counts - python

yut.py is the following.
I need to define two variables:
throw_yut1() which randomly selects '배' or '등' by 60% and 40% respectively.
throw_yut4() which prints 4 results from throw_yut1() such as '배등배배' or '배배배배' etc.
import random
random.seed(10)
def throw_yut1():
if random.random() <= 0.6 :
return '배'
else:
return '등'
def throw_yut4():
result = ''
for i in range(4):
result = result + throw_yut1()
return result
main.py is the following.
Here, I need to repeat throw_yut4 1000 times and print the value and percentages of getting each variables that are listed in if statement.
import yut
counts = {}
for i in range(1000):
result = yut.throw_yut4
back = yut.throw_yut4().count('등')
belly = yut.throw_yut4().count('배')
if back == 3 and belly == 1:
counts['도'] = counts.get('도', 0) + 1
elif back == 2 and belly == 2:
counts['개'] = counts.get('개', 0) + 1
elif back == 1 and belly == 3:
counts['걸'] = counts.get('걸', 0) + 1
elif back == 0 and belly == 4:
counts['윷'] = counts.get('윷', 0) + 1
elif back == 4 and belly == 0:
counts['모'] = counts.get('모', 0) + 1
for key in ['도','개','걸','윷','모']:
print(f'{key} - {counts[key]} ({counts[key] / 1000 * 100:.1f}%)')
I keep getting
도 - 33 (3.3%)
개 - 115 (11.5%)
걸 - 131 (13.1%)
윷 - 22 (2.2%)
모 - 1 (0.1%)
but I am meant to get
도 - 157 (15.7%)
개 - 333 (33.3%)
걸 - 349 (34.9%)
윷 - 135 (13.5%)
모 - 26 (2.6%)
How can I fix my error?

I think the problem is this part of the code:
for i in range(1000):
result = yut.throw_yut4
back = yut.throw_yut4().count('등')
belly = yut.throw_yut4().count('배')
Seems like it should be:
for i in range(1000):
result = yut.throw_yut4()
back = result.count('등')
belly = result.count('배')
Otherwise you are counting back/belly from independent yut.throw_yut4() calls, so some unhandled (and presumably unintended) results are possible like back=4 and belly=4 ... this is why the totals you count are less than 1000 and 100%

It seems like your counts dictionary isn't getting filled with 1000 elements. It is likely because you are calling the yut.throw_yut4() twice, which give different results, such that none of your conditional checks pass.
try this instead:
result = yut.throw_yut4()
back = result.count('등')
belly = result.count('등')
천만에요

Related

Make This Input Function Faster

I'm practicing some exam questions and I've encountered a time limit issue that I can't figure out. I think its to do with how I'm iterating through the inputs.
It's the famous titanic dataset so I won't bother printing a sample of the df as I'm sure everyone is familiar with it.
The function compares the similarity between two passengers which are provided as input. Also, I am mapping the Sex column with integers in order to compare between passengers you'll see below.
I was also thinking it could be how I'm indexing and locating the values for each passenger but again I'm not sure
The function is as follows and the time limit is 1 second but when no_of_queries == 100 the function takes 1.091s.
df = pd.read_csv("titanic.csv")
mappings = {'male': 0, 'female':1}
df['Sex'] = df['Sex'].map(mappings)
def function_similarity(no_of_queries):
for num in range(int(no_of_queries)):
x = input()
passenger_a, passenger_b = x.split()
passenger_a, passenger_b = int(passenger_a), int(passenger_b)
result = 0
if int(df[df['PassengerId'] == passenger_a]['Pclass']) == int(df[df['PassengerId'] == passenger_b]['Pclass']):
result += 1
if int(df[df['PassengerId'] ==passenger_a]['Sex']) == int(df[df['PassengerId'] ==passenger_b]['Sex']):
result += 3
if int(df[df['PassengerId'] ==passenger_a]['SibSp']) == int(df[df['PassengerId'] ==passenger_b]['SibSp']):
result += 1
if int(df[df['PassengerId'] == passenger_a]['Parch']) == int(df[df['PassengerId'] == passenger_b]['Parch']):
result += 1
result += max(0, 2 - abs(float(df[df['PassengerId'] ==passenger_a]['Age']) - float(df[df['PassengerId'] ==passenger_b]['Age'])) / 10.0)
result += max(0, 2 - abs(float(df[df['PassengerId'] ==passenger_a]['Fare']) - float(df[df['PassengerId'] ==passenger_b]['Fare'])) / 5.0)
print(result / 10.0)
function_similarity(input())

Calculate passenger row by id value once per passengers a and b.
df = pd.read_csv("titanic.csv")
mappings = {'male': 0, 'female':1}
df['Sex'] = df['Sex'].map(mappings)
def function_similarity(no_of_queries):
for num in range(int(no_of_queries)):
x = input()
passenger_a, passenger_b = x.split()
passenger_a, passenger_b = df[df['PassengerId'] == int(passenger_a)], df[df['PassengerId'] == int(passenger_b)]
result = 0
if int(passenger_a['Pclass']) == int(passenger_b['Pclass']):
result += 1
if int(passenger_a['Sex']) == int(passenger_b['Sex']):
result += 3
if int(passenger_a['SibSp']) == int(passenger_b['SibSp']):
result += 1
if int(passenger_a['Parch']) == int(passenger_b['Parch']):
result += 1
result += max(0, 2 - abs(float(passenger_a['Age']) - float(passenger_b['Age'])) / 10.0)
result += max(0, 2 - abs(float(passenger_a['Fare']) - float(passenger_b['Fare'])) / 5.0)
print(result / 10.0)
function_similarity(input())

Trying to use zfill and increment characters with Python

Hello lovely stackoverflowians!
I am fairly new to programming. Only have been programming a little under 2 months using CS50 which uses C and MITx Python. I went on Codewars and am trying to solve a problem where you basically get an id and then come out with a license plate number like this aaa001...aaa999, aab001...zzz999
if you catch my drift. For some reason my code compiles but when I run it I get this error.
File "/Users/pauliekennedy/Desktop/temp.py", line 9, in find_the_number_plate
a_str = (a_numb.zfill(3), range(0, 10))
AttributeError: 'int' object has no attribute 'zfill'
Because of this I am not able to test my code. If you could help me with this problem I would be much appreciated. As well, if you have anything to say about my code in general, tips, advice how to make it better, and if it will achieve this goal at all. Here is my code thanks again all.
#set number to 1 to start
a_numb = 1
#incrementing loop when 999 go back set back 0
while a_numb <1001:
a_numb += 1
a_str = str(a_numb)
# giving the number 00 or just 0 in front
if a_numb < 100:
a_str = (a_numb.zfill(3), range(0, 10))
#resetting the number back to 1
if a_numb == 999:
a_numb = 1
# Setting i to 0 and incrementing the characters
i = 0
ch = 'a'
ch2 = 'a'
ch3 = 'a'
#looping through alphabet
for i in range(26):
ch = chr(ord(ch) + 1)
print(ch)
if i == 26:
i = 0
if ch == 'z':
ch2 = chr(ord(ch) + 1)
if ch == 'z' & ch2 == 'z':
ch3(ord(ch) + 1)
# Adding results together and returning the string of them all
letter_plate = str(ch3 + ch2 + ch)
plate = str(a_numb) + str(letter_plate)
return plate```

Maybe you could consider using f-string string formatting instead:
def find_the_number_plate(customer_id):
number_part = customer_id % 999 + 1
customer_id //= 999
letter_part = ['a', 'a', 'a']
i = 0
while customer_id:
letter_part[i] = chr(ord('a') + customer_id % 26)
customer_id //= 26
i += 1
return f"{''.join(letter_part)}{number_part:03}"

You could use product from itertools to form the license plate numbers from 3 letters and numbers from 1 to 999 formatted with leading zeros:
from itertools import product
letter = "abcdefghijklmnopqrstuvwxyz"
numbers = (f"{n:03}" for n in range(1,1000))
plates = [*map("".join,product(letter,letter,letter,numbers))]
for plate in plates: print(plate)
aaa001
aaa002
aaa003
aaa004
aaa005
aaa006
aaa007
aaa008
...
If you only need to access a license place at a specific index, you don't have to generate the whole list. You can figure out which plate number will be at a given index by decomposing the index in chunks of 999,26,26,26 corresponding to the available option at each position/chunk of the number.
def plate(i):
letter = "abcdefghijklmnopqrstuvwxyz"
result = f"{i%999+1:03}"
i //= 999
for _ in range(3):
result = letter[i%26] + result
i //= 26
return result
output:
for i in range(10):print(plate(i))
aaa001
aaa002
aaa003
aaa004
aaa005
aaa006
aaa007
aaa008
aaa009
aaa010
plate(2021) # aac024

If, else return else value even when the condition is true, inside a for loop

Here is the function i defined:
def count_longest(field, data):
l = len(field)
count = 0
final = 0
n = len(data)
for i in range(n):
count = 0
if data[i:i + l] is field:
while data[i - l: i] == data[i:i + l]:
count = count + 1
i = i + 1
else:
print("OK")
if final == 0 or count >= final:
final = count
return final
a = input("Enter the field - ")
b = input("Enter the data - ")
print(count_longest(a, b))
It works in some cases and gives incorrect output in most cases. I checked by printing the strings being compared, and even after matching the requirement, the loop results in "OK" which is to be printed when the condition is not true! I don't get it! Taking the simplest example, if i enter 'as', when prompted for field, and 'asdf', when prompted for data, i should get count = 1, as the longest iteration of the substring 'as' is once in the string 'asdf'. But i still get final as 0 at the end of the program. I added the else statement just to check the if the condition was being satisfied, but the program printed 'OK', therefore informing that the if condition has not been satisfied. While in the beginning itself, data[0 : 0 + 2] is equal to 'as', 2 being length of the "field".

There are a few things I notice when looking at your code.
First, use == rather than is to test for equality. The is operator checks if the left and right are referring to the very same object, whereas you want to properly compare them.
The following code shows that even numerical results that are equal might not be one and the same Python object:
print(2 ** 31 is 2 ** 30 + 2 ** 30) # <- False
print(2 ** 31 == 2 ** 30 + 2 ** 30) # <- True
(note: the first expression could either be False or True—depending on your Python interpreter).
Second, the while-loop looks rather suspicious. If you know you have found your sequence "as" at position i, you are repeating the while-loop as long as it is the same as in position i-1—which is probably something else, though. So, a better way to do the while-loop might be like so:
while data[i: i + l] == field:
count = count + 1
i = i + l # <- increase by l (length of field) !
Finally, something that might be surprising: changing the variable i inside the while-loop has no effect on the for-loop. That is, in the following example, the output will still be 0, 1, 2, 3, ..., 9, although it looks like it should skip every other element.
for i in range(10):
print(i)
i += 1
It does not effect the outcome of the function, but when debugging you might observe that the function seems to go backward after having found a run and go through parts of it again, resulting in additional "OK"s printed out.
UPDATE: Here is the complete function according to my remarks above:
def count_longest(field, data):
l = len(field)
count = 0
final = 0
n = len(data)
for i in range(n):
count = 0
while data[i: i + l] == field:
count = count + 1
i = i + l
if count >= final:
final = count
return final
Note that I made two additional simplifications. With my changes, you end up with an if and while that share the same condition, i.e:
if data[i:i+1] == field:
while data[i:i+1] == field:
...
In that case, the if is superfluous since it is already included in the condition of while.
Secondly, the condition if final == 0 or count >= final: can be simplified to just if count >= final:.

More efficient way to pull time data from a string (or DataFrame object)

I'm learning Python on my own and this is my first question here. Always was able to find everything needed already answered. Finally got something I believe it's worth to ask. It's just more specific task, which I don't even know what to search for.
One of our machines is generating a log file, which requires a lot of cleaning after loading to a DataFrame and before being able to use. Without going into too much details, a log file contains time record in a very weird format. It's build of minutes, seconds and miliseconds. I was able to decode it to seconds with use of a function shown below (and further convert it into time format with another one). It works fine, but this is a very basic function with a lot of if statemets.
My goal is to rewrite it into more less amateur looking, however the log time format puts some challenging limitations at least for me. And it's not helping that even the units are a combination of the same two letters.
Here are samples of all possible time record combinations:
test1 = 'T#3853m10s575ms' # 231190.575 [seconds]
test2 = 'T#10s575ms' # 10.575
test3 = 'T#3853m575ms' # 231180.575
test4 = 'T#575ms' # 0.575
test5 = 'T#3853m10s' # 231190
test6 = 'T#10s' # 10
test7 = 'T#3853m' # 231180
test8 = 'T#0ms' # 0
I've tried to write it in regular expression format as:
T#[0-9]*m?[0-9]*s?[0-9]*ms?
however there would always be at least one digit present and at least one unit.
Here is the logic I'm using inside the function:
function diagram
And here is the function I apply to a raw time column in a DataFrame:
def convert_time(string):
if string == 'T#0ms':
return 0
else:
ms_ = False if string.find('ms') == -1 else True
string = string[2:-2] if ms_ else string[2:]
s_ = False if string.find('s') == -1 else True
m_ = False if string.find('m') == -1 else True
if m_ and s_ and ms_:
m, temp = string.split('m')
s, ms = temp.split('s')
return int(m)*60 + int(s) + int(ms)*0.001
elif not m_ and s_ and ms_:
s, ms = string.split('s')
return int(s) + 0.001 * int(ms)
elif m_ and not s_ and ms_:
m, ms = string.split('m')
return 60*int(m) + 0.001 * int(ms)
elif not m_ and not s_ and ms_:
return int(string) * 0.001
elif m_ and s_ and not ms_:
m, s = string.split('m')
return 60*int(m) + int(s[:-1])
elif not m_ and s_ and not ms_:
return int(string[:-1])
elif m_ and not s_ and not ms_:
return int(string[:-1]) * 60
elif not m_ and not s_ and not ms_:
return -1
Like mentioned above a lack of experience doesn't allow me to write a better function to result in similar output (or better, e.g. directly in time format).
Hope that would be interesting enough to get some improvement hints. Thanks.

Using regex:
import re
def f(x):
x = x[2:]
time = re.findall(r'\d+', x)
timeType = re.findall(r'[a-zA-Z]+',x)
#print(time,timeType)
total = 0
for i,j in zip(time,timeType):
if j == 'm':
total += 60*float(i)
elif j =='s':
total+=float(i)
elif j == 'ms':
total += float(i)/1000
return total
test1 = 'T#3853m10s575ms' # 231190.575 [seconds]
test2 = 'T#10s575ms' # 10.575
test3 = 'T#3853m575ms' # 231180.575
test4 = 'T#575ms' # 0.575
test5 = 'T#3853m10s' # 231190
test6 = 'T#10s' # 10
test7 = 'T#3853m' # 231180
test8 = 'T#0ms' # 0
arr = [test1,test2,test3,test4,test5,test6,test7,test8]
for t in arr:
print(f(t))
Output:
231190.575
10.575
231180.575
0.575
231190.0
10.0
231180.0
0.0
[Finished in 0.7s]
Or you can make look code smaller if you have more time type like an hour, day etc..
Use map for it
import re
def symbol(j):
if j == 'm':
return 60
elif j =='s':
return 1
elif j == 'ms':
return .001
def f(x):
x = x[2:]
time = list(map(float,re.findall(r'\d+', x)))
timeType = list(map(symbol,re.findall(r'[a-zA-Z]+',x)))
#print(time,timeType)
return sum([a*b for a,b in zip(timeType,time)])
test1 = 'T#3853m10s575ms' # 231190.575 [seconds]
test2 = 'T#10s575ms' # 10.575
test3 = 'T#3853m575ms' # 231180.575
test4 = 'T#575ms' # 0.575
test5 = 'T#3853m10s' # 231190
test6 = 'T#10s' # 10
test7 = 'T#3853m' # 231180
test8 = 'T#0ms' # 0
arr = [test1,test2,test3,test4,test5,test6,test7,test8]
for t in arr:
print(f(t))

def str_to_sec(time_str):
return_int = 0
cur_int = 0
# remove start characters and replace 'ms' with a single character as unit
time_str = time_str.replace('T#','').replace('ms', 'p')
# build multiplier matrix
split_order = ['m', 's', 'p']
multiplier = [60, 1, 0.001]
calc_multiplier_dic = dict(zip(split_order, multiplier))
# loop through string and update the cumulative time
for ch in time_str:
if ch.isnumeric():
cur_int = cur_int * 10 + int(ch)
continue
if ch.isalpha():
return_int += cur_int * calc_multiplier_dic[ch]
cur_int = 0
return return_int

Finding the longest repetitive piece in a string python

I want to write a function "longest" where my input doc test looks like this (python)
"""
>>>longest('1211')
1
>>>longest('1212')
2
>>>longest('212111212112112121222222212212112121')
2
>>>lvs('1')
0
>>>lvs('121')
0
>>>lvs('12112')
0
"""
What I am trying to achieve is that for example in the first case the 1 is repeated in the back with "11" so the repeated part is 1 and this repeated part is 1 character long it is this length that this function should return.
So in the case of the second you got "1212" so the repeated part is "12" which is 2 characters long.
The tricky thing here is that the longest is "2222222" but this doesn't matter since it is not in the front nor the back. The solution for the last doc test is that 21 is being repeated which is 2 characters long.
The code I have created this far is following
import re
def repetitions(s):
r = re.compile(r"(.+?)\1+")
for match in r.finditer(s):
yield (match.group(1), len(match.group(0)) / len(match.group(1)))
def longest(s):
"""
>>> longest('1211')
1
"""
nummer_hoeveel_keer = dict(repetitions(s)) #gives a dictionary with as key the number (for doctest 1 this be 1) and as value the length of the key
if nummer_hoeveel_keer == {}: #if there are no repetitive nothing should be returnd
return 0
sleutels = nummer_hoeveel_keer.keys() #here i collect the keys to see which has has the longest length
lengtes = {}
for sleutel in sleutels:
lengte = len(sleutel)
lengtes[lengte] = sleutel
while lengtes != {}: #as long there isn't a match and the list isn't empty i keep looking for the longest repetitive which is or in the beginning or in the back
maximum_lengte = max(lengtes.keys())
lengte_sleutel = {v: k for k, v in lengtes.items()}
x= int(nummer_hoeveel_keer[(lengtes[maximum_lengte])])
achter = s[len(s) - maximum_lengte*x:]
voor = s[:maximum_lengte*x]
combinatie = lengtes[maximum_lengte]*x
if achter == combinatie or voor == combinatie:
return maximum_lengte
del lengtes[str(maximum_lengte)]
return 0
when following doc test is put in this code
"""
longest('12112')
0
""
there is a key error where I put "del lengtes[str(maximum_lengte)]"
after a suggestion of #theausome I used his code as a base to work further with (see answer): this makes my code right now look like this:
def longest(s):
if len(s) == 1:
return 0
longest_patt = []
k = s[-1]
longest_patt.append(k)
for c in s[-2::-1]:
if c != k:
longest_patt.append(c)
else:
break
rev_l = list(reversed(longest_patt))
character = ''.join(rev_l)
length = len(rev_l)
s = s.replace(' ','')[:-length]
if s[-length:] == character:
return len(longest_patt)
else:
return 0
l = longest(s)
print l
Still there are some doc tests that are troubling me like for example:
>>>longest('211211222212121111111')
3 #I get 1
>>>longest('2111222122222221211221222112211')
4 #I get 1
>>>longest('122211222221221112111')
4 #I get 1
>>>longest('121212222112222112')
6 #I get 1
Anyone has ideas how to deal with/ approach this problem, maybe find a more graceful way around the problem ?

Try the below code. It works perfectly for your input doc tests.
def longest(s):
if len(s) == 1:
return 0
longest_patt = []
k = s[-1]
longest_patt.append(k)
for c in s[-2::-1]:
if c != k:
longest_patt.append(c)
else:
break
rev_l = list(reversed(longest_patt))
character = ''.join(rev_l)
length = len(rev_l)
s = s.replace(' ','')[:-length]
if s[-length:] == character:
return len(longest_patt)
else:
return 0
l = longest(s)
print l
Output:
longest('1211')
1
longest('1212')
2
longest('212111212112112121222222212212112121')
2
longest('1')
0
longest('121')
0
longest('12112')
0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python using different module + package, and counts - python

Related

Make This Input Function Faster

Trying to use zfill and increment characters with Python

If, else return else value even when the condition is true, inside a for loop

More efficient way to pull time data from a string (or DataFrame object)

Finding the longest repetitive piece in a string python

Categories

Resources