So, really I'm just confused, I've been learning python, and I was given an exercise to find the performance speed of a function, however after finishing the code I received an error in the time output, it was 3.215000000000856e-06, this value varies with every time I run the program though so you probably won't get the same output.(in reality it was less then a second.) I went through the video where it explained how to write how they did it and changed a how I wrote a statement, now my code is Identical, to theirs but with different variable names, I ran the program and the same problem, however they didn't experience this issue, heres the code:
import time
SetContainer = {I for I in range(1002)}
ListContainer = [I for I in range(1002)]
def Search(Value, Container):
if Value in Container:
return True
else:
return False
def Function_Speed(Func, HMT = 1, **arg):
sum = 0
for I in range(HMT):
start = time.perf_counter()
print (Func(**arg))
end = time.perf_counter()
sum = sum + (end - start)
return (sum, )
print (Function_Speed(Search, Value = 402,
Container = SetContainer))
Possible Answers?:
Could it be my hardware? My version of Python is no longer supported(The video is over a year old I'm using 3.6) or it turns out I screwed up.
(Edit:) By the way it does work when the function is printed so this example works, but without print (Func(**arg)) and instead Func(**arg) it doesn't work.
First, your capitalization of variable and function violates convention. Though not a syntax error, it makes it difficult for Python programmers to follow your code.
Second, the result you got makes sense. A single iteration of the search on a modern computer takes very little time. If your printed result was in the form 1.23456e-05, then that is a valid number so small that the default representation shifted to scientific notation.
Add a value for HMT, starting with 100000, and see what is output.
Related
I have a class that does different mathematical calculations on a cycle
I want to speed up its processing with Numba
And now I'm trying to apply Namba to init functions
The class itself and its init function looks like this:
class Generic(object):
##numba.vectorize
def __init__(self, N, N1, S1, S2):
A = [['Time'],['Time','Price'], ["Time", 'Qty'], ['Time', 'Open_interest'], ['Time','Operation','Quantity']]
self.table = pd.DataFrame(pd.read_csv('Datasets\\RobotMath\\table_OI.csv'))
self.Reader()
for a in A:
if 'Time' in a:
self.df = pd.DataFrame(pd.read_csv(Ex2_Csv, usecols=a, parse_dates=[0]))
self.df['Time'] = self.df['Time'].dt.floor("S", 0)
self.df['Time'] = pd.to_datetime(self.df['Time']).dt.time
if a == ['Time']:
self.Tik()
elif a == ['Time','Price']:
self.Poc()
self.Pmm()
self.SredPrice()
self.Delta_Ema(N, N1)
self.Comulative(S1, S2)
self.M()
elif a == ["Time", 'Qty']:
self.Volume()
elif a == ['Time', 'Open_interest']:
self.Open_intrest()
elif a == ['Time','Operation','Quantity']:
self.D()
#self.Dataset()
else:
print('Something went wrong', f"Set Error: {0} ".format(a))
All functions of the class are ordinary column calculations using Pandas.
Here are two of them for example:
def Tik(self):
df2 = self.df.groupby('Time').value_counts(ascending=False)
df2.to_csv('Datasets\\RobotMath\\Tik.csv', )
def Poc(self):
g = self.df.groupby('Time', sort=False)
out = (g.last()-g.first()).reset_index()
out.to_csv('Datasets\\RobotMath\\Poc.csv', index=False)
I tried to use Numba in different ways, but I got an error everywhere.
Is it possible to speed up exactly the init function? Or do I need to look for another way and I can 't do without rewriting the class ?
So there's nothing special or magical about the init function, at run time, it's just another function like any other.
In terms of performance, your class is doing quite a lot here - it might be worth breaking down the timings of each component first to establish where your performance hang-ups lie.
For example, just reading in the files might be responsible for a fair amount of that time, and for Ex2_Csv you repeatedly do that within a loop, which is likely to be sub-optimal depending on the volume of data you're dealing with, but before targeting a resolution (like Numba for example) it'd be prudent to identify which aspect of the code is performing within expectations.
You can gather that information in a number of ways, but the simplest might be to add in some tagged print statements that emit the elapsed time since the last print statement.
e.g.
start = datetime.datetime.now()
print("starting", start)
## some block of code that does XXX
XXX_finish = datetime.datetime.now()
print("XXX_finished at", XXX_finish, "taking", XXX_finish-start)
## and repeat to generate a runtime report showing timings for each block of code
Then you can break the runtime of your program down into feature-aligned chunks, then when tweaking you'll be able to directly see the effect. Profiling the runtime of your code like this can really help when making performance tweaks, and it helps sharpen focus on what tweaks are benefiting/harming specific areas of your code.
For example, in the section that's performing the group-bys, with some timestamp outputs before and after you can compare running it with the numba turned on, and then again with it turned off.
From there, with a little careful debugging, sensible logging of times and a bit of old-fashioned sleuthing (I tend to jot these kinds of things down with paper and pencil) you (and if you share your findings, we) ought to be in a better position to answer your question more precisely.
Introduction
Today I found a weird behaviour in python while running some experiments with exponentiation and I was wondering if someone here knows what's happening. In my experiments, I was trying to check what is faster in python int**int or float**float. To check that I run some small snippets, and I found a really weird behaviour.
Weird results
My first approach was just to write some for loops and prints to check which one is faster. The snipper I used is this one
import time
# Run powers outside a method
ti = time.time()
for i in range(EXPERIMENTS):
x = 2**2
tf = time.time()
print(f"int**int took {tf-ti:.5f} seconds")
ti = time.time()
for i in range(EXPERIMENTS):
x = 2.**2.
tf = time.time()
print(f"float**float took {tf-ti:.5f} seconds")
After running it I got
int**int took 0.03004
float**float took 0.03070 seconds
Cool, it seems that data types do not affect the execution time. However, since I try to be a clean coder I refactored the repeated logic in a function power_time
import time
# Run powers in a method
def power_time(base, exponent):
ti = time.time()
for i in range(EXPERIMENTS):
x = base ** exponent
tf = time.time()
return tf-ti
print(f"int**int took {power_time(2, 2):.5f} seconds")
print(f"float**float took {power_time(2., 2.):5f} seconds")
And what a surprise of mine when I got these results
int**int took 0.20140 seconds
float**float took 0.05051 seconds
The refactor didn't affect a lot the float case, but it multiplied by ~7 the time required for the int case.
Conclusions and questions
Apparently, running something in a method can slow down your process depending on your data types, and that's really weird to me.
Also, if I run the same experiments but change ** by * or + the weird results disappear, and all the approaches give more or less the same results
Does someone know why is this happening? Am I missing something?
Apparently, running something in a method can slow down your process depending on your data types, and that's really weird to me.
It would be really weird if it was not like this! You can write your class that has it's own ** operator (through implementing the __pow__(self, other) method), and you could, for example, sleep 1s in there. Why should that take as long as taking a float to the power of another?
So, yeah, Python is a dynamically typed language. So, the operations done on data depend on the type of that data, and things can generally take different times.
In your first example, the difference never arises, because a) most probably the values get cached, because right after parsing it's clear that 2**2 is a constant and does not need to get evaluated every loop. Even if that's not the case, b) the time it costs to run a loop in python is hundreds of times that it takes to actually execute the math here – again, dynamically typed, dynamically named.
base**exponent is a whole different story. None about this is constant. So, there's actually going to be a calculation every iteration.
Now, the ** operator (__rpow__ in the Python data model) for Python's built-in float type is specified to do the float exponent (which is something implemented in highly optimized C and assembler), as exponentiation can elegantly be done on floating point numbers. Look for nb_power in cpython's floatobject.c. So, for the float case, the actual calculation is "free" for all that matters, again, because your loop is limited by how much effort it is to resolve all the names, types and functions to call in your loop. Not by doing the actual math, which is trivial.
The ** operator on Python's built-in int type is not as neatly optimized. It's a lot more complicated – it needs to do checks like "if the exponent is negative, return a float," and it does not do elementary math that your computer can do with a simple instruction, it handles arbitrary-length integers (remember, a python integer has as many bytes as it needs. You can save numbers that are larger than 64 bit in a Python integer!), which comes with allocation and deallocations. (I encourage you to read long_pow in CPython's longobject.c; it has 200 lines.)
All in all, integer exponentiation is expensive in python, because of python's type system.
I've been facing strange results in a numerical problem I've been working on and since I started to program in python recently, I would like to see if you can help me.
Basically, the program has a function that is minimized for different values in a nested loop. I'll skip the details of the function for simplicity but I checked it several times and it is working correctly. Basically, the code looks like that:
def function(mins,args):
#I'll skip those details for simplicity
return #Value
ranges = ((0+np.pi/100),(np.pi/2-np.pi/100),np.pi/100)
while Ri[0] < R:
Ri[1] = 0; Ri[n-1] = R-(sum(Ri)-Ri[n-1])
while Ri[1] < (R-Ri[0]):
res = opt.brute(function, ranges, args=[args], finish=None)
F = function(res,args)
print(f'F = {F}')
Ri[1] += dR
Ri[2] = R-(sum(Ri)-Ri[n-1])
Ri[0] += dR
So, ignoring the Ri[] meaning in the loop (which is a variable of the function), for every increment in Ri[] the program makes a minimization of mins by the scipy.optimize.brute obtaining res as the answer, then it should run the function with this answer and print the result F. The problem is that it always get the same answer, no matter what the parameters I get in the minimization (which is working fine, I checked). It's strange because if I get the values from the minimization (which is an n-sized array, being n an input) and create a new program to run the same function and just get the result of the function, it returns me the right answer.
Anyone can give me a hint why I'm getting this? If I didn't make myself clear please tell and I could provide more details about the function and variables. Thanks in advance!
I've been working on solving Problem 31 from Project Euler. I'm almost finished, but my code is giving me what I know is the wrong answer. My dad showed the right answer, and mine is giving me something completely wrong.
coins = (1,2,5,10,20,50,100,200)
amount = 200
def ways(target, avc):
target = amount
if avc <= 1:
return 1
res = 0
while target >= 0:
res = res + ways(target, avc-1)
target = target - coins[avc]
return res
print(ways(amount, 7))
This is the answer I get.
284130
The answer is supposed to 73682.
EDIT: Thank you to everyone who answered my question. Thanks to all of you, I have figured it out!
I see several ways how you can improve the way you're working on problems:
Copy the task description into your source code. This will make you more independent from Internet resources. That allows you to work offline. Even if the page is down or disappears completely, it will enable someone else understanding what problem you're trying to solve.
Use an IDE that does proper syntax highlighting and gives you hints what might be wrong with your code. The IDE will tell you what #aronquemarr mentioned in the comments:
There's also a thing called magic numbers. Many of the numbers in your code can be explained, but why do you start with an avc of 7?. Where does that number come from? 7 does not occur in the problem description.
Use better naming, so you understand better what your code does. What does avc stand for? If you think it's available_coins, that's not correct. There are 8 different coins, not 7.
If it still does not work, reduce the problem to a simpler one. Start with the most simple one you can think of, e.g. make only 1 type of coin available and set the amount to 2 cent. There should only be 1 solution.
To make sure this simple result will never break, introduce an assertion or unit test. They will help you when you change the code to work with larger datasets. They will also change the way you write code. In your case you'll find that accessing the variable coins from an outer scope is probably not a good idea, because it will break your assertion when you switch to the next level. The change that is needed will make your methods more self-contained and robust.
Then, increase the difficulty by having 2 different coins etc.
Examples for the first assertions that I used on this problem:
# Only 1 way of making 1 with only a 1 coin of value 1
assert(find_combinations([1], 1) == 1)
# Still only 1 way of making 1 with two coins of values 1 and 2
assert(find_combinations([1,2], 1) == 1)
# Two ways of making 2 with two coins of values 1 and 2
assert(find_combinations([1,2], 2) == 2)
As soon as the result does no longer match your expectation, use the debugger and step through your code. After every line, check the values in the debugger against the values that you think they should be.
One time, the debugger will be your best friend. And you can never imagine how you did stuff without it. You just need to learn how to use it.
Firstly, read Thomas Weller's answer. They give some excellent suggestions as to how to improve your coding and problem solving.
Secondly, your code works, and gives the correct answer after the following changes:
As suggested, remove the target = amount line. You're resigning the argument of the function to global amount on each call (even on the recursive calls). Having removed that the answer comes up to 3275 - still not the right answer.
The other thing you have to remember is that Python is 0-indexed. Therefore, your simplest case condition ought to read if avc <= 0: ... (not <=1).
Having made these two changes, your code gives the correct answer. Here is your code with these changes:
coins = (1,2,5,10,20,50,100,200)
amount = 200
def ways(target, avc):
if avc <= 0:
return 1
res = 0
while target >= 0:
res = res + ways(target, avc-1)
target = target - coins[avc]
return res
print(ways(amount, 7))
Lastly, there are plenty of Project Euler answers out there. Having solved the yourself, it might be useful to have a look at what others did. For reference, I have not actually solved this Project Euler before, so I had to do that first. Here is my solution. I've added a pile of comments on top of it to explain what it does.
EDIT
I've just noticed something quite worrying: Your code (after fixes) works only if the first element of coins is 1. All the other elements can be shuffled:
# This still works ok
coins = (1,2,200,10,20,50,100,5)
# But this does not
coins = (2,1,5,10,20,50,100,200)
To ensure that this is always the case, you can just do the following:
coins = ... # <- Some not necessarily sorted tuple
coins = tuple(sorted(coins))
In principle there are a few other issues. Your solution breaks for non-unique values coins, and which don't include 1. The former you could fix with the use of sets, and the latter by modifying your if avc <= 0: case (check for the divisibility of the target by the remaining coin). Here is you piece of code with these changes implemented. I've also renamed the variables and the function to be a bit easier to read, and used coins as the argument, rather that avc pointer (which, by the way, I could not stop reading as avec):
unique = lambda x: sorted(set(x)) # Sorted unique list
def find_combinations(target, coins):
''' Find the number of ways coins can make up the target amount '''
coins = unique(coins)
if len(coins)==1: # Only one coin, therefore just check divisibility
return 1 if not target%coins[0] else 0
combinations = 0
while target >= 0:
combinations += find_combinations(target, coins[:-1])
target -= coins[-1]
return combinations
coins = [2,1,5,10,20,50,100,200] # <- Now works
amount = 200
print(find_combinations(amount, coins))
So, I have a function which basically does this:
import os
import json
import requests
from openpyxl import load_workbook
def function(data):
statuslist = []
for i in range(len(data[0])):
result = performOperation(data[0][i])
if result in satisfying_results:
print("its okay")
statuslist.append("Pass")
else:
print("absolutely not okay")
statuslist.append("Fail" + result)
return statuslist
Then, I invoke the function like this (I've added error handling to check what will happen after stumbling upon the reason for me asking this question), and was actually amazed by the results, as the function returns None, and then executes:
statuslist = function(data)
print(statuslist)
try:
for i in range(len(statuslist)):
anotherFunction(i)
print("Confirmation that it is working")
except TypeError:
print("This is utterly nonsense I think")
The output of the program is then as follows:
None
This is utterly nonsense I think
its okay
its okay
its okay
absolutely not okay
its okay
There is only single return statement at the end of the function, the function is not recursive, its pretty straightforward and top-down(but parses a lot of data in the meantime).
From the output log, it appears that the function first returns None, and then is properly executed. I am puzzled, and I were unable to find any similar problems over the internet (maybe I phrase the question incorrectly).
Even if there were some inconsistency in the code, I'd still expect it to return [] instead.
After changing the initial list to statuslist = ["WTF"], the return is [].
To rule out the fact that I have modified the list in some other functions performed in the function(data), I have changed the name of the initial list several times - the results are consistently beyond my comprehension
I will be very grateful on tips in debugging the issue. Why does the function return the value first, and is executed after?
While being unable to write the code which would at the same time present what happened in my code in full spectrum, be readable, and wouldn't interfere with no security policies of the company, I have re-wrote it in a simpler form (the original code has been written while I had 3 months of programming experience), and the issue does not reproduce anymore. I guess there had be some level of nesting of functions that I have misinterpreted, and this re-written code, doing pretty much the same, correctly returns me the expected list.
Thank you everyone for your time and suggestions.
So, the answer appears to be: You do not understand your own code, make it simpler.