I've looked at the Python Time module and can't find anything that gives the integer of how many seconds since 1970 as PHP does with time().
Am I simply missing something here or is there a common way to do this that's simply not listed there?
import time
print int(time.time())
time.time() does it, but it might be float instead of int which i assume you expect. that is, precision can be higher than 1 sec on some systems.
I recommend reading "Date and Time Representation in Python". I found it very enlightening.
Related
Can someone help me understand how process_time() works?
My code is
from time import process_time
t = process_time()
def fibonacci_of(n):
if n in cache: # Base case
return cache[n]
# Compute and cache the Fibonacci number
cache[n] = fibonacci_of(n - 1) + fibonacci_of(n - 2) # Recursive case
return cache[n]
cache = {0: 0, 1: 1}
fib = [fibonacci_of(n) for n in range(1500)]
print(fib[-1])
print(process_time() - t)
And last print is always 0.0.
My expected result is something like 0.764891862869
Docs at https://docs.python.org/3/library/time.html#time.process_time don't help newbie me :(
I tried some other functions and reading docs. But without success.
I'd assume this is OS dependent. Linux lets me get down to ~5 microseconds using process_time, but other operating systems might well not deal well with differences this small and return zero instead.
It's for this reason that Python exposes other timers that are designed to be more accurate over shorter time scales. Specifically, perf_counter is specified as using:
the highest available resolution to measure a short duration
Using this lets me measure down to ~80 nanoseconds, whether I'm using perf_counter or perf_counter_ns.
As documentation suggests:
time.process_time() → float
Return the value (in fractional seconds) of the sum of the system and user
CPU time of the current process. It does not include time
elapsed during sleep. It is process-wide by definition. The reference
point of the returned value is undefined, so that only the difference
between the results of two calls is valid.
Use process_time_ns() to avoid the precision loss caused by the float type.
This last sentence is the most important and differentiates between the very precise function: process_time_ns() and one that is less precise but more appropriate for the long-running processes - time.process_time().
It turns out that when you measure a couple nanoseconds (nano means 10**-9) and try to express it in seconds dividing by 10**9, you often go out of the precision of float (64 bits) and end up rounding up to zero. The float limitations are described in the Python documentation.
To get to know more you can also read a general introduction to precision in floating point arithmetic (ambitious) and its peril (caveats).
I am continuing to practice on Data Camp and the current session covers Datetimes, timezone, dateutil, etc. However, there is a function I am not sure about. The function mentioned in the below code is .enfold() and I cannot seem to locate an easy explanation in the documentation. What is its general purpose and what is it doing in the Data Camp practice code below?
trip_durations = []
for trip in onebike_datetimes:
# When the start is later than the end, set the fold to be 1
if trip['start'] > trip['end']:
trip['end'] = tz.enfold(trip['end'])
# Convert to UTC
start = trip['start'].astimezone(tz.gettz('UTC'))
end = trip['end'].astimezone(tz.gettz('UTC'))
# Subtract the difference
trip_length_seconds = (end-start).total_seconds()
trip_durations.append(trip_length_seconds)
# Take the shortest trip duration
print("Shortest trip: " + str(min(trip_durations)))
I tried finding the documenatation for enfold directly under datetime documentation for Python but it kept mentioning "fold" under each of the functions/methods for the datetime package. Not super explanatory, so more information regarding .enfold() and "fold" in general would be helpful.
I am assuming that enfold in your code is the function from the tz module in the third-party timezone package: https://dateutil.readthedocs.io/en/stable/tz.html#dateutil.tz.enfold
As the documentation mentions, "fold" is an attribute of datetime objects whose implementation was changed in Python 3.6 by PEP 495. I cannot explain better what "fold" means than what is described in that document (unless my example below counts).
Apparently "fold" existed before Python 3.6, but was used/accessed in a different way. The tz.enfold function exists to unify the programming interface to the "fold" attribute regardless of the Python version used.
The specific application of "fold" in this code example seems to be the calculation of trip durations with "start" and "end" times. If an "end" time is found which seems to be earlier than the "start" time, it is deduced that the "end" time was recorded after a timezone change has occurred where the clocks were turned back by 1 hour, and the time is to be interpreted as the later of the two possible "real" times with that "local" or "wall" time. enfold(trip["end"]) returns a datetime object with the same local time as trip["end"] but with "fold" set to 1.
Example: Clocks are turned back at 3 AM to 2 AM
1 AM 2 AM 3 AM (4 AM)
-------|-------|--F-S--| - - - | - - > old local time
/
/
/
/
- - - | - - - |--E----|-------|-----> new local time
(1 AM) 2 AM 3 AM 4 AM
S: start - 2:40 AM (old local time)
E: end - 2:20 AM (new local time, fold=1, 40 minutes later than S)
F: false end - 2:20 AM (old local time, fold=0, 20 minutes earlier than S)
Introduction
Today I found a weird behaviour in python while running some experiments with exponentiation and I was wondering if someone here knows what's happening. In my experiments, I was trying to check what is faster in python int**int or float**float. To check that I run some small snippets, and I found a really weird behaviour.
Weird results
My first approach was just to write some for loops and prints to check which one is faster. The snipper I used is this one
import time
# Run powers outside a method
ti = time.time()
for i in range(EXPERIMENTS):
x = 2**2
tf = time.time()
print(f"int**int took {tf-ti:.5f} seconds")
ti = time.time()
for i in range(EXPERIMENTS):
x = 2.**2.
tf = time.time()
print(f"float**float took {tf-ti:.5f} seconds")
After running it I got
int**int took 0.03004
float**float took 0.03070 seconds
Cool, it seems that data types do not affect the execution time. However, since I try to be a clean coder I refactored the repeated logic in a function power_time
import time
# Run powers in a method
def power_time(base, exponent):
ti = time.time()
for i in range(EXPERIMENTS):
x = base ** exponent
tf = time.time()
return tf-ti
print(f"int**int took {power_time(2, 2):.5f} seconds")
print(f"float**float took {power_time(2., 2.):5f} seconds")
And what a surprise of mine when I got these results
int**int took 0.20140 seconds
float**float took 0.05051 seconds
The refactor didn't affect a lot the float case, but it multiplied by ~7 the time required for the int case.
Conclusions and questions
Apparently, running something in a method can slow down your process depending on your data types, and that's really weird to me.
Also, if I run the same experiments but change ** by * or + the weird results disappear, and all the approaches give more or less the same results
Does someone know why is this happening? Am I missing something?
Apparently, running something in a method can slow down your process depending on your data types, and that's really weird to me.
It would be really weird if it was not like this! You can write your class that has it's own ** operator (through implementing the __pow__(self, other) method), and you could, for example, sleep 1s in there. Why should that take as long as taking a float to the power of another?
So, yeah, Python is a dynamically typed language. So, the operations done on data depend on the type of that data, and things can generally take different times.
In your first example, the difference never arises, because a) most probably the values get cached, because right after parsing it's clear that 2**2 is a constant and does not need to get evaluated every loop. Even if that's not the case, b) the time it costs to run a loop in python is hundreds of times that it takes to actually execute the math here – again, dynamically typed, dynamically named.
base**exponent is a whole different story. None about this is constant. So, there's actually going to be a calculation every iteration.
Now, the ** operator (__rpow__ in the Python data model) for Python's built-in float type is specified to do the float exponent (which is something implemented in highly optimized C and assembler), as exponentiation can elegantly be done on floating point numbers. Look for nb_power in cpython's floatobject.c. So, for the float case, the actual calculation is "free" for all that matters, again, because your loop is limited by how much effort it is to resolve all the names, types and functions to call in your loop. Not by doing the actual math, which is trivial.
The ** operator on Python's built-in int type is not as neatly optimized. It's a lot more complicated – it needs to do checks like "if the exponent is negative, return a float," and it does not do elementary math that your computer can do with a simple instruction, it handles arbitrary-length integers (remember, a python integer has as many bytes as it needs. You can save numbers that are larger than 64 bit in a Python integer!), which comes with allocation and deallocations. (I encourage you to read long_pow in CPython's longobject.c; it has 200 lines.)
All in all, integer exponentiation is expensive in python, because of python's type system.
I think I may have implemented this incorrectly because the results do not make sense. I have a Go program that counts to 1000000000:
package main
import (
"fmt"
)
func main() {
for i := 0; i < 1000000000; i++ {}
fmt.Println("Done")
}
It finishes in less than a second. On the other hand I have a Python script:
x = 0
while x < 1000000000:
x+=1
print 'Done'
It finishes in a few minutes.
Why is the Go version so much faster? Are they both counting up to 1000000000 or am I missing something?
One billion is not a very big number. Any reasonably modern machine should be able to do this in a few seconds at most, if it's able to do the work with native types. I verified this by writing an equivalent C program, reading the assembly to make sure that it actually was doing addition, and timing it (it completes in about 1.8 seconds on my machine).
Python, however, doesn't have a concept of natively typed variables (or meaningful type annotations at all), so it has to do hundreds of times as much work in this case. In short, the answer to your headline question is "yes". Go really can be that much faster than Python, even without any kind of compiler trickery like optimizing away a side-effect-free loop.
pypy actually does an impressive job of speeding up this loop
def main():
x = 0
while x < 1000000000:
x+=1
if __name__ == "__main__":
s=time.time()
main()
print time.time() - s
$ python count.py
44.221405983
$ pypy count.py
1.03511095047
~97% speedup!
Clarification for 3 people who didn't "get it". The Python language itself isn't slow. The CPython implementation is a relatively straight forward way of running the code. Pypy is another implementation of the language that does many tricky (especiallt the JIT) things that can make enormous differences. Directly answering the question in the title - Go isn't "that much" faster than Python, Go is that much faster than CPython.
Having said that, the code samples aren't really doing the same thing. Python needs to instantiate 1000000000 of its int objects. Go is just incrementing one memory location.
This scenario will highly favor decent natively-compiled statically-typed languages. Natively compiled statically-typed languages are capable of emitting a very trivial loop of say, 4-6 CPU opcodes that utilizes simple check-condition for termination. This loop has effectively zero branch prediction misses and can be effectively thought of as performing an increment every CPU cycle (this isn't entirely true, but..)
Python implementations have to do significantly more work, primarily due to the dynamic typing. Python must make several different calls (internal and external) just to add two ints together. In Python it must call __add__ (it is effectively i = i.__add__(1), but this syntax will only work in Python 3.x), which in turn has to check the type of the value passed (to make sure it is an int), then it adds the integer values (extracting them from both of the objects), and then the new integer value is wrapped up again in a new object. Finally it re-assigns the new object to the local variable. That's significantly more work than a single opcode to increment, and doesn't even address the loop itself - by comparison, the Go/native version is likely only incrementing a register by side-effect.
Java will fair much better in a trivial benchmark like this and will likely be fairly close to Go; the JIT and static-typing of the counter variable can ensure this (it uses a special integer add JVM instruction). Once again, Python has no such advantage. Now, there are some implementations like PyPy/RPython, which run a static-typing phase and should fare much better than CPython here ..
You've got two things at work here. The first of which is that Go is compiled to machine code and run directly on the CPU while Python is compiled to bytecode run against a (particularly slow) VM.
The second, and more significant, thing impacting performance is that the semantics of the two programs are actually significantly different. The Go version makes a "box" called "x" that holds a number and increments that by 1 on each pass through the program. The Python version actually has to create a new "box" (int object) on each cycle (and, eventually, has to throw them away). We can demonstrate this by modifying your programs slightly:
package main
import (
"fmt"
)
func main() {
for i := 0; i < 10; i++ {
fmt.Printf("%d %p\n", i, &i)
}
}
...and:
x = 0;
while x < 10:
x += 1
print x, id(x)
This is because Go, due to it's C roots, takes a variable name to refer to a place, where Python takes variable names to refer to things. Since an integer is considered a unique, immutable entity in python, we must constantly make new ones. Python should be slower than Go but you've picked a worst-case scenario - in the Benchmarks Game, we see go being, on average, about 25x times faster (100x in the worst case).
You've probably read that, if your Python programs are too slow, you can speed them up by moving things into C. Fortunately, in this case, somebody's already done this for you. If you rewrite your empty loop to use xrange() like so:
for x in xrange(1000000000):
pass
print "Done."
...you'll see it run about twice as fast. If you find loop counters to actually be a major bottleneck in your program, it might be time to investigate a new way of solving the problem.
#troq
I'm a little late to the party but I'd say the answer is yes and no. As #gnibbler pointed out, CPython is slower in the simple implementation but pypy is jit compiled for much faster code when you need it.
If you're doing numeric processing with CPython most will do it with numpy resulting in fast operations on arrays and matrices. Recently I've been doing a lot with numba which allows you to add a simple wrapper to your code. For this one I just added #njit to a function incALot() which runs your code above.
On my machine CPython takes 61 seconds, but with the numba wrapper it takes 7.2 microseconds which will be similar to C and maybe faster than Go. Thats an 8 million times speedup.
So, in Python, if things with numbers seem a bit slow, there are tools to address it - and you still get Python's programmer productivity and the REPL.
def incALot(y):
x = 0
while x < y:
x += 1
#njit('i8(i8)')
def nbIncALot(y):
x = 0
while x < y:
x += 1
return x
size = 1000000000
start = time.time()
incALot(size)
t1 = time.time() - start
start = time.time()
x = nbIncALot(size)
t2 = time.time() - start
print('CPython3 takes %.3fs, Numba takes %.9fs' %(t1, t2))
print('Speedup is: %.1f' % (t1/t2))
print('Just Checking:', x)
CPython3 takes 58.958s, Numba takes 0.000007153s
Speedup is: 8242982.2
Just Checking: 1000000000
Problem is Python is interpreted, GO isn't so there's no real way to bench test speeds. Interpreted languages usually (not always have a vm component) that's where the problem lies, any test you run is being run in interpreted bounds not actual runtime bounds. Go is slightly slower than C in terms of speed and that is mostly due to it using garbage collection instead of manual memory management. That said GO compared to Python is fast because its a compiled language, the only thing lacking in GO is bug testing I stand corrected if I'm wrong.
It is possible that the compiler realized that you didn't use the "i" variable after the loop, so it optimized the final code by removing the loop.
Even if you used it afterwards, the compiler is probably smart enough to substitute the loop with
i = 1000000000;
Hope this helps =)
I'm not familiar with go, but I'd guess that go version ignores the loop since the body of the loop does nothing. On the other hand, in the python version, you are incrementing x in the body of the loop so it's probably actually executing the loop.
I've been trying to find a way to get the time since 1970-01-01 00:00:00 UTC in seconds and nanoseconds in python and I cannot find anything that will give me the proper precision.
I have tried using time module, but that precision is only to microseconds, so the code I tried was:
import time
print time.time()
which gave me a result like this:
1267918039.01
However, I need a result that looks like this:
1267918039.331291406
Does anyone know a possible way to express UNIX time in seconds and nanoseconds? I cannot find a way to set the proper precision or get a result in the correct format. Thank you for any help
Since Python 3.7 it's easy to achieve with time.time_ns()
Similar to time() but returns time as an integer number of nanoseconds since the epoch.
All new features that includes nanoseconds in Python 3.7 release:
PEP 564: Add new time functions with nanosecond resolution
Your precision is just being lost due to string formatting:
>>> import time
>>> print "%.20f" % time.time()
1267919090.35663390159606933594
The problem is probably related to your OS, not Python. See the documentation of the time module: http://docs.python.org/library/time.html
time.time()
Return the time as a floating point
number expressed in seconds since the
epoch, in UTC. Note that even though
the time is always returned as a
floating point number, not all
systems provide time with a better
precision than 1 second. While this
function normally returns
non-decreasing values, it can return a
lower value than a previous call if
the system clock has been set back
between the two calls.
In other words: if your OS can't do it, Python can't do it. You can multiply the return value by the appropriate order of magnitude in order to get the nanosecond value, though, imprecise as it may be.
EDIT: The return is a float variable, so the number of digits after the comma will vary, whether your OS has that level of precision or not. You can format it with "%.nf" where n is the number of digits you want, though, if you want a fixed point string representation.
It depends on the type of clock and your OS and hardware wether or not you can even get nanosecond precision at all. From the time module documentation:
The precision of the various real-time functions may be less than suggested by the units in which their value or argument is expressed. E.g. on most Unix systems, the clock “ticks” only 50 or 100 times a second.
On Python 3, the time module gives you access to 5 different types of clock, each with different properties; some of these may offer you nanosecond precision timing. Use the time.get_clock_info() function to see what features each clock offers and what precision time is reported in.
On my OS X 10.11 laptop, the features available are:
>>> for name in ('clock', 'monotonic', 'perf_counter', 'process_time', 'time'):
... print(name, time.get_clock_info(name), sep=': ')
...
clock: namespace(adjustable=False, implementation='clock()', monotonic=True, resolution=1e-06)
monotonic: namespace(adjustable=False, implementation='mach_absolute_time()', monotonic=True, resolution=1e-09)
perf_counter: namespace(adjustable=False, implementation='mach_absolute_time()', monotonic=True, resolution=1e-09)
process_time: namespace(adjustable=False, implementation='getrusage(RUSAGE_SELF)', monotonic=True, resolution=1e-06)
time: namespace(adjustable=True, implementation='gettimeofday()', monotonic=False, resolution=1e-06)
so using the time.monotonic() or time.perf_counter() functions would theoretically give me nanosecond resolution. Neither clock gives me wall time, only elapsed time; the values are otherwise arbitrary. They are however useful for measuring how long something took.
It is unlikely that you will actually get nanosecond precision from any current machine.
The machine can't create precision, and displaying significant digits where not appropriate is not The Right Thing To Do.
I don't think there's a platform-independent way (maybe some third party has coded one, but I can't find it) to get time in nanoseconds; you need to do it in a platform-specific way. For example, this SO question's answer shows how to do it on platform that supply a librt.so system library for "realtime" operations.