Today I wrote a program using an array/list with 64000000 entries. However, when writing sigma=[1]*64000000 using Python it runs fine, but later on, as the program computes, my Ubuntu freezes - without any reaction to inputs, not even mouse movement. I tried twice and the results are the same.
When implemented in C++, long long sigma[64000000] holds up fine and runs very fast.
Is there any reason that my program would freeze in the middle of running, other than crashes at beginning?
EDIT: To reply to Chris below, my code did not freeze until a couple of loops later.
Thank you all!
For those who interested in seeing the code, this is the program, a brute-force Project Euler 211:
def e211():
ans=0
sigma=[1]*64000000
for i in range(2,64000000):
j=i;
if ((j%1000==0) or (j<100)):
print(j)
q=i*i
while j<64000000:
sigma[j]+=q
j+=i
for i in range(1,64000000):
j=int(sqrt(sigma[i]))
if j*j==sigma[i]:
ans+=i
if __name__=='__main__':
print(e211())
Python lists are lists of objects. A number in Python is an object in itself, and takes up somewhat more storage than the 64 bits needed to represent a long long in C++. In particular, Python transparently handles numbers larger than 32 bits, which end up taking a lot more space than a simple integer.
You may be find the standard Python array module useful. It provides Python-compatible access to uniform arrays of integers of specified size. (However, I note that it doesn't offer a 64-bit integer type.)
range(1,64000000):
This line creates a full list of size 64000000, so now you have two lists of that size in memory. Use xrange instead.
Related
so i’m making a simulation (billiard) in Python that needs to do a lot of updates per second. I know that division (or multiplying by decimal) is one of the slower operations. I was just wondering if that only goes for more “abstract” divisions (ex: ‘173/82’) or if it also goes for “easier” divisions, halving float or dividing by 10.
For extra info, it is for microstepping (getting a more accurate point of collision) so I’ll be dividing the speed. If it is costly to divide by 2 and 10, I’m thinking about precalculating the smaller speeds (on change of the balls speed), but please do suggest if there might be a better way.
Thanks for reading:)
Python is a very high level language which abstracts away floating point numbers as full objects. This kind of micro-optimization does not make sense in plain Python code.
If you are down to the algorithm you have to optimize in a few plain operations, one of the steps you could take is to promote the function where the calculation is to a helper framework that will run that code in native code, such as cython or numba. Cython, for example, will feature the same syntax as Python and be callable from ordinary Python code, but will be able to use the native CPU floating point implementation for the operations. Numba may be even simpler, requiring simply that the most critical functions be properly decorated.
If the results are to be consumed from and go into an array, you won't even have the language overhead of converting the value to a Python float instance or each data point.
Best way is to try it, just write a few lines of test code and loop over it a few million times. That's the beauty of Python, you can try things quickly.
Under the covers, the Python interpreter is doing a lot of work and the actual division itself will likely be a small component of the time.
Once the algorithm is right, you might try writing custom functions or classes for Python, in C. I've done this for Monte-Carlo simulation that has to handle millions of events per second.
I'm creating the parallelization of software that is doing a bunch of independent calculations so that the institute is not taking 6 hours to calculate one run. These results are saved in a list of arrays. The array length is static. This list is then dumped with pickle.dump(obj).
The difference is the single-threaded on is the size of 6.5 KiB and the multi-threaded one is 20,4 KiB
Firstly:
I did my research and yes you should not use pickle, but University is University. I also tested my multi-threaded implementation, I did that for the last days, I even compared a smaller sample by hand to be sure, so I'm not helped when you commend that I should check my multi-threaded implementation.
Now what I did:
First, of comparing all elements, between the single and multi-threaded created list, they are the same. Comparing the length, shape, sys.getsizeof(obj) they are the same.
Then I had a look into the pickle.dump(obj), this chooses the protocol on its own so I tried to use the different protocols. I got different results, but not the smaller one I expected.
Lastly I tripe checked if I really dump only the list and yes only the list is dumped.
As written above one would expect the get the exact same dump file size for the exact same list, so why is this not happening?
Yes, I'm new here and don't get all the rules so please give proper feedback on how to improve the question.
So the solution is rather trivial...
I should have checked also the types of the elements in the arrays. During the coding part in the night I used numpy for partitioning the list with values for the calculation and forgot about it completely.
With a simple array.tolist() this problem was fixed.
Conclusion:
Even in python check your types!
Beginner here, looked for an answer, but can't find one.
I know (or rather suspect) that part of the problem with the following code is how big the list of combinations gets.
(Maybe too, the last line seems like an error, in that, if I just run 'print ...' rather than 'comb += ...' it runs quickly and quits. Would 'append' be more graceful?)
I'm not 100% sure if the system hang is due to disk I/O (swapping?), CPU use, or memory... running it under Windows seems to result in a rather large disk I/O by 'System', while under Linux, top was showing high CPU and memory use before it was killed. In both cases though, the rest of the system was unusable while this operation was going (tried it in the Python interpreter directly, as well as in PyCharm).
So two part question: 1) is there some 'safe' way to test code like this that won't affect the rest of the system negatively, and 2) for this specific example, how should I rewrite it?
Trying this code (which I do not recommend!):
from itertools import combinations_with_replacement as cwr
comb = []
iterable = [1,2,3,4]
for x in xrange(4,100):
comb += cwr(iterable, x)
Thanks!
EDIT: Should have specified, but it is python2.7 code here as well (guess the xrange makes it obvious it's not 3 anyways). The Windows machine that's hanging has 4 GB of RAM, but it looks like the hang is on disk I/O. The original problem I was (and still am) working on was a question at codewars.com, about how many ways to make change given a list of possible coins and an amount to make. The solution I'd come up with worked for small amounts, and not big ones. Obviously, I need to come up with a better algorithm to solve that problem... so this is non-essential code, certainly. However, I would like to know if there's something I can do to set the programming environment so that bugs in my code don't propagate and choke my system this way.
FURTHER EDIT:
I was working on the problem again tonight, and realized that I didn't need to append to a master list (as some of you hinted to me in the comments), but just work on the subset that was collected. I hadn't really given enough of the code to make that obvious, but my key problem here was the line:
comb += cwr(iterable, x)
which should have been
comb = cwr(iterable, x)
Since you are trying to compute combinations with replacement, the number of orderings that must be considered will be 4^nth power.(4 because your iterable has 4 items).
More generally speaking, the number of orderings to be computed is the number of elements that can be at any spot in the list, raised to the power of how long the list is.
You are trying to compute 4^nth power for n between 3 and 99. 4^99 power is 4.01734511064748 * 1059.
I'm afraid not even a quantum computer would be much help computing that.
This isn't a very powerful laptop (3.7 GiB,Intel® Celeron(R) CPU N2820 # 2.13GHz × 2, 64bit ubuntu) but it did it in 15s or so (but did slow noticeably, top showed 100% cpu (dual core) and 35% memory. It took about 15s to release the memory when if finished.
len(comb) was 4,421,240
I had to change your code to
from itertools import combinations_with_replacement as cwr
comb = []
iterable = [1,2,3,4]
for x in xrange(4,100):
comb.extend(list(cwr(iterable, x)))
ED - just re-tried as per your original and it does run OK. My mistake. It looks as though it is the memory requirement. If you really need to do this you could write it to a file.
re-ED being curious about the back-of-an-envelope complexity calculation above not squaring my experience, I tried plotting n (X axis) against the length of list returned by combinations_with_replacement() (Y axis) for iterable lengths 2,3,4,5 i. The result seems to be below n**(i-1) (Which ties in with the figure I got for 4,99 above. It's actually (i+n-1)! / n! / (i-1)! which approximates to n**(i-1)/i! for n much bigger than i)
Also, running the plot I didn't keep the full comb list in memory and this did improve computer performance quite a bit, so maybe that's a relevant point: rather than produce a giant list then work on it afterwords, do the calculations in the loop.
I've come to work with a rather big simulation code which needs to store up to 189383040 floating point numbers. I know, this is large, but there isn't much that could be done to overcome this, like only looking at a portion of them or processing them one-by-one.
I've written a short script, which reproduces the error so I could quickly test it in different environments:
noSnapshots = 1830
noObjects = 14784
objectsDict={}
for obj in range(0, noObjects):
objectsDict[obj]=[[],[],[]]
for snapshot in range(0,noSnapshots):
objectsDict[obj][0].append([1.232143454,1.232143454,1.232143454])
objectsDict[obj][1].append([1.232143454,1.232143454,1.232143454])
objectsDict[obj][2].append(1.232143454)
It represents the structure of the actual code where some parameters (2 lists of length 3 each and 1 float) have to be stored for each of the 14784 objects at 1830 distinct locations. Obviously the numbers would be different each time for a different object, but in my code I just went for some randomly-typed number.
The thing, which I find not very surprising, is that it fails on Windows 7 Enterprise and Home Premium with a MemoryError. Even if I run the code on a machine with 16 GB of RAM it still fails, even though there's still plenty of memory left on the machine. So the first question would be: Why does it happen so? I'd like to think that the more RAM I've got the more things I can store in the memory.
I ran the same code on an Ubuntu 12.04 machine of my colleague (again with 16 GB of RAM) and it finished no-problem. So another thing which I'd like to know is: Is there anything I could do to make Windows happy with this code? I.e. give my Python process more memory on heap and stack?
Finally: Does anyone have any suggestions as to how to store plenty of data in memory in a manner similar to the one in the example code?
EDIT
After the answer I changed the code to:
import numpy
noSnapshots = 1830
noObjects = int(14784*1.3)
objectsDict={}
for obj in range(0, noObjects):
objectsDict[obj]=[[],[],[]]
objectsDict[obj][0].append(numpy.random.rand(noSnapshots,3))
objectsDict[obj][1].append(numpy.random.rand(noSnapshots,3))
objectsDict[obj][2].append(numpy.random.rand(noSnapshots,1))
and it works despite the larger amount of data, which has to be stored.
In Python, every float is an object on the heap, with its own reference count, etc. For storing this many floats, you really ought to use a dense representation of lists of floats, such as numpy's ndarray.
Also, because you are reusing the same float objects, you are not estimating the memory use correctly. You have lists of references to the same single float object. In a real case (where the floats are different) your memory use would be much higher. You really ought to use ndarray.
I am going through a link about generators that someone posted. In the beginning he compares the two functions below. On his setup he showed a speed increase of 5% with the generator.
I'm running windows XP, python 3.1.1, and cannot seem to duplicate the results. I keep showing the "old way"(logs1) as being slightly faster when tested with the provided logs and up to 1GB of duplicated data.
Can someone help me understand whats happening differently?
Thanks!
def logs1():
wwwlog = open("big-access-log")
total = 0
for line in wwwlog:
bytestr = line.rsplit(None,1)[1]
if bytestr != '-':
total += int(bytestr)
return total
def logs2():
wwwlog = open("big-access-log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
getbytes = (int(x) for x in bytecolumn if x != '-')
return sum(getbytes)
*edit, spacing messed up in copy/paste
For what it's worth, the main purpose of the speed comparison in the presentation was to point out that using generators does not introduce a huge performance overhead. Many programmers, when first seeing generators, might start wondering about the hidden costs. For example, is there all sorts of fancy magic going on behind the scenes? Is using this feature going to make my program run twice as slow?
In general that's not the case. The example is meant to show that a generator solution can run at essentially the same speed, if not slightly faster in some cases (although it depends on the situation, version of Python, etc.). If you are observing huge differences in performance between the two versions though, then that would be something worth investigating.
In David Beazley's slides that you linked to, he states that all tests were run with "Python 2.5.1 on OS X 10.4.11," and you say you're running tests with Python 3.1 on Windows XP. So, realize you're doing some apples to oranges comparison. I suspect of the two variables, the Python version matters much more.
Python 3 is a different beast than Python 2. Many things have changed under the hood, (even within the Python 2 branch). This includes performance optimizations as well as performance regressions (see, for example, Beazley's own recent blog post on I/O in Python 3). For this reason, the Python Performance Tips page states explicitly,
You should always test these tips with
your application and the version of
Python you intend to use and not just
blindly accept that one method is
faster than another.
I should mention that one area that you can count on generators helping is in reducing memory consumption, rather than CPU consumption. If you have a large amount of data where you calculate or extract something from each individual piece, and you don't need the data after, generators will shine. See generator comprehension for more details.
You don't have an answer after almost a half an hour. I'm posting something that makes sense to me, not necessarily the right answer. I figure that this is better than nothing after almost half an hour:
The first algorithm uses a generator. A generator functions by loading the first page of results from the list (into memory) and continually loads the successive pages (into memory) until there is nothing left to read from input.
The second algorithm uses two generators, each with an if statement for a total of two comparisons per loop as opposed to the first algorithm's one comparison.
Also the second algorithm calls the sum function at the end as opposed to the first algorithm that simply keeps adding relevant integers as it keeps encountering them.
As such, for sufficiently large inputs, the second algorithm has more comparisons and an extra function call than the first. This could possibly explain why it takes longer to finish than the first algorithm.
Hope this helps