Python - Garbage collection very slow, can't disable gc

Python - Garbage collection very slow, can't disable gc - python

I am developing a program that uses pandas dataframes and large dictionaries. The dataframe is read from a CSV that is approx. 700MB.
I am using Python 3.7.3 on Windows
I noticed that the program I am running is extremely slow, and slows down after each loop of the algorithm.
The program reads every line of the dataframe, checks some conditions on every item of every line of the df, and if those conditions are met, it stores the item and his state in a dictionary. This dictionary can get pretty big.
I have tried profiling my code with CProfile and I have found that the garbage-collector is the function that uses up about 90% of the execution time.
I have seen similar problems resolved by calling gc.disable() but this did nothing for me.
Weirdly (I have no idea if this is normal) but if I print(len(gc.get_objects())) as the first line of the code I get 51053 which seems a lot considering no function has been called yet.
My CProfile attempt : (on a small chunk of the CSV, as it would take hours to complete the attempt on the full CSV)
cProfile.run('get_pfs_errors("Logs/L5/L5_2000.csv")', 'restats.txt')
import pstats
from pstats import SortKey
p = pstats.Stats('restats.txt')
p.sort_stats(SortKey.CUMULATIVE).print_stats(10)
p.sort_stats(SortKey.TIME).print_stats(10)
Here are the stats from CProfile :
Tue Jun 18 15:40:19 2019 restats.txt
1719320 function calls (1459451 primitive calls) in 7.569 seconds
Ordered by: cumulative time
List reduced from 819 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 7.569 7.569 {built-in method builtins.exec}
1 0.001 0.001 7.569 7.569 <string>:1(<module>)
1 0.000 0.000 7.568 7.568 C:/Users/BC744818/Documents/OPTISS_L1_5/test_profile.py:6(get_pfs_errors)
1 0.006 0.006 7.503 7.503 C:\Users\BC744818\Documents\OPTISS_L1_5\utils\compute_pfs_rules.py:416(compute_pfs_rules)
1 0.197 0.197 7.498 7.498 C:\Users\BC744818\Documents\OPTISS_L1_5\utils\compute_pfs_rules.py:323(test_logs)
264 0.001 0.000 6.532 0.025 C:\Users\BC744818\Documents\OPTISS_L1_5\venv\lib\site-packages\pandas\core\series.py:982(__setitem__)
529 0.010 0.000 6.158 0.012 C:\Users\BC744818\Documents\OPTISS_L1_5\venv\lib\site-packages\pandas\core\generic.py:3205(_check_setitem_copy)
528 6.125 0.012 6.125 0.012 {built-in method gc.collect}
264 0.004 0.000 3.430 0.013 C:\Users\BC744818\Documents\OPTISS_L1_5\venv\lib\site-packages\pandas\core\series.py:985(setitem)
264 0.004 0.000 3.413 0.013 C:\Users\BC744818\Documents\OPTISS_L1_5\venv\lib\site-packages\pandas\core\indexing.py:183(__setitem__)
Tue Jun 18 15:40:19 2019 restats.txt
1719320 function calls (1459451 primitive calls) in 7.569 seconds
Ordered by: internal time
List reduced from 819 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
528 6.125 0.012 6.125 0.012 {built-in method gc.collect}
264 0.405 0.002 0.405 0.002 {built-in method gc.get_objects}
1 0.197 0.197 7.498 7.498 C:\Users\BC744818\Documents\OPTISS_L1_5\utils\compute_pfs_rules.py:323(test_logs)
71280/33 0.048 0.000 0.091 0.003 C:\Users\BC744818\AppData\Local\Programs\Python\Python37\lib\copy.py:132(deepcopy)
159671 0.033 0.000 0.056 0.000 {built-in method builtins.isinstance}
289 0.026 0.000 0.026 0.000 {built-in method nt.stat}
167191/83791 0.024 0.000 0.040 0.000 C:\Users\BC744818\AppData\Local\Programs\Python\Python37\lib\json\encoder.py:333(_iterencode_dict)
8118/33 0.019 0.000 0.090 0.003 C:\Users\BC744818\AppData\Local\Programs\Python\Python37\lib\copy.py:236(_deepcopy_dict)
167263/83794 0.017 0.000 0.048 0.000 C:\Users\BC744818\AppData\Local\Programs\Python\Python37\lib\json\encoder.py:277(_iterencode_list)
1067/800 0.017 0.000 0.111 0.000 C:\Users\BC744818\Documents\OPTISS_L1_5\venv\lib\site-packages\pandas\core\indexes\base.py:253(__new__)

Thank you #user9993950, I solved it thanks to you.
When I tested this program, I had a SettingWithCopyWarning but I wanted to fix the speed of the program before fixing this warning.
Yet it so happens that by fixing the warning I also greatly increased the speed of the program and gc is no longer taking up all of the running time
I don't know what caused this though, if someone knows and wants to share the knowledge please do.

Related

Why does python call builtins.compile when importing numpy?

I ran this code with python 3.7 to see what happens when I call import numpy.
import cProfile, pstats
profiler = cProfile.Profile()
profiler.enable()
import numpy
profiler.disable()
# Get and print table of stats
stats = pstats.Stats(profiler).sort_stats('time')
stats.print_stats()
The first few lines of output look like this:
79557 function calls (76496 primitive calls) in 0.120 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
32/30 0.015 0.000 0.017 0.001 {built-in method _imp.create_dynamic}
318 0.015 0.000 0.015 0.000 {built-in method builtins.compile}
115 0.011 0.000 0.011 0.000 {built-in method marshal.loads}
648 0.006 0.000 0.006 0.000 {built-in method posix.stat}
119 0.004 0.000 0.005 0.000 <frozen importlib._bootstrap_external>:914(get_data)
246/244 0.004 0.000 0.007 0.000 {built-in method builtins.__build_class__}
329 0.002 0.000 0.012 0.000 <frozen importlib._bootstrap_external>:1356(find_spec)
59 0.002 0.000 0.002 0.000 {built-in method posix.getcwd}
It spends a lot of time on builtins.compile. Is it creating the bytecode for NumPy for pycache? Why would that happen every time?
I'm on Mac OS. What I really want is to speed up the import, and it seems to me that compile should not be necessary.

User L3viathan pointed out in a comment that the code for numpy contains explicit calls to compile. This explains why builtins.compile is getting called. Thanks!

How to time disassembled representation of a Python code source

So far, when I want to inspect what might cause some sort of code to run faster compared to a very similar method, I'm using the dis module. However, the further steps of comparing what the causes is basically adding/removing lines.
Is there a more sophisticated way of actually listing what the high-offenders are?

What kind of code do you want to analyze? If you want to analyze pure python code. You can use profile. For example:
import cProfile
cProfile.run("x=1")
Or you can run a function: cProfile.run("function()")
Then it will show you something like the following:
4 function calls in 0.013 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.013 0.013 0.013 0.013 <ipython-input-7-8201fb940887>:1(fun)
1 0.000 0.000 0.013 0.013 <string>:1(<module>)
1 0.000 0.000 0.013 0.013 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

What does cProfile's output "{built-in method load}" mean?

I can't understand what the {built-in method load} output in cProfile means.
I know that question was already asked but i saw no real answer. I couldn't find out by myself either.
The executed script (Go_IA_vs_IA.py) imports functions from Go_settings.py and use them. You can find them here if it can be any useful for you : https://github.com/Ashargin/Go
Here is what I obtain :
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
25 3.793 0.152 3.793 0.152 {built-in method load}
1 0.071 0.071 3.938 3.938 Go_IA_vs_IA.py:1(<module>)
28481/205 0.027 0.000 0.061 0.000 copy.py:137(deepcopy)
5930/918 0.010 0.000 0.046 0.000 copy.py:215(_deepcopy_list)
... more lines
Obviously I want to optimize that line as it contributes for 96% of the time spent.

Why is this lambda involving a list.index() call so slow?

Using cProfile:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 17.834 17.834 <string>:1(<module>)
1 0.007 0.007 17.834 17.834 basher.py:5551(_refresh)
1 0.000 0.000 10.522 10.522 basher.py:1826(RefreshUI)
4 0.024 0.006 10.517 2.629 basher.py:961(PopulateItems)
211 1.494 0.007 7.488 0.035 basher.py:1849(PopulateItem)
231 0.074 0.000 6.734 0.029 {method 'sort' of 'list' objects}
215 0.002 0.000 6.688 0.031 bosh.py:4764(getOrdered)
1910 3.039 0.002 6.648 0.003 bosh.py:4770(<lambda>)
253 0.178 0.001 5.600 0.022 bosh.py:3325(getStatus)
1 0.000 0.000 5.508 5.508 bosh.py:4327(refresh)
1911 3.051 0.002 3.330 0.002 {method 'index' of 'list' objects}
The 1910 3.039 0.002 6.648 0.003 bosh.py:4770(<lambda>) line puzzles me. At bosh.py:4770 I have modNames.sort(key=lambda a: (a in data) and data.index(a)), data and modNames being lists. Notice 1911 3.051 0.002 3.330 0.002 {method 'index' of 'list' objects} which seems to come from this line.
So why is this so slow ? Any way I can rewrite this sort() so it performs faster ?
EDIT: a final ingredient I was missing to grok this lambda:
>>> True and 3
3

As YardGlassOfCode stated, it's not the lambda per se which is slow, it is the O(n) operation inside the lambda which is slow. Both a in data and data.index(a) are O(n) operations, where n is the length of data. And as an additional affront to efficiency, the call to index repeats much of the work done in a in data too. If the items in data are hashable, then you can speed this up considerably by first preparing a dict:
weight = dict(zip(data, range(len(data))))
modNames.sort(key=weight.get) # Python2, or
modNames.sort(key=lambda a: weight.get(a, -1)) # works in Python3
This is much quicker because each dict lookup is a O(1) operation.
Note that modNames.sort(key=weight.get) relies on None comparing as less than integers:
In [39]: None < 0
Out[39]: True
In Python3, None < 0 raises an TypeError. So lambda a: weight.get(a, -1) is used to return -1 when a is not in weight.

Does effective Cython cProfiling imply writing many sub functions?

I am trying to optimize some code with Cython, but cProfile is not providing enough information.
To do a good job at profiling, should I create many sub-routines func2, func3,... , func40 ?
Note below that i have a function func1 in mycython.pyx, but it has many for loops and internal manipulations. But cProfile does not tell me stats for those loops .
2009 function calls in 81.254 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 81.254 81.254 <string>:1(<module>)
2 0.000 0.000 0.021 0.010 blah.py:1495(len)
2000 0.000 0.000 0.000 0.000 blah.py:1498(__getitem__)
1 0.214 0.214 0.214 0.214 mycython.pyx:718(func2)
1 80.981 80.981 81.216 81.216 mycython.pyx:743(func1)
1 0.038 0.038 81.254 81.254 {mycython.func1}
2 0.021 0.010 0.021 0.010 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

Yes, it does. The finest granularity available to cProfile is a function call. You must split up func1 into multiple functions. (Note that you can make them functions defined inside func1 and thus only available to func1.)
If you want finer-grained profiling (line-level), then you need a different profiler. Take a look at this line-level profiler, but I don't think it works for Cython.

You need to enable profiling support for your Cython code. Use
# cython: profile=True
http://docs.cython.org/src/tutorial/profiling_tutorial.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Garbage collection very slow, can't disable gc - python

Related

Why does python call builtins.compile when importing numpy?

How to time disassembled representation of a Python code source

What does cProfile's output "{built-in method load}" mean?

Why is this lambda involving a list.index() call so slow?

Does effective Cython cProfiling imply writing many sub functions?

Categories

Resources