I have a python script that reads files in multiple threads and saves the content to a DB.
This works fine in windows and stays at around 500M Memory Usage.
However the same program builds up memory usage to the maximum available value (14GB) and basically kills the machine.
Could this be a garbage collection problem?
Related
I have a python program and it was getting killed due to an out of memory error. I was using some deep recursion so i decided to use gc.collect() at the beginning to the method.
This solved the problem but and the program is no longer killed. Does anyone know why I had to call garbage collector and it wasn't automatically taken care of?
Now I worry that this is slowing down my program and I wonder is there a way to configure how/how often garbage collection is complete from outside my application? (without calling gc.collect() from the code)
PS. this was not a problem on my macos, only when I deployed the code to a linux VM on GCP
I have a python script that works fine on my main computer without problems. But when I uploaded it to the Ubuntu server it started crashing. I thought for a long time what the problem was and looked at the system logs. It turned out that ubuntu automatically forcibly terminates the script due to lack of memory (server configuration is 512 MB of RAM), how can I debug the program on the consumed memory in different work options?
Have a look at something like Guppy3, which includes heapy, a 'heap analysis toolset' that can help you find where the memory's being used/held. Some links to information on how to use it are in the project's README.
If you have a core, consider using https://github.com/vmware/chap, which will allow you to look at both python and native allocations.
Once you have opened the core, probably "summarize used" is a good place to start.
Have a question that I can’t seem to find the answer to. So I have been using a raspberry pi to automate some scripts for data pulling in SQL databases. One issue that came up a few times was my python process gets killed which from the logs it looks like it’s due to insufficient RAM. This is from a raspberry pi 3B+, so only 1gb of ram. My question then is, is there a difference between say running it on a 1gb OSX system? Is there like better RAM management like writing swap files to the hard drive that another operating system/CPU architecture would have in this scenario? Or is it being a python process that the operating system cannot affect directly like that?
Note: this is really just for my understanding of how these factors work. I am pretty sure writing the code to process the data in chunks would work as a workaround to the RPi.
I am working on a python script that pulls data from an Access database via ODBC, and pulls it into a sqllite database managed by django.
The script takes a fair while to run, and so I was investigating where the bottle necks are and noticed in Task Manager that when running the script python only has a relatively small CPU usage <5% (6 core 12 thread system) but "Antimalware Service Executable" and Windows explorer" jump from virtually nothing to 16% and 10% respectively.
I attempted to add exclusions to windows defender to the python directory, the source code directory, and the location of the Access DB file but this do not make any noticeable effect.
As a small amount of background, the script runs thousands of queries so IO will being accessed quite frequently.
Is there any troubleshooting I can do to diagnose why this is happening and/or if it affecting performance.
I'm looking at using inotify to watch about 200,000 directories for new files. On creation, the script watching will process the file and then it will be removed. Because it is part of a more compex system with many processes, I want to benchmark this and get system performance statistics on cpu, memory, disk, etc while the tests are run.
I'm planning on running the inotify script as a daemon and having a second script generating test files in several of the directories (randomly selected before the test).
I'm after suggestions for the best way to benchmark the performance of something like this, especially the impact it has on the Linux server it's running on.
I would try and remove as many other processes as possible in order to get a repeatable benchmark. For example, I would set up a separate, dedicated server with an NFS mount to the directories. This server would only run inotify and the Python script. For simple server measurements, I would use top or ps to monitor CPU and memory.
The real test is how quickly your script "drains" the directories, which depends entirely on your process. You could profile the script and see where it's spending the time.