Python processes and "could not allocate memory"

Python processes and "could not allocate memory" - python

In my python code I use grab python lib and try to open 3 sites in ope python script
My current code:
from grab import Grab
...
g = Grab(interface=ip, headers=headers)
a = g.go('http://site1.com');
g = Grab(interface=ip, headers=headers)
b = g.go('http://site2.com');
g = Grab(interface=ip, headers=headers)
c = g.go('http://site3.com');
And this code working fine if I run even 10 python scripts
But I decide that better for me to open all connections in same time, (no wait when site "a" will be loaded before open site "b") And I tried to make processes:
pa = Process(target=m_a, args=(ip))
pb = Process(target=m_b, args=(ip))
pc = Process(target=m_c, args=(ip))
pa.start()
pb.start()
pc.start()
But when I try to run more than 5 python processes I see "could not allocate memory" message.
Why this code working in one python file, and "could not allocate memory" when I try to run it by processes for each site request?
Actuality I already use python process for running this python script,
and my name != 'main' .
In first python (which run this script) I use this code:
if __name__ == '__main__':
jobs = []
for f in [exit_error, exit_ok, return_value, raises, terminated]:
print 'Starting process for', f.func_name
j = multiprocessing.Process(target=f, name=f.func_name)
jobs.append(j)
j.start()
I use VPS OpenVZ 512
Error Report:
Process Process-18:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/root/_scripts/bf/check_current.py", line 140, in worker
p.start()
File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/usr/lib/python2.7/multiprocessing/forking.py", line 120, in __init__
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

If the processes are running in parallel then you may indeed be running out of RAM. Open Task Manager or its equivalent and check your total allocated memory as you run.

I found good command for set limit of memory using for python processes
ulimit -s 2000

Related

Too many open files error with python multiprocessing

I'm having multiple problems with a python (v3.7) script using multiprocessing (as mp hereafter). One of them is that my computations end with an "OSError: [Errno 24] Too many open files". My scripts and modules are complex, so I've broken down the problem to the following code:
def worker(n):
time.sleep(1)
n = 2000
procs = [mp.Process(target=worker, args=(i,)) for i in range(n)]
nprocs = 40
i = 0
while i<n:
if (len(mp.active_children())<=nprocs):
print('Starting proc {:d}'.format(i))
procs[i].start()
i += 1
else:
time.sleep(1)
[p.join() for p in procs]
This code fails when approx ~ 1020 processes have been excecuted. I've always used multiprocessing in a similar fashion without running into this problem, I'm running this on a serveur with ~ 120 CPU. Lately I've switch from Python 2.7 to 3.7, I don't know if that can be an issue.
Here's the full trace:
Traceback (most recent call last):
File "test_toomanyopen.py", line 18, in <module>
procs[i].start()
File "/p/jqueryrel/local_install/conda_envs/trois/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/p/jqueryrel/local_install/conda_envs/trois/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/p/jqueryrel/local_install/conda_envs/trois/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/p/jqueryrel/local_install/conda_envs/trois/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/p/jqueryrel/local_install/conda_envs/trois/lib/python3.7/multiprocessing/popen_fork.py", line 69, in _launch
parent_r, child_w = os.pipe()
OSError: [Errno 24] Too many open files
I've seen a similar issue here, but I don't see how I can solve this.
Thanks

To put the comments into an answer, several options to fix this:
Increase the limit of possible open file handles. Edit /etc/security/limits.conf. E.g. see here.
Don't spawn so many processes. If you have 120 CPUs, it doesn't really make sense to spawn more than 120 procs.
Maybe using Pool might be helpful to restructure your code.

Python psutil.wait raise timeout without good reason

I'm facing a strange situation, I've searched on google without any good results.
I'm running a python script as a subprocess from a parent subprocess with nohup using subprocess package:
cmd = list()
cmd.append("nohup")
cmd.append(sys.executable)
cmd.append(os.path.abspath(script))
cmd.append(os.path.abspath(conf_path))
_env = os.environ.copy()
if env:
_env.update({k: str(v) for k, v in env.items()})
p = subprocess.Popen(cmd, env=_env, cwd=os.getcwd())
After some time the parent process exists and the subprocess (the one with the nohup continues to run).
After another minute or two the process with the nohup exits, and with obvious reasons, becomes a zombie.
When running it on local PC with python3.6 and ubuntu 18.04, I manage to run the following code and everything works like a charm:
comp_process = psutil.Process(pid)
if comp_process.status() == "zombie":
comp_status_code = comp_process.wait(timeout=10)
As I said, everything works like a charm, The zombie process removed and I got the status code of the mentioned process.
But for some reason, when doing the SAME at docker container with the SAME python version and Ubuntu version, It fails after the timeout (Doesn't matter if its 10 seconds or 10 minutes)
The error:
psutil.TimeoutExpired timeout after 60 seconds (pid=779)
Traceback (most recent call last): File
"/usr/local/lib/python3.6/dist-packages/psutil/_psposix.py", line 84,
in wait_pid
retpid, status = waitcall() File "/usr/local/lib/python3.6/dist-packages/psutil/_psposix.py", line 75,
in waitcall
return os.waitpid(pid, os.WNOHANG) ChildProcessError: [Errno 10] No child processes
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File ".py", line 41, in
run
comp_status_code = comp_process.wait(timeout=60) File "/usr/local/lib/python3.6/dist-packages/psutil/init.py", line
1383, in wait
return self._proc.wait(timeout) File "/usr/local/lib/python3.6/dist-packages/psutil/_pslinux.py", line
1517, in wrapper
return fun(self, *args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/psutil/_pslinux.py", line
1725, in wait
return _psposix.wait_pid(self.pid, timeout, self._name) File "/usr/local/lib/python3.6/dist-packages/psutil/_psposix.py", line 96,
in wait_pid
delay = check_timeout(delay) File "/usr/local/lib/python3.6/dist-packages/psutil/_psposix.py", line 68,
in check_timeout
raise TimeoutExpired(timeout, pid=pid, name=proc_name) psutil.TimeoutExpired: psutil.TimeoutExpired timeout after 60 seconds
(pid=779)

One possibility may be the lack of an init process to reap zombies. You can fix this by running with docker run --init, or using e.g. tini. See https://hynek.me/articles/docker-signals/

SLURM task fails when creating an instance of the Dask LocalCluster in an HPC cluster

I'm queuing a task with the command sbatch and the next configuration:
#SBATCH --job-name=dask-test
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --mem=80G
#SBATCH --time=00:30:00
#SBATCH --tmp=10G
#SBATCH --partition=normal
#SBATCH --qos=normal
python ./dask-test.py
The python script is, more or less, as follows:
import pandas as pd
import dask.dataframe as dd
import numpy as np
from dask.distributed import Client, LocalCluster
print("Generating LocalCluster...")
cluster = LocalCluster()
print("Generating Client...")
client = Client(cluster, processes=False)
print("Scaling client...")
client.scale(8)
data = dd.read_csv(
BASE_DATA_SOURCE + '/Data-BIGDATFILES-*.csv',
delimiter=';',
)
def get_min_dt():
min_dt = data.datetime.min().compute()
print("Min is {}".format())
print("Getting min dt...")
get_min_dt()
The first problem is that the texts "Generating LocalCluster..." is printed 6 times, which makes me wonder if the script is running multiple times concurrently.
Secondly, after some minutes of printing nothing, I receive the following messages:
/anaconda3/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37396 instead
http_address["port"], self.http_server.port
many times.. and finally the next one, also many times:
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /cluster/home/user/anaconda3/lib/python3.7/asyncio/tasks.py:592> exception=RuntimeError('\n An attempt has been made to start a new process before the\n current process has finished its bootstrapping phase.\n\n This probably means that you are not using fork to start your\n child processes and you have forgotten to use the proper idiom\n in the main module:\n\n if __name__ == \'__main__\':\n freeze_support()\n ...\n\n The "freeze_support()" line can be omitted if the program\n is not going to be frozen to produce an executable.')>
Traceback (most recent call last):
File "/cluster/home/user/anaconda3/lib/python3.7/asyncio/tasks.py", line 599, in _wrap_awaitable
return (yield from awaitable.__await__())
File "/cluster/home/user/anaconda3/lib/python3.7/site-packages/distributed/core.py", line 290, in _
await self.start()
File "/cluster/home/user/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
response = await self.instantiate()
File "/cluster/home/user/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
result = await self.process.start()
File "/cluster/home/user/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 575, in start
await self.process.start()
File "/cluster/home/user/anaconda3/lib/python3.7/site-packages/distributed/process.py", line 34, in _call_and_set_future
res = func(*args, **kwargs)
File "/cluster/home/user/anaconda3/lib/python3.7/site-packages/distributed/process.py", line 202, in _start
process.start()
File "/cluster/home/user/anaconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/cluster/home/user/anaconda3/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/cluster/home/user/anaconda3/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/cluster/home/user/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/cluster/home/user/anaconda3/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/cluster/home/user/anaconda3/lib/python3.7/multiprocessing/spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "/cluster/home/user/anaconda3/lib/python3.7/multiprocessing/spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
I already tried adding more cores, more memory, setting processes=False at the instantiation of Client, and many other things but I can't figure it out what's the problem.
The used library/software versions are:
Python 3.7
Pandas 1.0.5
Dask 2.19.0
slurm 17.11.7
Am I setting something wrong? Is the way of using local cluster and client structures correct?

After some research, I could get a solution. Not too sure about the cause but very sure it works.
The instantiation of LocalCluster, Client and all the code after it (the code that will be distributed executed) must NOT be at module level of the Python script. Instead, this code must be in a method or inside the __main__ block, as follows:
import pandas as pd
import dask.dataframe as dd
import numpy as np
from dask.distributed import Client, LocalCluster
if __name__ == "__main__":
print("Generating LocalCluster...")
cluster = LocalCluster()
print("Generating Client...")
client = Client(cluster, processes=False)
print("Scaling client...")
client.scale(8)
data = dd.read_csv(
BASE_DATA_SOURCE + '/Data-BIGDATFILES-*.csv',
delimiter=';',
)
def get_min_dt():
min_dt = data.datetime.min().compute()
print("Min is {}".format())
print("Getting min dt...")
get_min_dt()
This simple change makes a difference. The solution was found in that issues thread: https://github.com/dask/distributed/issues/2520#issuecomment-470817810

Strange behavior with multiple instances of python script still active after interruption

I have a complex python script. Inside a loop I call a function with multiprocessing and inside that function I call an external program (pdfinfo) with subprocess popen.
My program runs for a while I can see the VIRT memory steadily increasing (with the top command) until after sometime the system runs out of memory and shows this message:
Traceback (most recent call last):
File "classify_pdf.py", line 603, in <module>
preprocessing_list[loop] = da.get_preprocessing_data(batch_files, metadata, cores)
File "/home/student/.../src/data.py", line 87, in get_preprocessing_data
properties = fp.pre_extract_pdf_properties(batch_files, cores)
File "/home/student/.../src/features/pdf_properties.py", line 73, in pre_extract_pdf_properties
pool = Pool(num_cores)
File "/usr/lib/python3.5/multiprocessing/context.py", line 118, in Pool
context=self.get_context())
File "/usr/lib/python3.5/multiprocessing/pool.py", line 168, in __init__
self._repopulate_pool()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 233, in _repopulate_pool
w.start()
File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
After interrupting the process with Crtl-C there are still many of the python processes still running like this (I show them with ps aux | grep ptyhon). Thousands even and they even remain when I close the session with the server and log back in.
user1+ 53872 0.0 0.0 5444552 0 ? S Aug29 0:00 python classify_pdf.py -fp /data/allfiles/ -repo
user1+ 53873 0.0 0.0 5444552 0 ? S Aug29 0:00 python classify_pdf.py -fp /data/allfiles/ -repo
user1+ 53876 0.0 0.0 5444552 0 ? S Aug29 0:00 python classify_pdf.py -fp /data/allfiles/ -repo
But how come there are still so many processes still alive even after I interrupt my script? Does it have something to do with using multiprocessing and a subprocess inside a loop? Is the fork for popen creating additional processes? but why won't they end?
BTW, the part of the code where this happens is
pool = Pool(num_cores)
res = pool.map(pdfinfo_get_pdf_properties, files)
pool.close()
pool.join()
res_fix={}
for x in res:
res_fix[splitext(basename(x[1]))[0]] = x[0]
return res_fix
and inside pdfinfo_get_pdf_properties this is called
output = subprocess.Popen(["pdfinfo", file_path],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE).communicate()[0].decode(errors='ignore')

commands.getoutput() raises Errno 12 - Cannot allocate memory

I have a Python script which runs on a server with RHEL5. The server has 32GB memory and 8 Intel Xeon CPUs at 2.83GHz. I think the hardware resource should not be a problem, but when I attempt to upload and process a 15 million line text file, the program gives me an error message:
Traceback (most recent call last):
File "./test.py", line 511, in <module>
startup()
File "./test.py", line 249, in startup
cmdoutput = commands.getoutput(cmd_file)
File "/usr/local/lib/python2.6/commands.py", line 46, in getoutput
return getstatusoutput(cmd)[1]
File "/usr/local/lib/python2.6/commands.py", line 55, in getstatusoutput
pipe = os.popen('{ ' + cmd + '; } 2>&1', 'r')
OSError: [Errno 12] Cannot allocate memory
I have investigated this problem and did not found any answers that exactly match my problem. Those answers were focused on the "popen" subroutine, but I do not use this routine. I just use the "commands.getoutput()" to display the file type of a document.
It should be noted that if I try to process a 10 million line text, this problem does not happen. It only happens when the text file is large.
Can any people help me out on this issue? The answer could be a better module other than the "command.getoutput()". Thanks!

your command might consume too much memory. To check, run it with the large file from a console without python to see if you get any errors
your command might generate too much output. To check, run:
subprocess.check_call(["cmd", "arg1", "arg2"])
if it succeeds then you should read output incrementally and discard the processed output e.g. line by line:
p = subprocess.Popen(["cmd", "arg1", "arg2"], stdout=subprocess.PIPE)
for line in iter(p.stdout.readline, ''):
# do something with line
print line,
p.stdout.close()
exit_code = p.wait() # wait for the process to exit

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python processes and "could not allocate memory" - python

If the processes are running in parallel then you may indeed be running out of RAM. Open Task Manager or its equivalent and check your total allocated memory as you run.

I found good command for set limit of memory using for python processes ulimit -s 2000

Related

Too many open files error with python multiprocessing

Python psutil.wait raise timeout without good reason

SLURM task fails when creating an instance of the Dask LocalCluster in an HPC cluster

Strange behavior with multiple instances of python script still active after interruption

commands.getoutput() raises Errno 12 - Cannot allocate memory

Categories

Resources