Multiprocessing memory limitation - python

Can anyone please shed some light into what's the highest memory which can be reached from multiprocessing?
I am running into an issue where multiprocessing.Queue fails by returning the following error when I run a program with arrays of size ~ 40 Gb. The same program works for the smaller arrays:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/queues.py", line 266, in _feed
send(obj)
SystemError: NULL result without error in PyObject_Call
So can anyone please share some information regarding the limitations for huge data?
Thank you

Related

Running chameleon clustering on python and meet "Segmentation fault (core dumped)", with detailed faulthandler info. Need help please~

I'm new in python and programming. Recently working on chameleon clustering algorithm. After running sample code:
python -i main.py
it can build kNN graph but when clustering start, it gives error:
Building kNN graph (k = 20)...
100%|████████████████████████████████████████| 788/788 [00:03<00:00, 214.07it/s]
Begin clustering...
Segmentation fault (core dumped)
Then I import faulthandler in the main.py. It gives:
Building kNN graph (k = 20)...
100%|████████████████████████████████████████| 788/788 [00:03<00:00, 217.10it/s]
Begin clustering...
Fatal Python error: Segmentation fault
Thread 0x00007f5b5d4f4700 (most recent call first):
File "/home/alex/anaconda3/lib/python3.8/threading.py", line 306 in wait
File "/home/alex/anaconda3/lib/python3.8/threading.py", line 558 in wait
File "/home/alex/anaconda3/lib/python3.8/site-packages/tqdm/_monitor.py", line 60 in run
File "/home/alex/anaconda3/lib/python3.8/threading.py", line 932 in _bootstrap_inner
File "/home/alex/anaconda3/lib/python3.8/threading.py", line 890 in _bootstrap
Current thread 0x00007f5b7cee9740 (most recent call first):
File "/home/alex/anaconda3/lib/python3.8/site-packages/metis.py", line 676 in _METIS_PartGraphKway
File "/home/alex/anaconda3/lib/python3.8/site-packages/metis.py", line 800 in part_graph
File "/home/alex/Downloads/chameleon_cluster-master/graphtools.py", line 63 in pre_part_graph
File "/home/alex/Downloads/chameleon_cluster-master/chameleon.py", line 82 in cluster
File "main.py", line 16 in <module>
Segmentation fault (core dumped)
I checked graphtool.py and chameleon.py, and couldn't fix the problem. Code used to run well at another computer with same version of Ubuntu 20.04.3 LTS, 64-bit.
Original code could be found at https://github.com/Moonpuck/chameleon_cluster.
Some help would be highly appreciated~
I was having the same issue. I think this github repo is using edges' weight with 1 / distance. If your dataset values is huge (i was using 2-dimensional dataset with 5000 rows and values was too huge), after few iterations, edges weights' propably will become too small. I've solved this issue with mean normalization on your dataset before start clustering. I can recommend to try the same solution.

'No Space Left on Device' when creating Semaphore?

I have recently started observing the following error with Python on a Macbook Pro (OS X 10.10). According to the disk utility, about one half of the 120 GB SSD drive remains available, so I suspect this is related not do disk but to some other filesystem property?
What factors control the amount of space available for semaphores? What can I do to fix this problem?
$ python -c 'import multiprocessing; multiprocessing.Semaphore()'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/rmcgibbo/miniconda/envs/3.4.2/lib/python3.4/multiprocessing/context.py", line 81, in Semaphore
return Semaphore(value, ctx=self.get_context())
File "/Users/rmcgibbo/miniconda/envs/3.4.2/lib/python3.4/multiprocessing/synchronize.py", line 127, in __init__
SemLock.__init__(self, SEMAPHORE, value, SEM_VALUE_MAX, ctx=ctx)
File "/Users/rmcgibbo/miniconda/envs/3.4.2/lib/python3.4/multiprocessing/synchronize.py", line 60, in __init__
unlink_now)
OSError: [Errno 28] No space left on device
(Note, this doesn't appear to depend on the version of python. Same error with 2.7.9)
According to MacOSX manuals, the sem_open system call used to create a semaphore can fail with ENOSPC:
[ENOSPC] O_CREAT is specified, the file does not exist, and
there is insufficient space available to create the
semaphore.
I suggest you use dtruss to find out where Python tries to create this semaphore.
Old post but for me the culprit was the tmpfs partition /dev/shm which was full.
Solution: increase its size (https://www.golinuxcloud.com/change-tmpfs-partition-size-redhat-linux/) or release some space.

Error calling Python module function in MySQL Workbench

I'm kind of at my wits end here, and so far have had no feedback from the MySQL Workbench bug reporting site, so I thought I'd throw this question/problem out to more sites.
I'm attempting to migrate from a MSSQL server on a Windows Server 2003 machine to MySQL server running on a Centos 6.5 VM. I can connect to the source and target databases, select a schemata, and runs through a pass through once for retrieving tables. After this the process fails and throws the following errors:
Traceback (most recent call last):
File "/usr/lib64/mysql-workbench/modules/db_mssql_grt.py", line 409, in reverseEngineer
reverseEngineerProcedures(connection, schema)
File "/usr/lib64/mysql-workbench/modules/db_mssql_grt.py", line 1016, in reverseEngineerProcedures
for idx, (proc_count, proc_name, proc_definition) in enumerate(cursor):
MemoryError
Traceback (most recent call last):
File "/usr/share/mysql-workbench/libraries/workbench/wizard_progress_page_widget.py", line 192, in thread_work
self.func()
File "/usr/lib64/mysql-workbench/modules/migration_schema_selection.py", line 160, in task_reveng
self.main.plan.migrationSource.reverseEngineer()
File "/usr/lib64/mysql-workbench/modules/migration.py", line 353, in reverseEngineer
self.state.sourceCatalog = self._rev_eng_module.reverseEngineer(self.connection, self.selectedCatalogName, self.selectedSchemataNames, self.state.applicationData)
SystemError: MemoryError(""): error calling Python module function DbMssqlRE.reverseEngineer
ERROR: Reverse engineer selected schemata: MemoryError(""): error calling Python module function DbMssqlRE.reverseEngineer
Failed
I thought this was initally a memory error, so I've upped the memory on the box to 16 GiB. This error also occurs on any size DBs, as I've tried very minimal sized ones with hardly any tables.
Any thoughts? Thanks for looking
Just in case anyone else runs into this. I had the same problem and fixed it by getting rid of non-ASCII characters in schemas, tables....basically all MSSQL objects. This was confounded by the fact that I had SQL# (www.sqlsharp.com) installed, which adds a number of functions and stored procs with a schema called SQL#. You can remove that with this command:
EXEC SQL#.SQLsharp_Uninstall
Once you get rid of non-ASCII chars, the migration works.
The OP (I assume) closed their bug report with this message:
[...] I figured out a work around, or the flaw in the system perhaps. Turns out that Null values were not allowed inside the Datetime fields when doing a migration. I turned every Datetime field in my database to a default value and migrated it successfully after that.

Applying SVD throws a Memory Error instantaneously?

I am trying to apply SVD on my matrix (3241 x 12596) that was obtained after some text processing (with the ultimate goal of performing Latent Semantic Analysis) and I am unable to understand why this is happening as my 64-bit machine has 16GB RAM. The moment svd(self.A) is called, it throws an error. The precise error is given below:
Traceback (most recent call last):
File ".\SVD.py", line 985, in <module>
_svd.calc()
File ".\SVD.py", line 534, in calc
self.U, self.S, self.Vt = svd(self.A)
File "C:\Python26\lib\site-packages\scipy\linalg\decomp_svd.py", line 81, in svd
overwrite_a = overwrite_a)
MemoryError
So I tried using
self.U, self.S, self.Vt = svd(self.A, full_matrices= False)
and this time, it throws the following error:
Traceback (most recent call last):
File ".\SVD.py", line 985, in <module>
_svd.calc()
File ".\SVD.py", line 534, in calc
self.U, self.S, self.Vt = svd(self.A, full_matrices= False)
File "C:\Python26\lib\site-packages\scipy\linalg\decomp_svd.py", line 71, in svd
return numpy.linalg.svd(a, full_matrices=0, compute_uv=compute_uv)
File "C:\Python26\lib\site-packages\numpy\linalg\linalg.py", line 1317, in svd
work = zeros((lwork,), t)
MemoryError
Is this supposed to be such a large matrix that Numpy cannot handle and is there something that I can do at this stage without changing the methodology itself?
Yes, the full_matrices parameter to scipy.linalg.svd is important: your input is highly rank-deficient (rank max 3,241), so you don't want to allocate the entire 12,596 x 12,596 matrix for V!
More importantly, matrices coming from text processing are likely very sparse. The scipy.linalg.svd is dense and doesn't offer truncated SVD, which results in a) tragic performance and b) lots of wasted memory.
Have a look at the sparseSVD package from PyPI, which works over sparse input and you can ask for top K factors only. Or try scipy.sparse.linalg.svd, though that's not as efficient and only available in newer versions of scipy.
Or, to avoid the gritty details completely, use a package that does efficient LSA for you transparently, such as gensim.
Apparently, as it turns out, thanks to #Ferdinand Beyer, I did not notice that I was using a 32-bit version of Python on my 64-bit machine.
Using a 64-bit version of Python and reinstalling all the libraries solved the problem.

Does sending a dictionary through a multiprocessing.queue mutate it somehow?

I have a setup where I send a dictionary through a multiprocessing.queue and do some stuff with it. I was getting an odd "dictionary size changed while iterating over it" error when I wasn't changing anything in the dictionary. Here's the traceback, although it's not terribly helpful:
Traceback (most recent call last):
File "/usr/lib/python2.6/multiprocessing/queues.py", line 242, in _feed
send(obj)
RuntimeError: dictionary changed size during iteration
So I tried changing the dictionary to an immutable dictionary to see where it was getting altered. Here's the traceback I got:
Traceback (most recent call last):
File "/home/jason/src/interface_dev/jiva_interface/jiva_interface/delta.py", line 54, in main
msg = self.recv()
File "/home/jason/src/interface_dev/jiva_interface/jiva_interface/process/__init__.py", line 65, in recv
return self.inqueue.get(timeout=timeout)
File "/usr/lib/python2.6/multiprocessing/queues.py", line 91, in get
res = self._recv()
File "build/bdist.linux-i686/egg/pysistence/persistent_dict.py", line 22, in not_implemented_method
raise NotImplementedError, 'Cannot set values in a PDict'
NotImplementedError: Cannot set values in a PDict
This is a bit odd, because as far as I can tell, I'm not doing anything other than getting it from the queue. Could someone shed some light on what's happening here?
There was a bug fixed quite recently where a garbage collection could change the size of a dictionary that contained weak references and that could trigger the "dictionary changed size during iteration" error. I don't know if that is your problem but the multiprocessing package does use weak references.
See http://bugs.python.org/issue7105

Categories

Resources