Process exits with status 139 when trying to instantiate a SentenceTransformer - python

Description
Hi all, I'm running into a strange error while trying to run the following:
import sentence_transformers as st
encoder = st.SentenceTransformer("all-mpnet-base-v2") # also tried sentence-t5-base and all-MiniLM-L6-v2
I get an error: Process finished with exit code 139 (interrupted by signal 11: SIGSEGV), and if I run it as a script, I get zsh: segmentation fault python. I also get a warning: multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appears to be 1 leaked semaphore objects to clean up at shutdown.
Reproducibility
I'm afraid I don't know how to reproduce this problem, it was working until it stopped working. Here are my specs:
MacBook Pro 2019, Intel Core i9, 32GB RAM
macOS Monterey 12.0.1
python 3.10.2
sentence_transformers==2.2.0
torch==1.11.0
What I've tried so far
clear the cache from ~/.cache/torch/sentence_transformers
reinstall sentence_transformers (also tried downgrading to 2.1.0)
reinstalling torch
recreating my virtual environment
rebooting
I've also managed to trace this problem to the following point in the torch codebase.
I've also started a discussion in this GitHub issue
Any help would be appreciated, thanks in advance!
Update: I managed to fix the problem by downgrading to Python 3.9.11. I'm not actually sure whether the Python version is to be blamed here, or perhaps just having a new Python environment did the trick (perhaps the virtual environment wasn't enough). Anyway, this isn't really a solution but a workaround, so if anyone has any better suggestions, then please let us know!

Related

Cant use ipynb files on jupyter/vsc

its my first post so do tell me if you require more specifics.
some details may not be too relevant here but i want to give to be as detailed as possible with the timeline here:
I have been using jupyter for my ipynb files for quite some time until i discovered tensorflow, at first it was going ok after installing the module but ever since i tried to use tensorflow to detect and utilise my gpu everything just went south from there. i tried things like downloading some nvidia stuff that my laptop does in fact support, and eventually got my tensorflow to detect my gpu. But the moment i tried to train my model with cnn, however simple the layers are, my kernel will crash. Eventually i used kaggle/colab as temporary solution but now i want to fix it.
After trying to fix the issue of tensorflow/revert back to when tensorflow runs just fine with only my cpu to no avail, i eventually decided to do a hard reset and deleted python/anaconda entirely from my computer.
After installing anaconda back. I booted up jupyter to see that there is a python3 ipykernel that is most likely preinstalled when i downloaded anaconda and i can run a simple hello world just fine. However i realise that after pip installing tensorflow my 'old' settings of tensorflow is still there and can detect my gpu, and hence kernel will crash yet again.
So, i thought why not just make a completely new environment so i can 100% install a new and fresh version of tensorflow. Then i realised that jupyter couldnt exactly detect the new environment that i made (idk if its cause of ipykernel but i did do a pip install ipykernel in the correct environment and its still not detected).
My next solution was to try to use vsc, so i used vsc and managed to detect the new environment but when running a print('hello world') i was told that 'The kernel failed to start as a dll could not be loaded. View Jupyter log for further details.'
I'm really lost as to what to do now, all i want to do at this point of time is to use tensorflow (whether cpu or gpu i rlly dont care anymore) in either vsc/jupyter. As long as my files are .py, i should be able to run it with any environment just fine ( though i didnt test with tensorflow module on py files because i dont see a point in training a model on a py file)
I use windows 10 if that helps
Im sorry if i gave unneccessary details. I would appreciate if i get some advice in anything im doing wrong/have a misunderstanding of/solutions and please do try to dumb it down for me with appropriate explanations if possible... thanks...... i can also be contacted on a voice call in discord if you think typing it is too much of a hassle

Using non-macOS version of tensorflow on mac m1

I recently purchased a new mac laptop with the M1 core. I've previously done work using tensorflow, and am trying to continue with my new laptop. I'm aware that apple has a tensorflow_macos version of tensorflow, but unfortunately for my project I need higher order derivatives, which the fork is not currently capable of doing. Is there a way to run the non_mac specific version of tensorflow on a M1?
If I attempt to import tensorflow from an arm terminal, I get the error:
python3
import tensorflow
zsh: illegal hardware instruction python3
The same happens if I run from a Rosetta terminal.
I've also tried running from PyCharm (with a Python 3.8 virtual environment), but I receive a different problem on trying to import tensorflow:
Process finished with exit code 132 (interrupted by signal 4: SIGILL)
I have python3.8.5 installed. I've seen on other related questions that there should be two versions of python3 installed by default on mac (arm and x86_64), but it looks like I only have one version:
file $(which python3)
/Users/xxx/opt/miniconda3/bin/python3: Mach-O 64-bit executable x86_64
I'm inexperienced when it comes to handling different core architectures, so any advice would be appreciated. I'm happy to provide clarification for any specifics.

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

I'm trying to execute a Python script, but I am getting the following error:
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
I'm using python 3.5.2 on a Linux Mint 18.1 Serena OS
Can someone tell me why this happens, and how can I solve?
The SIGSEGV signal indicates a "segmentation violation" or a "segfault". More or less, this equates to a read or write of a memory address that's not mapped in the process.
This indicates a bug in your program. In a Python program, this is either a bug in the interpreter or in an extension module being used (and the latter is the most common cause).
To fix the problem, you have several options. One option is to produce a minimal, self-contained, complete example which replicates the problem and then submit it as a bug report to the maintainers of the extension module it uses.
Another option is to try to track down the cause yourself. gdb is a valuable tool in such an endeavor, as is a debug build of Python and all of the extension modules in use.
After you have gdb installed, you can use it to run your Python program:
gdb --args python <more args if you want>
And then use gdb commands to track down the problem. If you use run then your program will run until it would have crashed and you will have a chance to inspect the state using other gdb commands.
Another possible cause (which I encountered today) is that you're trying to read/write a file which is open. In this case, simply closing the file and rerunning the script solved the issue.
After some times I discovered that I was running a new TensorFlow version that gives error on older computers. I solved the problem downgrading the TensorFlow version to 1.4
When I encounter this problem, I realize there are some memory issues. I rebooted PC and solved it.
This can also be the case if your C-program (e.g. using cpython is trying to access a variable out-of-bound
ctypedef struct ReturnRows:
double[10] your_value
cdef ReturnRows s_ReturnRows # Allocate memory for the struct
s_ReturnRows.your_value = [0] * 12
will fail with
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
For me, I was using the OpenCV library to apply SIFT.
In my code, I replaced cv2.SIFT() to cv2.SIFT_create() and the problem is gone.
Deleted the python interpreter and the 'venv' folder solve my error.
I got this error in PHP, while running PHPUnit. The reason was a circular dependency.
I received the same error when trying to connect to an Oracle DB using the pyodbc module:
connection = pyodbc.connect()
The error occurred on the following occasions:
The DB connection has been opened multiple times in the same python
file
While in debug mode a breakpoint has been reached
while the connection to the DB being open
The error message could be avoided with the following approaches:
Open the DB only once and reuse the connection at all needed places
Properly close the DB connection after using it
Hope, that will help anyone!
11 : SIGSEGV - This signal is arises when an memory segement is illegally accessed.
There is a module name signal in python through which you can handle this kind of OS signals.
If you want to ignore this SIGSEGV signal, you can do this:
signal.signal(signal.SIGSEGV, signal.SIG_IGN)
However, ignoring the signal can cause some inappropriate behaviours to your code, so it is better to handle the SIGSEGV signal with your defined handler like this:
def SIGSEGV_signal_arises(signalNum, stack):
print(f"{signalNum} : SIGSEGV arises")
# Your code
signal.signal(signal.SIGSEGV, SIGSEGV_signal_arises)
I encountered this problem when I was trying to run my code on an external GPU which was disconnected. I set os.environ['PYOPENCL_CTX']=2 where GPU 2 was not connected. So I just needed to change the code to os.environ['PYOPENCL_CTX'] = 1.
For me these three lines of code already reproduced the error, no matter how much free memory was available:
import numpy as np
from sklearn.cluster import KMeans
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=1, random_state=0).fit(X)
I could solve the issue by removing an reinstalling the scikit-learn package. A very similar solution to this.
This can also occur if trying to compound threads using concurrent.futures. For example, calling .map inside another .map call.
This can be solved by removing one of the .map calls.
I had the same issue working with kmeans from scikit-learn.
Upgrading from scikit-learn 1.0 to 1.0.2 solved it for me.
This issue is often caused by incompatible libraries in your environment. In my case, it was the pyspark library.
In my case, reverting my most recent conda installs fixed the situation.
I got this error when importing monai. It was solved after I created a new conda environment. Possible reasons I could imagine were either that there were some conflict between different packages, or maybe that my environment name was the same as the package name I wanted to import (monai).
found on other page.
interpreter: python 3.8
cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
this solved issue for me.
i was getting SIGSEGV with 2.7, upgraded my python to 3.8 then got different error with OpenCV. and found answer on OpenCV 4.0.0 SystemError: <class 'cv2.CascadeClassifier'> returned a result with an error set.
but eventually one line of code fixed it.

Centos 6.6 segmentation fault in Anaconda3 with glibc 2.14

I've been trying to understand this situation:
I want to use python packages in Anaconda 3 that requires glibc 2.14. As Centos 6.x only uses glibc 2.12 I've compiled glibc 2.14 and installed to /opt/glibc-2.14.
I'm installing Anaconda3. The test I run looks like this:
With system default glibc it works:
/opt/anaconda3/bin/python -c "import pandas"
but with compiled glibc
export LD_LIBRARY_PATH=/opt/glibc-2.14/lib/:$LD_LIBRARY_PATH
/opt/anaconda3/bin/python -c "import pandas"
it works on some machines... I installed over 20 VM and on some machines it works always and on some it never works and I receive: Segmentation fault (core dumped). On most of the machines it doesn't work.
Does anyone have any idea why this strange situation occures? Or maybe experienced this problems with
Does anyone have any idea why this strange situation occures
As this answer explains, what you are doing is not supposed to work: you have a mismatch between ld-linux and libc.so.6.
After some more investigating I've noticed that assigning more memory to lab machines (from 2/4 GB to 6 and more) make the segmentation fault error to go away. However problem still exist on production machine with 32 GB of RAM. Really strange.
Right now I found workaround which were the new python packages from anaconda that are compatibile with glibc 2.12 (become available few days ago) and packages' dependencies also does not require newer glibc.
#Employed Russian:
Thanks, but probabaly not the problem with Multiple glibc libraries on a single host. In my case Python works with additional glibc. The problem is that the segmentation fault shows up on random machines while only using new glibc. Also I'm using other Python packages that requires glibc 2.14 to work, so I'm aware which version of glibc I'm using at the moment.
Also if there were some kind of mismatch in libraries then it shouldn't work at all (...probably).
Also as I mentioned at the beginning I've noticed that the problem is related to memory (still not sure what is happening with 32 GB of RAM machine).
One more thing: I'm not compiling python packages, so changing of compilator options od 'myapp' (python package) is not an option.
Appreciate your answers though.

urllib3 segfault (core dumped)

I'm getting a segfault ("Illegal operation (core dumped)") for a python program that I've run every week without fault for ages. I'm also running Ubuntu on Nitrous. I recall dealing with these yonks ago when coding in C, and I haven't had to deal with them very much recently.
Importing the library urllib3 seems to be causing the problem. Does anyone know a fix?
Also, can someone advise or link to the best workflow for diagnosing these problems in future?
Thanks!
"Illegal operation"
This usually means that you are running code compiled for a more capable processor (e.g. Haswell) on a less capable one (e.g. Ivy Bridge).
Importing the library urllib3 seems to be causing the problem.
On my Ubuntu machine, import urllib3 loads libssl.so.1.0.0, libcrypto.so.1.0.0 and _ssl.x86_64-linux-gnu.so. These crypto libraries are very likely to be compiled with AVX, AVX2, etc. instructions which your processor may not support.
best workflow for diagnosing these problems
Your first step should be to find out which instruction is causing the SIGILL. To do so, run:
gdb python
(gdb) run
>>> import urllib3 # do whatever is necessary to reproduce SIGILL
(gdb) x/i $pc
(gdb) info sym $pc
The last two commands above should give you the instruction that is causing the SIGILL, and the library in which that instruction is used. Once you know what that instruction is, you can verify that your processor doesn't support it, and contact the distributor of the "guilty" library to get a different compilation (one without using instructions that are not supported by your CPU).

Categories

Resources