Importing Large iGraph Memory Exhausted

Importing Large iGraph Memory Exhausted - python

I'm trying to import a large LGL file (~2GB) and I am attempting to import this in igraph using
graph = Graph.Read_Lgl("Biggraph.lgl")
The error it is throwing is
Traceback (most recent call last):
File "graph.py", line8, in <module>
graph = Graph.Read_Lgl("Biggraph.lgl")
igraph.core.InternalError: Error at foreign.c:359: Parse error in LGL file, line 9997 (memory exhausted), Parse Error
I'm unsure as to what exactly is going on here. The memory exhausted error is making me think that the memory allocated to python (or the underlying C) is being used up when trying to read the file, but it almost happens instantly, like it isn't even trying to do much. Maybe it's looking at the file size and saying 'woah, can't do that.'
Seriously though, I have no idea what is happening. What I assumed from iGraph is that it can handle extremely large graphs, and I dont think my graph is too large for it.
I did generate the lgl file myself, but I believe I have the syntax correct. This error doesn't really seem like there is a problem with my lgl file, but I could be wrong ("Parse error" kind of scares me).
I just figured I'd try here and see if anyone more keen on how iGraph operates would know how to quickly solve this problem (or extend the memory). Thanks.

For the record, the poster has found a bug in the igraph library and we are working on a fix right now. The problem is caused by a right-recursive rule in the bison parser specification for the LGL format. Once we have an official patch for it in the trunk of the project, I will post the URL of the patch here should others run into the same problem.
Update:
The URLs to the patches are:
http://bazaar.launchpad.net/~igraph/igraph/0.5-main/revision/1696 (for igraph 0.5.x)
http://bazaar.launchpad.net/~igraph/igraph/0.6-main/revision/2543 (for igraph 0.6)

Related

how do I know what is the source of Bus error in Python?

I am having a Bus error in a Python script.
I could believe there is some issue with the memory there, but I don't know exactly what the source is.
I would expect it to be possible somehow to trace the line of Python code where this happens (even without a full stack).
It does say "core dumped", but no core is dumped, and I am not sure if Python core dumps can easily be used with gdb or the like to trace the line of code where the error happened.
What are my options? The error at the moment is cryptic in the sense that I don't know where the faulty memory access happens or why.
(I should mention that I did try to investigate for answers online before asking this, but didn't find anything useful. Just small pieces of "I am stuck" kind of things. I am guessing this error is very rare.)
EDIT:
$ python3 --version
Python 3.9.12
Yes, I am using C/C++ libraries, I believe, like numpy, copy (not sure if it is C/C++), torch.
I am not sure it is a good idea to post the code here, as it is quite a long .py file, and I am not sure exactly where the code gives that error, especially without a core dump actually being written to the disk.
I will mention that there are several parts where I am slightly concerned about unnecessary memory use that accumulates (this part is called a lot):
This loop:
for key in self.parameters.keys():
new = self.parameters[key]['x'].parameter
self.parser.parameters[key]['x'].parameter = new.detach().clone()

shelve module even not creating shelf

I am new to stackoverflow and experimenting with Python, currently just trying tutorial examples. Experienced a wonderful learning curve but got completely stuck with the following (working under windows 10):
import shelve
s = shelve.open("test")
Traceback (most recent call last):
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\dbm\dumb.py", line 82, in _create
f = _io.open(self._datfile, 'r', encoding="Latin-1")
FileNotFoundError: [Errno 2] No such file or directory: 'test.dat'
It would be great to get some help to resolve this.
During handling of the above exception, another exception occurred:

In Python 3, by default, shelve.open tries to open an existing shelf for reading. You have to pass an explicit flag to create a new shelf if if doesn't already exist.
s = shelve.open("test", "c")
This is in contrast to Python 2, where the default flag was "c" instead of "r".

How to read an error message
In general, error messages will do their best to tell you what's wrong. In the case of python, you'll typically start at the bottom; here
No such file or directory: 'test.dat'
tells you exactly why the error's being thrown: test.dat doesn't exist.
Next you would read upward through the stack trace until we got to something that we either understood or had written recently, and we'd try to make sense of the error message from there.
How to troubleshoot an error
Is the stated problem intelligible?
Yes, we asked the software to do something with a (.dat?) file called "test", so we at least know what the hell the error message is talking about.
Do we agree with the underlying premise of the error?
Specifically, does it make sense that it should matter if test.dat exists or not? Chepner covers this.
Do we agree with the specific problem as stated?
For example, it wouldn't be weird at all to get such an error message when there was in fact such a file. Then we would have a more specific question: "Why can't the software find the file?" That's progress.
(Usually the answer would be either "Because it's looking in the wrong place" or "Because it doesn't have permission to access that file".)
Read the documentation for the tools and functions in question.
How can we validate either our own understanding of the situation, or the situation described in the error message?
Depending on the context this may involve some trial and error of re-writing our code to
print out (log) its state during execution
do something similar but different from what it was doing, which we're more certain should work.
do something similar but different from what it was doing, which we're more certain should not work.
Ask for help.

Seems shelve sometimes uses the dumbdbm to serialize.
Use dbm to use dbm instead:
import dbm
with dbm.open($filename, 'n') as db:
# read/write

Configure Django's test debugging to print shorter paths.

I'm not sure how to approach this, whether it's a Django, Python, or even Terminal solution that I can tweak.
The thing is I'm learning Django following a book (reference here, really like it), and whenever I run the tests, I get really long output in the terminal for debugging matters. Obviously there's many traceback functions that get called after another, but what started bugging me is that the file paths are very long and they all have the same project folder... which is long by itself, and then it adds all the virtualenv stuff like this:
Traceback (most recent call last):
File "home/user/code/projects/type_of_projects_like_hobby/my_project_application/this_django_version/virtualenv/lib/python3.6/site-packages/django/db/models/base.py", line 808, in save
force_update=force_update, update_fields=update_fields)
Since the paths take two or more lines, I can't focus on what functions I should be looking at clearly.
I have looked at the verbosity option when calling manage.py test but it doesn't help with the paths. If anyone has an idea on how to ~fix~ go about this issue, it'd be cool.
Thanks guys.

There's really not a way to change the behavior (this is how Python displays tracebacks). But you can pipe the output into something that will reformat it. For example, here's a tool you can pipe traceback output into that will do various types of formatting.

Python crashes in rare cases when running code - how to debug?

I have a problem that I seriously spent months on now!
Essentially I am running code that requires to read from and save to HD5 files. I am using h5py for this.
It's very hard to debug because the problem (whatever it is) only occurs in like 5% of the cases (each run takes several hours) and when it gets there it crashes python completely so debugging with python itself is impossible. Using simple logs it's also impossible to pinpoint to the exact crashing situation - it appears to be very random, crashing at different points within the code, or with a lag.
I tried using OllyDbg to figure out whats happening and can safely conclude that it consistently crashes at the following location: http://i.imgur.com/c4X5W.png
It seems to be shortly after calling the python native PyObject_ClearWeakRefs, with an access violation error message. The weird thing is that the file is successfully written to. What would cause the access violation error? Or is that python internal (e.g. the stack?) and not file (i.e. my code) related?
Has anyone an idea whats happening here? If not, is there a smarter way of finding out what exactly is happening? maybe some hidden python logs or something I don't know about?
Thank you

PyObject_ClearWeakRefs is in the python interpreter itself. But if it only happens in a small number of runs, it could be hardware related. Things you could try:
Run your program on a different machine. if it doesn't crash there, it is probably a hardware issue.
Reinstall python, in case the installed version has somehow become corrupted.
Run a memory test program.

Thanks for all the answers. I ran two versions this time, one with a new python install and my same program, another one on my original computer/install, but replacing all HDF5 read/write procedures with numpy read/write procedures.
The program continued to crash on my second computer at odd times, but on my primary computer I had zero crashes with the changed code. I think it is thus safe to conclude that the problems were HDF5 or more specifically h5py related. It appears that more people encountered issues with h5py in that respect. Given that any error in my application translates to potentially large financial losses I decided to dump HDF5 completely in favor of other stable solutions.

Use a try catch statement. This can be put into the program in order to stop the program from crashing when erroneous data is entered

writing large netCDF4 file with python?

I am trying to use the netCDF4 package with python. I am ingesting close to 20mil records of data, 28 bytes each, and then I need to write the data to a netCDF4 file. Yesterday, I tried doing it all at once, and after an hour or so of execution, python stopped running the code with the very helpful error message:
Killed.
Anyway, doing this with subsections of the data, it becomes apparent that somewhere between 2,560,000 records and 5,120,000 records, the code doesn't have enough memory and has to start swapping. Performance is, of course, greatly reduced. So two questions:
1) Anyone know how to make this work more effeciently? One thing I am thinking is to somehow put subsections of data in incrementally, instead of doing it all at once. Anyone know how to do that? 2) I presume the "Killed" message happened when memory finally ran out, but I don't know. Can anyone shed any light on this?
Thanks.
Addendum: netCDF4 provides an answer to this problem, which you can see in the answer I have given to my own question. So for the moment, I can move forward. But here's another question: The netCDF4 answer will not work with netCDF3, and netCDF3 is not gone by a long shot. Anyone know how to resolve this problem in the framework of netCDF3? Thanks again.

It's hard to tell what you are doing without seeing code, but you could try using the sync command to flush the data in memory to disk after some amount of data has been written to the file:
http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4.Dataset-class.html

There is a ready answer in netCDF4: declare the netCDF4 variable with some specified "chunksize". I used 10000, and everything proceeded very nicely. As I indicated in the edit to my answer, I would like to find a way to resolve this in netCDF3 also, since netDF3 is far from dead.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.