How to solve OdbError in Abaqus Python script? - python

I am running a 3D solid model in Abaqus Python script, which is supposed to be analyzed for 200 times as the model has been arranged in a for loop (for i in range(0,199):). Sometimes, I receive the following error and then the analysis terminates. I can't realize the reason.
Odb_0=session.openOdb(name='Job-1'+'.odb')
odberrror: the .lck file for the output database D:/abaqus/Model/Job-1.odb indicates that the analysis Input File Processor is currently modifying the database. The database cannot be opened at this time.
It is noted that all the variables including "Odb_0" and .... are deleted at the end of each step of the loop prior to starting the further one.

I don't believe your problem will be helped by a change in element type.
The message and the .lck file say that there's an access deadlock in the database. The output file lost out and cannot update the .odb database.
I'm not sure what database Abaqus uses. I would have guessed that the input stream would have scanned the input file and written whatever records were necessary to the database before the solution and output processing began.

From the Abaqus documentation
The lock file (job_name.lck) is written whenever an output database file is opened with write access, including when an analysis is running and writing output to an output database file. The lock file prevents you from having simultaneous write permission to the output database from multiple sources. It is deleted automatically when the output database file is closed or when the analysis that creates it ends.
When you are deleting your previous analysis you should be sure that all processes connected with that simulation have been terminated. There are several possibilities to do so:
Launching simulation through subprocess.popen could give you much more control over the process (e.g. waiting until it ends, writing of a specific log, etc.);
Naming your simulations differently (e.g. 'Job-1', 'Job-2', etc.) and deleting old ones with a delay (e.g. deleting 'Job-1' while 'Job-3' has started);
Less preferable: using the time module

Related

Routing Python Logs to Databases Efficiently

I want to route some logs from my application to a database. Now, I know that this isn't exactly the ideal way to store logs, but my use case requires it.
I have also seen how one can write their own database logger as explained here,
python logging to database
This looks great, but given that a large number of logs are generated from an application, I feel like sending as many requests to the database could maybe overwhelm it? It may not be the most efficient solution?
Given that this argument is correct, what are some efficient methods for achieving this?
Some ideas that come to mind are,
Write the logs out to a log file during application run time and develop a script that will parse the file and make bulk inserts to a database.
Build some kind of queue architecture that the logs will be routed to, where each record will be inserted to the database in sequence.
Develop a type of reactive program, that will run in the background and route logs to the database.
etc.
What are some other possibilities that can be explored? Are there any best practices?
The rule of thumb is that DB throughput will be greater
if you can batch N row inserts into a single commit,
rather than doing N separate commits.
Have your app append to a structured log file, such as a .CSV
or an easily parsed logfile format.
Be sure to .flush() before sleeping for a while,
so recent output will be visible to other processes.
Consider making a call to .fsync() every now and again
if durability following power fail matters to the app.
Now you have timestamped structured logs that are safely stored
in the filesystem. Clearly there are other ways, such as 0mq
or Kafka, but FS is simplest and plays nicely with unit tests.
During interactive debugging you can tail -f the file.
Now write a daemon that tail -f's the file and copies new
records to the database. Upon reboot it will .seek() to end
after perhaps copying any trailing lines that are missing from DB.
Use kqueue -style events, or poll every K seconds and then sleep.
You can .stat() the file to learn its current length.
Beware of partial lines, where last character in file is not newline.
Consume all unseen lines, BEGIN a transaction, INSERT each line,
COMMIT the DB transaction, resume the loop.
When you do log rolling, avoid renaming logs.
Prefer log filenames that contain ISO8601 timestamps.
Perhaps you settle on daily logs.
Writer won't append lines past midnight, and will move on
to the next filename. Daemon will notice the newly created
file and will .close() the old one, with optional delete
of ancient logs more than a week old.
Log writers might choose to prepend a hashed checksum
to each message, so the reader can verify it receieved
the whole message intact.
A durable queue like Kafka certainly holds some attraction,
but has more moving pieces.
Maybe implement FS logging, with unit tests, and then
use what you've already learned about the application, when
you refactor to employ a more sophisticated message queueing API.

Good practice for parallel tasks in python

I have one python script which is generating data and one which is training a neural network with tensorflow and keras on this data. Both need an instance of the neural network.
Since I haven't set the flag "allow growth" each process takes the full GPU memory. Therefore I simply give each process it's own GPU. (Maybe not a good solution for people with only one GPU... yet another unsolved problem)
The actual problem is as follow: Both instances need access to the networks weights file. I recently had a bunch of crashes because both processes tried to access the weights. A flag or something similar should stop each process from accessing it, whilst the other process is accessing. Hopefully this doesn't create a bottle neck.
I tried to come up with a solution like semaphores in C, but today I found this post in stack-exchange.
The idea with renaming seems quite simple and effective to me. Is this good practice in my case? I'll just create the weight file with my own function
self.model.save_weights(filepath='weights.h5$$$')
in the learning process, rename them after saving with
os.rename('weights.h5$$$', 'weights.h5')
and load them in my data generating process with function
self.model.load_weights(filepath='weights.h5')
?
Will this renaming overwrite the old file? And what happens if the other process is currently loading? I would appreciate other ideas how I could multithread / multiprocess my script. Just realized that generating data, learn, generating data,... in a sequential script is not really performant.
EDIT 1: Forgot to mention that the weights are stored in a .h5 file by keras' save function
The multiprocessing module has a RLock class that you can use to regulate access to a sharded resource. This also works for files if you remember to acquire the lock before reading and writing and release it afterwards. Using a lock implies that some of the time one of the processes cannot read or write the file. How much of a problem this is depends on how much both processes have to access the file.
Note that for this to work, one of the scripts has to start the other script as a Process after creating the lock.
If the weights are a Python data structure, you could put that under control of a multiprocessing.Manager. That will manage access to the objects under its control for you. Note that a Manager is not meant for use with files, just in-memory objects.
Additionally on UNIX-like operating systems Python has os.lockf to lock (part of) a file. Note that this is an advisory lock only. That is, if another process calls lockf, the return value indicates that the file is already locked. It does not actually prevent you from reading the file.
Note:
Files can be read and written. Only when two processes are reading the same file (read/read) does this work well. Every other combination (read/write, write/read, write/write) can and eventually will result in undefined behavior and data corruption.
Note2:
Another possible solution involves inter process communication.
Process 1 writes a new h5 file (with a random filename), closes it, and then sends a message (using a Pipe or Queue to Process 2 "I've written a new parameter file \path\to\file".
Process 2 then reads the file and deletes it. This can work both ways but requires that both processes check for and process messages every so often. This prevents file corruption because the writing process only notifies the reading process after it has finished the file.

External input to Python program during runtime

I am creating a test automation which uses an application without any interfaces. However, The application calls a batch script when it changes modes, and I am therefore am able to catch the mode transitions.
What I want to do is to get the batch script to give an input to my python script (I have a state machine running in python) during runtime. Such that I can monitor the state of the application with python instead of the batch file.
I am using a similar state machine to the one of Karn Saheb:
https://dev.to/karn/building-a-simple-state-machine-in-python
However, instead of changing states statically like:
device.on_event('event')
I want the python script to do something similar to:
while(True):
device.on_event(input()) # where the input is passed from the batch script:
REM state.bat
set CurrentState=%1
"magic code to pass CurrentState to python input()" %CurrentState%
I see that a solution would be to start the python script from the batch file every time it is called with the "event" and then save the current event in another file upon termination of the python script... But I want to avoid such handling and rather evaluate this during runtime.
Thank you in advance!
A reasonably portable way of doing this without ugly polling on temporary files is to use a socket: have the main process listen and have the batch file(s) start a small program that connects to the server and writes a message.
There are security considerations here: you can start by listening only to the loopback interface, with further authentication if the local machine should not be trusted.
If you have more than one of these processes, or if you need to handle the child dying before it issues its next report, you’ll have to use threads or something like select to unify the news from different input channels (e.g., waiting on the child to exit vs. waiting on news from the next batch file).

File flush needed after process exit?

I'm writing files from one process using open and write (i.e. direct kernel calls.) After the write, I simply close and exit the application without flushing. Now, the application is started from a Python-Wrapper which immediately after the application exits reads the files. Sometimes however, the Python wrapper reads incorrect data, as if I'm still reading an old version of the file (i.e. the wrapper reads stale data)
I thought that no matter whether the file metadata and contents are written to disk, the user visible contents would be always valid & consistent (i.e. buffers get flushed to memory at least, so subsequent reads get the same content, even though it might not be committed to disk.) What's going on here? Do I need to sync on close in my application; or can I simply issue a sync command after running my application from the Python script to guarantee that everything has been written correctly? This is running on ext4.
On the Python side:
# Called for lots of files
o = subprocess.check_output (['./App.BitPacker', inputFile]) # Writes indices.bin and dict.bin
indices = open ('indices.bin', 'rb').read ()
dictionary = open ('dict.bin', 'rb').read ()
with open ('output-file', 'wb') as output:
output.write (dictionary) # Invalid content in output-file ...
# output-file is a placeholder, one output-file per inputFile or course
I've never had your problem and always found a call to close() to be sufficient. However, from the man entry on close(2):
A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a file system to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.)
As, at time of writing, you haven't included code for the write processes I can only suggest adding a call to fsync in that process and see if this makes a difference.

Terminate Python Program, but Recover Data

I have an inefficient simulation running (it has been running for ~24 hours).
It can be split into 3 independent parts, so I would like to cancel the simulation, and start a more efficient one, but still recover the data that has already been calculated for the first part.
When an error happens in a program, for example, you can still access the data that the script was working with, and examine it to see where things went wrong.
Is there a way to kill the process manually without losing the data?
You could start a debugger such as winpdb, or any of several IDE debuggers, in a separate session, attach to the running process, (this halts it), set a break point in a section of the code that has access to your data, resume until you reach the break point and then save your data to a file, your new process could then load that data as a starting point.

Categories

Resources