I wrote a script for create several models in abaqus and then run the jobs created using a simple python loop, but when running the script the programm runs all the jobs at the same time and the computer memory isn't enough so it aborts the jobs. I want to know how create a srcipt where the next job is submitted just after the first has ended.
It depends on how you are invoking Abaqus. If you are creating the Abaqus processes directly you can add the -interactive argument to your command so it doesn't run the solver in a background process and return immediately. For example:
abq2018 -j my_job_name -interactive
On the other hand, if you are using the Abaqus API and the Job object to create and run jobs you can use the waitForCompletion method to wait until a Job completes. Here is the excerpt from the Abaqus documentation:
waitForCompletion() This method interrupts the execution of
the script until the end of the analysis. If you call the
waitForCompletion method and the status member is neither SUBMITTED
nor RUNNING, Abaqus assumes the analysis has either completed or
aborted and returns immediately.
Here's a short example of how to create Job objects and use the waitForCompletion method:
from abaqus import *
# Create a Job from a Model definition
j1 = mdb.Job(name='my_job_name', model=mdb.models['my_model_name'])
# or create a Job from an existing input file
j2 = mdb.JobFromInputFile(name='my_job_name', inputFileName='my_job_name.inp')
# Submit the first job - this returns immediately
j1.submit()
# Now wait for the first job - this will block until the job completes
j1.waitForCompletion()
# Same process for the second Job
j2.submit()
j2.waitForCompletion()
I developed a Graphic User Interface to improve the queueing process of Abaqus analyses.
It's available on GitHub here.
Install:
1. Download or clone the package from GitHub.
2. Find the core.py file and edit the lines 31, 36, 37 and 38 according to your machine and your config.
3. Run it.
If you have any issues using it, please submit them on the GitHub repository.
Related
I have a simple script that is responsible of fetching data from an external API, lets call it connector.py.
That script takes some params as an input ,do its job and then write it to a file and return the output.
I want to implement a scheduler that would create and manage two instances of that script, each with his own input(different settings) and make them run in configured intervals with the next constraint:
Input: Pass the parameters of the connector from the settings, to the sub-process via the stdin channel (not as process args)
Output: Pass the connector output from the sub-process to the service via the stdout channel
I have to implement the constant loop cycle by myself (not use a Scheduler for example)
What mechanisem should I use in order to acheive that goal processes?, threads?, sub-process?
Im mainly struggling to understand how to deal with stdin/stdout issue for the different connector instances.
Any advice would be appericiated.
You have two possibilities whith the scheduling of tasks.
Make your script a factory which will run everytime until something stop it. So you will have the possibility to choose either threads or processes (subprocess use porcess). Here a little description of threads and processes. (If I use this method I would use sub-processes)
What is the difference between a process and a thread?
https://www.backblaze.com/blog/whats-the-diff-programs-processes-and-threads/
However I don't see the utility of using threads or subprocesses in your case because you're telling us that you will make them run in configured intervals. You can just integerate the program to your to make them run separatly.
For task scheduling you also have the use of cronjobs. It allows the execution of commands depending of the date, repetition, user, etc. Here some detail on how setting up a cronjob:
https://phoenixnap.com/kb/set-up-cron-job-linux
I am currently using Airflow to run a DAG (say dag.py) which has a few tasks, and then, it has a python script to execute (done via bash_operator). The python script (say report.py) basically takes data from a cloud (s3) location as a dataframe, does a few transformations, and then sends them out as a report over email.
But the issue I'm having is that airflow is basically running this python script, report.py, everytime Airflow scans the repository for changes (i.e. every 2 mins). So, the script is being run every 2 mins (and hence the email is being sent out every two minutes!).
Is there any work around to this? Can we use something apart from a bash operator (bare in mind that we need to do a few dataframe transformations before sending out the report)?
Thanks!
Just make sure you do everything serious in the tasks. It in the python script. The script will be executed often by scheduler but it should simply create tasks and build dependencies between them. The actual work is done in the 'execute' methods of the tasks.
For example rather than sending email in the script you should add the 'EmailOperator' as a task and the right dependencies, so the execute method of the operator will be executed not when the file is parsed by scheduler, but when all dependencies (other tasks ) will complete
Is there any python library that would provide a (generic) job state journaling and recovery functionality?
Here's my use case:
data received to start a job
job starts processing
job finishes processing
I then want to be able to restart a job back after 1 if the process aborts / power fails. Jobs would write to a journal file when job starts, and mark the job done when the job completes. So when the process starts, it checks a journal file for uncompleted jobs, and uses the journal data to restart the job(s) that did not complete, if present. So what python tools exist to solve this? (Or other python solutions to having fault tolerance and recovery for critical jobs that must complete). I know a job queue like RabbitMQ would work quite well for this case, but I want a solution that doesn't need an external service. I searched PyPI for "journaling" and didn't get much. So any solutions? Seems like a library for this would be useful, since there are multiple concerns when using a journal that are hard to get right, but a library could handle. (Such as multiple async writes, file splitting and truncating, etc.)
I think you can do this using either crontabs or APScheduler, I think the latter has all the feature you need, but even with cron you can do something like:
1: schedule A process to run after a specific interval
2: A process checks if there is a running job or not
3: if no job is running, start one
4: job continues working, and saves state into drive/db
5: if it fails or finishes, step 3 will continue
APScheduler is likely what you're looking for, their feature list is extensive and it's also extendable if it doesn't fulfill your requirements.
I have a recurring cron job that runs a Django management command. The command interacts with the ORM, sends email with sendmail, and sends SMS with Twilio. It's possible that the cron jobs will begin to overlap. In other words, the job (that runs this command) might still be executing when the next job starts to run. Will this cause any issues?
(I don't want to wait for the management command to finish executing before running the management command again with cron).
EDIT:
The very beginning of the management command gets a timestamp of when the command was run. At a minimum, this timestamp needs to be accurate. It would be nice if the rest of the command didn't wait for the previous cron job to finish running, but that's non-critical.
EDIT 2:
The cron job only reads from the DB, it doesn't write to it. The application has to continue to work while the cron job is running. The application reads and writes from the DB.
My understanding of cron is that it will fork off a job as a background process, allowing multiple jobs to run at the same time. This can be problematic if the second job depends on the first job to be done (if the second is running a daily report of aggregated data provided by the first job etc...). If you don't want them to run concurrently, there are workarounds to that:
How to prevent the cron job execution, if it is already running.
Will Cron start a new job if the current job is not complete?
Yes. This could definitely cause issues. You have a race condition. If you wish, you could acquire a lock somehow on a critical section which would prevent the next invocation from entering a section of code until the first invocation of the command finished. You may be able to do a row lock or a table lock for the underlying data.
Let's presume you're using MySQL which has specific lock syntax (DB dependent) and you have this model:
class Email(models.Model):
sent = models.BooleanField(default=False)
subj = models.CharField(max_length=140)
msg = models.TextField()
You can create a lock object like this:
from django.db import connection
[...]
class EmailLocks(object):
def __init__(self):
self.c = connection.cursor()
def __enter__(self):
self.c.execute('''lock tables my_app_email write''')
def __exit__(self, *err):
self.c.execute('unlock tables')
Then lock all of your critical sections like:
with EmailLocks():
# read the email table and decide if you need to process it
for e in Email.objects.filter(sent=False):
# send the email
# mark the email as sent
e.sent = True
e.save()
The lock object will automatically unlock the table on exit. Also, if you throw an exception in your code, the table will still be unlocked.
So you have a cron that runs django management command and you dont want them to overlap.
You can use flock, Which generates a lockfile and deletes it after executing the cron.If the second cron starts before the first one has ended it will see that there a lockfile already created and thus not execute the second one.
Below is the cron i used:
* * * * * /usr/bin/flock -n /tmp/fcj.lockfile /usr/bin/python /home/txuser/dev/Project1/projectnew/manage.py flocktest
There is lot more you can do with this.
more on this
Summary: I have a python script which collects tweets using Twitter API and i have postgreSQL database in the backend which collects all the streamed tweets. I have custom code which overcomes the ratelimit issue and i made it to run 24/7 for months.
Issue: Sometimes streaming breaks and sleeps for given secs but it is not helpful. I do not want to check it manually.
def on_error(self,status)://tweepy method
self.mailMeIfError(['me <me#localhost'],'listen.py <root#localhost>','Error Occured on_error method',str(error))
time.sleep(300)
return True
Assume mailMeIfError is a method which takes care of sending me a mail.
I want a simple cron script which always checks the process and restart the python script if not running/error/breaks. I have gone through some answers from stackoverflow where they have used Process ID. In my case process ID still exists because this script sleeps if Error.
Thanks in advance.
Using Process ID is much easier and safer. Try using watchdog.
This can all be done in your one script. Cron would need to be configured to start your script periodically, say every minute. The start of your script then just needs to determine if it is the only copy of itself running on the machine. If it spots that another copy is running, it just silently terminates. Else it continues to run.
This behaviour is called a Singleton pattern. There are a number of ways to achieve this for example Python: single instance of program