python, concurrent futures, does not work for the additional runs - python

could someone help me with the following problem. I would appreciate it.
I am trying to run several processes in the same time. When I do it for the first time it works fine. However when I run it for the second time, it does not work properly. It waits some time and finishes in 5 minutes when it took 30 second to finish at the first time. When I look to resource monitoring I see that after the first run there are several processes "python 2.7" with zero to ten values (0 or 10) in CPU column. If i restart python and run code it works fine for the first time. After it again stops properly working. How could I set sustainable work?
import nltk
import concurrent.futures
def try_my_operation(k):
sentences= nltk.sent_tokenize(k)
return sentences
executor=concurrent.futures.ProcessPoolExecutor()
futures = [executor.submit(try_my_operation, data['text'][i]) for i in range(0,data.shape[0])]
concurrent.futures.wait(futures)
executor=concurrent.futures.ProcessPoolExecutor(1)
executor.shutdown(wait=False)
gc.collect()

Related

Running scheduled task in python

I have a python script where a certain job needs to be done at say 8 AM everyday. To do this what i was thinking was have a while loop to keep the program running all the time and inside the while loop use scheduler type package to specify a time where a specific subroutine needs to start. So if there are other routines which run at different times of the day this would work.
def job(t):
print "I'm working...", t
return
schedule.every().day.at("08:00").do(job,'It is 08:00')
Then let windows scheduler run this program and done. But I was wondering if this is terribly inefficient since the while loop is waste of cpu cycles and plus could freeze the computer as the program gets larger in future. Could you please advise if there is a more efficient way to schedule tasks which needs to executed down to the second at the same time not having to run a while loop?
I noted that you have a hard time requirement for executing your script. Just set your Windows Scheduler to start the script a few minutes before 8am. Once the script starts it will start running your schedule code. When your task is done exit the script. This entire process will start again the next day.
and here is the correct way to use the Python module schedule
from time import sleep
import schedule
def schedule_actions():
# Every Day task() is called at 08:00
schedule.every().day.at('08:00').do(job, variable="It is 08:00")
# Checks whether a scheduled task is pending to run or not
while True:
schedule.run_pending()
# set the sleep time to fit your needs
sleep(1)
def job(variable):
print(f"I'm working...{variable}")
return
schedule_actions()
Here are other answers of mine on this topic:
How schedule a job run at 8 PM CET using schedule package in python
How can I run task every 10 minutes on the 5s, using BlockingScheduler?
Execute logic every X minutes (without cron)?
Why a while loop ? Why not just let your Windows Scheduler or on Linux cron job run your simple python script to do whatever, then stop ?
Maintenance tends to become a big problem over time, so try to keep things as lightweight as possible.

How can I make python busy-wait for an exact duration?

More out of curiosity, I was wondering how might I make a python script sleep for 1 second without using the time module?
Is there a computation that can be conducted in a while loop which takes a machine of n processing power a designated and indexable amount of time?
As mentioned in comments for your second part of question:
The processing time is depends on the machine(computer and its configuration) you are working with and active processes on it. There isnt fixed amount of time for an operation.
It's been a long time since you could get a reliable delay out of just trying to execute code that would take a certain time to complete. Computers don't work like that any more.
But to answer your first question: you can use system calls and open a os process to sleep 1 second like:
import subprocess
subprocess.run(["sleep", "1"])

issue in trying to execute certain number of python scripts at certain intervals

I am trying to execute certain number of python scripts at certain intervals. Each script takes a lot of time to execute and hence I do not want to waste time in waiting to run them sequentially. I tired this code but it is not executing them simultaneously and is executing them one by one:
Main_file.py
import time
def func(argument):
print 'Starting the execution for argument:',argument
execfile('test_'+argument+'.py')
if __name__ == '__main__':
arg = ['01','02','03','04','05']
for val in arg:
func(val)
time.sleep(60)
What I want is to kick off by starting the executing of first file(test_01.py). This will keep on executing for some time. After 1 minute has passed I want to start the simultaneous execution of second file (test_02.py). This will also keep on executing for some time. Like this I want to start the executing of all the scripts after gaps of 1 minute.
With the above code, I notice that the execution is happening one after other file and not simultaneously as the print statements which are there in these files appear one after the other and not mixed up.
How can I achieve above needed functionality?
Using python 2.7 on my computer, the following seems to work with small python scripts as test_01.py, test_02.py, etc. when threading with the following code:
import time
import thread
def func(argument):
print('Starting the execution for argument:',argument)
execfile('test_'+argument+'.py')
if __name__ == '__main__':
arg = ['01','02','03']
for val in arg:
thread.start_new_thread(func, (val,))
time.sleep(10)
However, you indicated that you kept getting a memory exception error. This is likely due to your scripts using more stack memory than was allocated to them, as each thread is allocated 8 kb by default (on Linux). You could attempt to give them more memory by calling
thread.stack_size([size])
which is outlined here: https://docs.python.org/2/library/thread.html
Without knowing the number of threads that you're attempting to create or how memory intensive they are, it's difficult to if a better solution should be sought. Since you seem to be looking into executing multiple scripts essentially independently of one another (no shared data), you could also look into the Multiprocessing module here:
https://docs.python.org/2/library/multiprocessing.html
If you need them to run parallel you will need to look into threading. Take a look at https://docs.python.org/3/library/threading.html or https://docs.python.org/2/library/threading.html depending on the version of python you are using.

Causes of a delayed Python sched execution

We are using the sched module in Python 2.6 to run a function every 60 seconds. Each call issues the sched.enter() function after executing with a delay of 60 and priority of 1. This has been working fine.
However, we have found a situation when the next execution of the sched function doesn't happen for several minutes, even up to 5-6 minutes later. This has been observed on a virtual machine.
What could be causing this? Is there any workaround to ensure the task gets executed regularly?
How long takes the processing that happens before sched.enter is called? I would suggest monitoring this, and maybe taking that processing time into account in the delay parameter of sched.enter.
The load of the VM and of the VM host may also be important factors. I've seen performance degradations in some apps caused by the VM host being under a too high load and swapping.

Python: Script works, but seems to deadlock after some time

I have the following script, which is working for the most part Link to PasteBin The script's job is to start a number of threads which in turn each start a subprocess with Popen. The output from each subprocess is as follows:
1
2
3
.
.
.
n
Done
Bascially the subprocess is transferring 10M records from tables in one database to different tables in another db with a lot of data massaging/manipulation in between because of the different schemas. If the subprocess fails at any time in it's execution (bad records, duplicate primary keys, etc), or it completes successfully, it will output "Done\n". If there are no more records to select against for transfer then it will output "NO DATA\n"
My intent was to create my script "tableTransfer.py" which would spawn a number of these processes, read their output, and in turn output information such as number of updates completed, time remaining, time elapsed, and number of transfers per second.
I started running the process last night and checked in this morning to see it had deadlocked. There were not subprocceses running, there are still records to be updated, and the script had not exited. It was simply sitting there, no longer outputting the current information because no subprocces were running to update the total number complete which is what controls updates to the output. This is running on OS X.
I am looking for three things:
I would like to get rid of the possibility of this deadlock occurring so I don't need to check in on it as frequently. Is there some issue with locking?
Am I doing this in a bad way (gThreading variable to control looping of spawning additional thread... etc.) I would appreciate some suggestions for improving my overall methodology.
How should I handle ctrl-c exit? Right now I need to kill the process, but assume I should be able to use the signal module or other to catch the signal and kill the threads, is that right?
I am not sure whether I should be pasting my entire script here, since I usually just paste snippets. Let me know if I should paste it here as well.
You have a few places in your script where you return without releasing your locks. This could cause a problem - lines: 97 and 99 - this is where try: finally: blocks can help you a lot as you can then ensure that the release is called properly.

Categories

Resources