I am writing a python script to transfer large files via sftp with the pysftp module. I have a massive amount of data to transfer, a total of around 36Tb, divided in 54 runs, or batches.
I want only to carry out these transfers between certain hours of the day, for this example, between 6pm and 7am. So my idea is to use a for loop to iterate over all the runs/ batches. Upon each iteration, I would check what hour it is. If it is between 6pm and 7am I would transfer. Else the script would sleep until it is 6pm minimum. The code that I wrote looks like so:
runsList = 'runA runB runC'.split() # these are directories
# time constraints
bottomLimit = 7
upperLimit = 18
doNotUploadRange = range(bottomLimit, upperLimit)
for run in runsList:
hour = dt.datetime.now().hour
while hour in doNotUploadRange:
print('do not upload now')
time.sleep(1800)
hour = dt.datetime.now().hour
# when I leave the while condition above
# do the transfer via pysftp (large amount of data) per run
The question here does not concern the code itself not I want to check whether or not the script is running (which can be checked with htop), but I am concerned that my script will crash, for whatever reason, before it finishes (perhaps it would be running for a full week if nothing crashes).
I do sometimes call scripts that run for a very long time and they do crash sometimes, with no obvious reasons for crash.
So my question is whether it is, for whatever reason, obvious that the script will crash after running for 6-7 days of can I expect it to finish provided that there is no error in the code itself? My idea is to call this script on the background, inside tmux I would python script.py &
Related
I have a Python script that has an infinite loop in it.
while True:
doStuff()
Now I need to check from some external script in case the program freezes - for example if doStuff() was not executed for five minutes reboot the system. My idea is to save current time to a file every time doStuff runs and then read it from another script and if time saved in that file is 5 minutes less than now then reboot. Is there some better and more elegant solution to this?
Edit: and no - I'm not trying to check if program is running - I need to check in case it is still running but it got stuck somewhere
I had some problem with Python 3.7 running a single script in .py format. To Solve this i divided the program in some .py files, even so have memory leak.
I have a web scraping program that runs some script at intervals so it has to run 24h / 7, but with 16mb / hour memory increase it gets hard.
while True:
with open('scrapy.py') as op:
exec(op.read())
time.sleep(5)
Scrapy.py have some requests, pandas etc
I think this code close 'scrapy.py' every time the loop ends, but it seems not since this program eat memory.
So I wrote a script, it's collecting a timeseries of the last year via an API call (requests library) and then calculating the average. It's an infinite loop ( while True: ) with a sleep time of a couple of minutes and is requesting the new data, cutting off that which is older than 1 year and concatenating the new one. Then the average gets recalculated. The current timeseries (of a whole year) and the average are stored in a class object that was created before the loop begins. So it's basically:
Obj=Obj_Class()
while True:
updateAverage()
postAverage()
time.sleep(60)
This all runs fine and the memory required is relatively constant at around 60-80MB.
Now, for deployment, it has been put in a Docker container and ran on the server. Therefore, the API calls were edited to grab the data directly from the Influx database that's hosted on the same server. Also "postAverage()" now does not only print the data on the monitor but puts it into the Influx database.
This is all that was changed. But suddenly, the memory (RAM) continuously grows (I ended the process after it reached 1 GB). I do not have a clue why this could be happening. Does someone have an idea what the reason could be? Where I could look or what I could look into? I know it is most likely impossible to tell without going through my code, but I figured someone here might have experienced something similar before and could offer some advice.
What are the best methods to set a .py file to run at one specific time in the future? Ideally, its like to do everything within a single script.
Details: I often travel for business so I built a program to automatically check me in to my flights 24 hours prior to takeoff so I can board earlier. I currently am editing my script to input my confirmation number and then setting up cron jobs to run said script at the specified time. Is there a better way to do this?
Options I know of:
• current method
• put code in the script to delay until x time. Run the script immediately after booking the flight and it would stay open until the specified time, then check me in and close. This would prevent me from shutting down my computer, though, and my machine is prone to overheating.
Ideal method: input my confirmation number & flight date, run the script, have it set up whatever cron automatically, be done with it. I want to make sure whatever method I use doesn't include keeping a script open and running in the background.
cron is best for jobs that you want to repeat periodically. For one-time jobs, use at or batch.
I have a program that counts pulses (Hall effect sensor) on a rain gauge to measure precipitation. It runs continuously and counts a number of pulses every 5 minutes that then translates into a rain amount. After an hour (12 - 5min. measurements, I add the total and this is the hourly rainfall. I have structure this program so that it drops the oldest measurement and adds the new one each 5 minutes after an hour, and so I have a running hourly rain output, termed "totalrainlasthour".
My problem is that I want to upload this data to weather underground using a separate program that includes other data such as wind speed, temp, etc. This upload takes place every 5 minutes. I want to include the current value of "totalrainlasthour", and use it in the upload.
I tried a "from import" command but the more I read, that doesn't look like it would work.
from rainmodule import totalrainlasthour
print totalrainlasthour
Is there a way can I pull in the current value of a variable from a separate program?
As far as I know, there's no good way for a python script that just starts up to access the values from inside an already-running Python instance. However, there are a few workarounds that you can try.
If it's acceptable for your weather uploading script to be running constantly, you could structure it to look something like this:
import time
import rainmodule
import windmodule
# etc
def start():
# instantiate classes so you can keep track of state
rain = rainmodule.RainCollection()
wind = windmodule.WindCollection()
# etc
prev_time = time.time()
while True:
rain.loop()
wind.loop()
# etc
now = time.time()
if now - prev_time > (60*60*5):
prev_time = now
totalrainlasthour = rain.totalrainlasthour
winddata = wind.data
# upload code here
if __name__ == '__main__':
start()
This method assumes that every one of your data collection modules can be modified to run iteratively within a "master" while loop.
If you can't wrangle your code to fit this format, (or the loop methods for some modules take a long time to execute) then you could perhaps launch each of your modules as a process using the multiprocessing or threading modules, and communicate using some synchronized data structure or a queue.
An alternative solution might be to create a database of some sort (Python comes bundled with sqlite, which could work), and have each of the scripts write to that database. That way, any arbitrary script could run and grab what it needs to from the database without having to tie in to the other data collection modules.
The only potential issue with using sqlite is that since it's lightweight, it supports only one writer at a time, so if you're making a huge amount of changes and additions to the database, it may end up being a bottleneck.