Python: Time of day to execute - python

I have written a small script to fetch instant stock prices.
#script to get stock data
from __future__ import print_function
import urllib
import lxml.html
from datetime import datetime
import sys
import time
stocks=["stock1","stock2","stock3","stock4","stock5"]
while True:
f=open('./out.txt', 'a+')
for x in stock:
url = "http://someurltofetchdata/"+x
code = urllib.urlopen(url).read()
html = lxml.html.fromstring(code)
result = html.xpath('//td[#class="LastValue"][position() = 1]')
result = [el.text_content() for el in result]
f.write(datetime.now().strftime("%Y-%m-%d %H:%M:%S") + ' ' + x + ' ' + result[0])
f.write("\n")
f.close()
I want that code to fetch data only on hours the stock market is open which means on trading hours. (09:00 to 12:30 and 13:30 to 17:30).
Could you please suggest a method to perform the scheduling implicitly on the code? (Not on the OS level)

If you cannot use cron (which is the simplest way to accomplish the task), you can add this to your code. It will download data if within given time range, sleep for 60 seconds and then run again.
while True:
now = datetime.now().strftime('%H%M')
if '0900' <= now <= '1230' or '1330' <= now <= '1730':
# your code starting with f=open('./out.txt', 'a+')
time.sleep(60)

Have a look at APScheduler
from apscheduler.scheduler import Scheduler
sched = Scheduler()
#sched.interval_schedule(hours=3)
def some_job():
print "Decorated job"
sched.configure(options_from_ini_file)
sched.start()
You can also specify a time.date
job = sched.add_date_job(my_job, datetime(2009, 11, 6, 16, 30, 5), ['text'])
Obviously you'll have to write some code to turn these on and off sched.start() sched.stop()at the relevant times , but then it will go and get the data as often as you have set on the decorator automatically. You could even schedule the schedule!

If you want to schedule this script on Windows, please use task schedule.
It has GUI to configuration, and pretty easy. For Linux, crontab will be better. And most important, you don't need to modify your code, and much stable for long term running.

Related

how can I download multiple files with the web address found in a local .txt file

import wget
with open('downloadhlt.txt') as file:
urls = file.read()
for line in urls.split('\n'):
wget.download(line, 'localfolder')
for some reason the post wouldn't work so I put the code above
What I'm trying to do is from a text file that has ~2 million of lines like these.
http://halitereplaybucket.s3.amazonaws.com/1475594084-2235734685.hlt
http://halitereplaybucket.s3.amazonaws.com/1475594100-2251426701.hlt
http://halitereplaybucket.s3.amazonaws.com/1475594119-2270812773.hlt
I want to grab each line and request it so it downloads as a group greater than 10. Currently, what I have and it downloads one item at a time, it is very time-consuming.
I tried looking at Ways to read/edit multiple lines in python but the iteration seems to be for editing while mine is for multiple executions of wget.
I have not tried other methods simply because this is the first time I have ever been in the need to make over 2 million download calls.
This should work fine, I'm a total newbie so I can't really
advice you on the number of thread to start lol.
These are my 2 cents anyway, hope it somehow helps.
I tried timing yours and mine over 27 downloads:
(base) MBPdiFrancesco:stack francesco$ python3 old.py
Elapsed Time: 14.542160034179688
(base) MBPdiFrancesco:stack francesco$ python3 new.py
Elapsed Time: 1.9618661403656006
And here is the code, you have to create a "downloads" folder
import wget
from multiprocessing.pool import ThreadPool
from time import time as timer
s = timer()
thread_num = 8
def download(url):
try:
wget.download(url, 'downloads/')
except Exception as e:
print(e)
if __name__ == "__main__":
with open('downloadhlt.txt') as file:
urls = file.read().split("\n")
results = ThreadPool(8).imap_unordered(download, urls)
c = 0
for i in results:
c += 1
print("Downloaded {} file{} so far".format(c, "" if c == 1 else "s"))
print("Elapsed Time: {} seconds\nDownloaded {} files".format(timer() - s, c))

Run python script like a service with Twisted

I would like to run this script like a automatic service who will run every minute, everyday with Twisted (I first tried to 'DAEMON' but it seems to difficult and i didn't find good tutos to do it, I already tried crontab but that's not what I'm looking for).
Do anyone ever do that with Twisted because I'm not finding the tutorial made for my kind of script(getting datas from a db table and putting them in another table of same db) ? I have to keep the logs in a file but it will not be the most difficult part.
from twisted.enterprise import adbapi
from twisted.internet import task
import logging
from datetime import datetime
from twisted.internet import reactor
from twisted.internet.defer import inlineCallbacks
"""
Test DB : This File do database connection and basic operation.
"""
log = logging.getLogger("Test DB")
dbpool = adbapi.ConnectionPool("MySQLdb",db="xxxx",user="guza",passwd="vQsx7gbblal8aiICbTKP",host="192.168.15.01")
class MetersCount():
def getTime(self):
log.info("Get Current Time from System.")
time = str(datetime.now()).split('.')[0]
return time
def getTotalMeters(self):
log.info("Select operation in Database.")
getMetersQuery = """ SELECT count(met_id) as totalMeters FROM meters WHERE DATE(met_last_heard) = DATE(NOW()) """
return dbpool.runQuery(getMetersQuery).addCallback(self.getResult)
def getResult(self, result):
print ("Receive Result : ")
print (result)
# general purpose method to receive result from defer.
return result
def insertMetersCount(self, meters_count):
log.info("Insert operation in Database.")
insertMetersQuery = """ INSERT INTO meter_count (mec_datetime, mec_count) VALUES (NOW(), %s)"""
return dbpool.runQuery(insertMetersQuery, [meters_count])
def checkDB(self):
d = self.getTotalMeters()
d.addCallback(self.insertMetersCount)
return d
a= MetersCount()
a.checkDB()
reactor.run()
If you want to run a function once a minute, have a look at LoopingCall. It takes a function, and runs it at intervals unless told to stop.
You would use it something like this (which I haven't tested):
from twisted.internet.task import LoopingCall
looper = LoopingCall(a.checkDB)
looper.start(60)
The documentation is at the link.

Python schedule with commandline

I have this problem that I want to automate a script.
And in passed projects I've used python scheduler for this. But for this project I'm unsure how to handle this.
The problem is that the code works with login details that are outside the code and entered in the commandline when launching the script.
ex. python scriptname.py email#youremail.com password
How can I automate this with python scheduler?
The code that is in 'scriptname.py' is:
//LinkedBot.py
import argparse, os, time
import urlparse, random
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
def getPeopleLinks(page):
links = []
for link in page.find_all('a'):
url = link.get('href')
if url:
if 'profile/view?id=' in url:
links.append(url)
return links
def getJobLinks(page):
links = []
for link in page.find_all('a'):
url = link.get('href')
if url:
if '/jobs' in url:
links.append(url)
return links
def getID(url):
pUrl = urlparse.urlparse(url)
return urlparse.parse_qs(pUrl.query)['id'][0]
def ViewBot(browser):
visited = {}
pList = []
count = 0
while True:
#sleep to make sure everything loads, add random to make us look human.
time.sleep(random.uniform(3.5,6.9))
page = BeautifulSoup(browser.page_source)
people = getPeopleLinks(page)
if people:
for person in people:
ID = getID(person)
if ID not in visited:
pList.append(person)
visited[ID] = 1
if pList: #if there is people to look at look at them
person = pList.pop()
browser.get(person)
count += 1
else: #otherwise find people via the job pages
jobs = getJobLinks(page)
if jobs:
job = random.choice(jobs)
root = 'http://www.linkedin.com'
roots = 'https://www.linkedin.com'
if root not in job or roots not in job:
job = 'https://www.linkedin.com'+job
browser.get(job)
else:
print "I'm Lost Exiting"
break
#Output (Make option for this)
print "[+] "+browser.title+" Visited! \n("\
+str(count)+"/"+str(len(pList))+") Visited/Queue)"
def Main():
parser = argparse.ArgumentParser()
parser.add_argument("email", help="linkedin email")
parser.add_argument("password", help="linkedin password")
args = parser.parse_args()
browser = webdriver.Firefox()
browser.get("https://linkedin.com/uas/login")
emailElement = browser.find_element_by_id("session_key-login")
emailElement.send_keys(args.email)
passElement = browser.find_element_by_id("session_password-login")
passElement.send_keys(args.password)
passElement.submit()
Running this on OSX.
I can see at least two different way of automating the trigger of your script. Since you are mentioning that your script is started this way:
python scriptname.py email#youremail.com password
It means that you start it from a shell. As you want to have it scheduled, it sounds like a Crontab is a perfect answer. (see https://kvz.io/blog/2007/07/29/schedule-tasks-on-linux-using-crontab/ for example)
If you really want to use python scheduler, you can use the subprocess.
In your file using python scheduler:
import subprocess
subprocess.call("python scriptname.py email#youremail.com password", shell=True)
What is the best way to call a Python script from another Python script?
About the code itself
LinkedIn REST Api
Have you tried using LinkedIn's REST Api instead of retrieving heavy pages, filling in some form and sending it back?
Your code is prone to be broken whenever LinkedIn changes some elements in their page. Whereas the Api is a contract between LinkedIn and the users.
Check here https://developer.linkedin.com/docs/rest-api and there https://developer.linkedin.com/docs/guide/v2/concepts/methods
Credentials
So that you don't have to pass your credentials through command line (especially your password, which will be readable in clear through history), you should either
use a config file (with your Api Key) and read it with ConfigParser (or anything else, depending on the format of your config file (json, python, etc...)
or set them into your environment variables.
For the scheduling
Using Cron
Moreover, for the scheduling part, you can use cron.
Using Celery
If you're looking for a 100% Python solution, you can use the excellent Celery project. Check its periodic tasks.
You can pass the args to the python scheduler.
scheduler.enter(delay, priority, action, argument=(), kwargs={})
Schedule an event for delay more time units. Other than the relative time, the other arguments, the effect and the return value are the same as those for enterabs().
Changed in version 3.3: argument parameter is optional.
New in version 3.3: kwargs parameter was added.
>>> import sched, time
>>> s = sched.scheduler(time.time, time.sleep)
>>> def print_time(a='default'):
... print("From print_time", time.time(), a)
...
>>> def print_some_times():
... print(time.time())
... s.enter(10, 1, print_time)
... s.enter(5, 2, print_time, argument=('positional',))
... s.enter(5, 1, print_time, kwargs={'a': 'keyword'})
... s.run()
... print(time.time())
...
>>> print_some_times()
930343690.257
From print_time 930343695.274 positional
From print_time 930343695.275 keyword
From print_time 930343700.273 default
930343700.276

'Listening' for a file in Python

I have a python script that does some updates on my database.
The files that this script needs are saved in a directory at around 3AM by some other process.
So I'm going to schedule a cron job to run daily at 3AM; but I want to handle the case if the file is not available exactly at 3AM, it could be delayed by some interval.
So I basically need to keep checking whether the file of some particular name exists every 5 minutes starting from 3AM. I'll try for around 1 hour, and give up if it doesn't work out.
How can I achieve this sort of thing in Python?
Try something like this (you'll need to change the print statements to be function calls if you are using Python 3).
#!/usr/bin/env python
import os
import time
def watch_file( filename, time_limit=3600, check_interval=60 ):
'''Return true if filename exists, if not keep checking once every check_interval seconds for time_limit seconds.
time_limit defaults to 1 hour
check_interval defaults to 1 minute
'''
now = time.time()
last_time = now + time_limit
while time.time() <= last_time:
if os.path.exists( filename ):
return True
else:
# Wait for check interval seconds, then check again.
time.sleep( check_interval )
return False
if __name__ == '__main__':
filename = '/the/file/Im/waiting/for.txt'
time_limit = 3600 # one hour from now.
check_interval = 60 # seconds between checking for the file.
if watch_file( filename, time_limit, check_interval ):
print "File present!"
else:
print "File not found after waiting:", time_limit, " seconds!"
For this sort of task, you need to use watchdog a library for listening to and monitoring system events.
One of the events it can monitor is file system events, via the FileSystemEventHandler class, which has on_created() method.
You'll end up writing a "wrapper" script, it can be running continuously. This script will use watchdog to listen on that particular directory. The moment a file is created, this script will be notified - you'll have to then check if the file created matches the pattern of the target file, and then execute your custom code.
Luckily, as this is a common task - there is a PatternMatchingEventHandler already available, which inherits from FileSystemEventHandler but watches for files matching a pattern.
Your wrapper script then becomes:
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler
class FileWatcher(PatternMatchingEventHandler):
patterns = ["*.dat"] # adjust as required
def process(self, event):
# your actual code goes here
# event.src_path will be the full file path
# event.event_type will be 'created', 'moved', etc.
print('{} observed on {}'.format(event.event_type, event.src_path))
def on_created(self, event):
self.process(event)
if __name__ == '__main__':
obs = Observer() # This is what manages running of your code
obs.schedule(FileWatcher(), path='/the/target/dir')
obs.start() # Start watching
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
ob.stop()
obs.join()
That is what comes to my mind first, pretty straight forward:
from time import sleep
counter = 0
working = True
while counter < 11 and working:
try:
# Open file and do whatever you need
working = False
except IOError:
counter +=1
sleep(5*60)
Better solution
from time import sleep
counter = 0
working = True
while counter < 11 and working:
if os.path.isfile('path/to/your/file')
# Open file and do whatever you need
working = False
else:
counter +=1
sleep(5*60)
In Python you can check if the file exists
import os.path
os.path.isfile(filename)
Then you set your cron to run every 5 minutes from 3am:
*/5 3 * * * /path-to-your/script.py
You can write in a simple file to control wether you already read the data from file or not (or a database if you are already using one)
You can use Twisted, and it is reactor it is much better than an infinite loop ! Also you can use reactor.callLater(myTime, myFunction), and when myFunction get called you can adjust the myTime and add another callback with the same API callLater().

Write timestamp to file every hour in Python

I have a python script that is constantly grabbing data from Twitter and writing the messages to a file. The question that I have is every hour, I want my program to write the current time to the file. Below is my script. Currently, it gets into the timestamp function and just keeps printing out the time every 10 seconds.
#! /usr/bin/env python
import tweetstream
import simplejson
import urllib
import time
import datetime
import sched
class twit:
def __init__(self,uname,pswd,filepath):
self.uname=uname
self.password=pswd
self.filepath=open(filepath,"wb")
def main(self):
i=0
s = sched.scheduler(time.time, time.sleep)
output=self.filepath
#Grab every tweet using Streaming API
with tweetstream.TweetStream(self.uname, self.password) as stream:
for tweet in stream:
if tweet.has_key("text"):
try:
#Write tweet to file and print it to STDOUT
message=tweet['text']+ "\n"
output.write(message)
print tweet['user']['screen_name'] + ": " + tweet['text'], "\n"
################################
#Timestamp code
#Timestamps should be placed once every hour
s.enter(10, 1, t.timestamp, (s,))
s.run()
except KeyError:
pass
def timestamp(self,sc):
now = datetime.datetime.now()
current_time= now.strftime("%Y-%m-%d %H:%M")
print current_time
self.filepath.write(current_time+"\n")
if __name__=='__main__':
t=twit("rohanbk","cookie","tweets.txt")
t.main()
Is there anyway for my script to do it without constantly checking the time every other minute with an IF statement to see how much time has elapsed? Can I use a scheduled task like how I've done above with a slight modification to my current implementation?
your code
sc.enter(10, 1, t.timestamp, (sc,)
is asking to be scheduled again in 10 seconds. If you want to be scheduled once an hour,
sc.enter(3600, 1, t.timestamp, (sc,)
seems better, since an hour is 3600 seconds, not 10!
Also, the line
s.enter(1, 1, t.timestamp, (s,))
gets a timestamp 1 second after every tweet written -- what's the point of that? Just schedule the first invocation of timestamp once, outside the loop, as well as changing its periodicity from 10 seconds to 3600.

Categories

Resources