I have a python script that is constantly grabbing data from Twitter and writing the messages to a file. The question that I have is every hour, I want my program to write the current time to the file. Below is my script. Currently, it gets into the timestamp function and just keeps printing out the time every 10 seconds.
#! /usr/bin/env python
import tweetstream
import simplejson
import urllib
import time
import datetime
import sched
class twit:
def __init__(self,uname,pswd,filepath):
self.uname=uname
self.password=pswd
self.filepath=open(filepath,"wb")
def main(self):
i=0
s = sched.scheduler(time.time, time.sleep)
output=self.filepath
#Grab every tweet using Streaming API
with tweetstream.TweetStream(self.uname, self.password) as stream:
for tweet in stream:
if tweet.has_key("text"):
try:
#Write tweet to file and print it to STDOUT
message=tweet['text']+ "\n"
output.write(message)
print tweet['user']['screen_name'] + ": " + tweet['text'], "\n"
################################
#Timestamp code
#Timestamps should be placed once every hour
s.enter(10, 1, t.timestamp, (s,))
s.run()
except KeyError:
pass
def timestamp(self,sc):
now = datetime.datetime.now()
current_time= now.strftime("%Y-%m-%d %H:%M")
print current_time
self.filepath.write(current_time+"\n")
if __name__=='__main__':
t=twit("rohanbk","cookie","tweets.txt")
t.main()
Is there anyway for my script to do it without constantly checking the time every other minute with an IF statement to see how much time has elapsed? Can I use a scheduled task like how I've done above with a slight modification to my current implementation?
your code
sc.enter(10, 1, t.timestamp, (sc,)
is asking to be scheduled again in 10 seconds. If you want to be scheduled once an hour,
sc.enter(3600, 1, t.timestamp, (sc,)
seems better, since an hour is 3600 seconds, not 10!
Also, the line
s.enter(1, 1, t.timestamp, (s,))
gets a timestamp 1 second after every tweet written -- what's the point of that? Just schedule the first invocation of timestamp once, outside the loop, as well as changing its periodicity from 10 seconds to 3600.
Related
How I would like to do: the action is dispatched, how it ends file is sent. But it turns out that the action lasts 5 seconds, and then it takes another 5 seconds to send the file, and this time the user does not understand whether the bot is frozen or the file is still being sent. How can I increase the duration of action before sending the file directly?
import telebot
...
def send_file(m: Message, file):
bot.send_chat_action(m.chat.id, action='upload_document')
bot.send_document(m.chat.id, file)
As Tibebes. M said this is not possible because all actions are sent via API. But threads helped me to solve the problem. The solution looks like this:
from threading import Thread
def send_action(id, ac):
bot.send_chat_action(id, action=ac)
def send_doc(id, f):
bot.send_document(id, f)
def send_file(m: Message):
file = open(...)
Thread(target=send_action, args=(m.chat.id, 'upload_document')).start()
Thread(target=send_doc, args=(m.chat.id, file)).start()
...
send_file(m)
Thus, it is possible to make it so that as soon as the action ends, the file is immediately sent without time gaps
import wget
with open('downloadhlt.txt') as file:
urls = file.read()
for line in urls.split('\n'):
wget.download(line, 'localfolder')
for some reason the post wouldn't work so I put the code above
What I'm trying to do is from a text file that has ~2 million of lines like these.
http://halitereplaybucket.s3.amazonaws.com/1475594084-2235734685.hlt
http://halitereplaybucket.s3.amazonaws.com/1475594100-2251426701.hlt
http://halitereplaybucket.s3.amazonaws.com/1475594119-2270812773.hlt
I want to grab each line and request it so it downloads as a group greater than 10. Currently, what I have and it downloads one item at a time, it is very time-consuming.
I tried looking at Ways to read/edit multiple lines in python but the iteration seems to be for editing while mine is for multiple executions of wget.
I have not tried other methods simply because this is the first time I have ever been in the need to make over 2 million download calls.
This should work fine, I'm a total newbie so I can't really
advice you on the number of thread to start lol.
These are my 2 cents anyway, hope it somehow helps.
I tried timing yours and mine over 27 downloads:
(base) MBPdiFrancesco:stack francesco$ python3 old.py
Elapsed Time: 14.542160034179688
(base) MBPdiFrancesco:stack francesco$ python3 new.py
Elapsed Time: 1.9618661403656006
And here is the code, you have to create a "downloads" folder
import wget
from multiprocessing.pool import ThreadPool
from time import time as timer
s = timer()
thread_num = 8
def download(url):
try:
wget.download(url, 'downloads/')
except Exception as e:
print(e)
if __name__ == "__main__":
with open('downloadhlt.txt') as file:
urls = file.read().split("\n")
results = ThreadPool(8).imap_unordered(download, urls)
c = 0
for i in results:
c += 1
print("Downloaded {} file{} so far".format(c, "" if c == 1 else "s"))
print("Elapsed Time: {} seconds\nDownloaded {} files".format(timer() - s, c))
Here is the code:
import os
import asyncio
async def func_placing_sell_orders():
prev_final_stocks_list_state = os.path.getmtime('stock_data//final_stocks_list.json')
print('i run once')
while True:
if (prev_final_stocks_list_state != os.path.getmtime('stock_data//final_stocks_list.json')):
prev_final_stocks_list_state = os.path.getmtime('stock_data//final_stocks_list.json')
print('here')
asyncio.get_event_loop().run_until_complete(func_placing_sell_orders())
simplified ver:
import os
def simple():
state = os.path.getmtime('file.json')
print('i run once')
while True:
if (state != os.path.getmtime('file.json')):
state = os.path.getmtime('file.json')
print('here')
simple()
This is the print out:
i run once
here
here
here, gets print out twice every time I save the file. I ran to check the time between previous and current modified time and it is always different, which implies it should only run once per save.
This is so basic I don't understand why I'm getting this result. Please send help
If the file is large enough maybe the first "here" is while file is still writing edits and the last "here" is after the saving is done. Also, if you're using something like open("file", "w") or something like this to write edits, the file will be first clean (first "here") and then edited with with new data (second "here")
You can ignore too fast reports (<1s) with a simple timer
lastEdit = time.time()
while True:
if (state != os.path.getmtime('file.json')):
state = os.path.getmtime('file.json')
if time.time()-lastEdit > 1:
print('here')
lastEdit = time.time()
I have a python script that does some updates on my database.
The files that this script needs are saved in a directory at around 3AM by some other process.
So I'm going to schedule a cron job to run daily at 3AM; but I want to handle the case if the file is not available exactly at 3AM, it could be delayed by some interval.
So I basically need to keep checking whether the file of some particular name exists every 5 minutes starting from 3AM. I'll try for around 1 hour, and give up if it doesn't work out.
How can I achieve this sort of thing in Python?
Try something like this (you'll need to change the print statements to be function calls if you are using Python 3).
#!/usr/bin/env python
import os
import time
def watch_file( filename, time_limit=3600, check_interval=60 ):
'''Return true if filename exists, if not keep checking once every check_interval seconds for time_limit seconds.
time_limit defaults to 1 hour
check_interval defaults to 1 minute
'''
now = time.time()
last_time = now + time_limit
while time.time() <= last_time:
if os.path.exists( filename ):
return True
else:
# Wait for check interval seconds, then check again.
time.sleep( check_interval )
return False
if __name__ == '__main__':
filename = '/the/file/Im/waiting/for.txt'
time_limit = 3600 # one hour from now.
check_interval = 60 # seconds between checking for the file.
if watch_file( filename, time_limit, check_interval ):
print "File present!"
else:
print "File not found after waiting:", time_limit, " seconds!"
For this sort of task, you need to use watchdog a library for listening to and monitoring system events.
One of the events it can monitor is file system events, via the FileSystemEventHandler class, which has on_created() method.
You'll end up writing a "wrapper" script, it can be running continuously. This script will use watchdog to listen on that particular directory. The moment a file is created, this script will be notified - you'll have to then check if the file created matches the pattern of the target file, and then execute your custom code.
Luckily, as this is a common task - there is a PatternMatchingEventHandler already available, which inherits from FileSystemEventHandler but watches for files matching a pattern.
Your wrapper script then becomes:
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler
class FileWatcher(PatternMatchingEventHandler):
patterns = ["*.dat"] # adjust as required
def process(self, event):
# your actual code goes here
# event.src_path will be the full file path
# event.event_type will be 'created', 'moved', etc.
print('{} observed on {}'.format(event.event_type, event.src_path))
def on_created(self, event):
self.process(event)
if __name__ == '__main__':
obs = Observer() # This is what manages running of your code
obs.schedule(FileWatcher(), path='/the/target/dir')
obs.start() # Start watching
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
ob.stop()
obs.join()
That is what comes to my mind first, pretty straight forward:
from time import sleep
counter = 0
working = True
while counter < 11 and working:
try:
# Open file and do whatever you need
working = False
except IOError:
counter +=1
sleep(5*60)
Better solution
from time import sleep
counter = 0
working = True
while counter < 11 and working:
if os.path.isfile('path/to/your/file')
# Open file and do whatever you need
working = False
else:
counter +=1
sleep(5*60)
In Python you can check if the file exists
import os.path
os.path.isfile(filename)
Then you set your cron to run every 5 minutes from 3am:
*/5 3 * * * /path-to-your/script.py
You can write in a simple file to control wether you already read the data from file or not (or a database if you are already using one)
You can use Twisted, and it is reactor it is much better than an infinite loop ! Also you can use reactor.callLater(myTime, myFunction), and when myFunction get called you can adjust the myTime and add another callback with the same API callLater().
I have written a small script to fetch instant stock prices.
#script to get stock data
from __future__ import print_function
import urllib
import lxml.html
from datetime import datetime
import sys
import time
stocks=["stock1","stock2","stock3","stock4","stock5"]
while True:
f=open('./out.txt', 'a+')
for x in stock:
url = "http://someurltofetchdata/"+x
code = urllib.urlopen(url).read()
html = lxml.html.fromstring(code)
result = html.xpath('//td[#class="LastValue"][position() = 1]')
result = [el.text_content() for el in result]
f.write(datetime.now().strftime("%Y-%m-%d %H:%M:%S") + ' ' + x + ' ' + result[0])
f.write("\n")
f.close()
I want that code to fetch data only on hours the stock market is open which means on trading hours. (09:00 to 12:30 and 13:30 to 17:30).
Could you please suggest a method to perform the scheduling implicitly on the code? (Not on the OS level)
If you cannot use cron (which is the simplest way to accomplish the task), you can add this to your code. It will download data if within given time range, sleep for 60 seconds and then run again.
while True:
now = datetime.now().strftime('%H%M')
if '0900' <= now <= '1230' or '1330' <= now <= '1730':
# your code starting with f=open('./out.txt', 'a+')
time.sleep(60)
Have a look at APScheduler
from apscheduler.scheduler import Scheduler
sched = Scheduler()
#sched.interval_schedule(hours=3)
def some_job():
print "Decorated job"
sched.configure(options_from_ini_file)
sched.start()
You can also specify a time.date
job = sched.add_date_job(my_job, datetime(2009, 11, 6, 16, 30, 5), ['text'])
Obviously you'll have to write some code to turn these on and off sched.start() sched.stop()at the relevant times , but then it will go and get the data as often as you have set on the decorator automatically. You could even schedule the schedule!
If you want to schedule this script on Windows, please use task schedule.
It has GUI to configuration, and pretty easy. For Linux, crontab will be better. And most important, you don't need to modify your code, and much stable for long term running.