How to increase the running time of send_action in pyTelegramBotApi? - python

How I would like to do: the action is dispatched, how it ends file is sent. But it turns out that the action lasts 5 seconds, and then it takes another 5 seconds to send the file, and this time the user does not understand whether the bot is frozen or the file is still being sent. How can I increase the duration of action before sending the file directly?
import telebot
...
def send_file(m: Message, file):
bot.send_chat_action(m.chat.id, action='upload_document')
bot.send_document(m.chat.id, file)

As Tibebes. M said this is not possible because all actions are sent via API. But threads helped me to solve the problem. The solution looks like this:
from threading import Thread
def send_action(id, ac):
bot.send_chat_action(id, action=ac)
def send_doc(id, f):
bot.send_document(id, f)
def send_file(m: Message):
file = open(...)
Thread(target=send_action, args=(m.chat.id, 'upload_document')).start()
Thread(target=send_doc, args=(m.chat.id, file)).start()
...
send_file(m)
Thus, it is possible to make it so that as soon as the action ends, the file is immediately sent without time gaps

Related

File atomicity with luigi python library

Do I need to worry about file atomicity in luigi with the following code, picking a dataframe and returning it as an output from a task? I don't get the atomicity part, as I would hope luigi would just wait for the task to complete writing a file before stating the task is complete.
class readSQLtoPickle(luigi.Task):
sql = luigi.Parameter()
pickle = luigi.Parameter()
def output(self):
return luigi.LocalTarget(self.pickle,format=format.Nop)
def run(self):
data = pd.read_sql(self.sql, ariel)
with self.output().open('w') as f:
pickle.dump(data, f)
class grabData(luigi.Task): # standard Luigi Task class
sql = luigi.Parameter(default="SELECT * FROM DIM_DRUG_PRODUCT")
pickle = luigi.Parameter(default="drug_product.pkl")
def requires(self):
# we need to read the log file before we can process it
return readSQLtoPickle(sql=self.sql, pickle=self.pickle)
def run(self):
with self.input().open('r') as f:
df = pickle.load(f)
print(type(df))
print(df.head(100))
print(len(df))
Writing to LocalTarget is atomic. Behind the scene lugi first writes to a temp file and then moves the temp file to your actual target. Look for atomic_file in the source code
I don't get the atomicity part, as I would hope luigi would just wait for the task to complete writing a file before stating the task is complete.
If you use a local scheduler to run your task (--local-scheduler) and have only one worker, then you should be fine.
It becomes a problem if you have several workers working on the same tasks and are trying to identity which tasks are now available to run.
In your example one worker could be trying to check if grabData is ready to run, and see that the file is available while another worker is in the middle of readSQLtoPickle writing on the file.

Discover what is blocking the event loop

I have thousands of asyncio tasks running.
Something is taking about 10 seconds to complete (some CPU intensive work).
This is making the program not work, as some tasks need to answer a message lets say in 5 seconds, on their network connection.
My current idea is to somehow intercept the event loop.
There must be some area in the asyncio module where it executes all current active tasks in an event loop, between each epoll()/select(). If I could insert a "elapsed = time.time()" before and "elapsed = time.time() - elapsed" after each task "resumed", I think it would be enough to find out the tasks that are taking too much time.
I think the related code may be here, at line 79:
https://github.com/python/cpython/blob/master/Lib/asyncio/events.py
def _run(self):
try:
self._context.run(self._callback, *self._args)
except (SystemExit, KeyboardInterrupt):
raise
except BaseException as exc:
cb = format_helpers._format_callback_source(
self._callback, self._args)
msg = f'Exception in callback {cb}'
context = {
'message': msg,
'exception': exc,
'handle': self,
}
if self._source_traceback:
context['source_traceback'] = self._source_traceback
self._loop.call_exception_handler(context)
self = None # Needed to break cycles when an exception occurs.
But I don't know what to do here to print any useful info; I need a way to identify what line of my code this "self._context.run(...)" will execute.
I have passed the last 5 sleepless months trying to fix my code and had no success yet.
I have tried to use CProfiler, line_profile, but none of them helped.
They tell me the time it takes to execute a function and the time spent on each line. What I need to find out is how much time the code is taken between each loop iteration.
All those profiling/debugging tools I tried gave me no clue what should be fixed. And after rewriting the same program about 15 times in different ways I still can't have it working.
I'm just a non-professional programmer and still a newbie in Python, but if I cant solve this problem the next step will be learning learning Rust, which itself will be a huge pain in the ass and probably 3 years after I started, I will have this thing working, which supposed to take no more than 2 months.
By the way, there is a built-in cool feature inside asyncio (you can see the code source: here) which tells you if there is a "blocking" function.
You just need to enable the debugging mode (good for load tests).
How to enable the debug mode - you can find here all the options how.
Just edited file /usr/lib/python3.7/asyncio/events.py and added:
import time
import signal
import traceback
START_TIME = 0
def handler(signum, frame):
print('##########', time.time() - START_TIME)
traceback.print_stack()
signal.signal(signal.SIGALRM, handler)
And on line 79:
def _run(self):
global START_TIME
try:
signal.alarm(3)
START_TIME = time.time()
self._context.run(self._callback, *self._args)
signal.alarm(0)
except Exception as exc:
cb = format_helpers._format_callback_source(
self._callback, self._args)
msg = f'Exception in callback {cb}'
context = {
'message': msg,
'exception': exc,
'handle': self,
}
if self._source_traceback:
context['source_traceback'] = self._source_traceback
self._loop.call_exception_handler(context)
self = None # Needed to break cycles when an exception occurs.
Now every time some asynchronous code block the event loop for 3 seconds it will show a message.
Found out my problem was with a simple "BeautifulSoup(page, 'html.parser')" where page was a 1mb html file with a big table.

libSpotifySDK : Timeout when loading playlist

I'm using libSpotify 12.1.51 (linux-libc6 x86_64) and pyspotify to make requests to spotify from python.
We have been using this code for a long time, but all of a sudden timeouts have started a couple of weeks ago. Every time I try to load a playlist, I get a timeout (I have tried with many playlists)
Here's some code that replicates the issue:
import spotify
import logging
import os
class SpotifyClient(object):
def __init__(self):
config = spotify.Config()
config.load_application_key_file(filename=os.path.join(os.path.dirname(os.path.abspath(__file__)), 'spotify_appkey.key'))
if spotify.session_instance:
self.session = spotify.session_instance
else:
self.session = spotify.Session(config=config)
if not self.session.connection_state == spotify.ConnectionState.LOGGED_IN:
self.session.login('OUR_USERNAME', 'OUR_PASSWORD')
while not self.session.user:
self.session.process_events()
def load_playlist(self, playlist_uri):
self.playlist = spotify.Link(playlist_uri).as_playlist()
self.playlist.load(timeout=20)
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
client = SpotifyClient()
client.load_playlist('spotify:user:melek136:playlist:32Gl8vkJmvJCHejGTEgM1t')
The playlist was just one I chose at random from the list of ones I am trying.
Here is what gets output to the console:
DEBUG:spotify.session:Notify main thread
DEBUG:spotify.session:Log message from Spotify: 16:18:40.516 E [ap:4172] ChannelError(0, 1, playlist)
DEBUG:spotify.session:Notify main thread
DEBUG:spotify.playlist:Playlist state changed
DEBUG:spotify.session:Notify main thread
DEBUG:spotify.session:Notify main thread
DEBUG:spotify.session:Notify main thread
DEBUG:spotify.session:Notify main thread
DEBUG:spotify.session:Notify main thread
DEBUG:spotify.session:Notify main thread
Traceback (most recent call last):
File "x.py", line 27, in <module>
client.load_playlist('spotify:user:melek136:playlist:32Gl8vkJmvJCHejGTEgM1t')
File "x.py", line 20, in load_playlist
self.playlist.load(timeout=20)
File "/home/entura/env/lib/python2.7/site-packages/spotify/playlist.py", line 103, in load
return utils.load(self, timeout=timeout)
File "/home/entura/env/lib/python2.7/site-packages/spotify/utils.py", line 222, in load
raise spotify.Timeout(timeout)
spotify.error.Timeout: Operation did not complete in 20.000s
libspotify itself doesn't have a timeout for loading playlists. Indeed, in a perfect storm of conditions (no local cache, large account, playlist service acting slow) it can take many minutes for playlists to load.
I'm not well versed in the Python bindings for libspotify, but the timeout is certainly introduced there. So, to fix it:
Increase the timeout value
Remove the timeout entirely (or, I guess, set it some crazy high number)
Sometimes the Spotify playlist service has a bad day and slows right down or goes down altogether. If the application you're making is a user-facing thing, you should just tell the user that the playlist is loading and leave it at the rather than erroring out.
As for the timing, well, it's possible that your cache is broken and it's causing libspotify to need more time to load playlists. Maybe the playlists you were loading were really close to the timeout and now they trigger it. Perhaps your libspotify connections are being load-balanced out to a Spotify server further away from your physical location than before. You can't affect any of these things, apart from deleting your cache.

'Listening' for a file in Python

I have a python script that does some updates on my database.
The files that this script needs are saved in a directory at around 3AM by some other process.
So I'm going to schedule a cron job to run daily at 3AM; but I want to handle the case if the file is not available exactly at 3AM, it could be delayed by some interval.
So I basically need to keep checking whether the file of some particular name exists every 5 minutes starting from 3AM. I'll try for around 1 hour, and give up if it doesn't work out.
How can I achieve this sort of thing in Python?
Try something like this (you'll need to change the print statements to be function calls if you are using Python 3).
#!/usr/bin/env python
import os
import time
def watch_file( filename, time_limit=3600, check_interval=60 ):
'''Return true if filename exists, if not keep checking once every check_interval seconds for time_limit seconds.
time_limit defaults to 1 hour
check_interval defaults to 1 minute
'''
now = time.time()
last_time = now + time_limit
while time.time() <= last_time:
if os.path.exists( filename ):
return True
else:
# Wait for check interval seconds, then check again.
time.sleep( check_interval )
return False
if __name__ == '__main__':
filename = '/the/file/Im/waiting/for.txt'
time_limit = 3600 # one hour from now.
check_interval = 60 # seconds between checking for the file.
if watch_file( filename, time_limit, check_interval ):
print "File present!"
else:
print "File not found after waiting:", time_limit, " seconds!"
For this sort of task, you need to use watchdog a library for listening to and monitoring system events.
One of the events it can monitor is file system events, via the FileSystemEventHandler class, which has on_created() method.
You'll end up writing a "wrapper" script, it can be running continuously. This script will use watchdog to listen on that particular directory. The moment a file is created, this script will be notified - you'll have to then check if the file created matches the pattern of the target file, and then execute your custom code.
Luckily, as this is a common task - there is a PatternMatchingEventHandler already available, which inherits from FileSystemEventHandler but watches for files matching a pattern.
Your wrapper script then becomes:
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler
class FileWatcher(PatternMatchingEventHandler):
patterns = ["*.dat"] # adjust as required
def process(self, event):
# your actual code goes here
# event.src_path will be the full file path
# event.event_type will be 'created', 'moved', etc.
print('{} observed on {}'.format(event.event_type, event.src_path))
def on_created(self, event):
self.process(event)
if __name__ == '__main__':
obs = Observer() # This is what manages running of your code
obs.schedule(FileWatcher(), path='/the/target/dir')
obs.start() # Start watching
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
ob.stop()
obs.join()
That is what comes to my mind first, pretty straight forward:
from time import sleep
counter = 0
working = True
while counter < 11 and working:
try:
# Open file and do whatever you need
working = False
except IOError:
counter +=1
sleep(5*60)
Better solution
from time import sleep
counter = 0
working = True
while counter < 11 and working:
if os.path.isfile('path/to/your/file')
# Open file and do whatever you need
working = False
else:
counter +=1
sleep(5*60)
In Python you can check if the file exists
import os.path
os.path.isfile(filename)
Then you set your cron to run every 5 minutes from 3am:
*/5 3 * * * /path-to-your/script.py
You can write in a simple file to control wether you already read the data from file or not (or a database if you are already using one)
You can use Twisted, and it is reactor it is much better than an infinite loop ! Also you can use reactor.callLater(myTime, myFunction), and when myFunction get called you can adjust the myTime and add another callback with the same API callLater().

Read from a log file as it's being written using python

I'm trying to find a nice way to read a log file in real time using python. I'd like to process lines from a log file one at a time as it is written. Somehow I need to keep trying to read the file until it is created and then continue to process lines until I terminate the process. Is there an appropriate way to do this? Thanks.
Take a look at this PDF starting at page 38, ~slide I-77 and you'll find all the info you need. Of course the rest of the slides are amazing, too, but those specifically deal with your issue:
import time
def follow(thefile):
thefile.seek(0,2) # Go to the end of the file
while True:
line = thefile.readline()
if not line:
time.sleep(0.1) # Sleep briefly
continue
yield line
You could try with something like this:
import time
while 1:
where = file.tell()
line = file.readline()
if not line:
time.sleep(1)
file.seek(where)
else:
print line, # already has newline
Example was extracted from here.
As this is Python and logging tagged, there is another possibility to do this.
I assume this is based on a Python logger, logging.Handler based.
You can just create a class that gets the (named) logger instance and overwrite the emit function to put it onto a GUI (if you need console just add a console handler to the file handler)
Example:
import logging
class log_viewer(logging.Handler):
""" Class to redistribute python logging data """
# have a class member to store the existing logger
logger_instance = logging.getLogger("SomeNameOfYourExistingLogger")
def __init__(self, *args, **kwargs):
# Initialize the Handler
logging.Handler.__init__(self, *args)
# optional take format
# setFormatter function is derived from logging.Handler
for key, value in kwargs.items():
if "{}".format(key) == "format":
self.setFormatter(value)
# make the logger send data to this class
self.logger_instance.addHandler(self)
def emit(self, record):
""" Overload of logging.Handler method """
record = self.format(record)
# ---------------------------------------
# Now you can send it to a GUI or similar
# "Do work" starts here.
# ---------------------------------------
# just as an example what e.g. a console
# handler would do:
print(record)
I am currently using similar code to add a TkinterTreectrl.Multilistbox for viewing logger output at runtime.
Off-Side: The logger only gets data as soon as it is initialized, so if you want to have all your data available, you need to initialize it at the very beginning. (I know this is what is expected, but I think it is worth being mentioned.)
Maybe you could do a system call to
tail -f
using os.system()

Categories

Resources