Script from cron sends >50 mails on errors, 1 on success - python

I have an issue which I really cannot figure out.
The following snippets from my Python script zips a directory and sends a mail on success. It also sends a mail if an error occured. And here is the issue:
When I execute the script manually, everything works fine.
1 mail on success, 1 mail if an error occured.
If the script is run from cron though, I reveive over 50 emails if an error occures (on success only one)! All mails have the same content (the error message), and all mails are sent at the same time (exact as "hh:mm").
This is the script snippet:
def backup(pathMedia, pathZipMedia):
[...]
try:
createArchive(pathMedia, pathZipMedia)
except Exception as e:
sendMail('Error in zipping the media dir: ' + str(e))
sys.exit()
sendMail('Backup successfully created!')
def sendMail(msg):
sent = 0
SMTPserver = '[...]'
sender = '[...]'
destination = ['...']
USERNAME = '[...]'
PASSWORD = '[...]'
text_subtype = 'plain'
subject='Backup notification'
content=msg
try:
msg = MIMEText(content, text_subtype)
msg['Subject'] = subject
msg['From'] = sender
conn = SMTP(SMTPserver)
conn.set_debuglevel(False)
conn.login(USERNAME, PASSWORD)
try:
if (sent == 0):
conn.sendmail(sender, destination, msg.as_string())
sent = 1
finally:
conn.quit()
except Exception as e:
sys.exit()
My crontab is the following:
## run the backup script every 3 days at 4am
* 4 */3 * * /root/backup.py >/dev/null 2>&1
I fixed the orrucring errors now, but it still might happen again.
And I'm really curious about why this issue occurs!
Thanks!

The * at the beginning of your crontab line says "run this job every minute".
Presumably a successful run of the first job at 4:00 causes the following 59 runs to find that no work needs to be done, therefore they don't attempt to create a backup and they exit quietly without sending email. But an unsuccessful run at 4:00 will leave work to be done by the next job at 4:01, and again the minute after that, and so on until 4:59. All of those jobs try to create a backup and all of them fail, so you get something like 60 failure emails. (Or fewer if one of the jobs manages to succeed, breaking the chain of failures.)
To fix the crontab line to run the job only one time at 4:00am, change the first * to a 0.
I don't know why your failure emails all have the same timestamp. Are you certain that they're all exactly the same? If so, perhaps they're being batched by your mail system and are assigned a Date header at the time the batch is processed. Or perhaps all of the jobs are started by cron and then they all wait, blocked until some system timeout or other event occurs, and then they all experience the failure simultaneously and all send emails at the same time.

Related

Google pubsub late acknowledgement

I have an app deployed on GKE, separated in different microservices.
One of the microservices, let's call it "worker", receives tasks to execute from pubsub messages.
The tasks can take up to 1 hour to be executed. The regular acknowledgement deadline for Google pubsub messages being pretty short, we renew the deadline every 10s before it ends. Here is the piece of code responsible for that:
def watchdog(businessDoneEvent, subscription, ack_deadline, message, ack_id):
'''
Prevents message from being republished as long as computation is
running
'''
while True:
# Wait (defaultDeadline - 10) seconds before renewing if defaultDeadline
# is > 5 seconds; renewed every second otherwise
sleepTime = ack_deadline - 10 if ack_deadline > 10 else 1
startTime = time.time()
while time.time() - startTime < sleepTime:
LOGGER.info('Sleeping time: {} - ack_deadline: {}'.format(time.time() - startTime, ack_deadline))
if businessDoneEvent.isSet():
LOGGER.info('Business done!')
return
time.sleep(1)
subscriber = SubscriberClient()
LOGGER.info('Modifying ack deadline for message ' +
str(message.data) + ' processing to ' +
str(ack_deadline))
subscriber.modify_ack_deadline(subscription, [ack_id],
ack_deadline)
Once the execution is over, we reach this piece of code:
def callbackWrapper(callback,
subscription,
message,
ack_id,
endpoint,
context,
subscriber,
postAcknowledgmentCallback=None):
'''
Pub/sub message acknowledgment if everything ran correctly
'''
try:
callback(message.data, endpoint, context, **message.attributes)
except Exception as e:
LOGGER.info(message.data)
LOGGER.error(traceback.format_exc())
raise e
else:
LOGGER.info("Trying to acknowledge...")
my_retry = Retry(predicate=if_exception_type(ServiceUnavailable), deadline=3600)
subscriber.acknowledge(subscription, [ack_id], retry=my_retry)
LOGGER.info(str(ack_id) + ' has been acknowledged')
if postAcknowledgmentCallback is not None:
postAcknowledgmentCallback(message.data,
**message.attributes)
Note that we also use this code in most of our microservices and it works just fine.
My problem is, even though I do not get any error from this code and it seems that the acknowledgement request is sent properly, it is actually acknowledged later. For example, according to the GCP console, right now I have 8 unacknowledged messages, but I should only have 3. It also said there are 12 when I should only have 5 for an hour:
I have a horizontal pod autoscaler using the pubsub metric. When the pods are done, they are not scaled down, or only 1 hour later or more. This creates some useless costs that I would like to avoid.
Does anyone have an idea about why this is happening?

Checking FTP connection is valid using NOOP command

I'm having trouble with one of my scripts seemingly disconnecting from my FTP during long batches of jobs. To counter this, I've attempted to make a module as shown below:
def connect_ftp(ftp):
print "ftp1"
starttime = time.time()
retry = False
try:
ftp.voidcmd("NOOP")
print "ftp2"
except:
retry = True
print "ftp3"
print "ftp4"
while (retry):
try:
print "ftp5"
ftp.connect()
ftp.login('LOGIN', 'CENSORED')
print "ftp6"
retry = False
print "ftp7"
except IOError as e:
print "ftp8"
retry = True
sys.stdout.write("\rTime disconnected - "+str(time.time()-starttime))
sys.stdout.flush()
print "ftp9"
I call the function using only:
ftp = ftplib.FTP('CENSORED')
connect_ftp(ftp)
However, I've traced how the code runs using print lines, and on the first use of the module (before the FTP is even connected to) my script runs ftp.voidcmd("NOOP") and does not except it, so no attempt is made to connect to the FTP initially.
The output is:
ftp1
ftp2
ftp4
ftp success #this is ran after the module is called
I admit my code isn't the best or prettiest, and I haven't implemented anything yet to make sure I'm not reconnecting constantly if I keep failing to reconnect, but I can't work out why this isn't working for the life of me so I don't see a point in expanding the module yet. Is this even the best approach for connecting/reconnecting to an FTP?
Thank you in advance
This connects to the server:
ftp = ftplib.FTP('CENSORED')
So, naturally the NOOP command succeeds, as it does not need an authenticated connection.
Your connect_ftp is correct, except that you need to specify a hostname in your connect call.

Telnet connection to TS3 ServerQuery keeps getting slower and slower

I wrote a bot for TeamSpeak 3 that runs over ServerQuery (a telnet interface).
But the bot keeps responding later and later, in the beginning it takes like 0.1 sec, after like 1 minute the bot takes about 10 seconds to respond, and using commands makes it even faster.
Any idea why?
So basically the telnet interface sends data from the TS3 Server to my python script, the ts3 module recieves and processes the data, then the script will make a decision of what the action will be.
As modules I am using MySQLdb and ts3(https://github.com/benediktschmitt/py-ts3)
My sourcecode is here: https://pastebin.com/cJuyB9ZH
Another script, which just takes all clients and pushes them into a database every 5 min, runs multiple days without any issues.
I checked the code multiple times now and even deleted variables right after they have been used, but it still has the same issue.
My guess would be that is sortof clogges the RAM, so I looked through the code multiple times, but couldn't find out why or where.
Sidenote: I know I sometimes call a commit() when its totally not necessary, but I don't know if that might cause problems, but I dont see how.
Short(er) version of my code:
import ts3
import MySQLdb
# Some other imports like time and threading and such
## Connect to TS3
tsConn = ts3.query.TS3Connection(tsAddr, tsPort)
try:
tsConn.login(client_login_name=tsUser, client_login_password=tsPass)
tsConn.use(sid=tsSID, virtual=True)
print(" ==>> CONNECTED TO TS3 SERVER: " + tsAddr)
except ts3.query.TS3QueryError as e:
print("Login to TS Server failed! Aborting...")
exit(1)
## Connect to mySQL
try:
qConn = MySQLdb.connect(host=qHost, user=qUser, passwd=qPass, db=qDB)
qServer = qConn.cursor()
print(" ==>> CONNECTED TO mySQL SERVER: " + qHost)
except OperationalError:
print("Cannot connect to mySQL Database! Aborting...")
exit(1)
running = True
while running:
tsConn.send_keepalive()
qServer.execute("SELECT 1") # keepalive
try:
e = tsConn.wait_for_event(timeout=1)
except TS3TimeoutError:
pass
else:
try:
# <some command processing here>
except KeyError:
try:
if event[0]["reasonid"] == "0":
tsConn.sendtextmessage(targetmode=1, target=event[0]["clid"], msg=greetingmsg.format(event[0]["client_nickname"]))
except:
pass

Python Imap4 server logout issue

I wrote a python script to check my email for every 10 seconds. Its working fine in my system. but througing some errors in another system.
whats this error?
My python program
#!/usr/bin/env python
import imaplib , os ,time,sys
from subprocess import call
if len(sys.argv)<4:
print "\n\nEnter required credentials in following format..\n\n"
print "python mailcheck.py <email_id> <PASSWORD> <time>\n\n"
sys.exit()
USERNAME = str(sys.argv[1])
PASSWORD = str(sys.argv[2])
MAIL_CHECK_FREQ = int(sys.argv[3])
while True:
obj = imaplib.IMAP4_SSL('imap.mail.yahoo.com','993')
obj.login(USERNAME,PASSWORD)
obj.select('INBOX')
status, response = obj.status('INBOX', "(UNSEEN)")
unreadcount = int(response[0].split()[2].strip(').,]'))
if unreadcount > 0:
call(["zenity","--info","--title='New Mail'","--text='Check your mail'"])
time.sleep(MAIL_CHECK_FREQ)
Errors is this.
How can I fix this. help me guys.
The remote server disconnected you. Yahoo has rate limiting: don't check every 10 seconds! Try every 5 minutes, at most.

paramiko ssh client does not work with HP switches

I've been using my script for a unix server and it's working perfectly. However when i use the same script( with some minor command changes) to connect to HP Procurve switches , script crashes with error. Part of the script is below:
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(address, username=userna, password=passwd)
stdin,stdout,stderr= ssh.exec_command("show ver")
for line in stdout:
print '... ' + line.strip('\n')
ssh.close()
This gives error
Traceback (most recent call last):
File "C:/Users/kucar/Desktop/my_python/switchmodel", line 34, in <module>
stdin,stdout,stderr= ssh.exec_command("show ver")
File "C:\Python27\lib\site-packages\paramiko\client.py", line 379, in exec_command
chan.exec_command(command)
File "C:\Python27\lib\site-packages\paramiko\channel.py", line 218, in exec_command
self._wait_for_event()
File "C:\Python27\lib\site-packages\paramiko\channel.py", line 1122, in _wait_for_event
raise e
SSHException: Channel closed.
I've found similar complaints in the web however seems like solution is not provided at all. Switch is open to ssh and works fine with putty. Appreciate if you give any ideas that could help me. I cannot do "show ver" command manually for 100 switches.
As #dobbo mentioned above you have to do invoke_shell() on the channel so that you can execute multiple commands. Also HP ProCurve has ANSI Escape Codes in the output so you have to strip those out. Finally, HP ProCurve throws up a "Press any key to continue" message which you have to get past at least on some devices.
I have an HP ProCurve handler in this library https://github.com/ktbyers/netmiko
Set device_type to "hp_procurve".
Exscript also has some sort of a ProCurve handler though I haven't dug into it enough to get it to work.
I had the same experience connecting to my Samsung s4 phone with an ssh server.
I had no problem connecting to a SUSE VM or a Rasperry Pi and also tried MobaXterm (putty is SO last week).
I have not found the answer but will share my research.
I had a look at the source and found line 1122 in channel.py (copied below).
With my phone (and possibly your HP switch) I have noticed that there is no login message or MOTD at all and when exiting (with putty/mobaXterm) the session doesn't end properly.
In some other reading, I have found that the parameko is not getting much support from the author any more but others are working to port it to python 3x.
Here is the source code I found.
def _wait_for_send_window(self, size):
"""
(You are already holding the lock.)
Wait for the send window to open up, and allocate up to C{size} bytes
for transmission. If no space opens up before the timeout, a timeout
exception is raised. Returns the number of bytes available to send
(may be less than requested).
"""
# you are already holding the lock
if self.closed or self.eof_sent:
return 0
if self.out_window_size == 0:
# should we block?
if self.timeout == 0.0:
raise socket.timeout()
# loop here in case we get woken up but a different thread has filled the buffer
timeout = self.timeout
while self.out_window_size == 0:
if self.closed or self.eof_sent:
return 0
then = time.time()
self.out_buffer_cv.wait(timeout)
if timeout != None:
timeout -= time.time() - then
if timeout <= 0.0:
raise socket.timeout()
# we have some window to squeeze into
It seems that if you don't clean up the connection buffer Paramiko goes nuts when working with HP Procurves. First off you need to invoke a shell or Paramiko will simply drop the connection after the first command (normal behavior, but confusing).
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(switch_ip, username=switch_user, password=switch_pass,look_for_keys=False)
conn = ssh.invoke_shell()
recieveData() # <-- see below
It's important to actually handle the data, and as I've learned you need to make sure Paramiko has actually received all the data before you ask it to do stuff with it. I do this by using the following function. You can adjust the sleep as needed, in some cases 0.050 will work fine.
def recieveData():
tCheck = 0
while not conn.recv_ready():
time.sleep(1)
tCheck+=1
if tCheck >=10:
print "time out"
cleanThatStuffUp(conn.recv(1024)) # <-- see below
This is an example of the garbage that is returning to your ssh client.
[1;24r[24;1H[24;1H[2K[24;1H[?25h[24;1H[24;1HProCurve Switch 2650# [24;1H[24;23H[24;1H[?5h[24;23H[24;23Hconfigure[24;23H[?25h[24;32H[24;0HE[24;1H[24;32H[24;1H[2K[24;1H[?5h[24;1H[1;24r[24;1H[1;24r[24;1H[24;1H[2K[24;1H[?25h[24;1H[24;1H
There's also exit codes to deal with before each "[". So to deal with that I figured out some regex to clean all of that "stuff" up.
procurve_re1 = re.compile(r'(\[\d+[HKJ])|(\[\?\d+[hl])|(\[\d+)|(\;\d+\w?)')
procurve_re2 = re.compile(r'([E]\b)')
procurve_re3 = re.compile(ur'[\u001B]+') #remove stupid escapes
def cleanThatStuffUp(message):
message = procurve_re1.sub("", message)
message = procurve_re2.sub("", message)
message = procurve_re3.sub("", message)
print message
Now you can go about entering commands, just make sure you clear out the buffer each time using recieveData().
conn.send("\n") # Get past "Press any key"
recieveData()

Categories

Resources