Python uploading to ftplib.FTP (we think) resulting in EOFError - python

Python newbie here.
We have a script that downloads data from an API (in the form of JSON), converts it into a CSV and uploads to FTP using ftplib.FTP(); script below:
session = ftplib.FTP('ftpserver','username','password')
file = open('filename.csv','rb')
session.storbinary('STOR filename.csv', file)
file.close()
session.quit()
This whole process is set to occur every 11 minutes using:
schedule.every(11).minutes.do(download_json_from_api)
while True:
schedule.run_pending()
time.sleep(1)
The script runs no problem until every now and then (no pattern or set time) we receives an EOFError that would look like this:
A quick Google suggested this may have something to do with the FTP server connection dropping out; would that be correct? What can I add to my existing script to prevent this from happening (eg can I write something to immediately initiate new connection every time the connection gets terminated? OR something that will tell Python to close the current console and run the script again in a new console (because that seems to fix the issue)?)?
Guidance is much appreciated!
Thank you.

Related

Pyngrok to getting connecting continuously

I have just started using ngrok, and while using the standard procedure, I can start the tunnel using ./ngrok tcp 22 and see that tunnel open in my dashboard,
But I would like to use pyngrok, and here when I use:
from pyngrok.conf import PyngrokConfig
from pyngrok import ngrok
ngrok.set_auth_token("<NGROK_AUTH_TOKEN>")
pyngrok_config = PyngrokConfig(config_path="/opt/ngrok/ngrok.yml")
ngrok.get_tunnels(pyngrok_config=pyngrok_config)
ssh_url = ngrok.connect()
It connects and generates a tunnel, but I can't see anything open in the dashboard, why?
Maybe because the python script executes and generates URL and then stops and comes out of it, but then how to make it keep running, or how to even start a tunnel using python or even API ? Please suggest the correct script, using python or API?
The thread with the ngrok tunnel will terminate as soon as the Python process terminates. So you are correct, the reason this is happening is because your script is not long lived. The easiest way to accomplish this is by following the example in the documentation.
Another issue is how you're setting the authtoken. Since you're not using the default config_path, you need to set this before setting the authtoken so it gets updated in the correct file (you'd also need to pass it to connect()). There are a couple ways to do this, but the easiest way from the docs is to just update the default config (since that's what will be used if you don't pass a pyngrok_config to any future method calls).
I also see that you're response variable is ssh_url, so you probably want to start a TCP tunnel to a port other than 80 (the default)—perhaps you've configured this in your ngrok.yml, but if not, I've updated the call to connect() to ensure this is the type of tunnel started for you and in case others try to use this same code snippet.
Full disclosure, I am the developer of pyngrok. Here is your code snippet updated with my changes.
import os, time
from pyngrok.conf import PyngrokConfig
from pyngrok import ngrok, conf
conf.get_default().config_path = "/opt/ngrok/ngrok.yml"
ngrok.set_auth_token(os.environ.get("NGROK_AUTH_TOKEN"))
ssh_tunnel = ngrok.connect(22, "tcp")
ngrok_process = ngrok.get_ngrok_process()
try:
# Block until CTRL-C or some other terminating event
ngrok_process.proc.wait()
except KeyboardInterrupt:
print(" Shutting down server.")
ngrok.kill()

Keeping ports open in a python script that is run continuously

I'm trying to develop a server script using python 3.4 that runs perpetually and responds to client requests on up to 5 separate ports. My preferred platform is Debian 8.0. The platform currently runs on a virtual machine in the cloud. My script works fine when I run it off the command line - I need to now (1) keep it running once I log off the server and (2) keep several ports open through the script so that a windows client can connect to them.
For (1),
After trying several options [I tried using upstart, added the script to rc.local, used nohup with & to run it off the terminal, etc] that didn't seem to work, I eventually found something that does seem to keep the script running, even if it's not very elegant - I wrote an hourly cron script that checks to see if the script is running in the process list, and if not, to execute it.
Whenever I login to the VM now, I see the following output when I type 'ps -ef':
root 22007 21992 98 Nov10 14-12:52:59 /usr/bin/python3.4 /home/userxyz/cronserver.py
I assume that the script is running based on the fact that there is an active process in the system. I mention this part because I suspect that there could be a correlation with part (2) of my issue.
For (2),
The script is supposed to open ports 49100 - 49105 and listen for connection requests, etc. When I run the script from the terminal, zenmap from my client machine verifies that these ports are open. However, when the cron job initiates the script, these ports don't seem to stay open. My windows client program can't connect to the script either.
The python code I use for listening to a port:
f = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
f.bind((serviceIP, 49101))
f.listen(5)
while True:
scName, address = f.accept()
[code to handle request]
scName.shutdown(socket.SHUT_WR)
scName.close()
Any insight or assistance would be greatly appreciated!
What you ask is not easy because it depends on a variety of factors:
What is the frequency of the data received?
How many clients are expected to connect to this server?
Is there a chance two clients try to connect at the same time?
How long it takes to handle some received data?
What do you need to do with your data?
Write to a database?
Write to a file?
Calculate something?
Etc.
Depending on your answer you'll have some design decisions to make for your solution.
But since you need an answer, here's a hack that represent a way to do things:
import socketserver
import threading
import datetime
class SleepyGaryReceptionHandler(socketserver.BaseRequestHandler):
log_file_name = "/tmp/sleepygaryserver.log"
def handle(self):
# self.request is defined in BaseRequestHandler
data_received = self.request.recv(1024)
# self.client_address is also defined in BaseRequestHandler
sender_address = self.client_address[0]
# This is where you are supposed to do something with your data
# This is an example
self.write_to_log('Someone from {} sent us "{}"'.format(sender_address,
data_received))
# A way to stop the server from going on forever
# But you could do this other ways but it depends what condition
# should cause the shutdown
if data_received.startswith(b"QUIT"):
finishing_thread = threading.Thread(target=self.finish_in_another_thread)
finishing_thread.start()
# This will be called in another thread to terminate the server
# self.server is also defined in BaseRequestHandler
def finish_in_another_thread(self):
self.write_to_log("Shutting down the server")
self.server.shutdown()
# Write something (with a timestamp) to a text file so that we
# know something is happenning
def write_to_log(self, message):
timestamp = datetime.datetime.now()
timestamp_text = timestamp.isoformat(sep=' ', timespec='seconds')
with open(self.log_file_name, mode='a') as log_file:
log_file.write("{}: {}\n".format(timestamp_text, message))
service_address = "localhost"
port_number = 49101
server = socketserver.TCPServer((service_address, port_number),
SleepyGaryReceptionHandler)
server.serve_forever()
I'm using here the socketserver module instead of listening directly at a socket. This standard library module has been written to simplify writing a server. so use it!
All I do here is write to a text file what has been received. You would have to adapt it to your use.
But to have it running continuously use a cron job but to start it at the startup of the computer. Since this script will block until the server is stopped, we have to run it in the background. It would look something like that:
#reboot /usr/bin/python3 /home/sleepygary/sleppys_server.py &
I have tested it and after 5 hours it still does his thing.
Now like I said, it is a hack. If you want to go all the way and do things like any other services on your computer you have to program it in a certain way. You can find more information on this page: https://www.freedesktop.org/software/systemd/man/daemon.html
I'm really tired so there may be some errors here and there.

can't get commands on remote host after fixed amount of send commands

I have a program with 2 threads. Every thread sends different commands to remote host and redirect output to file. Threads use different remote hosts. I've created a connection with pxssh and trying to send commands to remote hosts with 'sendline':
s = pxssh.pxssh()
try:
s.login (ip, user, pswd)
except:
logging.error("login: error")
return
logging.debug("login: success")
s.sendline("ls / >> tmpfile.log")
s.prompt()
I can send fixed number of commands (about 500 commands on every host) and after that 'sendline' stops working. Connection is ok, but I can't get commands on remote hosts. It looks like some resources run out... what can it be?
Reposting as an answer, since it solved the issue:
Are you reading in between each write? If the host is producing output and you're not reading it, sooner or later a buffer will fill up and it will block until there's room to write some more. Make sure that before each write, you read any data that's available in the terminal, even if you don't want to do anything with it.
If you really don't care about the output at all, you could create a thread that constantly reads in a loop, so that your main thread can skip reading altogether. If your code needs to do anything with any part of the output, though, don't do this.

quick download files using python

I am currently trying to download files from more than 800,000 urls. Each url represent on .txt file.
I am using dataframe to store all the
url information:
index Filename
4 .../data/1000015/0001104659-05-006777.txt
5 .../data/1000015/0000950123-05-003812.txt
......
Code:
for i in m.index:
download = 'ftp:/.../' + m['Filename'][i]
print download
urllib.urlretrieve(download, '%s''%s.txt' % (m['Co_name'][i], m['Date'][i]))
This method works, however, the speed is quite low, it downloads 15 files in 7 minutes. Considering I have more than 800,000 files. It is more than 9 month...So I was wondering could anyone help me improve this? Thank you so much.
After Some really helpful comments, I made some changes, Is the following a good way to do multiprocessing?
Code:
def download(file):
import ftplib
ftp = ftplib.FTP('XXXX')
ftp.login()
for i in m.index:
a = m['Filename'][i]
local_file = os.path.join("local_folder", '%s %s.txt' % (m['Co_name'][i], m['Data'][i]))
fhandle = open(local_file,'wb')
print fhandle
ftp.retrbinary('RETR '+a, fhandle.write)
fhandle.close()
m=pd.read_csv('XXXX.csv', delimiter=',', index_col='index')
pool = Pool(10)
pool.map(download, m)
This way, you establish a new connection for every file. This means you lose a few seconds for every file where nothing is downloaded.
You can reduce this by using ftplib (https://docs.python.org/2/library/ftplib.html), which allows to establish a single connection and retrieve one file by one over this connection.
Still, there is time where no data is transferred. To use the maximum bandwith, use threads, to downloads several files in parallel. But note that some servers limit the numbers of parallel connections.
However, the time overhead should not exceed a few seconds, lets say 5 in worst case. Then, about 25s for a 100kB file is very very slow.
I guess your connection is very slow, or the server is. If FTP is not the standard way, may be the FTP server of your main frame is shut down when a connection is terminated and started when a connection is established? Then, FTPlib should help.
Still, an overhead of half a second means 400.000 seconds of waiting. So, downloading in parallel makes sense.
May be, you first try a FTP client like filezilla and check what bandwith is possible with it.

Streaming of a log text file with constant updates

I made a program which is saving sensor data in a log file (server site).
This file is stored in a temporary directory (Ram-Disk).
Each line contains a timestamp and a JSON string.
The update rate is dependent on the sensor data, but the fastest is every 0.5s.
What I want to do is, to stream every update in this file to a client application.
I have several approaches in mind:
maybe a shared folder on server site (samba) with a script (client site), just checking the file every 0.5s
maybe a another server program running on the server, checking for updates (but this I don't want to do, because Raspberry Pi is slow)
Has anyone maybe done something like this before and can share some ideas? Is there maybe a python module for this already (which opens a file like a stream and if something changed then this stream is giving it out)? Is it smart to check a file constantly for updates?
To stream the log file to an application you can use
tail -n 1000000 -f | application
(This will continuously check the file for new lines and then stream them to the application, then hang again until new lines are present.)
But this will of course put load on your server as the querying whether there are new lines or not will be relayed to the Raspberry Pi to execute it. A small program (written in C, with a decent sleep) on the server itself might in fact put less load on it than querying for new lines via the network.
I'm doing something like that.
I have a server running on my raspberry pi + client that parse the output of the server and sends it to another server on the web.
What I'm doing is that the local server program write it's data in chunks.
Every time it writes the data (by the way, also on tmpfs) it writes it on a different file, so I don't get errors when trying to parse the file while something else is writing to that file..
After it writes the file, it starts the client program in order to parse and send the data (Using subprocess with the name of the file as a parameter).
Works great for me.

Categories

Resources