Why is my Image losing data through my Python FTP program?

Why is my Image losing data through my Python FTP program? - python

My Python program sends an image to my web server through FTP, but occasionally upon arrival, partial data is lost from the transferred image. The program takes a screenshot every x amount of seconds and then uploads the image to the web server.
My web hosting provider thinks it must be coming from the Python program itself, so please let me know what I'm doing wrong to cause this issue.
Image(What it looks like when pulled from the web server):
Code:
def ftp(self): # Screen Grab and FTP Transfer
new = ImageGrab.grab(bbox=(0, 50, 1366, 720))
new = new.resize((1366, 700), PIL.Image.ANTIALIAS)
new.save("C:\\Users\\Owner\\Desktop\\screenshots\\capture.jpg")
newOpen = PIL.Image.open("C:\\Users\\user\\Desktop\\screenshots\\capture.jpg")
newOpen.save("C:\\Users\\Owner\\Desktop\\screenshots\\capture.jpg", format="JPEG", quality=40)
tries = 10 # Denotes maximum try limit for retry attempts
for i in range(tries):
try:
# FTP image to Web Server
session = ftplib.FTP('server', 'user', 'pass')
file = open('C:\\Users\\Owner\\Desktop\\screenshots\\capture.jpg', 'rb') # file to send
session.storbinary('STOR capture.jpg', file) # send the file
file.close() # close file and FTP
session.quit()
value = "Updated. \nFailed " + str(i) + " Times\n" + str(self.tick)
print value
self.tick += 1
except KeyError as e:
if i < tries - 1: # i is zero indexed
continue
else:
raise
break
threading.Timer(5, self.ftp).start()

So, the actual reason for this was due to me opening the FTP image from the web server before the actual transfer was complete. My solution was to add a PHP filter to my web server to only pull the image when it was above a specific size as to avoid the premature viewing of the file before the whole transfer completed.
It works perfectly now and I am glad that the problem was a simple programmatic fix

Related

Store file in binary transfer mode with ftplib in python does not finish [duplicate]

I am trying to upload a file to an FTP site using FTPS, but when I attempt to store the file, it just hangs after the file is fully transferred.
global f_blocksize
global total_size
global size_written
f_blocksize = 1024
total_size = os.path.getsize(file_path)
size_written = 0
file = open(file_path, "rb")
try:
ftps = FTP_TLS("ftp.example.com")
ftps.auth()
ftps.sendcmd("USER username")
ftps.sendcmd("PASS password")
ftps.prot_p()
print(ftps.getwelcome())
try:
print("File transfer started...")
ftps.storbinary("STOR myfile.txt", file, callback=handle, blocksize=f_blocksize)
print("File transfer complete!")
except OSError as ex:
print(ftps.getresp())
except Exception as ex:
print("FTP transfer failed.")
print("%s: %s" %(type(ex), str(ex)))
def handle(block):
global size_written
global total_size
global f_blocksize
size_written = size_written + f_blocksize if size_written + f_blocksize < total_size else total_size
percent_complete = size_written / total_size * 100
print("%s percent complete" %str(percent_complete))
I get the following output:
220 Microsoft FTP Service
File transfer started...
3.5648389904264577 percent complete
7.129677980852915 percent complete
10.694516971279374 percent complete
14.25935596170583 percent complete
17.824194952132288 percent complete
21.389033942558747 percent complete
24.953872932985206 percent complete
28.51871192341166 percent complete
32.083550913838124 percent complete
35.648389904264576 percent complete
39.213228894691035 percent complete
42.778067885117494 percent complete
46.342906875543946 percent complete
49.90774586597041 percent complete
53.472584856396864 percent complete
57.03742384682332 percent complete
60.60226283724979 percent complete
64.16710182767625 percent complete
67.7319408181027 percent complete
71.29677980852915 percent complete
74.8616187989556 percent complete
78.42645778938207 percent complete
81.99129677980854 percent complete
85.55613577023499 percent complete
89.12097476066144 percent complete
92.68581375108789 percent complete
96.25065274151436 percent complete
99.81549173194082 percent complete
100.0 percent complete
After which there is no further progress until the connection times out...
FTP transfer failed.
<class 'ftplib.error_temp'>: 425 Data channel timed out due to not meeting the minimum bandwidth requirement.
While the program is running I can see an empty myfile.txt in the FTP site if I connect and look manually, but when I either cancel it or the connection times out, this empty file disappears.
Is there something I'm missing that I need to invoke to close the file after it has been completely transferred?

This appears to be an issue with Python's SSLSocket class, which is waiting for data from the server when running unwrap. Since it never receives this data from the server, it is unable to unwrap SSL from the socket and therefore times out.
This server in particular I have identified by the welcome message as some Microsoft FTP server, which fits in well with the issue written about in this blog
The "fix" (if you can call it that) was to stop the SSLSocket from attempting to unwrap the connection altogether by editing ftplib.py and amending the FTP_TLS.storbinary() method.
def storbinary(self, cmd, fp, blocksize=8192, callback=None, rest=None):
self.voidcmd('TYPE I')
with self.transfercmd(cmd, rest) as conn:
while 1:
buf = fp.read(blocksize)
if not buf: break
conn.sendall(buf)
if callback: callback(buf)
# shutdown ssl layer
if isinstance(conn, ssl.SSLSocket):
# HACK: Instead of attempting unwrap the connection, pass here
pass
return self.voidresp()

I faced this issue on STORBINARY function when using python's ftplib.FTP_TLS, prot_p and Microsoft FTP server.
Example:
ftps = FTP_TLS(host,username,password)
ftps.prot_p
STORBINARY...
The error indicated timeout on the unwrap function.
It is related to the following issues:
https://www.sami-lehtinen.net/blog/python-32-ms-ftps-ssl-tls-lockup-fix
https://bugs.python.org/issue10808
https://bugs.python.org/issue34557
Resolution:
Open the python page for ftplib: https://docs.python.org/3/library/ftplib.html
Click on the source code which will take you to something like this: https://github.com/python/cpython/blob/3.10/Lib/ftplib.py
Create a copy of this code into your project (example: my_lib\my_ftplib.py)
For the method that is failing, in your case STORBINARY, the error looks to be on the line where it says conn.unwrap() in that method. Comment this line. Enter keyword pass otherwise the empty if block will give syntax error.
Import the above library in your file where you are instantiating the FTP_TLS. Now you will no longer face this error.
Reasoning:
The code in the function def ntransfercmd (under FTP_LTS class) encloses the conn object into a SSL session. The above line which you have commented is responsible for tearing down the SSL session post transfer. For some reason, when using Microsoft's FTP server, the code gets blocked on that line and results in timeout. This can be either because post transfer the server drops the connection or maybe the server unwraps the SSL from its side. I am not sure. Commenting that line is harmless because eventually the connection will be closed anyways - see below for details:
In ftplib's python code, you will notice that the conn object in STORBINARY function is enclosed in a with block, and that it is created using socket.create_connection. This means that .close() is automatically called when the code exits the with block (you can confirm this by looking at the __exit__ method on source code of python's socket class).

Show Python FTP file upload messages

I have a remote FTP server where I want to upload new firmware images. When using the Linux ftp client I can do this using put <somefile> the server then responds with status messages like:
ftp> put TS252P005.bin flash
local: TS252P005.bin remote: flash
200 PORT command successful
150 Connecting to port 40929
226-File Transfer Complete. Starting Operation:
Checking file integrity...
Updating firmware...
Result: Success
Rebooting...
421 Service not available, remote server has closed connection
38563840 bytes sent in 6.71 secs (5.4779 MB/s)
ftp>
Now I can upload the file using Python as well using ftplib:
fw_path = ....
ip = ....
user = ...
pw = ....
with open(fw_path, "rb") as f:
with ftplib.FTP(ip) as ftp:
ftp.login(user=user, passwd=pw)
ftp.storbinary("stor flash", f)
But I can't see a way for me to get the status messages that I can see using the ftp utility. This is important for me because I need to check that the update actually succeeded.
How can I get this output in my Python program? I'm also willing to use a different library if ftplib can't do it.
Any help is appreciated!

If you want to check the response programatically, check the result of FTP.storbinary:
print(ftp.storbinary("STOR flash", f))
Though as your server actually closes the connection before even sending a complete response, the FTP.storbinary throws an exception.
If you want to read the partial response, you will have to re-implement what the FTP.storbinary does. Like
ftp.voidcmd('TYPE I')
with ftp.transfercmd("STOR flash") as conn:
while 1:
buf = f.read(8192)
if not buf:
break
conn.sendall(buf)
line = ftp.getline()
print(line)
if line[3:4] == '-':
code = line[:3]
while 1:
nextline = self.getline()
print(nextline)
if nextline[:3] == code and nextline[3:4] != '-':
break
If you want to check the response manually, enable logging using FTP.set_debuglevel.

Any way to increase speed of closing a file?

I'm trying to write a html-file and then upload it to my website using the following code:
webpage = open('testfile.html',"w")
webpage.write(contents)
webpage.close
server = 'ftp.xxx.be'
username = 'userxxx'
password = 'topsecret'
ftp_connection = ftplib.FTP(server, username, password)
remote_path = "/"
ftp_connection.cwd(remote_path)
fh = open("testfile.html", 'rb')
ftp_connection.storbinary('STOR testfile.html', fh)
fh.close()
The problem is the .close command seems to be slower than the ftp connection and the file that is sent over ftp is empty. A few seconds after the ftp is executed I see the file correctly locally on my PC.
Any hints to be certain the .close is finished before the ftp starts (apart from using time.sleep())?
Running Python 3.xx on W7pro

Try blocking on the close call:
Blocking until a file is closed in python
By the way, are the parentheses missing on your close call?

creating daemon using Python libtorrent for fetching meta data of 100k+ torrents

I am trying to fetch meta data of around 10k+ torrents per day using python libtorrent.
This is the current flow of code
Start libtorrent Session.
Get total counts of torrents we need metadata for uploaded within last 1 day.
get torrent hashes from DB in chunks
create magnet link using those hashes and add those magnet URI's in the session by creating handle for each magnet URI.
sleep for a second while Meta Data is fetched and keep checking whether meta data s found or not.
If meta data is received add it in DB else check if we have been looking for meta data for around 10 minutes , if yes then remove the handle i.e. dont look for metadata no more for now.
do above indefinitely. and save session state for future.
so far I have tried this.
#!/usr/bin/env python
# this file will run as client or daemon and fetch torrent meta data i.e. torrent files from magnet uri
import libtorrent as lt # libtorrent library
import tempfile # for settings parameters while fetching metadata as temp dir
import sys #getting arguiments from shell or exit script
from time import sleep #sleep
import shutil # removing directory tree from temp directory
import os.path # for getting pwd and other things
from pprint import pprint # for debugging, showing object data
import MySQLdb # DB connectivity
import os
from datetime import date, timedelta
session = lt.session(lt.fingerprint("UT", 3, 4, 5, 0), flags=0)
session.listen_on(6881, 6891)
session.add_extension('ut_metadata')
session.add_extension('ut_pex')
session.add_extension('smart_ban')
session.add_extension('metadata_transfer')
session_save_filename = "/magnet2torrent/magnet_to_torrent_daemon.save_state"
if(os.path.isfile(session_save_filename)):
fileread = open(session_save_filename, 'rb')
session.load_state(lt.bdecode(fileread.read()))
fileread.close()
print('session loaded from file')
else:
print('new session started')
session.add_dht_router("router.utorrent.com", 6881)
session.add_dht_router("router.bittorrent.com", 6881)
session.add_dht_router("dht.transmissionbt.com", 6881)
session.add_dht_router("dht.aelitis.com", 6881)
session.start_dht()
session.start_lsd()
session.start_upnp()
session.start_natpmp()
alive = True
while alive:
db_conn = MySQLdb.connect( host = '', user = '', passwd = '', db = '', unix_socket='/mysql/mysql.sock') # Open database connection
#print('reconnecting')
#get all records where enabled = 0 and uploaded within yesterday
subset_count = 100 ;
yesterday = date.today() - timedelta(1)
yesterday = yesterday.strftime('%Y-%m-%d %H:%M:%S')
#print(yesterday)
total_count_query = ("SELECT COUNT(*) as total_count FROM content WHERE upload_date > '"+ yesterday +"' AND enabled = '0' ")
#print(total_count_query)
try:
total_count_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
total_count_cursor.execute(total_count_query) # Execute the SQL command
total_count_results = total_count_cursor.fetchone() # Fetch all the rows in a list of lists.
total_count = total_count_results[0]
print(total_count)
except:
print "Error: unable to select data"
total_pages = total_count/subset_count
#print(total_pages)
current_page = 1
while(current_page <= total_pages):
from_count = (current_page * subset_count) - subset_count
#print(current_page)
#print(from_count)
hashes = []
get_mysql_data_query = ("SELECT hash FROM content WHERE upload_date > '" + yesterday +"' AND enabled = '0' ORDER BY record_num DESC LIMIT "+ str(from_count) +" , " + str(subset_count) +" ")
#print(get_mysql_data_query)
try:
get_mysql_data_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
get_mysql_data_cursor.execute(get_mysql_data_query) # Execute the SQL command
get_mysql_data_results = get_mysql_data_cursor.fetchall() # Fetch all the rows in a list of lists.
for row in get_mysql_data_results:
hashes.append(row[0].upper())
except:
print "Error: unable to select data"
#print(hashes)
handles = []
for hash in hashes:
tempdir = tempfile.mkdtemp()
add_magnet_uri_params = {
'save_path': tempdir,
'duplicate_is_error': True,
'storage_mode': lt.storage_mode_t(2),
'paused': False,
'auto_managed': True,
'duplicate_is_error': True
}
magnet_uri = "magnet:?xt=urn:btih:" + hash.upper() + "&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.ccc.de%3A80"
#print(magnet_uri)
handle = lt.add_magnet_uri(session, magnet_uri, add_magnet_uri_params)
handles.append(handle) #push handle in handles list
#print("handles length is :")
#print(len(handles))
while(len(handles) != 0):
for h in handles:
#print("inside handles for each loop")
if h.has_metadata():
torinfo = h.get_torrent_info()
final_info_hash = str(torinfo.info_hash())
final_info_hash = final_info_hash.upper()
torfile = lt.create_torrent(torinfo)
torcontent = lt.bencode(torfile.generate())
tfile_size = len(torcontent)
try:
insert_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
insert_cursor.execute("""INSERT INTO dht_tfiles (hash, tdata) VALUES (%s, %s)""", [final_info_hash , torcontent] )
db_conn.commit()
#print "data inserted in DB"
except MySQLdb.Error, e:
try:
print "MySQL Error [%d]: %s" % (e.args[0], e.args[1])
except IndexError:
print "MySQL Error: %s" % str(e)
shutil.rmtree(h.save_path()) # remove temp data directory
session.remove_torrent(h) # remove torrnt handle from session
handles.remove(h) #remove handle from list
else:
if(h.status().active_time > 600): # check if handle is more than 10 minutes old i.e. 600 seconds
#print('remove_torrent')
shutil.rmtree(h.save_path()) # remove temp data directory
session.remove_torrent(h) # remove torrnt handle from session
handles.remove(h) #remove handle from list
sleep(1)
#print('sleep1')
#print('sleep10')
#sleep(10)
current_page = current_page + 1
#save session state
filewrite = open(session_save_filename, "wb")
filewrite.write(lt.bencode(session.save_state()))
filewrite.close()
print('sleep60')
sleep(60)
#save session state
filewrite = open(session_save_filename, "wb")
filewrite.write(lt.bencode(session.save_state()))
filewrite.close()
I tried kept above script running overnight and found only around 1200 torrent's meta data is found in the overnight session.
so I am looking for improve the performance of the script.
I have even tried Decoding the save_state file and noticed there are 700+ DHT nodes I am connected to. so its not like DHT is not running,
What I am planning to do is, keep the handles active in session indefinitely while meta data is not fetched. and not going to remove the handles after 10 minutes if no meta data is fetched in 10 minutes, like I am currently doing it.
I have few questions regarding the lib-torrent python bindings.
How many handles can I keep running ? is there any limit for running handles ?
will running 10k+ or 100k handles slow down my system ? or eat up resources ? if yes then which resources ? I mean RAM , NETWORK ?
I am behind firewall , can be a blocked incoming port causing the slow speed of metadata fetching ?
can DHT server like router.bittorrent.com or any other BAN my ip address for sending too many requests ?
Can other peers BAN my ip address if they find out I am making too many requests only fot fetching meta data ?
can I run multiple instances of this script ? or may be multi-threading ? will it give better performance ?
if using multiple instances of the same script, each script will get unique node-id depending on the ip and port I am using , is this viable solution ?
Is there any better approach ? for achieving what I am trying ?

I can't answer questions specific to libtorrent's APIs, but some of your questions apply to bittorrent in general.
will running 10k+ or 100k handles slow down my system ? or eat up resources ? if yes then which resources ? i mean RAM , NETWORK ?
Metadata-downloads shouldn't use much resources since they are not full torrent-downloads yet, i.e. they can't allocate the actual files or anything like that. But they will need some ram/disk space for the metadata itself once they grab the first chunk of those.
I am behind firewall , can be a blocked incoming port causing the slow speed of metadata fetching ?
yes, by reducing the number of peers that can establish connections it becomes more difficult to fetch metadata (or establish any connection at all) on swarms with a low peer count.
NATs can cause the same issue.
can DHT server like router.bittorrent.com or any other BAN my ip address for sending too many requests ?
router.bittorrent.com is a bootstrap node, not a server per se. Lookups don't query a single node, they query many different (among millions). But yes, individual nodes can ban, or more likely rate-limit, you.
This can be mitigated by looking for randomly distributed IDs to spread the load across the DHT keyspace.
can i run multiple instances of this script ? or may be multi-threading ? will it give better performance ?
AIUI libtorrent is sufficiently non-blocking or multi-threaded that you can schedule many torrents at once.
I don't know if libtorrent has a rate-limit for outgoing DHT requests.
if using multiple instances of the same script, each script will get unique node-id depending on the ip and port i am using , is this viable solution ?
If you mean the DHT node ID, then they're derived from the IP (as per BEP 42), not the port. Although some random element is included, so a limited amount of IDs can be obtained per IP.
And some of this might also might be applicable for your scenario: http://blog.libtorrent.org/2012/01/seeding-a-million-torrents/
And another option is my own DHT implementation which includes a CLI to bulk-fetch torrents.

urlopen error 10045, 'address already in use' while downloading in Python 2.5 on Windows

I'm writing code that will run on Linux, OS X, and Windows. It downloads a list of approximately 55,000 files from the server, then steps through the list of files, checking if the files are present locally. (With SHA hash verification and a few other goodies.) If the files aren't present locally or the hash doesn't match, it downloads them.
The server-side is plain-vanilla Apache 2 on Ubuntu over port 80.
The client side works perfectly on Mac and Linux, but gives me this error on Windows (XP and Vista) after downloading a number of files:
urllib2.URLError: <urlopen error <10048, 'Address already in use'>>
This link: http://bytes.com/topic/python/answers/530949-client-side-tcp-socket-receiving-address-already-use-upon-connect points me to TCP port exhaustion, but "netstat -n" never showed me more than six connections in "TIME_WAIT" status, even just before it errored out.
The code (called once for each of the 55,000 files it downloads) is this:
request = urllib2.Request(file_remote_path)
opener = urllib2.build_opener()
datastream = opener.open(request)
outfileobj = open(temp_file_path, 'wb')
try:
while True:
chunk = datastream.read(CHUNK_SIZE)
if chunk == '':
break
else:
outfileobj.write(chunk)
finally:
outfileobj = outfileobj.close()
datastream.close()
UPDATE: I find by greping the log that it enters the download routine exactly 3998 times. I've run this multiple times and it fails at 3998 each time. Given that the linked article states that available ports are 5000-1025=3975 (and some are probably expiring and being reused) it's starting to look a lot more like the linked article describes the real issue. However, I'm still not sure how to fix this. Making registry edits is not an option.

If it is really a resource problem (freeing os socket resources)
try this:
request = urllib2.Request(file_remote_path)
opener = urllib2.build_opener()
retry = 3 # 3 tries
while retry :
try :
datastream = opener.open(request)
except urllib2.URLError, ue:
if ue.reason.find('10048') > -1 :
if retry :
retry -= 1
else :
raise urllib2.URLError("Address already in use / retries exhausted")
else :
retry = 0
if datastream :
retry = 0
outfileobj = open(temp_file_path, 'wb')
try:
while True:
chunk = datastream.read(CHUNK_SIZE)
if chunk == '':
break
else:
outfileobj.write(chunk)
finally:
outfileobj = outfileobj.close()
datastream.close()
if you want you can insert a sleep or you make it os depended
on my win-xp the problem doesn't show up (I reached 5000 downloads)
I watch my processes and network with process hacker.

Thinking outside the box, the problem you seem to be trying to solve has already been solved by a program called rsync. You might look for a Windows implementation and see if it meets your needs.

You should seriously consider copying and modifying this pyCurl example for efficient downloading of a large collection of files.

Instead of opening a new TCP connection for each request you should really use persistent HTTP connections - have a look at urlgrabber (or alternatively, just at keepalive.py for how to add keep-alive connection support to urllib2).

All indications point to a lack of available sockets. Are you sure that only 6 are in TIME_WAIT status? If you're running so many download operations it's very likely that netstat overruns your terminal buffer. I find that netstat stat overruns my terminal during normal useage periods.
The solution is to either modify the code to reuse sockets. Or introduce a timeout. It also wouldn't hurt to keep track of how many open sockets you have. To optimize waiting. The default timeout on Windows XP is 120 seconds. so you want to sleep for at least that long if you run out of sockets. Unfortunately it doesn't look like there's an easy way to check from Python when a socket has closed and left the TIME_WAIT status.
Given the asynchronous nature of the requests and timeouts, the best way to do this might be in a thread. Make each threat sleep for 2 minutes before it finishes. You can either use a Semaphore or limit the number of active threads to ensure that you don't run out of sockets.
Here's how I'd handle it. You might want to add an exception clause to the inner try block of the fetch section, to warn you about failed fetches.
import time
import threading
import Queue
# assumes url_queue is a Queue object populated with tuples in the form of(url_to_fetch, temp_file)
# also assumes that TotalUrls is the size of the queue before any threads are started.
class urlfetcher(threading.Thread)
def __init__ (self, queue)
Thread.__init__(self)
self.queue = queue
def run(self)
try: # needed to handle empty exception raised by an empty queue.
file_remote_path, temp_file_path = self.queue.get()
request = urllib2.Request(file_remote_path)
opener = urllib2.build_opener()
datastream = opener.open(request)
outfileobj = open(temp_file_path, 'wb')
try:
while True:
chunk = datastream.read(CHUNK_SIZE)
if chunk == '':
break
else:
outfileobj.write(chunk)
finally:
outfileobj = outfileobj.close()
datastream.close()
time.sleep(120)
self.queue.task_done()
elsewhere:
while url_queue.size() < TotalUrls: # hard limit of available ports.
if threading.active_threads() < 3975: # Hard limit of available ports
t = urlFetcher(url_queue)
t.start()
else:
time.sleep(2)
url_queue.join()
Sorry, my python is a little rusty, so I wouldn't be surprised if I missed something.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.