Python ftplib cannot use STOR in callback function following RETR - python

Here is what I need to accomplish:
- connect to FTP
- get contents of test.txt
- write new contents into test.txt right after getting the results
In the real case scenario I need to get previos modification time, stored in a txt file and then upload to FTP only those files which were modified after that time without checking every file specifically (there are thousands of them, that would be too long).
Here is where I'm stuck.
def continueTest(data, ftp):
print(data, ftp)
with open('test.txt', 'w+') as file:
file.write('test')
with open('test.txt', 'rb') as file:
ftp.storbinary('STOR htdocs/test.txt', file)
def test():
host_data=FTP_HOSTS['planz-norwegian']
ftp = ftplib.FTP(host=host_data['server'],
user = host_data['username'],
passwd = host_data['password'])
print('connected to ftp')
ftp.retrbinary('RETR htdocs/test.txt', lambda data:continueTest(data, ftp))
if __name__=='__main__':
test()
This outputs:
connected to ftp
b'test' <ftplib.FTP object at 0x0322FAB0>
Traceback (most recent call last):
File "C:\Python33\Plan Z Editor SL\redistdb.py", line 111, in <module>
test()
File "C:\Python33\Plan Z Editor SL\redistdb.py", line 107, in test
ftp.retrbinary('RETR htdocs/test.txt', lambda data:continueTest(data, ftp))
File "C:\Python33\lib\ftplib.py", line 434, in retrbinary
callback(data)
File "C:\Python33\Plan Z Editor SL\redistdb.py", line 107, in <lambda>
ftp.retrbinary('RETR htdocs/test.txt', lambda data:continueTest(data, ftp))
File "C:\Python33\Plan Z Editor SL\redistdb.py", line 99, in continueTest
ftp.storbinary('STOR htdocs/test.txt', file)
File "C:\Python33\lib\ftplib.py", line 483, in storbinary
with self.transfercmd(cmd, rest) as conn:
File "C:\Python33\lib\ftplib.py", line 391, in transfercmd
return self.ntransfercmd(cmd, rest)[0]
File "C:\Python33\lib\ftplib.py", line 351, in ntransfercmd
host, port = self.makepasv()
File "C:\Python33\lib\ftplib.py", line 329, in makepasv
host, port = parse227(self.sendcmd('PASV'))
File "C:\Python33\lib\ftplib.py", line 873, in parse227
raise error_reply(resp)
ftplib.error_reply: 200 Type set to I.
If I don't use STOR in a callback, everything works fine, But then, how am I supposed to get data from RETR command?
I know possible solutions, but I'm sure there must be a more elegant one:
- use urllib.request instead of RETR (what if there's no HTTP on the server?)
- reinitialize FTP connection in callback function (may be slower than expected because of waiting for the server to reconnect)
- user ftp.set_pasv(False) (callback launches, but the script does not end and cannot use ftp.quit() or ftp.close())

According to the documentation of retrbinary:
The callback function is called for each block of data received, with a single string argument giving the data block.
This suggests that the callback is called while the data connection to retrieve the file is still open and the STOR command is not yet completed. It is not possible with FTP to create a new data connection (in the same FTP session) while another is still active. Additionally it looks like ftplib gets confused and considers the response to TYPE I beeing the response for PASV:
File "C:\Python33\lib\ftplib.py", line 873, in parse227
raise error_reply(resp)
ftplib.error_reply: 200 Type set to I.
What you should do instead is to call STOR only after the RETR is completed, i.e. let the callback store everything in the file but then open the file only after retrbinary returned.
But then, how am I supposed to get data from RETR command?
In your current callback you store the data inside a file and then you read the file. The callback should still store the data in the file but reading and calling STOR should be done outside the callback, right after retrbinary. You cannot RETR and STOR data in parallel.

Related

Airflow with gcp python : ValueError: Stream must be at beginning

I am using python along with airflow and gcp python library. I automated the process of sending files to gcp using airflow dags. The code is as follows :-
for fileid, filename in files_dictionary.items():
if ftp.size(filename) <= int(MAX_FILE_SIZE):
data = BytesIO()
ftp.retrbinary('RETR ' + filename, callback=data.write)
f = client.File(client, fid=fileid)
size = sys.getsizeof(data.read())
// Another option is to use FileIO but not sure how
f.send(data, filename, size) // This method is in another library
The code to trigger the upload is current repo (as soon above) but real upload is done by another dependency which is not in our control. The documentation of that method is
def send(self, fp, filename, file_bytes):
"""Send file to cloud
fp file object
filename is the name of the file.
file_bytes is the size of the file in bytes
"""
data = self.initiate_resumable_upload(self.getFileid())
_, blob = self.get_gcs_blob_and_bucket(data)
# Set attachment filename. Does this work with datasets with folders
original_filename = filename.rsplit(os.sep, 1)[-1]
blob.content_disposition = "attachment;filename=" + original_filename
blob.upload_from_file(fp)
self.finish_resumable_upload(self.getFileid())
I am getting below error
[2020-04-23 09:43:17,239] {{models.py:1788}} ERROR - Stream must be at beginning.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1657, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 103, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 108, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/usr/local/airflow/dags/transfer_data.py", line 241, in upload
f.send(data, filename, size)
File "/usr/local/lib/python3.6/site-packages/client/utils.py", line 53, in wrapper_timer
value = func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/client/client.py", line 518, in send
blob.upload_from_file(fp)
File "/usr/local/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 1158, in upload_from_file
client, file_obj, content_type, size, num_retries, predefined_acl
File "/usr/local/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 1068, in _do_upload
client, stream, content_type, size, num_retries, predefined_acl
File "/usr/local/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 1011, in _do_resumable_upload
predefined_acl=predefined_acl,
File "/usr/local/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 960, in _initiate_resumable_upload
stream_final=False,
File "/usr/local/lib/python3.6/site-packages/google/resumable_media/requests/upload.py", line 343, in initiate
stream_final=stream_final,
File "/usr/local/lib/python3.6/site-packages/google/resumable_media/_upload.py", line 415, in _prepare_initiate_request
raise ValueError(u"Stream must be at beginning.")
ValueError: Stream must be at beginning.
The upload_from_file function has a parameter that handles the seek(0) call for you:
I would modify your upload_from_file call to:
blob.upload_from_file(file_obj=fp, rewind=True)
That should do the trick, and you don't need to include the additional seek()
When reading a binary file, you can navigate through it using seek operations. In other words, you can move the reference from the beginning of the file to any other position. The error ValueError: Stream must be at beginning. is basically saying: "your reference is not pointed to the beginning of the stream and it must be"
Given that, you need to set your reference back to the beginning of the stream. You can do that using the function seek.
In your case, you would do something like:
data = BytesIO()
ftp.retrbinary('RETR ' + filename, callback=data.write)
f = client.File(client, fid=fileid)
size = sys.getsizeof(data.read())
data.seek(0)
f.send(data, filename, size)

How to write on top of pandas HDF5 'read-only mode' files?

I am storing data using pandas built-in HDF5 methods.
Somehow, these HDF5 files were turned into 'read-only' files, and I am getting a lot of Opening xxx in read-only mode messages when I open those files in write mode and I can't write them, which is something I really need to do.
The thing I really don't understand so far is how come those files turned into read-only, as I am not aware of a piece of code that I wrote that may result in that behavior. (I have tried to check if the data stored in the HDF5 is corrupt, but I am able to read it and manipulate it, so it seems to be working just fine)
I have 2 questions:
How can I append data to those 'read-only mode' HDF5 files? (Can I convert them back to write mode or any other clever solution?)
Is there any pandas method that would change the HDF5 file to a 'read-only mode' by default so I can avoid turning those files into read-only in the first place?
Code:
The piece of code that is raising this issue is, which is the piece I use to save the output I generated:
with pd.HDFStore('data/observer/' + self._currency + '_' + str(ts)) as hdf:
hdf.append(key='observers', value=df, format='table', data_columns=True)
I also use this piece of code to manipulate the outputs that were generated previously:
for the_file in list_dir:
if currency in the_file:
temp_df = pd.read_hdf(folder + the_file)
...
I use some select commands as well to get specific columns from the data files:
with pd.HDFStore('data/observer/' + self.currency + '_' + timestamp) as hdf:
df = hdf.select(key='observers', columns=[x, y])
Error Traceback:
File ".../data_processing/observer_data.py", line 52, in save_obs_to_pandas
hdf.append(key='observers', value=df, format='table', data_columns=True)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 963, in append
**kwargs)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 1341, in _write_to_group
s.write(obj=value, append=append, complib=complib, **kwargs)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 3930, in write
self.set_info()
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 3163, in set_info
self.attrs.info = self.info
File ".../venv/lib/python3.5/site-packages/tables/attributeset.py", line 464, in __setattr__
nodefile._check_writable()
File ".../venv/lib/python3.5/site-packages/tables/file.py", line 2119, in _check_writable
raise FileModeError("the file is not writable")
tables.exceptions.FileModeError: the file is not writable
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".../general_manager.py", line 144, in <module>
gm.run()
File ".../general_manager.py", line 114, in run
list_of_observer_managers = self.load_all_observer_managers()
File ".../general_manager.py", line 64, in load_all_observer_managers
observer = currency_pool.map(self.load_observer_manager, list_of_currencies)
File "/usr/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
tables.exceptions.FileModeError: the file is not writable
The issue at hand was that I messed up with OS file permissions. The file I was trying to read belonged to the root (as I had run the code that generated those files with the root) and I was trying to access them with a user account.
I am running debian, and the following command (as root) solved my issues:
chown -R user.user folder
This commands recursively changes permissions of all files inside that folder to user.user.

'NoneType' object has no attribute 'sendall' PYTHON

My current python script:
import os
import ftplib
import hashlib
import glob
hashing = "123"
m = hashlib.md5()
m.update(hashing)
dd = m.hexdigest()
ftp = ftplib.FTP('localhost','kevin403','S$ip1234')
ftp.cwd('/var/www/html/image')
for image in glob.glob(os.path.join('Desktop/images/test*.png')):
with open(image, 'rb') as file:
ftp.storbinary('STOR '+dd+ '.png', file)
ftp.close()
ftp.quit()
Anybody know this error? I trying to send file over to another folder via ftp.
The error i got when running the script:
Traceback (most recent call last):
File "/home/kevin403/wwq.py", line 21, in <module>
ftp.quit()
File "/usr/lib/python2.7/ftplib.py", line 591, in quit
resp = self.voidcmd('QUIT')
File "/usr/lib/python2.7/ftplib.py", line 253, in voidcmd
self.putcmd(cmd)
File "/usr/lib/python2.7/ftplib.py", line 181, in putcmd
self.putline(line)
File "/usr/lib/python2.7/ftplib.py", line 176, in putline
self.sock.sendall(line)
AttributeError: 'NoneType' object has no attribute 'sendall'
FTP.quit()
Send a QUIT command to the server and close the connection. This is the “polite” way to close a connection, but it may raise an exception if the server responds with an error to the QUIT command. This implies a call to the close() method which renders the FTP instance useless for subsequent calls (see below).
FTP.close()
Close the connection unilaterally. This should not be applied to an already closed connection such as after a successful call to quit(). After this call the FTP instance should not be used any more (after a call to close() or quit() you cannot reopen the connection by issuing another login() method).
Just erase ftp.close(), since ftp.quit() implies call .close() function you're telling to close two times, and then quit falls because socket was already closed.
FTPlib Documentation

Getting an EOFError when getting large files with Paramiko

I'm trying to write a quick python script to grab some logs with sftp. My first inclination was to use Pysftp, since it seemed like it made it very simple. It worked great, until it got to a larger file. I got an error while getting any file over about 13 MB. I then decided to try writing what I needed directly in Paramiko, rather than relying on the extra layer of Pysftp. After figuring out how to do that, I ended up getting the exact same error. Here's the Paramiko code, as well as the trace from the error I get. Does anyone have any idea why this would have an issue pulling any largish files? Thanks.
# Create tranport and connect
transport = paramiko.Transport((host, 22))
transport.connect(username=username, password=password)
sftp = paramiko.SFTPClient.from_transport(transport)
# List of the log files in c:
files = sftp.listdir('c:/logs')
# Now pull them, logging as you go
for f in files:
if f[0].lower() == 't' or f[:3].lower() == 'std':
logger.info('Pulling {0}'.format(f))
sftp.get('c:/logs/{0}'.format(f), output_dir +'/{0}'.format(f))
# Close the connection
sftp.close()
transport.close()
And here's the error:
No handlers could be found for logger "paramiko.transport"
Traceback (most recent call last):
File "pull_logs.py", line 420, in <module> main()
File "pull_logs.py", line 410, in main
pull_logs(username, host, password, location)
File "pull_logs.py", line 142, in pull_logs
sftp.get('c:/logs/{0}'.format(f), output_dir +'/{0}'.format(f))
File "/Users/me/my_site/site_packages/paramiko/sftp_client.py", line 676, in get
size = self.getfo(remotepath, fl, callback)
File "/Users/me/my_site/site_packages/paramiko/sftp_client.py", line 645, in getfo
data = fr.read(32768)
File "/Users/me/my_site/site_packages/paramiko/file.py", line 153, in read
new_data = self._read(read_size)
File "/Users/me/my_site/site_packages/paramiko/sftp_file.py", line 152, in _read
data = self._read_prefetch(size)
File "/Users/me/my_site/site_packages/paramiko/sftp_file.py", line 132, in _read_prefetch
self.sftp._read_response()
File "/Users/me/my_site/site_packages/paramiko/sftp_client.py", line 721, in _read_response
raise SSHException('Server connection dropped: %s' % (str(e),))
paramiko.SSHException: Server connection dropped:

Why does my code raise an exception?

Hi I am trying to create an FTP server and to aid the development I'm using pyftpdlib. What I wanted to do is to do some file operations if a user downloads a specific file but sometimes it raises an exception and I don't really know why.
I wrote my own handler in pyftpdlib after this tutorial: http://code.google.com/p/pyftpdlib/wiki/Tutorial#3.8_-_Event_callbacks
But something goes terribly wrong sometimes when the user downloads the log file (which I intend to do some file operations on) and I don't really understand why. I have another class which basically reads from a configuration file and the error message said it couldn't find FTP Section. But it's strange because I clearly have it in my configuration file and it is working sometimes perfectly.
May this error appear because I have two "Connection" objects? That's the only guess I have, so I would be very glad if someone could explain what's going wrong. Here is my code that's troubled (nevermind the file.name check because that was very recently added):
class ArchiveHandler(ftpserver.FTPHandler):
def on_login(self, username):
# do something when user login
pass
def on_logout(self, username):
# do something when user logs out
pass
def on_file_sent(self, file):
"What to do when retrieved the file the class is watching over"
attr = Connection()
if attr.getarchive() == 'true':
t = datetime.now()
if file.name == "log.log":
try:
shutil.copy2(file, attr.getdir() + ".archive/" + str(t.strftime("%Y-%m-%d_%H:%M:%S") + '.log'))
except OSError:
print 'Could not copy file'
raise
if attr.getremain() == 'false':
try:
os.remove(file)
except OSError:
print 'Could not remove file'
raise
The full source:
http://pastie.org/3552079
Source of the config-file:
http://pastie.org/3552085
EDIT-> (and of course the error):
[root]#85.230.122.159:40659 unhandled exception in instance <pyftpdlib.ftpserver.DTPHandler object at 0xb75f49ec>
Traceback (most recent call last):
File "/usr/lib/python2.6/asyncore.py", line 84, in write
obj.handle_write_event()
File "/usr/lib/python2.6/asyncore.py", line 435, in handle_write_event
self.handle_write()
File "/usr/lib/python2.6/asynchat.py", line 174, in handle_write
self.initiate_send()
File "/usr/lib/python2.6/asynchat.py", line 215, in initiate_send
self.handle_close()
File "/usr/local/lib/python2.6/dist-packages/pyftpdlib/ftpserver.py", line 1232, in handle_close
self.close()
File "/usr/local/lib/python2.6/dist-packages/pyftpdlib/ftpserver.py", line 1261, in close
self.cmd_channel.on_file_sent(filename)
File "ftp.py", line 87, in on_file_sent
File "ftp.py", line 12, in __init__
File "/usr/lib/python2.6/ConfigParser.py", line 311, in get
raise NoSectionError(section)
NoSectionError: No section: 'FTP Section'
The problem is in a section you didn't include. It says
File "ftp.py", line 12, in __init__
File "/usr/lib/python2.6/ConfigParser.py", line 311, in get
raise NoSectionError(section)
NoSectionError: No section: 'FTP Section'
So from the first line, we know that whatever is on line 12 of ftp.py is the problem (since everything below that isn't our code, so we assume that it's correct).
Line 12 is this:
self.ip = config.get('FTP Section', 'hostname')
And the error message says "No section: 'FTP Section'".
From this we can assume there's an error in the config file (that it doesn't have a "FTP Section").
Are you pointing at the correct config file? Is it in the same directory that you're running the script from? Being in the same folder as the script will not work, it must be the folder that you run the script from.
I think this is the problem you're having, since according to the documentation:
If none of the named files exist, the ConfigParser instance will contain an empty dataset.
You can confirm this by trying to open the file.
The problem in this case was that by reading the file it left the file open.
I changed to this and it's working much better:
config = ConfigParser.RawConfigParser()
fp = open('config.cfg')
config.readfp(fp)
And then when I'm finished reading in the constructor I add:
#Close the file
fp.close()
And voila, you can open how many objects of the class you want and it won't show any errors. :)

Categories

Resources