Read a file in buffer from FTP python - python

I am trying to read a file from an FTP server. The file is a .gz file. I would like to know if I can perform actions on this file while the socket is open. I tried to follow what was mentioned in two StackOverflow questions on reading files without writing to disk and reading files from FTP without downloading but was not successful.
I know how to extract data/work on the downloaded file but I'm not sure if I can do it on the fly. Is there a way to connect to the site, get data in a buffer, possibly do some data extraction and exit?
When trying StringIO I got the error:
>>> from ftplib import FTP
>>> from StringIO import StringIO
>>> ftp = FTP('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz')
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
ftp = FTP('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz')
File "C:\Python27\lib\ftplib.py", line 117, in __init__
self.connect(host)
File "C:\Python27\lib\ftplib.py", line 132, in connect
self.sock = socket.create_connection((self.host, self.port), self.timeout)
File "C:\Python27\lib\socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno 11004] getaddrinfo failed
I just need to know how can I get data into some variable and loop on it until the file from FTP is read.
I appreciate your time and help. Thanks!

Make sure to login to the ftp server first. After this, use retrbinary which pulls the file in binary mode. It uses a callback on each chunk of the file. You can use this to load it into a string.
from ftplib import FTP
ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login() # Username: anonymous password: anonymous#
# Setup a cheap way to catch the data (could use StringIO too)
data = []
def handle_binary(more_data):
data.append(more_data)
resp = ftp.retrbinary("RETR pub/pmc/PMC-ids.csv.gz", callback=handle_binary)
data = "".join(data)
Bonus points: how about we decompress the string while we're at it?
Easy mode, using data string above
import gzip
import StringIO
zippy = gzip.GzipFile(fileobj=StringIO.StringIO(data))
uncompressed_data = zippy.read()
Little bit better, full solution:
from ftplib import FTP
import gzip
import StringIO
ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login() # Username: anonymous password: anonymous#
sio = StringIO.StringIO()
def handle_binary(more_data):
sio.write(more_data)
resp = ftp.retrbinary("RETR pub/pmc/PMC-ids.csv.gz", callback=handle_binary)
sio.seek(0) # Go back to the start
zippy = gzip.GzipFile(fileobj=sio)
uncompressed = zippy.read()
In reality, it would be much better to decompress on the fly but I don't see a way to do that with the built in libraries (at least not easily).

There are two easy ways I can think of to download a file using FTP and store it locally:
Using ftplib:
from ftplib import FTP
ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login()
ftp.cwd('pub/pmc')
ftp.retrbinary('RETR PMC-ids.csv.gz', open('PMC-ids.csv.gz', 'wb').write)
ftp.quit()
Using urllib
from urllib import urlretrieve
urlretrieve("ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz", "PMC-ids.csv.gz")
If you don't want to download and store it to a file, but you want to process it gradually as it comes, I suggest using urllib2:
from urllib2 import urlopen
u = urlopen("ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/readme.txt")
for line in u:
print line
which prints your file line by line.

That is not possible. To process data on the server, you need to have some sort of execution permissions, be it for a shell script you would send or SQL access.
FTP is pure file transfer, no execution allowed. You will need either to enable SSH access, load the data into a Database and access that with queries or download the file with urllib then process it locally, like this:
import urllib
handle = urllib.urlopen('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz')
# Use data, maybe: buffer = handle.read()
In particular, I think the third one is the only zero-effort solution.

Related

read big files from SFTP server with python 3

I want to read multi big files that exist on centos server with python.I wrote a simple code for that and it's worked but entire file came to a paramiko object (paramiko.sftp_file.SFTPFile) after that I can process line. it has not good performance and I want process file and write to csv piece by piece because process entire file can affect performance. Is there a way to solve the problem?
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(host, port, username, password)
sftp_client = ssh.open_sftp()
remote_file = sftp_client.open(r'/root/bigfile.csv')
try:
for line in remote_file:
#Proccess
finally:
remote_file.close()
Here could solve your problem.
def lazy_loading_ftp_file(sftp_host_conn, filename):
"""
Lazy loading ftp file when exception simple sftp.get call
:param sftp_host_conn: sftp host
:param filename: filename to be downloaded
:return: None, file will be downloaded current directory
"""
import shutil
try:
with sftp_host_conn() as host:
sftp_file_instance = host.open(filename, 'r')
with open(filename, 'wb') as out_file:
shutil.copyfileobj(sftp_file_instance.raw, out_file)
return {"status": "sucess", "msg": "sucessfully downloaded file: {}".format(filename)}
except Exception as ex:
return {"status": "failed", "msg": "Exception in Lazy reading too: {}".format(ex)}
This will avoid reading the whole thing into memory at once.
Reading in chunks will help you here:
import pandas as pd
chunksize = 1000000
for chunk in pd.read_csv(filename, chunksize=chunksize):
process(chunk)
Update:
Yeah, I'm aware that my answer written based on a local file. Just giving example for reading file in chunks.
To answer the question, check out this one:
paramiko.sftp_client.SFTPClient.putfo
Functions for working with remote files using pandas and paramiko (SFTP/SSH). - pass the chunk size as I mentioned above.

Downloading file from FTP server with ftplib: Always 0 bytes/empty

I'm trying to download a file from an FTPS server, using Python ftplib.
But the downloaded file has always 0 bytes (is empty).
If I see the file in the server with WinSCP, the file has data (about 1Kb).
In WinSCP I'm using the options "Encryption: Explicit TSL" and "PassiveMode=False".
What is wrong with the code?
Thanks!!
This is the code I am using:
import ftplib
server='10.XX.XX.XX'
username='username'
password='password'
session = ftplib.FTP_TLS(server)
session.login(user=username,passwd=password)
session.prot_p()
session.set_pasv(False)
session.nlst()
session.cwd("home")
print(session.pwd())
filename = "test.txt"
# Open a local file to store the downloaded file
my_file = open(r'c:\temp\ftpTest.txt', 'wb')
session.retrbinary('RETR ' + filename, my_file.write, 1024)
session.quit()
You are not closing the local file after the download. You should use context manager for that. Similarly also for the FTP session:
with ftplib.FTP_TLS(server) as session:
session.login(user=username, passwd=password)
session.prot_p()
session.set_pasv(False)
session.nlst()
session.cwd("home")
print(session.pwd())
filename = "test.txt"
# Open a local file to store the downloaded file
with open(r'c:\temp\ftpTest.txt', 'wb') as my_file:
session.retrbinary('RETR ' + filename, my_file.write, 1024)

Cannot access file on Samba server via Python

I'm trying to access a file on our Samba server using Python. I found out I need to use a Samba client for this, so I started using PySmbClient. Even though there are many examples online of how to do this, mine just does not want to work. See below.
smb = smbclient.SambaClient(server="192.168.0.320", share="DATA", domain="WORKGROUP",username="admin", password="abc123")
f = smb.open('test.json', 'r')
This produces the following error:
OSError: [Errno 2] No such file or directory
with the following trace:
Traceback (most recent call last):
File "create_dataset.py", line 35, in <module>
f = smb.open('serverSaver.txt', 'r')
File "/home/grant/Development/create_dataset/env/local/lib/python2.7/site-packages/smbclient.py", line 408, in open
f = _SambaFile(self, path, mode)
File "/home/grant/Development/create_dataset/env/local/lib/python2.7/site-packages/smbclient.py", line 448, in __init__
connection.download(remote_name, self._tmp_name)
File "/home/grant/Development/create_dataset/env/local/lib/python2.7/site-packages/smbclient.py", line 393, in download
result = self._runcmd('get', remote_path, local_path)
File "/home/grant/Development/create_dataset/env/local/lib/python2.7/site-packages/smbclient.py", line 184, in _runcmd
return self._raw_runcmd(fullcmd)
File "/home/grant/Development/create_dataset/env/local/lib/python2.7/site-packages/smbclient.py", line 168, in _raw_runcmd
stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
raise child_exception
I've read and implemented many "solutions", but so far nothing has worked for me. I can access the Samba server with the given credentials through my file manager just fine, so I know those values should be fine. I even spoke to our sys admin and he doesn't know what could be wrong.
It must be more than the simple code I wrote. Do you think there's an issue on the server side of things? Something with the values I input into SambaClient? At this point I'm pretty much open to anything that leads to a solution.
Here's some code that works for me, transferring a file from a Linux Samba share to my Windows laptop. It's also known to work fine in the other direction (Linux client, Windows server).
I'm using the pysmb library version 1.1.19 (the latest) and Python 2.7.1.
See the pysmb site for the pysmb package; I actually downloaded and installed it directly from its tarball and setup.py, as pip was throwing an error.
The pysmb package is less user-friendly but it does work well for Windows clients.
I set up a share called "my_share" on the Linux machine for user "edwards" using the following entry in smb.conf:
[my_share]
path = /home/edwards
valid_users = edwards
read only = no
guest ok = yes
browseable = yes
And then used the following code to list the files on the share, and download a file called "rti_license.dat" to my laptop:
import tempfile
import smb
import shutil
from smb.SMBConnection import SMBConnection
share_name = "my_share"
user_name = "edwards"
password = "######" # secret :-)
local_machine_name = "laptop" # arbitrary
server_machine_name = "edwards-Yocto" # MUST match correctly
server_IP = "192.162.2.1" # as must this
# create and establish connection
conn = SMBConnection(user_name, password, local_machine_name, server_machine_name, use_ntlm_v2 = True)
assert conn.connect(server_IP, 139)
# print list of files at the root of the share
files = conn.listPath(share_name, "/")
for item in files:
print item.filename
# check if the file we want is there
sf = conn.getAttributes(share_name, "rti_license.dat")
print sf.file_size
print sf.filename
# create a temporary file for the transfer
file_obj = tempfile.NamedTemporaryFile(mode='w+t', delete=False)
file_name = file_obj.name
file_attributes, copysize = conn.retrieveFile(share_name, "rti_license.dat", file_obj)
print copysize
file_obj.close()
# copy temporary file
shutil.copy(file_name, "rti_license.dat")
# close connection
conn.close()
Note that the server name must be correct or the connection won't work (from a Linux machine it's the output of the hostname command)
Hope this may be useful.

urlretrieve hangs when downloading file

I have a very simple script that uses urllib to retrieve a zip file and place it on my desktop. The zip file is only a couple MB in size and doesn't take long to download. However, the script doesn't seem to finish, it just hangs. Is there a way to forcibly close the urlretrieve?...or a better solution?
The URL is to a public ftp size. Is the ftp perhaps the cause?
I'm using python 2.7.8.
url = r'ftp://ftp.ngs.noaa.gov/pub/DS_ARCHIVE/ShapeFiles/IA.ZIP'
zip_path = r'C:\Users\***\Desktop\ngs.zip'
urllib.urlretrieve(url, zip_path)
Thanks in advance!
---Edit---
Was able to use ftplib to accomplish the task...
import os
from ftplib import FTP
import zipfile
ftp_site = 'ftp.ngs.noaa.gov'
ftp_file = 'IA.ZIP'
download_folder = '//folder to place file'
download_file = 'name of file'
download_path = os.path.join(download_folder, download_file)
# Download file from ftp
ftp = FTP(ftp_site)
ftp.login()
ftp.cwd('pub/DS_ARCHIVE/ShapeFiles') #change directory
ftp.retrlines('LIST') #show me the files located in directory
download = open(download_path, 'wb')
ftp.retrbinary('RETR ' + ftp_file, download.write)
ftp.quit()
download.close()
# Unzip if .zip file is downloaded
with zipfile.ZipFile(download_path, "r") as z:
z.extractall(download_folder)
urllib has a very bad support for error catching and debugging. urllib2 is a much better choice. The urlretrieve equivalent in urllib2 is:
resp = urllib2.urlopen(im_url)
with open(sav_name, 'wb') as f:
f.write(resp.read())
And the errors to catch are:
urllib2.URLError, urllib2.HTTPError, httplib.HTTPException
And you can also catch socket.error in case that the network is down.
You can use python requests library with requests-ftp module. It provides easier API and better processes exceptions. See: https://pypi.python.org/pypi/requests-ftp and http://docs.python-requests.org/en/latest/

Download specific file from FTP using python

quick and simple:
I have the following function, works well if i specify the file name.
import os
import ftplib
def ftpcon(self, host, port, login, pwd, path):
ftp = ftplib.FTP()
ftp.connect(host, port, 20)
try:
ftp.login(login, pwd)
ftp.cwd(path)
for files in ftp.nlst():
if files.endswith('.doc') or files.endswith('.DOC'):
ftp.retrbinary('RETR ' + files, open(file, 'wb').write)
print files
But when i use the for loop with ftp.nlst() to try to match an specific type of file, i receive the error:
coercing to Unicode: need string or buffer, type found
Since im not sure if this is the best way to do it, what could the "correct" way to download a file ?
Maybe try:
from ftplib import FTP
server = FTP("ip/serveradress")
server.login("user", "password")
server.retrlines("LIST") # Will show a FTP content list.
server.cwd("Name_Of_Folder_in_FTP_to_browse_to") # browse to folder containing your file for DL
then:
server.sendcmd("TYPE i") # ready for file transfer
server.retrbinary("RETR %s"%("FILENAME+EXT to DL"), open("DESTINATIONPATH+EXT", "wb").write) # this will transfer the selected file...to selected path/file
believe this is as correct as serves..
u can set server.set_debuglevel(0) to (1) or (2) for more detailed description while logged in to server.

Categories

Resources