How do I read a CSV from Secure FTP Server - python

I have a script which get .csv file and some data correction and save my django database. In my case I couldn't get .csv file from FTP server. I tried following codes but I faced different errors each time.
import pandas as pd
import pysftp as sftp
with sftp.connect(your_host, your_user, your_pw) as conn:
with conn.open("path_and_file.csv", "r") as f:
df = pd.read_csv(f)
Error: "AttributeError: module 'pysftp' has no attribute 'connect'"
ftp = FTP('your_host')
ftp.login('your_user', 'your_pw')
ftp.set_pasv(False)
I couldn't go further.
How can I read .csv file from FTP server using by pandas?
I Solved my problem as below:
I copied files then opened as pd.
with FTP(host) as ftp:
ftp.login(user=user, passwd=password)
print(ftp.getwelcome())
with open("proj.csv", "wb") as f:
ftp.retrbinary("RETR " + "proj.csv", f.write, 1024)
with open("pers.csv", "wb") as f:
ftp.retrbinary("RETR " + "pers.csv", f.write, 1024)
ftp.quit()

import pysftp
import pandas as pd
cnopts = pysftp.CnOpts()
cnopts.hostkeys = None
with pysftp.Connection(hostname='hostname',username='username',password='password', cnopts=cnopts) as conn:
conn.get('filename')
with.open('filename') as f:
df = pd.read_csv(f)
this should give you the data frame of csv.

Related

How to read a csvfile on FTP that is compressed on a zip/folder

I'm trying to :
read a .csv file (compressed in a zipfile that is stored on FTP) by using ftplib
store the .csv file on a virtual file on memory by using io
transform the virutal file to a dataframe by using pandas
For that I'm using the code below and it works really fine for the first scenario (path1, see image above) :
CODE :
import ftplib
import zipfile
import io
import pandas as pd
ftp = ftplib.FTP("theserver_name")
ftp.login("my_username","my_password")
ftp.encoding = "utf-8"
ftp.cwd('folder1/folder2')
filename = 'zipFile1.zip'
download_file = io.BytesIO()
ftp.retrbinary("RETR " + filename, download_file.write)
download_file.seek(0)
zfile = zipfile.ZipFile(download_file)
df = pd.read_csv(zfile.namelist()[0], delimiter=';')
display(df)
But in the second scenario (path2) and after changing my code, I get the error below :
CODE UPDATE :
ftp.cwd('folder1/folder2/')
filename = 'zipFile2.zip'
ERROR AFTER UPDATE :
FileNotFoundError: [Errno 2] No such file or directory:
'folder3/csvFile2.csv'
It seems like Python don't recognize the folder3 (contained in the zipFile2). Is there any explanation for that, please ? How can we fix that ? I tried with ftp.cwd('folder3') right before pd.read.csv() but it doesn't work..
Thanks to Serge Ballesta in his post here, I finally figure out how to transform csvFile2.csv to a DataFrame :
import ftplib
import zipfile
import io
import pandas as pd
ftp = ftplib.FTP("theserver_name")
ftp.login("my_username","my_password")
ftp.encoding = "utf-8"
flo = io.BytesIO()
ftp.retrbinary('RETR /folder1/folder2/zipFile2.zip', flo.write)
flo.seek(0)
with zipfile.ZipFile(flo) as archive:
with archive.open('folder3/csvFile2.csv') as fd:
df = pd.read_csv(fd, delimiter=';')
display(df)

Reading CSV file downloaded from FTP in Python not reading all rows

I am trying to read a CSV file from a folder in FTP. The file has 3072 rows. However, when I am running the code, it is not reading all the rows. Certain rows from the bottom are getting missed out.
## FTP host name and credentials
ftp = ftplib.FTP('IP', 'username','password')
## Go to the required directory
ftp.cwd("Folder_Name")
names = ftp.nlst()
final_names= [line for line in names if '.csv' in line]
latest_time = None
latest_name = None
#os.chdir(filepath)
for name in final_names:
time1 = ftp.sendcmd("MDTM " + name)
if (latest_time is None) or (time1 > latest_time):
latest_name = name
latest_time = time1
file = open(latest_name, 'wb')
ftp.retrbinary('RETR '+ latest_name, file.write)
dat = pd.read_csv(latest_name)
The CSV file to be read from FTP is as given below-
The output from the code is as-
Make sure you close the file, before you try to read it, using file.close(), or even better using with:
with open(latest_name, 'wb') as file:
ftp.retrbinary('RETR '+ latest_name, file.write)
dat = pd.read_csv(latest_name)
If you do not need to actually store the file to local file system, and the file is not too large, you can download it to memory only:
Reading files from FTP server to DataFrame in Python

Removing files using python from a server using FTP

I’m having a hard time with this simple script. It’s giving me an error of file or directory not found but the file is there. Script below I’ve masked user and pass plus FTP site
Here is my script
from ftplib import FTP
ftp = FTP('ftp.domain.ca')
pas = str('PASSWORD')
ftp.login(user = 'user', passwd=pas)
ftp.cwd('/public_html/')
filepaths = open('errorstest.csv', 'rb')
for j in filepaths:
    print(j)
    ftp.delete(str(j))
ftp.quit()
The funny thing tho is if I slight change the script to have ftp.delete() it finds the file and deletes it. So modified to be like this:
from ftplib import FTP
ftp = FTP('ftp.domain.ca')
pas = str('PASSWORD')
ftp.login(user = 'user', passwd=pas)
ftp.cwd('/public_html/')
ftp.delete(<file path>)
ftp.quit()
I’m trying to read this from a csv file. What am I doing wrong?
Whatever you have showed seems to be fine. But could you try this?
from ftplib import FTP
ftp = FTP(host)
ftp.login(username, password)
ftp.cwd('/public_html/')
print(ftp.pwd())
print(ftp.nlst())
with open('errorstest.csv') as file:
for line in file:
if line.strip():
ftp.delete(line.strip())
print(ftp.nlst())

Recursively get meta data of FTP folder and all sub folders

I am trying to figure out how to retrieve metadata from an FTP folder and all sub folders. I want to get the file name, file size, and date/time (of when the file was modified). I found the sample code (below) online. I entered in my credentials, ran the code, and received this error: No hostkey for host ftp.barra.com found.
Is there a quick fix for this?
from __future__ import print_function
import os
import time
import pysftp
ftp_username='xxx'
ftp_password='xxx'
ftp_host='xxx'
year = time.strftime("%Y")
month = time.strftime("%m")
day = time.strftime("%d")
ftp_dir = 'data/'+year+'/'+month
filename = time.strftime('ftp_file_lists.txt')
fout = open(filename, 'w')
wtcb = pysftp.WTCallbacks()
with pysftp.Connection(ftp_host, username=ftp_username, password=ftp_password) as sftp:
sftp.walktree(ftp_dir, fcallback=wtcb.file_cb, dcallback=wtcb.dir_cb, ucallback=wtcb.unk_cb)
print(len(wtcb.flist))
for fpath in wtcb.flist:
print(fpath, file=fout)
sftp.close()
Code from here.
http://alvincjin.blogspot.com/2014/09/recursively-fetch-file-paths-from-ftp.html

FTP Python pandas dataframe result set

Instead of FTPing a file from a locale server to a remote server, I am interested in sending the content of a pandas dataframe directly to the remote server.
Suppose I have the following in a dataframe
df.head()
Country City
A New-York
B France
C Londo
I want to be able to create write the content of the panda df directly to FTP, without having to write the file to disk and reading it before ftp.
Thanks
import ftplib
import os
ftp.storbinary("STOR " + file, open(file, "rb"))
ftp = ftplib.FTP('myserver.host.com')
ftp.login("", "")
File=open(" ", 'rb')
ftp.storbinary("file.txt" , File)
File.close()
ftp.quit()

Categories

Resources