How do I scrape the text from a FTP server using Python?

How do I scrape the text from a FTP server using Python? - python

I am looking to extract all the information from this page:
Text data in FTP
I understand that requests lib wouldn't work for ftp, so I have resorted to using ftplib.
However, documentation seems to only explore the downloading of files in directories. How do I download this file without a "file type"
Thanks in advance.

If you want to download a text file contents to memory, without using any temporary file, use retrlines like:
contents = ""
def collectLines(s):
global contents
contents += s + "\n"
ftp.retrlines("RETR " + filename, collectLines)
Or use an array:
lines = []
ftp.retrlines("RETR " + filename, lines.append)
For binary files, see Read a file in buffer from FTP python.

Related

Copy the text file data using Python FTPLIB without copying it to local path [duplicate]

I think my question sounds kinda stupid but I'm pretty new to python programming.
I just want to have a text variable which gets a string from a .txt file at an FTP server.
So in conclusion: There is a .txt File stored at an FTP server and I want to have the content of this file stored in an variable...
This is what I have so far... Can anybody help me? I use Python 3.6.3 :) thanks in advance!
from ftplib import FTP
ftp = FTP('1bk2t.ddns.net')
ftp.login(user='User', passwd = 'Password')
ftp.cwd('/path/')
filename = 'filename.txt'
ftp.retrbinary("RETR " + filename, open(filename, 'wb').write)
ftp.quit()
var = localfile.read

If you want to download a text file contents to memory, without using any temporary file, use FTP.retrlines like:
contents = ""
def collectLines(s):
global contents
contents += s + "\n"
ftp.retrlines("RETR " + filename, collectLines)
Or use an array:
lines = []
ftp.retrlines("RETR " + filename, lines.append)

You can use StringIO as buffer for retrlines RETR:
import io
with io.StringIO() as buffer_io:
ftp.retrlines(f'RETR {filename}', buffer_io.write)
content = buffer_io.getvalue()

Downloading a file with a wildcard from Sharepoint URL with Python

I'm not sure if this can be done or not but I thought I'd ask!
I'm running a windows 10 PC using Python 2.7. I'm wanting to download a file form sharepoint to a folder on my C: drive.
OpenPath = "https://office.test.com/sites/Rollers/New improved files/"
OpenFile = "ABC UK.xlsb"
The downside is the file is uploaded externally & due to human error it can be saved as a .xlsx or ABC_UK. Therefor I only want to use the first 3 characters with a wildcard (ABC*) to open that file. Thankfully the first 3 Characters are unique & there only be one file in the path that should match.

to find the file in your dir:
import os, requests, fnmatch
#loop over dir
for filename in os.listdir(OpenPath):
#find the file
if fnmatch.fnmatch(filename, 'ABC_UK*'):
#download the file
# open file handler
with open('C:\dwnlfile.xls', 'wb') as fh:
#try to get it
result = requests.get(OpenPath+filename)
#check u got it
if not result.ok:
print result.reason # or text
exit(1)
#save it
fh.write(result.content)
print 'got it and saved'

How to Retrieve a Zip Folder from FTP in Python

I'm trying to retrieve a zip folder(s) from an ftp site and save them to my local machine, using python (ideally I'd like to specify where they are saved on my C:).
The code below connects to the FTP site and then *something happens in the PyScripter window that looks like random characters for about 1000 lines... but nothing actually gets downloaded to my hard drive.
Any tips?
import ftplib
import sys
def gettext(ftp, filename, outfile=None):
# fetch a text file
if outfile is None:
outfile = sys.stdout
# use a lambda to add newlines to the lines read from the server
ftp.retrlines("RETR " + filename, lambda s, w=outfile.write: w(s+"\n"))
def getbinary(ftp, filename, outfile=None):
# fetch a binary file
if outfile is None:
outfile = sys.stdout
ftp.retrbinary("RETR " + filename, outfile.write)
ftp = ftplib.FTP("FTP IP Address")
ftp.login("username", "password")
ftp.cwd("/MCPA")
#gettext(ftp, "subbdy.zip")
getbinary(ftp, "subbdy.zip")

Well, it seems that you simply forgot to open the file you want to write into.
Something like:
getbinary(ftp, "subbdy.zip", open(r'C:\Path\to\subbdy.zip', 'wb'))

Automated download of NetCDF file

Just wondered if anyone could help I'm trying to download a NetCDF file from the internet within my code. The website is wish to download from is:
http://www.esrl.noaa.gov/psd/cgi-bin/db_search/DBListFiles.pl?did=3&tid=38354&vid=20
the file name which I would like to download is air.sig995.2013.nc
and if its downloaded manually the link is:
ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/air.sig995.2013.nc
Thanks

I would use urllib to retrieve the file
like this:
urllib.urlretrieve(url, filename)
where url is the url of the download and filename is the what you want to name the file

You can try this :
#!/usr/bin/env python
# Read data from an opendap server
import netCDF4
# specify an url, the JARKUS dataset in this case
url = 'http://dtvirt5.deltares.nl:8080/thredds/dodsC/opendap/rijkswaterstaat/jarkus/profiles/transect.nc'
# for local windows files, note that '\t' defaults to the tab character in python, so use prefix r to indicate that it is a raw string.
url = r'f:\opendap\rijkswaterstaat\jarkus\profiles\transect.nc'
# create a dataset object
dataset = netCDF4.Dataset(url)
# lookup a variable
variable = dataset.variables['id']
# print the first 10 values
print variable[0:10]
from
https://publicwiki.deltares.nl/display/OET/Reading%2Bdata%2Bfrom%2BOpenDAP%2Busing%2Bpython

PDF files downloaded with Python cannot be opened in acrobat

I have a little python script that I am using to download a whole bunch of PDF files for archiving. The problem I have is that when I download the files, they appear correctly under the correct title, but they are the wrong size and they can't be opened by Acrobat, which fails with an error message saying Out of memory or Insufficient data for an image or some other arbitrary Acrobat error. Viewing the content of the page in a text editor looks a bit like a PDF document, by which I mean it is incomprehensible in general but with a few fragments of text and markup, including PDF identifiers.
The code to download the file is this:
def download_file( file_id):
folder_path = ".\\pdf_files\\"
file_download="http://myserver/documentimages.asp?SERVICE_ID=RETRIEVE_IMAGE&documentKey="
file_content = urllib.urlopen(file_download+file_id, proxies={})
file_local = open( folder_path + file_id + '.pdf', 'w' )
file_local.write(file_content.read())
file_content.close()
file_local.close()
If the same file is downloaded through a browser it looks fine, but is also larger on the disk. I am guessing that the problem might be to do with the encoding of the file when it is saved?

You need to write it as a binary file so:
file_local = open( folder_path + file_id + '.pdf', 'wb' )

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I scrape the text from a FTP server using Python? - python

Related

Copy the text file data using Python FTPLIB without copying it to local path [duplicate]

Downloading a file with a wildcard from Sharepoint URL with Python

How to Retrieve a Zip Folder from FTP in Python

Automated download of NetCDF file

PDF files downloaded with Python cannot be opened in acrobat

Categories

Resources