Download multiple files from FTP share? [duplicate] - python

This question already has answers here:
Downloading a directory tree with ftplib
(6 answers)
Closed 2 years ago.
I know this question has been asked multiple times, but none of the solutions actually worked so far.
I would like to pull some files to a web tool based on an URL.
This seems to be an FTP share but using
import ftplib
url = 'ftp://ftp.ebi.ac.uk/pub/databases/metabolights/studies/public/MTBLS1167'
ftp = ftplib.FTP(url)
6 ftp = ftplib.FTP(url)
gaierror: [Errno -2] Name or service not known
It is easy to download single files with wget:
wget.download(url+'/'+filename, out=ms_dir)
However, the python implementation of wget does not have all features of the Linux tool implemented. So, something like wget.download(url+'/*.*', out=ms_dir) does not work.
Therefore, I need to pull the list of files that I want to download first and download the files one by one. I tried beautifulsoup, requests, urllib. But all the solutions seem over-complicated for a problem that was probably solved a million times ten years ago, or don't work at all.
However, e.g.
import requests
response = requests.get(url, params=params)
InvalidSchema: No connection adapters were found for...
import urllib3
http = urllib3.PoolManager()
r = http.request('GET', url)
URLSchemeUnknown: Not supported URL scheme ftp
And so on. I am not sure what I am doing wrong here.

import ftplib
from urllib.parse import urlparse
def get_files_from_ftp_directory(url):
url_parts = urlparse(url)
domain = url_parts.netloc
path = url_parts.path
ftp = ftplib.FTP(domain)
ftp.login()
ftp.cwd(path)
filenames = ftp.nlst()
ftp.quit()
return filenames
get_files_from_ftp_directory(URL)
Thanks, I was using the whole URL instead of just the domain to login. I use this function to get the filenames and then download them with the more comfortable wget api.

Related

Download a giant file with HTTP and upload to FTP server without storing it

I have a project which I should download some giant files using HTTP and upload them to am FTP server. The simplest way is to first download the file and then upload it to FTP; Thinking it as two independent stages.
But can we use streams to upload the file as it is being downloaded? This way seems more efficient. Any example specially in python are welcome.
Use requests module to obtain a file-like object representing the HTTP download and use it with ftplib FTP.storbinary:
from ftplib import FTP
import requests
url = "https://www.example.com/file.zip"
r = requests.get(url, stream=True)
ftp = FTP(host, user, passwd)
ftp.storbinary("STOR /ftp/path/file.zip", r.raw)

SSL: CERTIFICATE_VERIFY_FAILED request.get [duplicate]

This question already has answers here:
"SSL: certificate_verify_failed" error when scraping https://www.thenewboston.com/
(7 answers)
Closed 4 years ago.
I am trying to download this url which is a frame on this page.
I have tried like this:
import urllib.request
url = 'https://tips.danskespil.dk/tips13/?#/tps/poolid/2954'
response = urllib.request.urlopen(url)
html = response.read()
and also this way:
import requests
page = requests.get(url)
but both ways give me the error: SSL: CERTIFICATE_VERIFY_FAILED request.get
Any help would be much appriciated.
If you're not worried about safety (which you should be) your best bet is to use verify=False in the request function.
page = requests.get(url, verify=False)
You can also set verify to a directory of certificates with trusted CAs like so
verify = '/path/to/certfile'
You can refer to the documentation here for all the ways to get around it

How to download file using Python? [duplicate]

This question already has answers here:
How to download a file over HTTP?
(30 answers)
Closed 4 years ago.
I'm completely new to Python, I want to download a file by sending a request to the server. When I type it into my browser, I see the CSV file is downloaded, but when I try sending a get request it does not return anything. for example:
import urllib2
response = urllib2.urlopen('https://publicwww.com/websites/%22google.com%22/?export=csv')
data = response.read()
print 'data: ', data
It does not show anything, how can I handle that? When I search on the web, all the questions are about how to send a get request. I can send the get request, but I have no idea of how the file can be downloaded as it is not in the response of the request.
I do not have any idea of how to find a solution for that.
You can use the urlretrieve to download the file
EX:
u = "https://publicwww.com/websites/%22google.com%22/?export=csv"
import urllib
urllib.request.urlretrieve (u, "Ktest.csv")
You can also download a file using requests module in python.
import shutil
import requests
url = "https://publicwww.com/websites/%22google.com%22/?export=csv"
response = requests.get(url, stream=True)
with open('file.csv', 'wb') as out_file:
shutil.copyfileobj(response.raw, out_file)
del response
import os
os.system("wget https://publicwww.com/websites/%22google.com%22/?export=csv")
You could try wget, if you have it.

How can I use speech recognition in Python on a proxy network?

It works quite well on a proxy free network but whenever I try to run it on a proxy network it gives this error.
Could not request results from Google STT; recognition connection failed: [Errno 11001] getaddrinfo failed
Github link for the code
Please help.
I was having similar issues for the last few days and after a lot of research I found out that sadly Google Speech API currently does not support proxies.
https://github.com/GoogleCloudPlatform/java-docs-samples/issues/1061#issuecomment-373478268
After many days working this issue I realized that speech_recognition python package, for desktop aplications (no cloud) is based on urllib that doesn´t support to proxies.
So I had to chage some code lines inside the google package speech recoginition to use requests package (which supports proxies) instead of urllib, such way to make it works:
In your <Home_python>\Lib\site-packages\speech_recognition folder open the init.py file and create a new method named def recognize_google_proxy():
Then copy def recognize_google() codes into def recognize_google_proxy().
Now you have two distict objects that makes the same thing.
Next, In the recognize_google_proxy() object,
Change lines like described here:
comment comand line bellow with #
#request = Request(url, data=flac_data, headers={"Content-Type": "audio/x-flac; rate={}".format(audio_data.sample_rate)})
Insert to replace bellow code snipet:
import requests
http_proxy = "http://<your proxy url>:port"
https_proxy = "https://<your proxy url>:port"
proxyDict = {
"http" : http_proxy,
"https" : https_proxy
}
Comment comand line bellow with #
#response = urlopen(request, timeout=self.operation_timeout)
Insert to replace bellow code snipet:
response = requests.post(url, data=flac_data, headers={"Content-Type": "audio/x-flac; rate={}".format(audio_data.sample_rate)},proxies=proxyDict)
Save init.py
import speech_recognition
That's all.
Now you have available one object without proxy support and using urllib named recognize_google() and another object, with proxy support, using requests named recognize_google_proxy().

Newbie: update changing IP using urlopen with https and do login

This is a newbie problem with python, advice is much appreciated.
no-ip.com provides an easy way to update a computer's changing ip-address, simply open the url
http://user:password#dynupdate.no-ip.com/nic/update?hostname=my.host.name
...both http and https work when entered in firefox. I tried to implement that in a script residing in "/etc/NetworkManager/dispatcher.d" to be used by Network Manager on a recent version of Ubuntu.
What works is the python script:
from urllib import urlopen;
urlopen("http://user:password#dynupdate.no-ip.com/nic/update?hostname=my.host.name")
What I want to have is the same with "https", which does not work as easily. Could anyone, please,
(1) show me what the script should look like for https,
(2) give me some keywords, which I can use to learn about this.
(3) perhaps even explain why it does not work any more when the script is changed to using "urllib2":
from urllib2 import urlopen;
urlopen("http://user:password#dynupdate.no-ip.com/nic/update?hostname=my.host.name")
Thank you!
The user:password part isn't in the actual URL, but a shortcut for HTTP authentication. The browser's URL parsing lib will filter them out. In urllib2, you want to
import base64, urllib2
user,password = 'john_smith','123456'
request = urllib2.Request('dynupdate.no-ip.com/nic/update?hostname=my.host.name')
auth = base64.base64encode(user + ':' + password)
request.add_header('Authorization', 'Basic ' + auth)
urllib2.urlopen(request)

Categories

Resources