Downloading images using requests in Python3 - python

I need to download an image from a url using Python. I'm using this to do so:
import requests
with requests.get(url, stream=True) as r:
with open(img_path, "wb") as f:
f.write(r.content)
In order for me to see the image in the browser, I need to be logged into my account on that site. The image may have been sent by someone else or myself.
The issue is that I am able to download some images successfully, but for other ones, I get an authentication error, i.e. that I'm not logged in.
In that case too, sometimes it downloads a file whose content is this:
{"result":"error","msg":"Not logged in: API authentication or user session required"}
And sometimes, it downloads the html file of the webpage which asks me to login to view the image.
Why am I getting this error for just some cases and not others? And how should I fix it?

Use Response.content to get the image data as bytes and then write it to a file opened in wb (write binary) mode:
import requests
image_url = "https://www.python.org/static/community_logos/python-logo-master-v3-TM.png"
img_data = requests.get(image_url).content
with open('image_name.jpg', 'wb') as f:
f.write(img_data)
Note, for authorization:
from requests.auth import HTTPBasicAuth
img_data = requests.get('image_url', auth=HTTPBasicAuth('user', 'pass')).content

You can either use the response.raw file object, or iterate over the response.
import requests
import shutil
from requests.auth import HTTPBasicAuth
r = requests.get(url, auth=HTTPBasicAuth('user', 'pass'), stream=True)
if r.status_code == 200:
with open(path, 'wb') as f:
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)

Related

Python: open a video file from an url the same way open("filepath", "rb") does for local files

I want to upload short videos through an API connection (which one is not relevant for the question). The videos that will be uploaded are already on a server (publicly accessible, so that is not the issue) with a direct link (eg: 'https://nameofcompany.com/uploads/videoname.mp4").
I am using the Requests library, so the post request looks like this:
requests.post(url, files={'file': OBJECT_GOES_HERE}, headers=headers)
The object should be a 'bytes-like object', so with a local file we can do:
requests.post(url, files={'file': open('localfile.mp4', 'rb')}, headers=headers)
I tested this with a local file and this works. However, as mentioned I need to upload it from the link, so how do I do that? Is there some method (or some library with a method) that would return the same type of response like the open() method does for local files? If not, how could I create one myself?
import requests
from io import BytesIO
url = 'https://nameofcompany.com/uploads/videoname.mp4'
r = requests.get(url)
video = r.content
# This is probably enough:
requests.post(url2, files={'file': video}, headers=headers)
# But if not, here's an example of using BytesIO to treat bytes as a file:
requests.post(url2, files={'file': open(BytesIO(video), 'rb')}, headers=headers)

Can't download csv.gz from website using Python

I'm currently trying to download a csv.gz file from the following link: https://www.cryptoarchive.com.au/bars/pair. As you can see, opening the link with a browser simply opens the save file dialogue. However, passing the link to requests or urllib simply downloads HTML as opposed to the actual file.
This is the current approach I'm trying:
EDIT: Updated to reflect changes I've made.
url = "https://www.cryptoarchive.com.au/bars/pair"
file_name = "test.csv.gz"
headers = {"PLAY_SESSION": play_session}
r = requests.get(url, stream=True, headers=headers)
with open(file_name, "wb") as f:
for chunk in r.raw.stream(1024, decode_content=False):
if chunk:
f.write(chunk)
f.flush()
The only saved cookie I can find is the PLAY_SESSION. Setting that as a header doesn't change the result I'm getting.
Further, I've tried posting a request to the login page like this:
login = "https://www.cryptoarchive.com.au/signup"
data = {"email": email,
"password": password,
"accept": "checked"}
with requests.Session() as s:
p = s.post(login, data=data)
print(p.text)
However, this also doesn't seem to work and I especially can't figure out what to pass to the login page or how to actually check the checkbox...
Just browsing that url from a private navigation shows the error:
Please login/register first.
To get that file, you need to login first into the site. Probably with the login you will get a session token, some cookie or something similar that you need to put in the request command.
Both #Daniel Argüelles' and #Abhyudaya Sharma's answer have helped me. The solution was simply getting the PLAY_SESSION cookie after logging into the website and passing it to the request function.
cookies = {"PLAY_SESSION": play_session}
url = "https://www.cryptoarchive.com.au/bars/pair"
r = requests.get(url, stream=True, cookies=cookies)
with open(file_name, "wb") as f:
for chunk in r.raw.stream(1024, decode_content=False):
if chunk:
f.write(chunk)
f.flush()

Why request fails to download an excel file from web?

the url link is the direct link to a web file (xlsb file) which I am trying to downlead. The code below works with no error and the file seems created in the path but once I try to open it, corrupt file message pops up on excel. The response status is 400 so it is a bad request. Any advice on this?
url = 'http://rigcount.bakerhughes.com/static-files/55ff50da-ac65-410d-924c-fe45b23db298'
file_name = r'local path with xlsb extension'
with open(file_name, "wb") as file:
response = requests.request(method="GET", url=url)
file.write(response.content)
Seems working for me. Try this out:
from requests import get
url = 'http://rigcount.bakerhughes.com/static-files/55ff50da-ac65-410d-924c-fe45b23db298'
# make HTTP request to fetch data
r = get(url)
# check if request is success
r.raise_for_status()
# write out byte content to file
with open('out.xlsb', 'wb') as out_file:
out_file.write(r.content)

Beautiful Soup - urllib.error.HTTPError: HTTP Error 403: Forbidden

I am trying to download a GIF file with urrlib, but it is throwing this error:
urllib.error.HTTPError: HTTP Error 403: Forbidden
This does not happen when I download from other blog sites. This is my code:
import requests
import urllib.request
url_1 = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'
source_code = requests.get(url_1,headers = {'User-Agent': 'Mozilla/5.0'})
path = 'C:/Users/roysu/Desktop/src_code/Python_projects/python/web_scrap/myPath/'
full_name = path + ".gif"
urllib.request.urlretrieve(url_1,full_name)
Don't use urllib.request.urlretrieve. Instead, use the requests library like this:
import requests
url = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'
path = "D:\\Test.gif"
response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
file = open(path, "wb")
file.write(response.content)
file.close()
Output:
Hope that this helps!
Solution:
The remote server is apparently checking the user agent header and rejecting requests from Python's urllib.
urllib.request.urlretrieve() doesn't allow you to change the HTTP headers, however, you can use
urllib.request.URLopener.retrieve():
import urllib.request
url_1='https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'
path='/home/piyushsambhi/Downloads/'
full_name= path + "testimg.gif"
opener = urllib.request.URLopener()
opener.addheader('User-Agent', 'Mozilla/5.0')
filename, headers = opener.retrieve(url_1, full_name)
print(filename)
NOTE: You are using Python 3 and these functions are now considered part of the "Legacy interface", and URLopener has been deprecated. For that reason you should not use them in new code.
Your code imports requests, but you don't use it - you should though because it is much easier than urllib. Below mentioned code snippet works for me:
import requests
url = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'
path='/home/piyushsambhi/Downloads/'
full_name= path + "testimg1.gif"
r = requests.get(url)
with open(full_name, 'wb') as outfile:
outfile.write(r.content)
NOTE: CHANGE THE PATH VARIABLE ACCORDING TO YOUR MACHINE AND ENVIRONMENT

How do i download pdf file over https with python

I am writing a python script, which will save pdf file locally according to the format given in URL. for eg.
https://Hostname/saveReport/file_name.pdf #saves the content in PDF file.
I am opening this URL through python script :
import webbrowser
webbrowser.open("https://Hostname/saveReport/file_name.pdf")
The url contains lots of images and text. Once this URL is opened i want to save a file in pdf format using python script.
This is what i have done so far.
Code 1:
import requests
url="https://Hostname/saveReport/file_name.pdf" #Note: It's https
r = requests.get(url, auth=('usrname', 'password'), verify=False)
file = open("file_name.pdf", 'w')
file.write(r.read())
file.close()
Code 2:
import urllib2
import ssl
url="https://Hostname/saveReport/file_name.pdf"
context = ssl._create_unverified_context()
response = urllib2.urlopen(url, context=context) #How should i pass authorization details here?
html = response.read()
In above code i am getting: urllib2.HTTPError: HTTP Error 401: Unauthorized
If i use Code 2, how can i pass authorization details?
I think this will work
import requests
import shutil
url="https://Hostname/saveReport/file_name.pdf" #Note: It's https
r = requests.get(url, auth=('usrname', 'password'), verify=False,stream=True)
r.raw.decode_content = True
with open("file_name.pdf", 'wb') as f:
shutil.copyfileobj(r.raw, f)
One way you can do that is:
import urllib3
urllib3.disable_warnings()
url = r"https://websitewithfile.com/file.pdf"
fileName = r"file.pdf"
with urllib3.PoolManager() as http:
r = http.request('GET', url)
with open(fileName, 'wb') as fout:
fout.write(r.data)
You can try something like :
import requests
response = requests.get('https://websitewithfile.com/file.pdf',verify=False, auth=('user', 'pass'))
with open('file.pdf','w') as fout:
fout.write(response.read()):
For some files - at least tar archives (or even all other files) you can use pip:
import sys
from subprocess import call, run, PIPE
url = "https://blabla.bla/foo.tar.gz"
call([sys.executable, "-m", "pip", "download", url], stdout=PIPE, stderr=PIPE)
But you should confirm that the download was successful some other way as pip would raise error for any files that are not archives containing setup.py, hence stderr=PIPE (Or may be you can determine if the download was successful by parsing subprocess error message).

Categories

Resources