I am trying to download pdf file. I use request. Here is the code.
url = 'https://unistream.ru/upload/iblock/230/230283e15180d590198137eba4e70644.PDF'
r = requests.get(url, allow_redirects=False)
pdf_url = r.url
with open('C:\\Users\\piolv\\Desktop\\pdf_folder\\work_file.PDF', 'wb') as f:
f.write(r.content)
print(r.content)
It is this link that opens in the browser, but it is not possible to download it. As I found out, the site does a redirect - status code 302 . But in the standard way allow_redirects=False , I can’t get this file to be downloaded. What am I doing wrong? Where is the mistake? Thanks
Related
I get some response from external service. Then need to get url from this response and by this url download a file.
When i pass url to urlretrieve from response.text, urlretrieve return an Error.
But when i manually copy the url. Then set variable in python. url = 'https://My_service_site.com/9a57v4db5_2023-02-14.csv.gz'.
urlretrieve works fine and download the file to computer by this link.
response = requests.post(url, json=payload, headers=headers)
#method 1 - get error
url = response.text[17:-2] #get the link like 'https://my_provide_name.com/csv_exports/5704d5.csv.gz'
urlrtv = urllib.request.urlretrieve(url=url, filename='C:\\Users\\UserName\\Downloads\\test4.csv.gz')
>>return error: HTTP Error 404 Not Found
#method 2 - works fine
url2 = 'https://my_provide_name.com/csv_exports/5704d5.csv.gz'
urlrtv=urllib.request.urlretrieve(url=url2, filename='C:\\Users\\UserName\\Downloads\\test4.csv.gz')
>>works fine
When i copy url from method 1 and put in browser. It works fine.
Edit:
To be more precise i have tried to get url not like that response.text[17:-2]. Insted use json.loads to parse url from response. But still got the error
a = json.loads(response.text)
>>{'csv_file_url': 'https://service_name.com/csv_exports/746d6.csv.gz'}
url = a['csv_file_url']
print(url)
>>https://service_name.com/csv_exports/746d6.csv.gz
Solved: Just add time.sleep(3) before downloading file.
url = response.json()['csv_file_url']
time.sleep(3)
urlrtv = urllib.request.urlretrieve(url=url, filename=f'{storage_path}{filename}')
I'm currently trying to download a csv.gz file from the following link: https://www.cryptoarchive.com.au/bars/pair. As you can see, opening the link with a browser simply opens the save file dialogue. However, passing the link to requests or urllib simply downloads HTML as opposed to the actual file.
This is the current approach I'm trying:
EDIT: Updated to reflect changes I've made.
url = "https://www.cryptoarchive.com.au/bars/pair"
file_name = "test.csv.gz"
headers = {"PLAY_SESSION": play_session}
r = requests.get(url, stream=True, headers=headers)
with open(file_name, "wb") as f:
for chunk in r.raw.stream(1024, decode_content=False):
if chunk:
f.write(chunk)
f.flush()
The only saved cookie I can find is the PLAY_SESSION. Setting that as a header doesn't change the result I'm getting.
Further, I've tried posting a request to the login page like this:
login = "https://www.cryptoarchive.com.au/signup"
data = {"email": email,
"password": password,
"accept": "checked"}
with requests.Session() as s:
p = s.post(login, data=data)
print(p.text)
However, this also doesn't seem to work and I especially can't figure out what to pass to the login page or how to actually check the checkbox...
Just browsing that url from a private navigation shows the error:
Please login/register first.
To get that file, you need to login first into the site. Probably with the login you will get a session token, some cookie or something similar that you need to put in the request command.
Both #Daniel Argüelles' and #Abhyudaya Sharma's answer have helped me. The solution was simply getting the PLAY_SESSION cookie after logging into the website and passing it to the request function.
cookies = {"PLAY_SESSION": play_session}
url = "https://www.cryptoarchive.com.au/bars/pair"
r = requests.get(url, stream=True, cookies=cookies)
with open(file_name, "wb") as f:
for chunk in r.raw.stream(1024, decode_content=False):
if chunk:
f.write(chunk)
f.flush()
the url link is the direct link to a web file (xlsb file) which I am trying to downlead. The code below works with no error and the file seems created in the path but once I try to open it, corrupt file message pops up on excel. The response status is 400 so it is a bad request. Any advice on this?
url = 'http://rigcount.bakerhughes.com/static-files/55ff50da-ac65-410d-924c-fe45b23db298'
file_name = r'local path with xlsb extension'
with open(file_name, "wb") as file:
response = requests.request(method="GET", url=url)
file.write(response.content)
Seems working for me. Try this out:
from requests import get
url = 'http://rigcount.bakerhughes.com/static-files/55ff50da-ac65-410d-924c-fe45b23db298'
# make HTTP request to fetch data
r = get(url)
# check if request is success
r.raise_for_status()
# write out byte content to file
with open('out.xlsb', 'wb') as out_file:
out_file.write(r.content)
I have tried to upload a pdf by sending a POST Request to an API in R and in Python but I am not having a lot of success.
Here is my code in R
library(httr)
url <- "https://envoc-apply-api.azurewebsites.net/api/apply"
POST(url, body = upload_file("filename.pdf"))
The status I received is 500 when I want a status of 202
I have also tried with the exact path instead of just the filename but that comes up with a file does not exist error
My code in Python
import requests
url ='https://envoc-apply-api.azurewebsites.net/api/apply'
files = {'file': open('filename.pdf', 'rb')}
r = requests.post(url, files=files)
Error I received
FileNotFoundError: [Errno 2] No such file or directory: 'filename.pdf'
I have been trying to use these to guides as examples.
R https://cran.r-project.org/web/packages/httr/vignettes/quickstart.html
Python http://requests.readthedocs.io/en/latest/user/quickstart/
Please let me know if you need any more info.
Any help will be appreciated.
You need to specify a full path to the file:
import requests
url ='https://envoc-apply-api.azurewebsites.net/api/apply'
files = {'file': open('C:\Users\me\filename.pdf', 'rb')}
r = requests.post(url, files=files)
or something like that: otherwise it never finds filename.pdf when it tries to open it.
I am using python requests library to login the site:
http://www.zsrcpod.aviales.ru/login/login_auth2.pl
and then trying to get a file, but authentication fails and i am gettin redirect to login page.
I already had some experience in using requests library for other sites and it works fine with .seesion, but this script is not working:
login_to_site_URL= r'http://www.zsrcpod.aviales.ru/login/login_auth2.pl'
URL = r"http://www.zsrcpod.aviales.ru/modistlm-cgi/seances.pl?db=modistlm"
payload = {r'login': r'XXXXXX',
r'second': r'XXXXXX'}
with requests.session() as s:
s.post(login_to_site_URL, payload)
load = s.get(URL, stream=True)
# download
with open('G:\!Download\!TEST.html', "wb") as save_command:
for chunk in load.iter_content(chunk_size=1024):
if chunk:
save_command.write(chunk)
save_command.flush()
I need a help with adopting it.