Python Request .csv 401 Client Error: Unauthorized for URL - python

I am trying to download a csv file from an authorized website.
I am able to get respond code of 200 with url https://workspace.xxx.com/abc/ (click in this web page to download the csv) but respond code of 401 at url = 'https://workspace.xxx.com/abc/abc.csv'
This is my code:
import requests
r = requests.get(url, auth=('myusername', 'mybasicpass'))
I tried adding header and using session but still get respond code of 401.

First of all, you have to investigate how the website accepts the password.
They might be using HTTP authentication or Authorization header in the request.
You can log in using their website and then download the file .study how they are passing authorization.
I am sure they are not accepting plain passwords in authorization they might be encoding it in base64 or another encoding scheme.
My advice to you is to open the developer console and study their requests in network tab. You can post more information so one could help you more.

Related

How do I get the response cookie after logging with the Requests library?

I am trying to scrape a website to get the shipping information for my company. I am able to log in to the website using Python's request library. The issue I am facing is that after I log in and try to navigate to a different URL that has the information I need the cookies change and logs me out.
When I look at the network in the dev tools I see that the cookies that it changes to are the response cookies. When I use .cookies to see if it was getting picked up, it only shows the request cookies.
I tried setting up a persistent sessions but that did not help. I then tried saving the cookies and got nowhere with that. I am not sure what else to do.
url = 'http://website/login'
creds = {'_method':'****','username':'*****','password':'*****'}
response = requests.post(url,data=creds)
token = response.cookies
response = requests.get('http://webiste/reports/view/17',cookies=token)
You can try token = response.headers['Set-Cookie'].

How to get CSV file from URL with authentication in Python

I'm working on a project on physionet.I have physionet credentials also.
I'm trying to get csv file directly into my python notebook instead of downloading it into my local machine.
I tried many ways but all are giving me 403.
I tried with User-Agent as well but it didn't work.
Anyone can suggest someway to do it.
from requests.auth import HTTPDigestAuth
url = 'https://physionet.org/files/nch-sleep/3.1.0/'
requests.get(url, auth=HTTPDigestAuth('username', 'pass'))
+++++++++++++++++++++++++++
if this is an HTML login, i dont think you would be able to mimic with requests call unless you know all the request header parameters, even though you provide username&password. if you want really an API call for your login, you need to hit the actual login api endpoint with basic auth credentials and tokens and with the response token, hit your actual endpoint to read your csv files.

How to properly login to reddit via python requests?

I am trying to login to my reddit account using python, not Praw. I succufully extracted the CSRF token, but still don't know why it is not functional and unable to properly login. The desired behavior of the code is to be able to login to reddit through this python script and then confirm if the logging was a success or not.
When you try to login on your browser, in the developer tools (Network tab) you can preview how the request should look. The mistake you made here is that the POST request should be sent not to: https://www.reddit.com/, but https://www.reddit.com/login.
To verify the result you can check, in the debugger, the response you received to your POST request. When I ran your code with the mentioned change of URL the response to "r" was: response
You can see that the server returned response status code 400 with explanation "incorrect username or password". It means that your code should be enough to login to Reddit with inputting the correct credentials.

Python "requests" module errors sending follow request on Instagram

I am using the following script:
import requests
import json
import os
COOKIES = json.loads("") #EditThisCookie export here (json) to send requests
COOKIEDICTIONARY = {}
for i in COOKIES:
COOKIEDICTIONARY[i['name']] = i['value']
def follow(id):
post = requests.post("https://instagram.com/web/friendships/" + id + "/follow/", cookies=COOKIEDICTIONARY)
print(post.text)
follow('309438189')
os.system("pause")
This script is supposed to send a follow request to the user, '3049438189' on Instagram. However, if the code is run, the post.text outputs some HTML code, including
"This page could not be loaded. If you have cookies disabled in your
browser, or you are browsing in Private Mode, please try enabling
cookies or turning off Private Mode, and then retrying your action."
It's supposed to append the cookies to the variable, COOKIEDICTIONARY in a "requests" module readable format. If you print the array (I don't know what it's called in Python), it replies with all of the cookies and their values.
The cookies put in are valid and the requests syntax (I believe to be) is correct.
I have fixed it. The problem was certain headers that I needed were not present, such as Origin (I will get the full list soon). For anybody who wants to imitate any instagram post request, you need those headers or it will error.

Why doesn't urllib2 throw a 404?

I have a public folder in Google Drive, in which I store pictures.
In Python, I am trying to detect if a picture with a particular name exist or not. I am using this code:
import urllib2
url = "http://googledrive.com/host/0B7K23HtYjKyBfnhYbkVyUld3YUVqSWgzWm1uMXdrMzQ0NlEwOXVUd3o0MWVYQ1ZVMlFSNms/0000.png"
resp = urllib2.urlopen(url)
print resp.getcode()
And even though there is no file with this name in this folder, this code is not throwing an exception and is printing "200" as the return code. I have checked in my browser and this URL (http://googledrive.com/host/0B7K23HtYjKyBfnhYbkVyUld3YUVqSWgzWm1uMXdrMzQ0NlEwOXVUd3o0MWVYQ1ZVMlFSNms/0000.png) does return a 404, after a few redirects.
Why doesn't urllib2 detect that this file actually doesn't exist?
When you make the request, your request goes to google's web servers and is processed there. If and only if google's servers were to return a 404, would you see a 404 on your end; urllub2 simply encapsulates the underlying handshaking and data transfer logic.
In this particular case, google's server side code requires the request to be authenticated, and your request url is simply unauthenticated. As such, the request is redirected to the login page, and since this is a valid existing page/response, urllib2 shows the correct code 200. You can get the same page if you open the link in a private window.
However, if you are authenticated and then open the url (basically logged into your gmail/googgle docs account), you would get the 404 error.

Categories

Resources