I'm attempting to access data by first logging in to the following site:
https://accessns.nscorp.com/accessNS/login/
It looks like it sends this requests to the backend:
Request URL: https://accessns.nscorp.com/accessNS/rest/auth/login
Request Method: POST
Status Code: 200 OK
Remote Address:167.121.11.85:443
Referrer Policy: no-referrer-when-downgrade
When I send that request in python, I'm able to get a response back with a CSRF Token.
Next, I would like to access:
Request URL:
https://accessns.nscorp.com/accessNS/rest/backend/ServicesIndustrial/services/industrial2/v2/onsite/details
Request Method: POST
Status Code: 200 OK Remote Address:
167.121.11.85:443
Referrer Policy: no-referrer-when-downgrade
Which takes the following headers:
Accept: /
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Content-Length: 58
Content-Type: application/json
Cookie: [some cookies]
CSRFTOKEN: [from auth request above]
Host: accessns.nscorp.com
Origin: https://accessns.nscorp.com
Referer: https://accessns.nscorp.com/accessNS/legacy/
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36
Payload:
{userId: "username", classCode: "classcode", stationCode: "stationcode"}
My code fails when I attempt to use CSRF Token and cookies from the authorization request into the secondary request, but it doesn't fail when I manually login to the site, get the token/cookies, and hard code them into python. In fact, all I need in terms of headers for the secodary request is CSRF Token and cookies to get a viable response.
with requests.Session() as s:
login = {'Id': 'username', 'pwd': 'password'}
auth = s.post('https://accessns.nscorp.com/accessNS/rest/auth/login',
json=login)
headers = {}
headers['CSRFTOKEN'] = auth.json()['response']['token']
headers['Cookie'] = '; '.join('='.join((i.name, i.value)) for i in
auth.cookies)
payload = {'userId': 'username', 'classCode': 'classcode',
'stationCode': 'stationcode'}
url =
'https://accessns.nscorp.com/accessNS/rest/backend/ServicesIndustrial/
services/industrial2/v2/onsite/details'
inv= s.post(url, json=payload, headers=headers)
print(inv.json())
I expect to get a json response with the data I've requests, however, using my code, I get:
{'time': 1559217902355, 'message': 'Object reference not set to an instance of an object.. Reference number: 1559217902355', 'cause': 'Invalid user input', 'isError': True}, which to me seems like it's an issue with my headers/payload, or I'm missing an intermediate step.
When I hardcode the csrf token/cookies and don't use a requests session, I get the response I want.
Figured it out, my code works fine. Turns out the secondary request was being made to the wrong URL, which is strange because that's what's listed in every developer console I've checked.
Related
I'm trying to scrap this website https://triller.co/ , so I want to get information from profile pages like this https://triller.co/#warnermusicarg , what I do is trying to request the json url that contains the information, in this case it's https://social.triller.co/v1.5/api/users/by_username/warnermusicarg
When I use requests.get() it works normally and I can retrieve all the information.
import requests
import urllib.parse
from urllib.parse import urlencode
url = 'https://social.triller.co/v1.5/api/users/by_username/warnermusicarg'
headers = {'authority':'social.triller.co',
'method':'GET',
'path':'/v1.5/api/users/by_username/warnermusicarg',
'scheme':'https',
'accept':'*/*',
'accept-encoding':'gzip, deflate, br',
'accept-language':'ar,en-US;q=0.9,en;q=0.8',
'authorization': 'Bearer eyJhbGciOiJIUzI1NiIsImlhdCI6MTY0MDc4MDc5NSwiZXhwIjoxNjkyNjIwNzk1fQ.eyJpZCI6IjUyNjQ3ODY5OCJ9.Ds-acbfcGSeUrGDSs47pBiT3b13Eb9SMcB8BF8OylqQ',
'origin':'https://triller.co',
'sec-ch-ua':'" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"',
'sec-ch-ua-mobile':'?0',
'sec-ch-ua-platform':'"Windows"',
'sec-fetch-dest':'empty',
'sec-fetch-mode':'cors',
'sec-fetch-site':'same-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
response = requests.get(url, headers=headers)
The problem arises when I try to use an API proxy providers as Webscraping.ai, ScrapingBee, etc
api_key='my_api_key'
api_url='https://api.webscraping.ai/html?'
params = {'api_key': api_key, 'timeout': '20000', 'url':url}
proxy_url = api_url + urlencode(params)
response2 = requests.get(proxy_url, headers=headers)
This gives me this error
2022-01-08 22:30:59 [urllib3.connectionpool] DEBUG: https://api.webscraping.ai:443 "GET /html?api_key=my_api_key&timeout=20000&url=https%3A%2F%2Fsocial.triller.co%2Fv1.5%2Fapi%2Fusers%2Fby_username%2Fwarnermusicarg&render_js=false HTTP/1.1" 502 91
{'status_code': 403, 'status_message': '', 'message': 'Unexpected HTTP code on the target page'}
What I tried to do is:
1- I searched for the meaning of 403 code in the documentation of my API proxy provider, it said that api_key is wrong, but I'm 100% sure it's correct,
Also, I changed to another API proxy provider but the same issue,
Also, I had the same issue with twitter.com
And I don't know what to do?
Currently, the code on the question successfully returns a response with code 200, but there are 2 possible issues:
Some sites block datacenter proxies, try to use proxy=residential API parameter (params = {'api_key': api_key, 'timeout': '20000', proxy: 'residential', 'url':url}).
Some of the headers on your headers parameter are unnecessary. Webscraping.AI uses its own set of headers to mimic the behaviors of normal browsers, so setting custom user-agent, accept-language, etc., may interfere with them and cause 403 responses from the target site. Use only the necessary headers. Looks like it will be only the authorization header in your case.
I don't know exactly what caused this error but I tried using their webscraping_ai.ApiClient() instance as in here and it worked,
configuration = webscraping_ai.Configuration(
host = "https://api.webscraping.ai",
api_key = {
'api_key': 'my_api_key'
}
)
with webscraping_ai.ApiClient(configuration) as api_client:
# Create an instance of the API class
api_instance = webscraping_ai.HTMLApi(api_client)
url_j = url # str | URL of the target page
headers = headers
timeout = 20000
js = False
proxy = 'datacenter'
api_response = api_instance.get_html(url_j, headers=headers, timeout=timeout, js=js, proxy=proxy)
I'm trying to provide an "mfa-code" parameter to a post request using python requests, but the response I'm getting is that the parameter "mfa-code" is missing, even though I try and provide it via requests.post(url, data={"mfa-code": "0000"}) and also tried requests.post(url, json={"mfa-code": "0000"}).
Here is what I'm trying to send.
POST /login2 HTTP/1.1
Host: redacted.net
Cookie: session=qexMWyQnLtSlBI8B005qnVW4OYvEwEV2; verify=wiener
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: https://redacted.net/login2
Content-Type: application/x-www-form-urlencoded
Content-Length: 13
Origin: https://redacted.net
Upgrade-Insecure-Requests: 1
Dnt: 1
Sec-Gpc: 1
Te: trailers
Connection: close
mfa-code=0000
And this is the request I'm sending with my python script
import requests
url = "redacted.net"
data={"mfa-code": "0000"}
r = reqeusts.post(url, data=data)
print(r.text)
This results in a response only stating:
"Missing parameter 'mfa-code'"
I took note in the response and how mfa-code is surrounded with ', so I went to burp repeater and put single quotes on mfa-code and sure enough received the same error.
I then tried with other options like json=json.dumps(data), but to the same result as the requests needs a POST body parameter of the type variable=data and not a json object.
What am I missing here?
Or is this something python requests cannot do?
I want to get the data points from the graph below (https://index.mysteel.com/price/getChartMultiCity_1_0.html)
I found out from Developer Tools -> Network -> XHR that a request is made when I explore the graph. The response has all the data needed, like date and price.
If I just copy the request URL and fetch the data with python's requests, I get the correct response. Example:
import requests
r = requests.get('https://index.mysteel.com/zs/newprice/getBaiduChartMultiCity.ms?catalog=%25E8%259E%25BA%25E7%25BA%25B9%25E9%2592%25A2_%3A_%25E8%259E%25BA%25E7%25BA%25B9%25E9%2592%25A2&city=%25E4%25B8%258A%25E6%25B5%25B7%3A15278&spec=HRB400E%252020MM_%3A_HRB400E_20MM&startTime=2021-08-10&endTime=2021-08-12&callback=json')
r.json()
{'marketDatas': .........}
However, if I change the values of endTime or startTime query parameters, I get a 401 error:
Missing authentication parameters!
It looks like I can only send requests that are already sent from my browser (I'm running a jupyter notebook in the same browser). Even if I add request headers like in the Network tab, I still get the same error:
params= {'catalog':'%E8%9E%BA%E7%BA%B9%E9%92%A2_:_%E8%9E%BA%E7%BA%B9%E9%92%A2' ,'city':'%E4%B8%8A%E6%B5%B7:15278' ,
'spec':'HRB400E%2020MM_:_HRB400E_20MM' ,'startTime':'2021-08-10', 'endTime':'2021-08-12', 'callback':'json'}
headers = {'Host': 'index.mysteel.com',
'User-Agent': 'Mozilla/5.0',
'Referer': 'https://index.mysteel.com/price/getChartMultiCity_1_0.html',
'appKey': '47EE3F12CF0C443F851EFDA73AC815',
'Cookie':'href=https%3A%2F%2Findex.mysteel.com%2Fprice%2FgetChartMultiCity_1_0.html; accessId=5d36a9e0-919c-11e9-903c-ab24d11b; pageViewNum=2'}
url = 'https://index.mysteel.com/zs/newprice/getBaiduChartMultiCity.ms'
r = requests.get(url, params=params, headers=headers)
What am I missing here? I am not authenticating myself in the browser, so there shouldn't be any authentication issues. Is there some other parameter missing in my headers dict?
PS: These are all the request headers:
Accept: */*
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9,sq;q=0.8,de;q=0.7
appKey: 47EE3F12CF0C443F851EFDA73AC815
Connection: keep-alive
Cookie: href=https%3A%2F%2Findex.mysteel.com%2Fprice%2FgetChartMultiCity_1_0.html; accessId=5d36a9e0-919c-11e9-903c-ab24d11b; pageViewNum=2
dnt: 1
Host: index.mysteel.com
Referer: https://index.mysteel.com/price/getChartMultiCity_1_0.html
sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92"
sec-ch-ua-mobile: ?0
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
sec-gpc: 1
sign: D404184969BF3B8C081A9F0C913AF68E
timestamp: 1629136315665
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36
version: 1.0.0
X-Requested-With: XMLHttpRequest
I am trying to use Python Requests library to POST a zipped file as multipart/form-data. I have currently used the Chrome Extension Advanced REST Client that is able to upload the file without a problem. However, I face difficulties while trying to do the same from the console using Python Requests.
The general information for the request is:
Remote Address:IP/Address/to/Host:Port
Request URL:/path/to/host/with/parameters/
Request Method:POST
The request headers from Advanced REST Client are:
Accept:*/*
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.8
Authorization:Basic/Authentication
Connection:keep-alive
Content-Length:1893
Content-Type:multipart/form-data; boundary=----WebKitFormBoundaryu3rhOVbU2LpT89zi
Host:/host/name
Origin:chrome-extension://hgmloofddffdnphfgcellkdfbfbjeloo
User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
The payload is as follows:
------WebKitFormBoundaryu3rhOVbU2LpT89zi
Content-Disposition: form-data; name="fileUpload"; filename="filename.zip"
Content-Type: application/x-zip-compressed
------WebKitFormBoundaryu3rhOVbU2LpT89zi--
I formatted this query in Python as follows:
import requests
authentication = requests.auth.HTTPBasicAuth(username=user, password=pass)
parameters = {} # with the appropriate parameters
url = '' # base URL
files = {'file': ('fileUpload', 'application/x-zip-compressed', {})}
response = requests.post(url, params = parameters, auth=authentication, files=files)
While the Chrome App, Advanced REST Client gives me a 200 OK response, I get a 400 response (bad query). What am I doing wrong?
Thanks!
I think I'm on the right track for ASP.NET authentication. I'm trying to use requests to pass credentials to a website. Here are the headers and network info I pulled from chrome:
Remote Address: REMOVED
Request URL: https://REMOVED/default.aspx
Request Method: POST
Status Code: 302 Found
Request Headers:
POST /default.aspx HTTP/1.1
Host: REMOVED
Connection: keep-alive
Content-Length: 928
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Origin: https://REMOVED
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: https://REMOVED/default.aspx
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Cookie: ASP.NET_SessionId=REMOVED; BIGipServerpool_REMOVED_dmz_80=REMOVED.REMOVED.0000; AUTHCDB=**REMOVED**
Form Data:
__EVENTTARGET:
__EVENTARGUMENT:
__VIEWSTATE: /wEP**REMAINDER REMOVED**
__EVENTVALIDATION: /wEd**REMAINDER REMOVED**
jsCheck:
ddlEngine:REMOVED:13008
Username:
Password:
btnLogin.x: 42
btnLogin.y: 9
btnLogin: Login
Response Headers:
Cache-Control: private
Content-Length: 132
Content-Type: text/html; charset=utf-8
Date: Fri, 13 Jun 2014 00:59:13 GMT
Location: /Dashboard.aspx
Server: Microsoft-IIS/7.5
Set-Cookie: AUTHCDB=**REMOVED**; path=/; HttpOnly
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Here is the script I wrote:
import requests
FORM_DATA = {
"__EVENTTARGET:":,
"__EVENTARGUMENT:",
"__VIEWSTATE:/wEPDwUKMTA5NTA5ODU1MQ9kFgJmD2QWAgIGDxBkDxYFZgIBAgICAwIEF***REMAINDER REMOVED***",
"__EVENTVALIDATION:/wEdAAp4d3BHvSTs+Kv6cxGP3xEbBr8xrgRYad2tj4YCyRIw5qUAjimf****REMAINDER REMOVED****",
"jsCheck:",
"ddlEngine: REMOVED:13008",
"Username: ****",
"Password: ****",
"btnLogin.x: 42",
"btnLogin.y: 9",
"btnLogin: Login",
}
HEADER = {
"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding":"gzip,deflate,sdch",
"Accept-Language":"en-US,en;q=0.8",
"Cache-Control":"max-age=0",
"Connection":"keep-alive",
"Content-Type":"application/x-www-form-urlencoded",
"Host":"REMOVED",
"Origin":"REMOVED",
"Referer":"REMOVED",
"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36"
}
LOGIN_URL = "REMOVED"
#requests session to handle cookies.
s = requests.Session()
#Send a POST request with the form data/header info
r = s.post(LOGIN_URL, data=FORM_DATA, headers=HEADER)
if r.status_code == 302:
print "Successfully logged in."
else:
print "Error logging in."
Am I able to use Python Requests to log into a webpage that uses ASP.NET? If so, is this the correct way to pass the credentials into the website? For reference, the website I'm trying to log into is a company server monitor.
Looks like a similar issue was posted here and here. I've been using RoboBrowser and it's made messing with ASPX so much simpler.
from robobrowser import RoboBrowser
login_url = 'http://example.com/Login.aspx'
username = 'JohnDoe'
password = 'passwd'
browser = RoboBrowser(history=True)
# This gets all the ASPX stuff, __VIEWSTATE and friends
browser.open(login_url)
signin = brower.get_form(id='aspnetForm')
signin["jsCheck"].value = ''
signin["ddlEngine"].value = "REMOVED:13008"
signin["Username"].value = username
signin["Password"].value = password
signin["btnLogin.x"].value = "42"
signin["btnLogin.y"].value = "9"
signin["btnLogin"].value = "Login"
browser.submit_form(signin)
I'm also working with some ASP.net pages right now, and being somewhat familiar with the requests module, I thought I'd try to help out a bit.
It is my understanding that requests supports basic authentication in this fashion:
from requests.auth import HTTPBasicAuth
requests.get('https://api.github.com/user', auth=HTTPBasicAuth('user', 'pass'))
It could be the case that you'll need to import a different authentication library that works with ASP.net and plug that directly into the auth function of requests.
Hope this helps!