I've been trying to use the developer tools on my browser to find ways of getting information, so I am attempting to get a large list of Steam user's names via an HTTP request for fun.
The problem was that I can't use the network tab of the 'Inspect Element' feature to track my search query because it is sent via JavaScript when the enter key is pressed. Fair enough... I look through the js file and find the GET request. It requires a session ID, so I open the console and get my session ID.
Here is where I am stuck: This is the GET request from the js file
url: 'https://steamcommunity.com/search/SearchCommunityAjax',
type: 'GET',
data: {
text: search_text,
filter: search_filter,
sessionid: g_sessionID,
steamid_user: g_steamID,
page: this.m_nPage
}
I have tried passing all the arguments as if I were actually using the browser, via python (the session ID is blocked in case it is sensitive?):
data = {'sessionid': '********', 'text': 'test', 'filter': 'users', 'steamid_user': 'false', 'page': 1}
requests.get('https://steamcommunity.com/search/SearchCommunityAjax', data=data)
And the requests yields a 401 response, I am not authorized. I tried another request with JUST the session ID, in case there was a formatting error with the other parameters, but to no avail.
I should note that I am able to do this via the web interface without logging in. The only form of authentication for this request seems to be the session ID.
Like I said, I really have no need for this to work, but I felt it would be nice to know why it isn't working, so I can improve my skills! Thanks for your help.
Related
I'm trying to make a script to auto-login to this website and I'm having some troubles. I was hoping I could get assistance with making this work. I have the below code assembled but I get 'Your request cannot be processed at this time\n' in the bottom of what's returned to me when I should be getting some different HTML if it was successful:
from pyquery import PyQuery
import requests
url = 'https://licensing.gov.nl.ca/miriad/sfjsp?interviewID=MRlogin'
values = {'d_1553779889165': 'email#email.com',
'd_1553779889166': 'thisIsMyPassw0rd$$$',
'd_1618409713756': 'true',
'd_1642075435596': 'Sign in'
}
r = requests.post(url, data=values)
print (r.content)
I do this in .NET, but I think the logic can be written in Python as well.
Firstly, I always use Fiddler to capture requests that a webpage sends then identify the request which you want to replicate and add all the cookies and headers that are sent with it in your code.
After sending the login request you will get some cookies that will identify that you've logged in and you use those cookies to proceed further in your site. For example, if you want to retrieve user's info after logging in first you need to trick the server thinking that you are logged in and that is where those log in cookies will help you
Also, I don't think the login would be so simple through a script because if you're trying to automate a government site, they may have some anti-bot security there lying there, some kind of fingerprint or captcha.
Hope this helps!
I am trying to use Python requests to log into amazon.se. To do so, I first make a GET request to one of the pages, get redirected to the sign-in page, and make a POST request using the data from the login form + my credentials.
The problem is that in response I get the sign-in page with the following error:
I am of course sure that both the email and password are valid, but the login process still fails in both Python and Postman. I tried to compare the browser requests to my manufactured ones, and they seem almost identical except for me missing a couple of what I believe are non-essential headers. Nevertheless, there must be something going on behind the scenes that I am missing.
Postman POST request
Headers:
Body (form data from previous GET request):
Browser POST request
Headers:
Body:
I'm trying to retrieve a list of Youtube videos from a Youtube channel, say "https://www.youtube.com/user/YouTube/videos", to get the nth first videos (thanks to the key = "videoId").
It used to work like a charm until a few days ago, when it started to ask for my consent.
I tried many things on SO with no luck, I still see the message asking me to accept the cookies in order to see the videos.
import requests
import re
url='https://www.youtube.com/user/YouTube/videos'
s1 = requests.session()
s1.get(url)
print("Original Cookies")
print(s1.cookies)
cookieValueNum = (re.findall(r'\d+', str(s1.cookies)))[0]
cookieValue = ('YES+cb.20210328-17-p0.en-GB+FX+'+str(cookieValueNum))
cookie = {'name': 'CONSENT', 'value': cookieValue, 'domain': '.youtube.com'}
print("==========")
print("After new Cookie added")
s1.cookies.update(cookie)
print(s1.cookies)
print(s1.get(url, cookies=cookie).text)
It still returns the same message asking my consent for cookies (in html obviously, this is a picture of what I get when opening Youtube in a private session):
My idea was then to replicate the Consent cookie and sent it back to be able to access the page content.
Any idea of what I'm doing wrong?
The idea is not to use the Youtube API but only request/BeautifulSoup if needed.
You need to delete first response cookies. I'm not sure how to do that in requests.session, but any of the following works for me.
requests.get('https://www.youtube.com/user/YouTube/videos', cookies={'CONSENT': 'PENDING+999'})
requests.get('https://www.youtube.com/user/YouTube/videos', cookies={'CONSENT': 'YES+cb.20210328-17-p0.en-GB+FX+{}'.format(random.randint(100, 999))})
I faced the same problem - here's a solution that should work just fine for your case.
with browsers like chrome you can always check what data you need to pass to acccept cookies. you find these information in dev tools -> application -> cookies.
screenshot of the google chrome cookie view
doing this, you'll see that youtube expects YES or NO and any integer > 0.
pass these information in your request. and that's it.
requests.get('https://www.youtube.com/user/YouTube/videos', cookies={'CONSENT': 'YES+1'})
Google is a hazzle and tries to identify you with those technices. There seems now way around as to keep the consent cookie - or you have to give consent every time
set headers of your request like that:
headers = {
'Authorization': 'authorization',
'cookie': 'hl=en'
}
And use tor to change your ip on all requests.
after send request check your response, if Before you continue exist in response.text , set time sleep for a few seconds (in this time your ip will be change) and then send request again.
I'm new to web scraping, and I'm attempting to log in to imagingrewardsprogram.com using requests.Session(). I've been able to successfully log in to other websites, and I'm stumped why I haven't been able to log into this one.
When I login to the site in Google Chrome and view the form data in developer tools, I'm able to see that the form data I'm passing in to my code is identical to the form data I pass in to the web browser ("user" and "password"). I'm sure there's something else I should be passing in that I'm missing, but I'm not sure what it is.
Here is my code:
loginURL = 'https://imagingrewardsprogram.com'
requestURL = ''https://imagingrewardsprogram.com/merlin/pnaimaging?command=get&style=home'
payload = {
'user': myusername,
'password': mypassword,
'command':'get',
'style':'home'
}
with requests.Session() as session:
post = session.post(loginURL, data=payload)
r = session.get(requestURL)
print(r.text)
The output I get is a page that says, "Either your session has expired or an error occurred while obtaining your account information."
Any guidance is appreciated!
Maybe one reason can be website you are trying to access uses better security that does not allow automatic process to login.
So, thats why you are unable to create a session using a script.
Security like captcha and re- captcha are used to prevent automatic login.
HELLO I'm now trying to get information from the website that needs log in.
But I already get 200 response in the reqeustURL where I should POST some ID, passwords and requests.
headers dict have requests_headers that can be seen in the chrome developer network tap. form data dict have the ID and passwords.
login_site = requests.post(requestUrl, headers=headers, data=form_data)
status_code = login_site.status_code print(status_code)
I got 200
The code below is the way I've tried.
1. Session.
when I tried to set cookies with session, I failed. I've heard that session could set the cookies when I scrape other pages that need log-in.
session = requests.Session()
session.post(requestUrl, headers=headers, data=form_data)
test = session.get('~~') #the website that I want to scrape
print(test.status_code)
I got 403
2. Manually set cookie
I manually made the cookie dict that I can get
cookies = {'wcs_bt':'...','_production_session_id':'...'}
r = requests.post('http://engoo.co.kr/dashboard', cookies = cookies)
print(r.status_code)
I also got 403
Actually, I don't know what should I write in the cookies dict. when I get,'wcs_bt=AAA; _production_session_id=BBB; _ga=CCC;',should I change it to dict {'wcs_bt':'AAA'.. }?
When I get cookies
login_site = requests.post(requestUrl, headers=headers, data=form_data)
print(login_site.cookies)
in this code, I only can get
RequestsCookieJar[Cookie _production_session_id=BBB]
Somehow, I failed it also.
How can I scrape it with the cookie?
Scraping a modern (circa 2017 or later) Web site that requires a login can be very tricky, because it's likely that some important portion of the login process is implemented in Javascript.
Unless you execute that Javascript exactly as a browser would, you won't be able to complete the login. Unfortunately, the basic Python libraries won't help.
Consider Selenium with Python, which is used for testing Web sites but can be used to automate any interaction with a Web site.