I'm trying to retrieve a list of Youtube videos from a Youtube channel, say "https://www.youtube.com/user/YouTube/videos", to get the nth first videos (thanks to the key = "videoId").
It used to work like a charm until a few days ago, when it started to ask for my consent.
I tried many things on SO with no luck, I still see the message asking me to accept the cookies in order to see the videos.
import requests
import re
url='https://www.youtube.com/user/YouTube/videos'
s1 = requests.session()
s1.get(url)
print("Original Cookies")
print(s1.cookies)
cookieValueNum = (re.findall(r'\d+', str(s1.cookies)))[0]
cookieValue = ('YES+cb.20210328-17-p0.en-GB+FX+'+str(cookieValueNum))
cookie = {'name': 'CONSENT', 'value': cookieValue, 'domain': '.youtube.com'}
print("==========")
print("After new Cookie added")
s1.cookies.update(cookie)
print(s1.cookies)
print(s1.get(url, cookies=cookie).text)
It still returns the same message asking my consent for cookies (in html obviously, this is a picture of what I get when opening Youtube in a private session):
My idea was then to replicate the Consent cookie and sent it back to be able to access the page content.
Any idea of what I'm doing wrong?
The idea is not to use the Youtube API but only request/BeautifulSoup if needed.
You need to delete first response cookies. I'm not sure how to do that in requests.session, but any of the following works for me.
requests.get('https://www.youtube.com/user/YouTube/videos', cookies={'CONSENT': 'PENDING+999'})
requests.get('https://www.youtube.com/user/YouTube/videos', cookies={'CONSENT': 'YES+cb.20210328-17-p0.en-GB+FX+{}'.format(random.randint(100, 999))})
I faced the same problem - here's a solution that should work just fine for your case.
with browsers like chrome you can always check what data you need to pass to acccept cookies. you find these information in dev tools -> application -> cookies.
screenshot of the google chrome cookie view
doing this, you'll see that youtube expects YES or NO and any integer > 0.
pass these information in your request. and that's it.
requests.get('https://www.youtube.com/user/YouTube/videos', cookies={'CONSENT': 'YES+1'})
Google is a hazzle and tries to identify you with those technices. There seems now way around as to keep the consent cookie - or you have to give consent every time
set headers of your request like that:
headers = {
'Authorization': 'authorization',
'cookie': 'hl=en'
}
And use tor to change your ip on all requests.
after send request check your response, if Before you continue exist in response.text , set time sleep for a few seconds (in this time your ip will be change) and then send request again.
Related
I am trying to scrape a website to get the shipping information for my company. I am able to log in to the website using Python's request library. The issue I am facing is that after I log in and try to navigate to a different URL that has the information I need the cookies change and logs me out.
When I look at the network in the dev tools I see that the cookies that it changes to are the response cookies. When I use .cookies to see if it was getting picked up, it only shows the request cookies.
I tried setting up a persistent sessions but that did not help. I then tried saving the cookies and got nowhere with that. I am not sure what else to do.
url = 'http://website/login'
creds = {'_method':'****','username':'*****','password':'*****'}
response = requests.post(url,data=creds)
token = response.cookies
response = requests.get('http://webiste/reports/view/17',cookies=token)
You can try token = response.headers['Set-Cookie'].
I'm trying to make a script to auto-login to this website and I'm having some troubles. I was hoping I could get assistance with making this work. I have the below code assembled but I get 'Your request cannot be processed at this time\n' in the bottom of what's returned to me when I should be getting some different HTML if it was successful:
from pyquery import PyQuery
import requests
url = 'https://licensing.gov.nl.ca/miriad/sfjsp?interviewID=MRlogin'
values = {'d_1553779889165': 'email#email.com',
'd_1553779889166': 'thisIsMyPassw0rd$$$',
'd_1618409713756': 'true',
'd_1642075435596': 'Sign in'
}
r = requests.post(url, data=values)
print (r.content)
I do this in .NET, but I think the logic can be written in Python as well.
Firstly, I always use Fiddler to capture requests that a webpage sends then identify the request which you want to replicate and add all the cookies and headers that are sent with it in your code.
After sending the login request you will get some cookies that will identify that you've logged in and you use those cookies to proceed further in your site. For example, if you want to retrieve user's info after logging in first you need to trick the server thinking that you are logged in and that is where those log in cookies will help you
Also, I don't think the login would be so simple through a script because if you're trying to automate a government site, they may have some anti-bot security there lying there, some kind of fingerprint or captcha.
Hope this helps!
I'm not sure how else to describe this. I'm trying to log into a website using the requests library with Python but it doesn't seem to be capturing all cookies from when I login and subsequent requests to the site go back to the login page.
The code I'm using is as follows: (with redactions)
with requests.Session() as s:
r = s.post('https://www.website.co.uk/login', data={
'amember_login': 'username',
'amember_password': 'password'
})
Looking at the developer tools in Chrome. I see the following:
After checking r.cookies it seems only that PHPSESSID was captured there's no sign of the amember_nr cookie.
The value in PyCharm only shows:
{RequestsCookieJar: 1}<RequestsCookieJar[<Cookie PHPSESSID=kjlb0a33jm65o1sjh25ahb23j4 for .website.co.uk/>]>
Why does this code fail to save 'amember_nr' and is there any way to retrieve it?
SOLUTION:
It appears the only way I can get this code to work properly is using Selenium, selecting the elements on the page and automating the typing/clicking. The following code produces the desired result.
from seleniumrequests import Chrome
driver = Chrome()
driver.get('http://www.website.co.uk')
username = driver.find_element_by_xpath("//input[#name='amember_login']")
password = driver.find_element_by_xpath("//input[#name='amember_pass']")
username.send_keys("username")
password.send_keys("password")
driver.find_element_by_xpath("//input[#type='submit']").click() # Page is logged in and all relevant cookies saved.
You can try this:
with requests.Session() as s:
s.get('https://www.website.co.uk/login')
r = s.post('https://www.website.co.uk/login', data={
'amember_login': 'username',
'amember_password': 'password'
})
The get request will set the required cookies.
FYI I would use something like BurpSuite to capture ALL the data being sent to the server and sort out what headers etc are required ... sometimes people/servers to referrer checking, set cookies via JAVA or wonky scripting, even seen java obfuscation and blocking of agent tags not in whitelist etc... it's likely something the headers that the server is missing to give you the cookie.
Also you can have Python use burp as a proxy so you can see exactly what gets sent to the server and the response.
https://github.com/freeload101/Python/blob/master/CS_HIDE/CS_HIDE.py (proxy support )
HELLO I'm now trying to get information from the website that needs log in.
But I already get 200 response in the reqeustURL where I should POST some ID, passwords and requests.
headers dict have requests_headers that can be seen in the chrome developer network tap. form data dict have the ID and passwords.
login_site = requests.post(requestUrl, headers=headers, data=form_data)
status_code = login_site.status_code print(status_code)
I got 200
The code below is the way I've tried.
1. Session.
when I tried to set cookies with session, I failed. I've heard that session could set the cookies when I scrape other pages that need log-in.
session = requests.Session()
session.post(requestUrl, headers=headers, data=form_data)
test = session.get('~~') #the website that I want to scrape
print(test.status_code)
I got 403
2. Manually set cookie
I manually made the cookie dict that I can get
cookies = {'wcs_bt':'...','_production_session_id':'...'}
r = requests.post('http://engoo.co.kr/dashboard', cookies = cookies)
print(r.status_code)
I also got 403
Actually, I don't know what should I write in the cookies dict. when I get,'wcs_bt=AAA; _production_session_id=BBB; _ga=CCC;',should I change it to dict {'wcs_bt':'AAA'.. }?
When I get cookies
login_site = requests.post(requestUrl, headers=headers, data=form_data)
print(login_site.cookies)
in this code, I only can get
RequestsCookieJar[Cookie _production_session_id=BBB]
Somehow, I failed it also.
How can I scrape it with the cookie?
Scraping a modern (circa 2017 or later) Web site that requires a login can be very tricky, because it's likely that some important portion of the login process is implemented in Javascript.
Unless you execute that Javascript exactly as a browser would, you won't be able to complete the login. Unfortunately, the basic Python libraries won't help.
Consider Selenium with Python, which is used for testing Web sites but can be used to automate any interaction with a Web site.
I'm trying to get contest data from the url: "https://www.draftkings.com/contest/gamecenter/32947401"
If you go to this URL and aren't logged in, it'll just re-direct you to the lobby. If you're logged in, it'll actually show you the contest results.
Here's some things I tried:
-First, I used Chrome's Dev networking tools to watch requests while I manually logged in
-I then tried copying the cookie that I thought contained the authentication info, it was of the form:
'ajs_anonymous_id=%123123123123123, mlc=true; optimizelyEndUserId'
-I then stored that cookie as an Evironment variable and ran this code:
HEADERS= {'cookie': os.environ['MY_COOKIE'] }
requests.get(draft_kings_url, headers= HEADERS)
No luck, this just gave me the lobby.
I then tried request's built in:
HTTPBasicAuth
HTTPDigestAuth
No luck here either.
I'm no python expert by far, and I've pretty much exhausted what I know and the search results I've found. Any ideas?
The tool that you want is selenium. Something along the lines of:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get(r"https://www.draftkings.com/contest/gamecenter/32947401" )
username = browser.find_element_by_id("user")
username.send_keys("username")
password = browser.find_element_by_id("password")
password.send_keys("top_secret")
login = selenium.find_element_by_name("login")
login.click()
Use fiddler to see the exact request they are making when you try to log in. Then use Session class in requests package.
import requests
session = requests.Session()
session.get('YOUR_URL_LOGIN_PAGE')
this will save all the cookies from your url in your session variable (Like when you use a browser).
Then make a post request to the login url with appropriate data.
You dont have to manually pass cookie data as it is auto generated when you first visit a website. However you can set some header explicitly like UserAgent etc by:
session.headers.update({'header_name':'header_value'})
HTTPBasicAuth & HTTPDigestAuth might not work based on the website.