Error when loading cookies into a Python request session

Error when loading cookies into a Python request session - python

I am trying to load cookies into my request session in Python from selenium exported cookies, however when I do it returns the following error:
"'list' object has no attribute 'extract_cookies'"
def load_cookies(filename):
with open(filename, 'rb') as f:
return pickle.load(f)
initial_state= requests.Session()
initial_state.cookies=load_cookies(time_cookie_file)
search_requests = initial_state.get(search_url)
Everywhere I see this should work, however my cookies are a list of dictionaries, which is what I understand all cookies are, and why I assume this works with Selenium. However for some reason it does not work with requests, any and all help in this regard would be really great, it feels like I am missing something obvious!
Cookies have been dumped from Selenium using:
with open("Filepath.pkl", 'wb') as f:
pickle.dump(driver.get_cookies(), f)
An example of the cookies would be (slightly obfuscated):
[{'domain': '.website.com',
'expiry': 1640787949,
'httpOnly': False,
'name': '_ga',
'path': '/',
'secure': False,
'value': 'GA1.2.1111111111.1111111111'},
{'domain': 'website.com',
'expiry': 1585488346,
'httpOnly': False,
'name': '__pnahc',
'path': '/',
'secure': False,
'value': '0'}]
I have now managed to load in the cookies as per the answer below, however it does not seem like the cookies are loaded in properly as they do not remember anything, however if I load the cookies in when browsing through Selenium they work fine.

Cookie
The Cookie HTTP request header contains stored HTTP cookie previously sent by the server with the Set-Cookie header. A HTTP cookie is a small piece of data that a server sends to the user's web browser. The browser may store the cookies and send it back with the next request to the same server. Typically, cookies to tell if two requests came from the same browser, keeping the user logged in.
Demonstration using Selenium
To demonstrate the usage of cookies using Selenium we have stored the cookies using pickle once the user had logged into the website http://demo.guru99.com/test/cookie/selenium_aut.php. In the next step, we opened the same website, adding the cookies and was able to land as a logged in user.
Code Block to store the cookies:
from selenium import webdriver
import pickle
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get('http://demo.guru99.com/test/cookie/selenium_aut.php')
driver.find_element_by_name("username").send_keys("abc123")
driver.find_element_by_name("password").send_keys("123xyz")
driver.find_element_by_name("submit").click()
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb"))
Code Block to use the stored cookies for automatic authentication:
from selenium import webdriver
import pickle
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get('http://demo.guru99.com/test/cookie/selenium_aut.php')
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
driver.add_cookie(cookie)
driver.get('http://demo.guru99.com/test/cookie/selenium_cookie.php')
Demonstration using Requests
To demonstrate usage of cookies using session and requests we have accessed the site https://www.google.com, added a new dictionary of cookies:
{'name':'my_own_cookie','value': 'debanjan' ,'domain':'.stackoverflow.com'}
Next, we have used the same requests session to send another request which was successful as follows:
Code Block:
import requests
s1 = requests.session()
s1.get('https://www.google.com')
print("Original Cookies")
print(s1.cookies)
print("==========")
cookie = {'name':'my_own_cookie','value': 'debanjan' ,'domain':'.stackoverflow.com'}
s1.cookies.update(cookie)
print("After new Cookie added")
print(s1.cookies)
Console Output:
Original Cookies
<RequestsCookieJar[<Cookie 1P_JAR=2020-01-21-14 for .google.com/>, <Cookie NID=196=NvZMMRzKeV6VI1xEqjgbzJ4r_3WCeWWjitKhllxwXUwQcXZHIMRNz_BPo6ujQduYCJMOJgChTQmXSs6yKX7lxcfusbrBMVBN_qLxLIEah5iSBlkdBxotbwfaFHMd-z5E540x02-YZtCm-rAIx-MRCJeFGK2E_EKdZaxTw-StRYg for .google.com/>]>
==========
After new Cookie added
<RequestsCookieJar[<Cookie domain=.stackoverflow.com for />, <Cookie name=my_own_cookie for />, <Cookie value=debanjan for />, <Cookie 1P_JAR=2020-01-21-14 for .google.com/>, <Cookie NID=196=NvZMMRzKeV6VI1xEqjgbzJ4r_3WCeWWjitKhllxwXUwQcXZHIMRNz_BPo6ujQduYCJMOJgChTQmXSs6yKX7lxcfusbrBMVBN_qLxLIEah5iSBlkdBxotbwfaFHMd-z5E540x02-YZtCm-rAIx-MRCJeFGK2E_EKdZaxTw-StRYg for .google.com/>]>
Conclusion
Clearly, the newly added dictionary of cookies {'name':'my_own_cookie','value': 'debanjan' ,'domain':'.stackoverflow.com'} is pretty much in use within the second request.
Passing Selenium Cookies to Python Requests
Now, if your usecase is to passing Selenium Cookies to Python Requests, you can use the following solution:
from selenium import webdriver
import pickle
import requests
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get('http://demo.guru99.com/test/cookie/selenium_aut.php')
driver.find_element_by_name("username").send_keys("abc123")
driver.find_element_by_name("password").send_keys("123xyz")
driver.find_element_by_name("submit").click()
# Storing cookies through Selenium
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb"))
driver.quit()
# Passing cookies to Session
session = requests.session() # or an existing session
with open('cookies.pkl', 'rb') as f:
session.cookies.update(pickle.load(f))
search_requests = session.get('https://www.google.com/')
print(session.cookies)

Since you are replacing session.cookies (RequestsCookieJar) with a list which don't have those attributes, it won't work.
You can import those cookies one by one by using:
for c in your_cookies_list:
initial_state.cookies.set(name=c['name'], value=c['value'])
I've tried loading the whole cookie but it seems like requests doesn't recognize those ones and returns:
TypeError: create_cookie() got unexpected keyword arguments: ['expiry', 'httpOnly']
requests accepts expires instead and HttpOnly comes nested within rest
Update:
We can also change the dict keys for expiry and httpOnly so that requests correctly load them instead of throwing an exception, by using dict.pop() which deletes an item from dict by the key and returns the value of deleted key so after we add a new key with deleted item value then unpack & pass them as kwargs:
for c in your_cookies_list:
c['expires'] = c.pop('expiry')
c['rest'] = {'HttpOnly': c.pop('httpOnly')}
initial_state.cookies.set(**c)

You can get cookies and use only name/value. You'll need headers also. You can get them from dev tools or by using proxy.
Basic example:
driver.get('https://website.com/')
# ... login or do anything
cookies = {}
for cookie in driver.get_cookies():
cookies[cookie['name']] = cookie['value']
# Write to a file if need or do something
# import json
# with open("cookies.txt", 'w') as f:
# f.write(json.dumps(cookies))
And usage:
# Read cookies from file as Dict
# with open('cookies.txt') as reader:
# cookies = json.loads(reader.read())
# use cookies
response = requests.get('https://website.com/', headers=headers, cookies=cookies)
Stackoverflow headers example, some headers can be required some not. You can find information here and here. You can get request headers using dev tools Network tab:
headers = {
'authority': 'stackoverflow.com',
'pragma': 'no-cache',
'cache-control': 'no-cache',
'dnt': '1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36',
'sec-fetch-user': '?1',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'navigate',
'referer': 'https://stackoverflow.com/questions/tagged?sort=Newest&tagMode=Watched&uqlId=8338',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'ru,en-US;q=0.9,en;q=0.8,tr;q=0.7',
}

You can create a session. The session class handles cookies between requests.
s = requests.Session()
login_resp = s.post('https://example.com/login', login_data)
self.cookies = self.login_resp.cookies
cookiedictreceived = {}
cookiedictreceived=requests.utils.dict_from_cookiejar(self.login_resp.cookies)

So requests wants all "values" in your cookie to be a string. Possibly the same with the "key". Cookies also does not want a list as your function load_cookies returns. Cookies can be created for the request.utils with cookies = requests.utils.cookiejar_from_dict(....
Lets say I go to "https://stackoverflow.com/" with selenium and save the cookies as you have done.
from selenium import webdriver
import pickle
import requests
#Go to the website
driver = webdriver.Chrome(executable_path=r'C:\Path\\To\\Your\\chromedriver.exe')
driver.get('https://stackoverflow.com/')
#Save the cookies in a file
with open("C:\Path\To\Your\Filepath.pkl", 'wb') as f:
pickle.dump(driver.get_cookies(), f)
driver.quit()
#you function to get the cookies from the file.
def load_cookies(filename):
with open(filename, 'rb') as f:
return pickle.load(f)
saved_cookies_list = load_cookies("C:\Path\To\Your\Filepath.pkl")
#Set request session
initial_state = requests.Session()
#Function to fix cookie values and add cookies to request_session
def fix_cookies_and_load_to_requests(cookie_list, request_session):
for index in range(len(cookie_list)):
for item in cookie_list[index]:
if type(cookie_list[index][item]) != str:
print("Fix cookie value: ", cookie_list[index][item])
cookie_list[index][item] = str(cookie_list[index][item])
cookies = requests.utils.cookiejar_from_dict(cookie_list[index])
request_session.cookies.update(cookies)
return request_session
initial_state_with_cookies = fix_cookies_and_load_to_requests(cookie_list=saved_cookies_list, request_session=initial_state)
search_requests = initial_state_with_cookies.get("https://stackoverflow.com/")
print("search_requests:", search_requests)

Requests also accept http.cookiejar.CookieJar objects:
https://docs.python.org/3.8/library/http.cookiejar.html#cookiejar-and-filecookiejar-objects

Related

Log into a website using Selenium, but continue working (while logged in) with requests

I am using Selenium and the Chrome web driver to log into my account on a website, but after the login, I want to use other libraries (such as requests) to interact with the website.
I am using Selenium to attempt to bypass reCAPTCHA v3, but I want to use the requests and beautifulsoup libraries to scrape data in the URL that comes after the login page (The URL that the login page redirects to, after logging in ).
Here is the code I've written for logging in, and a little snippet at the bottom which I plan to use for scraping the website post-login.
import requests
import os
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.action_chains import ActionChains
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome("chromedriver", options=chrome_options)
action = ActionChains(driver)
url_1 = "https://ais.usvisa-info.com/en-am/niv/users/sign_in"
url_2 = "https://ais.usvisa-info.com/en-am/niv/account/settings/update_email"
email = "email"
password = 'password'
Headers = {
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36"
}
def login():
driver.get(url_1)
driver.find_element_by_id("user_email").send_keys(email)
driver.find_element_by_id("user_password").send_keys(password)
driver.find_elements_by_class_name("icheckbox")[0].click()
driver.find_elements_by_name("commit")[0].click()
time.sleep(1)
print(driver.current_url)
login()
test = requests.get(url, headers=Headers)

What logging in is actually doing is modifying your cookies to add a key, which verifies that you are logged in. What we can do with this info is to take the cookie data and reuse it for the Python requests module. Let's start by extracting the cookies from the webdriver like so:
driver_cookies = driver.get_cookies()
Now that you have your cookies, you can inject them into future requests in the cookies parameter, like so:
test = requests.get(url, headers=Headers, cookies=driver_cookies)

How to get request headers in Selenium

https://www.sahibinden.com/en
If you open it incognito window and check headers in Fiddler then these are the two main headers you get:
When I click the last one and check request headers this is what I get
I want to get these headers in Python. Is there any way that I can get these using Selenium? Im a bit clueless here.

You can use Selenium Wire. It is a Selenium extension which has been developed for this exact purpose.
https://pypi.org/project/selenium-wire/
An example after pip install:
## Import webdriver from Selenium Wire instead of Selenium
from seleniumwire import webdriver
## Get the URL
driver = webdriver.Chrome("my/path/to/driver", options=options)
driver.get("https://my.test.url.com")
## Print request headers
for request in driver.requests:
print(request.url) # <--------------- Request url
print(request.headers) # <----------- Request headers
print(request.response.headers) # <-- Response headers

You can run JS command like this;
var req = new XMLHttpRequest()
req.open('GET', document.location, false)
req.send(null)
return req.getAllResponseHeaders()
On Python;
driver.get("https://t.me/codeksiyon")
headers = driver.execute_script("var req = new XMLHttpRequest();req.open('GET', document.location, false);req.send(null);return req.getAllResponseHeaders()")
# type(headers) == str
headers = headers.splitlines()

The bottom line is, No, you can't retrieve the request headers using Selenium.
Details
It had been a long time demand from the Selenium users to add the WebDriver methods to read the HTTP status code and headers from a HTTP response. We have discussed about implementing this feature through Selenium at length within the discussion WebDriver lacks HTTP response header and status code methods.
However, Jason Leyba (Selenium contributor) in his comment straightly mentioned:
We will not be adding this feature to the WebDriver API as it falls outside of our current scope (emulating user actions).
Ashley Leyba further added, attempting to make WebDriver the ideal web testing tool will suffer in overall quality as driver.get(url) blocks until the browser has loaded the page and return the response for the final loaded page. So in case of a login redirects, status codes and headers will always end up with a 200 instead of the 302 you're looking for.
Finally, Simon M Stewart (WebDriver creator) in his comment concluded that:
This feature isn't going to happen. The recommended approach is to either extend the HtmlUnitDriver to access the information you require or to make use of an external proxy that exposes this information such as the BrowserMob Proxy

It's not possible to get headers using Selenium. Further information
However, you might use other libraries such as requests, BeautifulSoup to get headers.

Maybe you can use BrowserMob Proxy for this. Here is a example:
import settings
from browsermobproxy import Server
from selenium.webdriver import DesiredCapabilities
config = settings.Config
server = Server(config.BROWSERMOB_PATH)
server.start()
proxy = server.create_proxy()
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % proxy.proxy)
chrome_options.add_argument('--headless')
capabilities = DesiredCapabilities.CHROME.copy()
capabilities['acceptSslCerts'] = True
capabilities['acceptInsecureCerts'] = True
driver = webdriver.Chrome(options=chrome_options,
desired_capabilities=capabilities,
executable_path=config.CHROME_PATH)
proxy.new_har("sahibinden", options={'captureHeaders': True})
driver.get("https://www.sahibinden.com/en")
entries = proxy.har['log']["entries"]
for entry in entries:
if 'request' in entry.keys():
print(entry['request']['url'])
print(entry['request']['headers'])
print('\n')
proxy.close()
driver.quit()

js_headers = '''
const _xhr = new XMLHttpRequest();
_xhr.open("HEAD", document.location, false);
_xhr.send(null);
const _headers = {};
_xhr.getAllResponseHeaders().trim().split(/[\\r\\n]+/).map((value) => value.split(/: /)).forEach((keyValue) => {
_headers[keyValue[0].trim()] = keyValue[1].trim();
});
return _headers;
'''
page_headers = driver.execute_script(js_headers)
type(page_headers) # -> dict

You can use https://pypi.org/project/selenium-wire/ a plug-in replacement for webdriver adding request/response manipulation even for https by using its own local ssl certificate.
from seleniumwire import webdriver
d = webdriver.Chrome() # make sure chrome/chromedriver is in path
d.get('https://en.wikipedia.org')
vars(d.requests[-1].headers)
will list the headers in the last requests object list:
{'policy': Compat32(), '_headers': [('content-length', '1361'),
('content-type', 'application/json'), ('sec-fetch-site', 'none'),
('sec-fetch-mode', 'no-cors'), ('sec-fetch-dest', 'empty'),
('user-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36'),
('accept-encoding', 'gzip, deflate, br')],
'_unixfrom': None, '_payload': None, '_charset': None,
'preamble': None, 'epilogue': None, 'defects': [], '_default_type': 'text/plain'}

Scraping JSON from AJAX calls

Background
Considering this url:
base_url = "https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html"
I want to make the ajax call for the telephone number:
ajax_url = "https://www.olx.bg/ajax/misc/contact/phone/7XarI/?pt=e3375d9a134f05bbef9e4ad4f2f6d2f3ad704a55f7955c8e3193a1acde6ca02197caf76ffb56977ce61976790a940332147d11808f5f8d9271015c318a9ae729"
Wanted results
If I press the button through the site in my chrome browser in the console I would get the wanted result:
{"value":"088 *****"}
debugging
If I open a new tab and paste the ajax_url I would always get empty values:
{"value":"000 000 000"}
If I try something like:
Bash:
wget $ajax_url
Python:
import requests
json_response= requests.get(ajax_url)
I would just receive the html of the the site's handling page that there is an error.
Ideas
I have something more when I am opening the request with the browser. What more do I have? maybe a cookie?
How do I get the wanted result with Bash/Python ?
Edit
the code of the response html is 200
I have tried with curl I get the same html problem.
Kind of a fix.
I have noticed that if I copy the cookie of the browser, and make a request with all the headers INCLUDING the cookie from the browser, I get the correct result
# I think the most important header is the cookie
headers = DICT_WITH_HEADERS_FROM_BROWSER
json_response= requests.get(next_url,
headers=headers,
)
Final question
The only question left is how can I generate a cookie through a Python script?

First you should create a requests Session to store cookies.
Then send a http GET request to the page that is actually calling the ajax request. If any cookie is created by the website, it is sent in GET response and your sessions stores the cookie.
Then you can easily use the session to call ajax api.
Important Note 1:
The ajax url you are calling in the original website is a http POST request! you should not send a get request to that url.
Important Note 2:
You also must extract phoneToken from the website js code which is stored in a variable like var phoneToken = 'here is the pt';
Sample code:
import re
import requests
my_session = requests.Session()
# call html website
base_url = "https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html"
base_response = my_session.get(url=base_url)
assert base_response.status_code == 200
# extract phone token from base url response
phone_token = re.findall(r'phoneToken\s=\s\'(.+)\';', base_response.text)[0]
# call ajax api
ajax_path = "/ajax/misc/contact/phone/81i3H/?pt=" + phone_token
ajax_url = "https://www.olx.bg" + ajax_path
ajax_headers = {
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,fa;q=0.8',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'Referer': 'https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
}
ajax_response = my_session.post(url=ajax_url, headers=ajax_headers)
print(ajax_response.text)
When you run the code above, the result below is displayed:
{"value":"088 558 9937"}

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
import time
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get(
'https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html')
number = driver.find_element_by_xpath(
"/html/body/div[3]/section/div[3]/div/div[1]/div[2]/div/ul[1]/li[2]/div/strong").click()
time.sleep(2)
source = driver.page_source
soup = BeautifulSoup(source, 'html.parser')
phone = soup.find("strong", {'class': 'xx-large'}).text
print(phone)
Output:
088 558 9937

Python: how to fill out a web form and get the resulting page source

I am trying to write a python script that will scrape http://www.fakenewsai.com/ and tell me whether or not a news article is fake news. I want the script to input a given news article into the website's url input field and hit the submit button. Then, I want to scrape the website to determine whether the article is "fake" or "real" news, as displayed on the website.
I was successful in accomplishing this using selenium and ChromeDriver, but the script was very slow (>2 minutes) and did not run on Heroku (using flask). For reference, here is the code I used:
from selenium import webdriver
import time
def fakeNews(url):
if url.__contains__("https://"):
url = url[8:-1]
if url.__contains__("http://"):
url = url[7:-1]
browser = webdriver.Chrome("static/chromedriver.exe")
browser.get("http://www.fakenewsai.com")
element = browser.find_element_by_id("url")
element.send_keys(url)
button = browser.find_element_by_id("submit")
button.click()
time.sleep(1)
site = "" + browser.page_source
result = ""
if(site[site.index("opacity: 1")-10] == "e"):
result = "Fake News"
else:
result = "Real News"
browser.quit()
return result
print(fakeNews('https://www.nytimes.com/2019/11/02/opinion/sunday/instagram-social-media.html'))
I have attempted to replicate this code using other python libraries, such as mechanicalsoup, pyppeteer, and scrapy. However, as a beginner at python, I have not found much success. I was hoping someone could point me in the right direction with a solution.

For the stated purpose, in my opinion it would be much more simple to analyze the website, understand it's functionality and then automate the browser behavior instead of the user behavior.
Try to hit F12 on your browser while on the website, open the Network tab, paste a URL on the input box and then hit submit, you will see that it sends a HTTP OPTIONS request and then a POST request to a URL. The server then returns a JSON response as a result.
So, you can use Python's request module (docs) to automate the very POST request instead of having a very complex code that simulates clicks and scrapes the result.
A very simple example you can build on is:
import json
import requests
def fake_news():
url = 'https://us-central1-fake-news-ai.cloudfunctions.net/detect/'
payload = {'url': 'https://www.nytimes.com/'}
headers = {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br', 'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive', 'Content-Length': '103', 'Content-type': 'application/json; charset=utf-8',
'DNT': '1', 'Host': 'us-central1-fake-news-ai.cloudfunctions.net', 'Origin': 'http://www.fakenewsai.com',
'Referer': 'http://www.fakenewsai.com/', 'TE': 'Trailers',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0'}
response_json = requests.post(url, data=json.dumps(payload), headers=headers).text
response = json.loads(response_json)
is_fake = int(response['fake'])
if is_fake == 0:
print("Not fake")
elif is_fake == 1:
print("Fake")
else:
print("Invalid response from server")
if __name__ == "__main__":
fake_news()
PS: It would be fair to contact the owner of the website to discuss using his or her infrastructure for your project.

The main slowdown occurs on starting a chrome browser and locating the first URL.
Note that you are launching a browser for each request.
You can launch a browser on the initialization step and only do the automation parts per request.
This will greatly increase the performance.

requests.Session() load cookies from CookieJar

How can I load a CookieJar to a new requests.Session object?
cj = cookielib.MozillaCookieJar("mycookies.txt")
s = requests.Session()
This is what I create, now the session will store cookies, but I want it to have my cookies from the file (The session should load the cookieJar).
How can this be achieved?
I searched the documentation but I can only find code examples and they are never loading a cookieJar, just saving cookies during one session.

Python 3.x code, fully working and well-implemented example. The code is self-explanatory.
This code properly handles "session cookies", preserving them between runs. By default, those are not saved to disk, which means that most websites would require you to constantly login between runs. But with the technique below, all session cookies are kept too!
This is the code you are looking for.
import os
import pathlib
import requests
from http.cookiejar import MozillaCookieJar
cookiesFile = str(pathlib.Path(__file__).parent.absolute() / "cookies.txt") # Places "cookies.txt" next to the script file.
cj = MozillaCookieJar(cookiesFile)
if os.path.exists(cookiesFile): # Only attempt to load if the cookie file exists.
cj.load(ignore_discard=True, ignore_expires=True) # Loads session cookies too (expirydate=0).
s = requests.Session()
s.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36",
"Accept-Language": "en-US,en"
}
s.cookies = cj # Tell Requests session to use the cookiejar.
# DO STUFF HERE WHICH REQUIRES THE PERSISTENT COOKIES...
s.get("https://www.somewebsite.com/")
cj.save(ignore_discard=True, ignore_expires=True) # Saves session cookies too (expirydate=0).

In Python 3.x
import requests
import http.cookiejar
s = requests.Session()
s.cookies = http.cookiejar.MozillaCookieJar("anything.txt")
for example, i will acces google site and save the cookiejar to file "anything.txt"
s.get("https://google.com")
s.cookies.save()
And in the future, i access google again with my cookiejar.
s.cookies.load()
s.get("https://google.com")

There's an optional cookies= that can be provided for a requests.Session (as well as request) objects:
cookies = None
A CookieJar containing all currently outstanding cookies set on this
session. By default it is a RequestsCookieJar, but may be any other cookielib.CookieJar compatible object.
see: https://2.python-requests.org/en/latest/api/#requests.Session.cookies
So it becomes:
s = requests.Session(cookies=cj)
Update: I was confusing the the requests.get, request.post etc..., as correctly pointed out by mata in comments - cookies is an attribute of the session object, not a init parameter, so this won't work. s.cookies = cj after constructing the session will:
Therefore, use:
s = requests.Session()
s.cookies = cj

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Error when loading cookies into a Python request session - python

Requests also accept http.cookiejar.CookieJar objects: https://docs.python.org/3.8/library/http.cookiejar.html#cookiejar-and-filecookiejar-objects

Related

Log into a website using Selenium, but continue working (while logged in) with requests

How to get request headers in Selenium

Scraping JSON from AJAX calls

Python: how to fill out a web form and get the resulting page source

requests.Session() load cookies from CookieJar

Categories

Resources