So I encounter a problem to use my cookie file.
I have a simple script that login to system and make several requests.
So after the login I make several requests and before the log out I save this file:
def __save_user_cookies_file__(self) -> None:
file_name = f'{self.user}_cookies'
with open(file_name, 'wb') as f:
pickle.dump(self.api.session.cookies, f)
And Next time instead of login I just load this cookies file and use it:
def __load_user_cookies_file__(self) -> None:
with open(f'{self.user}_cookies', 'rb') as f:
self.api.session.cookies.update(pickle.load(f))
And use this object:
self.api.session
So at the first time this work fine and I can do my stuff using this cookies file instead of login first and the problem starts when I try to use this file again at the next day.
So I try to save this file again and obtained the same results, It looks like my user not logged in.
Any suggestions ?
Cookies expire, and the authentication cookies you receive from the URL you request have an expires field. In your case, those may only be valid for a day.
You can check those expiry dates with:
import datetime as datetime
for cookie in list(self.api.session.cookies):
print(f"Cookie {cookie.name} \
expires on {datetime.fromtimestamp(cookie.expires)}")
Related
I'm trying to download some videos from a website using Selenium.
Unfortunately I can't download it from source cause the video is stored in a directory with restricted access, trying to retrieve them using urllib, requests or ffmpeg returns a 403 Forbidden error, even after injecting my user data to the website.
I was thinking of playing the video in its entirety and store the media file from cache.
Would it be a possibility? Where can I find the cache folder in a custom profile? How do I discriminate among files in cache?
EDIT: This is what I attempted to do using requests
import requests
def main():
s = requests.Session()
login_page = '<<login_page>>'
login_data = dict()
login_data['username'] = '<<username>>'
login_data['password'] = '<<psw>>'
login_r = s.post(login_page)
video_src = '<<video_src>>'
cookies = dict(login_r.cookies) # contains the session cookie
# static cookies for every session
cookies['_fbp'] = 'fb.1.1630500067415.734723547'
cookies['_ga'] = 'GA1.2.823223936.1630500067'
cookies['_gat'] = '1'
cookies['_gid'] = 'GA1.2.1293544716.1631011551'
cookies['user'] = '66051'
video_r = s.get(video_src, cookies=cookies)
print(video_r.status_code)
if __name__ == '__main__':
main()
The print() function returns:
403
This is the network tab for the video:
Regarding video_r = s.get(video_src, cookies=cookies) Have you try to stream the response ? which send correct byte-range headers to download the video. Most websites prevent downloading the file as "one" block.
with open('...', 'wb') as f:
response = s.get(url=link, stream=True)
for chunk in response.iter_content(chunk_size=512):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
You can send a head request before if you want, in that way you can create a progress bar, you will retrieve the full content length from header.
Also a 403 is commonly use by anti-bot system, may be your selenium is detected.
You blocked because you forgot about headers.
You must use:
s.get('https://httpbin.org / headers', headers ={'user-agent': <The user agent value (for example: last line of your uploaded image)>})
or:
s.headers.update({'user-agent': <The user agent value (for example: last line of your uploaded image)>})
before sending a request
There's a website that has a button which downloads an Excel file. After I click, it takes around 20 seconds for the server API to generate the file and send it back to my browser for download.
If I monitor the communication after I click the button, I can see how the browser sends a POST request to a server with a series of headers and form values.
Is there a way that I can simulate a similar POST request programmatically using Python, and retrieve the Excel file after the server sends it over?
Thank you in advance
The requests module is used for sending all kinds of request types.
requests.post sends the post requests synchronously.
The payload data can be set using data=
The response can be accessed using .content.
Be sure to check the .status_code and only save on a successful response code
Also note the use of "wb" inside open, because we want to save the file as a binary instead of text.
Example:
import requests
payload = {"dao":"SampleDAO",
"condigId": 1,
...}
r = requests.post("http://url.com/api", data=payload)
if r.status_code == 200:
with open("file.save","wb") as f:
f.write(r.content)
Requests Documentation
I guess You could similarly do this:
file_info = request.get(url)
with open('file_name.extension', 'wb') as file:
file.write(file_info.content)
I honestly do not know how to explain this tho since I have little understanding how it works
I'm trying to get contest data from the url: "https://www.draftkings.com/contest/gamecenter/32947401"
If you go to this URL and aren't logged in, it'll just re-direct you to the lobby. If you're logged in, it'll actually show you the contest results.
Here's some things I tried:
-First, I used Chrome's Dev networking tools to watch requests while I manually logged in
-I then tried copying the cookie that I thought contained the authentication info, it was of the form:
'ajs_anonymous_id=%123123123123123, mlc=true; optimizelyEndUserId'
-I then stored that cookie as an Evironment variable and ran this code:
HEADERS= {'cookie': os.environ['MY_COOKIE'] }
requests.get(draft_kings_url, headers= HEADERS)
No luck, this just gave me the lobby.
I then tried request's built in:
HTTPBasicAuth
HTTPDigestAuth
No luck here either.
I'm no python expert by far, and I've pretty much exhausted what I know and the search results I've found. Any ideas?
The tool that you want is selenium. Something along the lines of:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get(r"https://www.draftkings.com/contest/gamecenter/32947401" )
username = browser.find_element_by_id("user")
username.send_keys("username")
password = browser.find_element_by_id("password")
password.send_keys("top_secret")
login = selenium.find_element_by_name("login")
login.click()
Use fiddler to see the exact request they are making when you try to log in. Then use Session class in requests package.
import requests
session = requests.Session()
session.get('YOUR_URL_LOGIN_PAGE')
this will save all the cookies from your url in your session variable (Like when you use a browser).
Then make a post request to the login url with appropriate data.
You dont have to manually pass cookie data as it is auto generated when you first visit a website. However you can set some header explicitly like UserAgent etc by:
session.headers.update({'header_name':'header_value'})
HTTPBasicAuth & HTTPDigestAuth might not work based on the website.
I am using the following script:
import requests
import json
import os
COOKIES = json.loads("") #EditThisCookie export here (json) to send requests
COOKIEDICTIONARY = {}
for i in COOKIES:
COOKIEDICTIONARY[i['name']] = i['value']
def follow(id):
post = requests.post("https://instagram.com/web/friendships/" + id + "/follow/", cookies=COOKIEDICTIONARY)
print(post.text)
follow('309438189')
os.system("pause")
This script is supposed to send a follow request to the user, '3049438189' on Instagram. However, if the code is run, the post.text outputs some HTML code, including
"This page could not be loaded. If you have cookies disabled in your
browser, or you are browsing in Private Mode, please try enabling
cookies or turning off Private Mode, and then retrying your action."
It's supposed to append the cookies to the variable, COOKIEDICTIONARY in a "requests" module readable format. If you print the array (I don't know what it's called in Python), it replies with all of the cookies and their values.
The cookies put in are valid and the requests syntax (I believe to be) is correct.
I have fixed it. The problem was certain headers that I needed were not present, such as Origin (I will get the full list soon). For anybody who wants to imitate any instagram post request, you need those headers or it will error.
I used this piece to
cj = cookielib.LWPCookieJar()
cookie_support = urllib2.HTTPCookieProcessor(cj)
opener = urllib2.build_opener(cookie_support, urllib2.HTTPHandler)
urllib2.install_opener(opener)
// ..... log in with username and password.
// urllib2.urlopen() to get the stuff I need.
Now, how do I preserve the cookie and set the expiration dates to forever, so next time I don't have to log in with username and password again. I can directly use urllib2.urlopen() ?
By "next time" I mean after the program ends, when I start a new program, I can just reload the cookie from the disk and use it
Thanks a lot
I would highly recommend using the Requests HTTP library. It will handle all this for you.
http://docs.python-requests.org/en/latest/
import requests
sess = requests.session()
sess.post("http://somesite.com/someform.php", data={"username": "me", "password": "pass"})
#Everything after that POST will retain the login session
print sess.get("http://somesite.com/otherpage.php").text
edit: To save the session to disk, there are a lot of ways. You could do the following:
from requests.utils import cookiejar_from_dict as jar
cookies = jar(sess.cookies)
Then read the following documentation. You could convert it to a FileCookieJar and save the cookies to a text file, then load them at the start of the program.
http://docs.python.org/2/library/cookielib.html#cookiejar-and-filecookiejar-objects
Alternatively you could pickle the dict and save that data to a file, and load it with pickle.load(file).
http://docs.python.org/2/library/pickle.html
edit 2: To handle expiration, you can iterate over the CookieJar as follows. cj is assumed to be a CookieJar obtained in some fashion.
for cookie in cj:
if cookie.is_expired():
#re-attain session
To check if any of the cookies are expired, it may be more convenient to do if any(c.is_expired() for c in cj).