Chrome Cookies Aren't Being Stored In The Chrome Cookies File - python

So essentially I've been looking into how Chrome and all web browsers store their cookies. I've discoverd of course that Chrome's system works as follows:
All Chrome files are stored in: %USERPROFILE%/AppData\Local\Google\Chrome\User Data
There are multiple profiles named in the following way: If no profiles are made there is one called "Default" otherwise they go in the following pattern: "Profile 1", "Profile 2", etc...
Each profile has it's unique folder which stores all the information: Cache, Login info, and of course COOKIES
The login info is stored in a file called Login Data, Cache in \Cache\Cache_Data, Cookies in \Network\Cookies. Login Data + Cookies are both in sqlite3 format.
The problem is that when I take a random site for example, minecraft.net. It has a lot of cookies. But when I look at the Cookies file in a browser, there is only one cookie for it. Why is this. I was checking a week ago helping fix something with my brother roblox account and I found a similar thing. I noticed that there was a cookie called .RBLXSECURITY that had his token in it but when I looked in his cookies it wasn't there. I then found a file in the roblox appdata that stored the token. So does chrome access token from files and if so is there a way to get access to them aswell?
I've tried looking up stuff for this but I couldn't find anything. I also tried looking through a lot of my files but nothing there either.

Related

Selenium - Login to google ads with cookies and account suspicious activity

I have been using selenium to scrape data from Google Ads Account, I log-in manually using my profile data first and then save the cookies to be used to later to automatically login and then run the scraping task, some times the script runs on VM, not the machine the cookies was created.
Everything was going fine but sometimes, approximately every 10 days, I get an email that there's suspicious activity in my account and google sign me out automatically and I have to change my google account password recreate the cookies manually.
I'm thinking that the problem comes from the fact that the cookies was created on a machine and being used on other but not sure.
I'm thinking of creating the cookies on the VM and only using it there, but not sure if that would work?!
That might be because google detects, that your using the same cookie in a browser with different useragent & device-metrics.
To bypass that, you could try using Selenium-Profiles and start it, using a profile previously exported from your local browser.

Login parameters of unknown origin?

I want to login to a website with Python but could not find where the parameters in the login url came from.
I've checked all the urls before the login, but none of them (headers, cookies, etc.) show these parameters.
The login url looks like this: https://www.example.com/auth/login?key=iH3_8aYEZZy7iQJliEospQ&expires=1598750085
the key and expires parameters here come automatically as variable with this url.
Is there something I don't know about this or these parameters come with a hidden API?
I looked into the site you sent.
The key and expires parameters are generated on the front end in javascript. You can use the Chrome dev tools to search the web page script files for a specific term. In this case, I searched for 'login'. It looks like those parameters are generated by a couple functions buried in the script.
I see 3 options you can try from here:
Dig through the script and find the logic that creates the parameters
Download the scripts and use javascript to generate the parameters
Use an automation tool like selenium to log in through the browser

Python: download csv files of historical data from a web that requires login

After an extensive web search, I could not figure out how to solve my problem. Namely, I want to download daily csv files of historical data from a sunnyportal web that requires login. This is the login page: https://www.sunnyportal.com/Login
After logged in, the following shows a page from which the csv will be downloaded. As you can see, this is the Analysis page (selected from the left-hand side). Below the big graph, there is a date picker to select the date of the year. On the bottom right corner, there is a download button. By clicking this button, the csv for that specific date will be downloaded.
My aim is to download csv for each day (or can also period) across many years. I know my effort below is still far from the objective, but I have no ide how to proceed.
import requests
s = requests.Session()
site_url = r'https://www.sunnyportal.com/Login'
s.get(site_url)
s.post(site_url, data={'_username': 'myusername', '_password': 'mypassword'})
file_url =r'https://www.sunnyportal.com/FixedPages/AnalysisTool.aspx'
s.get(file_url)
When I try to login, the parameters used are looking like this (you can use F12 Developer Console in Chrome or Firefox to watch the POST request)
ctl00$ContentPlaceHolder1$Logincontrol1$txtUserName = my#email.com
ctl00$ContentPlaceHolder1$Logincontrol1$txtPassword = test
I first though the parameters are hidden behind some javascript encoding and generated on the fly, but it seems these are the real parameters in the post request, beside the obscure names.
If this works, then you have to find out how the website identifies already logged in users. This could be a cookie, some kind of session id in the url or a http request header. Then you have to emulate that.

Downloading URL To file... Not returning JSON data but Login HTML instead

I am writing a web scraping application. When I enter the URL directly into a browser, it displays the JSON data I want.
However, if I use Python's request lib, or URLDownloadToFile in C++, it simply downloads the html for the login page.
The site I am trying to scrape it from (DraftKings.com) requires a login. The other sites I scrape from don't.
I am 100% sure this is related, since if I paste the url when I am logged out, I get the login page, rather than the JSON data. Once I log in, if I paste the URL again, I get the JSON data again.
The thing is that if I remain logged in, and then use the Python script or C++ app to download the JSON data, as mentioned.... it downloads the Login HTML.
Anyone know how I can fix this issue?
Please don't ask us to help with an activity that violates the terms of service of the site you are trying to (ab-)use:
Using automated means (including but not limited to harvesting bots, robots, parser, spiders or screen scrapers) to obtain, collect or access any information on the Website or of any User for any purpose.
Even if that kind of usage were allowed, the answer would be boring:
You'd need to implement the login functionality in your scraper.

Is it possible to use Python to read html from an open browser window? [duplicate]

Say, I browse to a website (on intranet too) that require a login to access the contents. I will fill in the required fields... e.g. username, password and any captcha, etc. that is required for logging in from the browser itself.
Once I have logged in into the site, there are lots of goodies that can be scraped from several links and tabs on the first page after logged in.
Now, from this point forward (that is after logged in from the browser).. I want to control the page and downloads from urllib2... like going through page by page, download pdf and images on each page, etc.
I understand that we can use everything from urllib2 (or mechanize) directly (that is login to the page and do the whole thing).
But, for some sites.. it is really a pain to go through and find out the login mechanism, required hidden parameters, referrers, captcha, cookies and pop ups.
Please advise. Hope my question makes sense.
In summary, i want the initial login part done using the web browser manually... and then take over the automation for scraping through urllib2.
Did you consider Selenium? It's about browser automation instead of http requests (urllib2), and you can manipulate the browser in between steps.
You want to use the cookielib module.
http://docs.python.org/library/cookielib.html
You can log on using your browser, then export the cookies into a Netscape-style cookie.txt file. Then from python you'll be able to load this and fetch the resource you require. The cookie will be good until the website expires your session (often around 30 days).
import cookielib, urllib2
cj = cookielib.MozillaCookieJar()
cj.load('cookie.txt')
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/resource")
There are add-ons for Chrome and Firefox that will export the cookies in this format. For example:
https://chrome.google.com/webstore/detail/lopabhfecdfhgogdbojmaicoicjekelh
https://addons.mozilla.org/en-US/firefox/addon/export-cookies/

Categories

Resources