I want to change a cookie outside the site I'm visiting with python selenium like this (this is my chrome tab not selenium browser)
The site I entered: https://2captcha.com/tr/demo/hcaptcha?difficulty=difficult
Cookie site I use from frame:
newassets.hcaptcha.com
At first I tried to do something like:
import time
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://newassets.hcaptcha.com')
cookie = {"name":"hc_accessibility","value":""}
driver.add_cookie(cookie)
driver.refresh()
time.sleep(2)
driver.get("https://2captcha.com/tr/demo/hcaptcha?difficulty=difficult")
time.sleep(2)
but when I entered the site, the hcaptcha_accecibilty cookies were not there (this is my selenium browser tab)
Related
Goal is to scrape information from people/users
Here is my code - Trying to get into the url to eventually scrape data from the search.
However when executing the code, it prompts the log in page. This is where im currently stuck
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
productlinks=[]
test1=[]
options = Options()
options.headless = True
driver = webdriver.Chrome(ChromeDriverManager().install())
url = "https://www.linkedin.com/search/results/people/?currentCompany=%5B%221252860%22%5D&geoUrn=%5B%22103644278%22%5D&keywords=sales&origin=FACETED_SEARCH&page=2"
driver.get(url)
time.sleep(3)
username = driver.find_element_by_class_name('login-email')
username.send_keys('example123#gmail.com')
password = driver.find_element_by_class_name('login-password')
password.send_keys('Password123')
log_in_button = driver.find_element_by_class_name('login-submit')
log_in_button.click()
There are 3 methods
Add the login code logic in your code by clicking the login button and send the login credentials using send_keys()
Disable headless by removing options.headless = True and manually login yourself
Since LinkedIn uses cookies to validate session, so you can login once and store up the cookies somewhere else and inject back to your session every time you launch selenium driver.
For getting the cookies,
# Go to the correct domain
driver.get("https://www.example.com")
# get all the cookies from this domain
driver = browser.get_cookies()
# store it somewhere, maybe a text file
or do it manually and copy from Chrome Dev Tools
For restoring the cookies
# Go to the correct domain
driver.get("https://www.example.com")
# get back the cookies
cookies = {‘name’ : ‘foo’, ‘value’ : ‘bar’}
driver.add_cookies(cookies)
Reference: LinkedIn Cookies Policy
I'm using Selenium and ChromeDriver to scrape data from a website.
I need to keep my account logged in after closing the Driver: for this purpose I use every time the default Chrome profile.
Here you can see my code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
urlpage = 'https://example.com/'
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir=C:\\Users\\MyName\\AppData\\Local\\Google\\Chrome\\User Data")
driver = webdriver.Chrome(options=options)
driver.get(urlpage)
The problem is that for some websites (e.g. https://projecteuler.net/) it works, so I'm logged in also the following session, but for other (like https://www.fundraiso.ch, the one I need) it doesn't, although in the "normal" browser I'm still logged in after I close the window.
Does anyone know how to fix this problem?
EDIT:
I didn't mention that I can't automate the login because the website has a maximum login number, and if I breach it the website will block my account.
I'm implementing a TikTok crawler using selenium and scrapy
start_urls = ['https://www.tiktok.com/trending']
....
def parse(self, response):
options = webdriver.ChromeOptions()
from fake_useragent import UserAgent
ua = UserAgent()
user_agent = ua.random
options.add_argument(f'user-agent={user_agent}')
options.add_argument('window-size=800x841')
driver = webdriver.Chrome(chrome_options=options)
driver.get(response.url)
The crawler open Chrome but it does not load videos.
Image loading
The same problem happens also using Firefox
No loading page using Firefox
The same problem using a simple script using Selenium
from selenium import webdriver
import time
driver = webdriver.Firefox()
driver.get("https://www.tiktok.com/trending")
time.sleep(10)
driver.close()
driver = webdriver.Chrome()
driver.get("https://www.tiktok.com/trending")
time.sleep(10)
driver.close()
Did u try to navigate further within the selenium browser window? If an error 404 appears on following sites, I have a solution that worked for me:
I simply changed my User-Agent to "Naverbot" which is "allowed" by the robots.txt file from Tik Tok
(Robots.txt)
After changing that all sites and videos loaded properly.
Other user-agents that are listed under the "allow" segment should work too, if you want to add a rotation.
You can use Windows IE. Instead of chrome or firefox
Videos will load in IE but IE's Layout of showing feed is somehow different from chrome and firefox.
Reasons, why your page, is not loading.
Few advance web apps check your browser history, profile data and cached to check the authentication of the user.
One other thing you can do is run your default profile within your selenium It would be helpfull.
I'm trying to login to http://login.live.com, and stay logged in after closing the browser using pickle and cookies.
import pickle
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://login.live.com')
# i do my login here
pickle.dump(driver.get_cookies() , open("login_live.pkl","wb"))
browser.quit()
browser = webdriver.Chrome()
browser.get('https://google.com')
for cookie in pickle.load(open("login_live.pkl", "rb")):
driver.add_cookie(cookie)
browser.get('https://login.live.com')
The problem is that after directing to live.com, I don't remain logged into my account. I perform the same flow manually (obviously without loading cookies). Can't seem to figure out what is wrong, any help would be appreciated.
login.live.com is a redirection page and cookies are not associated with it. Use the page of cookies i.e. https://account.microsoft.com
So while re-loading the session, load the page and then load cookies -
import pickle
from selenium import webdriver
browser = webdriver.Chrome("./chromedriver")
browser.get('https://login.live.com')
pickle.dump(browser.get_cookies() , open("login_live.pkl","wb"))
browser.quit()
browser = webdriver.Chrome("./chromedriver")
browser.get('https://account.microsoft.com')
for cookie in pickle.load(open("login_live.pkl", "rb")):
browser.add_cookie(cookie)
When I view the source HTML after manually navigating to the site via Chrome I can see the full page source but on loading the page source via selenium I'm not getting the complete page source.
from bs4 import BeautifulSoup
from selenium import webdriver
import sys,time
driver = webdriver.Chrome(executable_path=r"C:\Python27\Scripts\chromedriver.exe")
driver.get('http://www.magicbricks.com/')
driver.find_element_by_id("buyTab").click()
time.sleep(5)
driver.find_element_by_id("keyword").send_keys("Navi Mumbai")
time.sleep(5)
driver.find_element_by_id("btnPropertySearch").click()
time.sleep(30)
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"lxml")
print soup.prettify()
The website is possibly blocking or restricting the user agent for selenium. An easy test is to change the user agent and see if that does it. More info at this question:
Change user agent for selenium driver
Quoting:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
opts = Options()
opts.add_argument("user-agent=whatever you want")
driver = webdriver.Chrome(chrome_options=opts)
Try something like:
import time
time.sleep(5)
content = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
instead of driver.page_source.
Dynamic web pages are often needed to be rendered by JavaScript.