Webscrape LinkedIn Users

Webscrape LinkedIn Users - python

Goal is to scrape information from people/users
Here is my code - Trying to get into the url to eventually scrape data from the search.
However when executing the code, it prompts the log in page. This is where im currently stuck
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
productlinks=[]
test1=[]
options = Options()
options.headless = True
driver = webdriver.Chrome(ChromeDriverManager().install())
url = "https://www.linkedin.com/search/results/people/?currentCompany=%5B%221252860%22%5D&geoUrn=%5B%22103644278%22%5D&keywords=sales&origin=FACETED_SEARCH&page=2"
driver.get(url)
time.sleep(3)
username = driver.find_element_by_class_name('login-email')
username.send_keys('example123#gmail.com')
password = driver.find_element_by_class_name('login-password')
password.send_keys('Password123')
log_in_button = driver.find_element_by_class_name('login-submit')
log_in_button.click()

There are 3 methods
Add the login code logic in your code by clicking the login button and send the login credentials using send_keys()
Disable headless by removing options.headless = True and manually login yourself
Since LinkedIn uses cookies to validate session, so you can login once and store up the cookies somewhere else and inject back to your session every time you launch selenium driver.
For getting the cookies,
# Go to the correct domain
driver.get("https://www.example.com")
# get all the cookies from this domain
driver = browser.get_cookies()
# store it somewhere, maybe a text file
or do it manually and copy from Chrome Dev Tools
For restoring the cookies
# Go to the correct domain
driver.get("https://www.example.com")
# get back the cookies
cookies = {‘name’ : ‘foo’, ‘value’ : ‘bar’}
driver.add_cookies(cookies)
Reference: LinkedIn Cookies Policy

Related

Open browser with selenium without creating new instance

I am automating a form-filler using selenium, however the issue is the user needs to be logged in to their google account. Selenium is opening up a new browser instance where the user is not logged in. I cannot automate the log in process due to 2 factor authentication.
So far I've found
import webbrowser
webbrowser.open('www.google.com', new = 2)
which will open the window the way I want with the user logged in, however I am unable to interact with the page unlike selenium. Is there a way I can get selenium to open a window like webbrowser? Or is there a way to interact with the page with webbrowser? I have checked the docs of both and not seen an answer to this

You don't need to make the user log in again to their user account. You can use the same chrome profile that you have for your browser. This will enabled you to use all your accounts from chrome without making them log explicitly.
Here is how you can do this :
First get the user chrome profile path
Mine was : C:\Users\hpoddar\AppData\Local\Google\Chrome\User Data
If you have multiple profiles you might need to get the Profile id for your chrome google account as well.
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium import webdriver
chrome_path = r"C:\Users\hpoddar\Desktop\Tools\chromedriver_win32\chromedriver.exe"
options = webdriver.ChromeOptions()
options.add_argument(r"--user-data-dir=C:\Users\hpoddar\AppData\Local\Google\Chrome\User Data")
# Specify this if multiple profiles in chrome
# options.add_argument('--profile-directory=Profile 1')
s = Service(chrome_path)
driver = webdriver.Chrome(service=s, options=options)
driver.get("https://www.google.co.in")

Selenium Chromedriver doesn't keep me logged in in some websites

I'm using Selenium and ChromeDriver to scrape data from a website.
I need to keep my account logged in after closing the Driver: for this purpose I use every time the default Chrome profile.
Here you can see my code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
urlpage = 'https://example.com/'
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir=C:\\Users\\MyName\\AppData\\Local\\Google\\Chrome\\User Data")
driver = webdriver.Chrome(options=options)
driver.get(urlpage)
The problem is that for some websites (e.g. https://projecteuler.net/) it works, so I'm logged in also the following session, but for other (like https://www.fundraiso.ch, the one I need) it doesn't, although in the "normal" browser I'm still logged in after I close the window.
Does anyone know how to fix this problem?
EDIT:
I didn't mention that I can't automate the login because the website has a maximum login number, and if I breach it the website will block my account.

Remain logged into account using selenium

I'm trying to login to http://login.live.com, and stay logged in after closing the browser using pickle and cookies.
import pickle
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://login.live.com')
# i do my login here
pickle.dump(driver.get_cookies() , open("login_live.pkl","wb"))
browser.quit()
browser = webdriver.Chrome()
browser.get('https://google.com')
for cookie in pickle.load(open("login_live.pkl", "rb")):
driver.add_cookie(cookie)
browser.get('https://login.live.com')
The problem is that after directing to live.com, I don't remain logged into my account. I perform the same flow manually (obviously without loading cookies). Can't seem to figure out what is wrong, any help would be appreciated.

login.live.com is a redirection page and cookies are not associated with it. Use the page of cookies i.e. https://account.microsoft.com
So while re-loading the session, load the page and then load cookies -
import pickle
from selenium import webdriver
browser = webdriver.Chrome("./chromedriver")
browser.get('https://login.live.com')
pickle.dump(browser.get_cookies() , open("login_live.pkl","wb"))
browser.quit()
browser = webdriver.Chrome("./chromedriver")
browser.get('https://account.microsoft.com')
for cookie in pickle.load(open("login_live.pkl", "rb")):
browser.add_cookie(cookie)

Python issues, mechanize bots

I'm trying to make a bot in order to reset my router easily, so I'm using mechanize for this task.
import mechanize
br = mechanize.Browser()
br.set_handle_robots(False)
response = br.open("http://192.168.0.1/")
br.select_form(nr=0)
br.form['loginUsername']='support'
br.form['loginPassword']='71689637'
response=br.submit()
if(response.read().find("wifi")) != -1:
# ?????
If it finds the string 'wifi', it means the bot has logged in, but here's where I get stuck, because the restart button is in another tab (Another page, I guess that from the same object indicating the new URL it should be able to follow the redirection URL without logging off). However, the button from that tab is, well, a button but not a form.
Picture 1:
Picture 2:
And here's the source:
https://github.com/SharkiPy/Code-stackoverflow/blob/master/Source

Here is a start of code using Selenium, with hidden browser. You just have to add the actions you take when browsing through your router. I hope it can get you started!
import time
from selenium import webdriver
from selenium.common.exceptions import WebDriverException, NoSuchElementException,InvalidElementStateException,ElementNotInteractableException, StaleElementReferenceException, ElementNotVisibleException
from selenium.webdriver.common.keys import Keys
# There may be some unnecessary import above
from selenium.webdriver.chrome.options import Options
options_chrome = webdriver.ChromeOptions()
options_chrome.binary_location = 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe' # PATH to your chrome driver (you can also use firefox or any other browser, but options below will not be exactly the same
prefs = {"profile.default_content_setting_values.notifications" : 2} # disable notification by default
options_chrome.add_experimental_option("prefs",prefs)
#### below are options for headless chrome
#options_chrome.add_argument('headless')
#options_chrome.add_argument("disable-gpu")
#options_chrome.add_argument("--start-maximized")
#options_chrome.add_argument("--no-sandbox")
#options_chrome.add_argument("--disable-setuid-sandbox")
#### You should uncomment these lines when your code will be working
# starting browser :
browser = webdriver.Chrome( options=options_chrome)
# go to the router page :
browser.get("http://192.168.0.1/")
# connect
elem = browser.find_element_by_id("loginUsername")
elem.send_keys('support')
elem = browser.find_element_by_id("loginPassword")
elem.send_keys('71689637')
elem.send_keys(Keys.RETURN)
# here you need to find your button and click it
button = browser.find_element_by_[Whatever works for you]
button.click()

Use and retain the information of the current login session with Selenium

I am automating certain tasks using Web Browser with Selenium. Suppose I open a webpage say Facebook or Quora using webdriver, the page that opened asks for the username and password again even though I am still logged in.
from selenium import webdriver
b = webdriver.Chrome()
b.get("https://www.quora.com/")
I want the webdriver to use and retain the information of the current session so that I am able to land on my profile without having to enter my username and password again. How can I achieve this? Thanks.
Edit 1 : I tried pointing it to the chrome user data, but isn't working.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_option = Options()
chrome_option.add_argument('user-data-dir=~/Library/Application Support/Google/Chrome/Default')
b = webdriver.Chrome(executable_path="/Users/mymac/Downloads/chromedriver",chrome_options=chrome_option)
b.get("https://www.quora.com/")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Webscrape LinkedIn Users - python

Related

Open browser with selenium without creating new instance

Selenium Chromedriver doesn't keep me logged in in some websites

Remain logged into account using selenium

Python issues, mechanize bots

Use and retain the information of the current login session with Selenium

Categories

Resources