I am trying to scrape some LinkedIn profiles of well known people. The code takes a bunch of LinkedIn profile URLS and then uses Selenium and scrape_linkedin to collect the information and save it into a folder as a .json file.
The problem I am running into is that LinkedIn naturally blocks the scraper from collecting some profiles. I am always able to get the first profile in the list of URLs. I put this down to the fact that it opens a new Google Chrome window and then goes to the LinkedIn page. (I could be wrong on this point however.)
What I would like to do is to add to the for loop a line which opens a new Google Chrome session and once the scraper has collected the data close the Google Chrome session such that on the next iteration in the loop it will open up a fresh new Google Chrome session.
From the package website here it states:
driver {selenium.webdriver}: driver type to use
default: selenium.webdriver.Chrome
Looking at the Selenium package website here I see:
driver = webdriver.Firefox()
...
driver.close()
So Selenium does have a close() option.
How can I add an open and close Google Chrome browser to the for loop?
I have tried alternative methods to try and collect the data such as changing the time.sleep() to 10 minutes, to changing the scroll_increment and scroll_pause but it still does not download the whole profile after the first one has been collected.
Code:
from datetime import datetime
from scrape_linkedin import ProfileScraper
import pandas as pd
import json
import os
import re
import time
my_profile_list = ['https://www.linkedin.com/in/williamhgates/', 'https://www.linkedin.com/in/christinelagarde/', 'https://www.linkedin.com/in/ursula-von-der-leyen/']
# To get LI_AT key
# Navigate to www.linkedin.com and log in
# Open browser developer tools (Ctrl-Shift-I or right click -> inspect element)
# Select the appropriate tab for your browser (Application on Chrome, Storage on Firefox)
# Click the Cookies dropdown on the left-hand menu, and select the www.linkedin.com option
# Find and copy the li_at value
myLI_AT_Key = 'INSERT LI_AT Key'
with ProfileScraper(cookie=myLI_AT_Key, scroll_increment = 50, scroll_pause = 0.8) as scraper:
for link in my_profile_list:
print('Currently scraping: ', link, 'Time: ', datetime.now())
profile = scraper.scrape(url=link)
dataJSON = profile.to_dict()
profileName = re.sub('https://www.linkedin.com/in/', '', link)
profileName = profileName.replace("?originalSubdomain=es", "")
profileName = profileName.replace("?originalSubdomain=pe", "")
profileName = profileName.replace("?locale=en_US", "")
profileName = profileName.replace("?locale=es_ES", "")
profileName = profileName.replace("?originalSubdomain=uk", "")
profileName = profileName.replace("/", "")
with open(os.path.join(os.getcwd(), 'ScrapedLinkedInprofiles', profileName + '.json'), 'w') as json_file:
json.dump(dataJSON, json_file)
time.sleep(10)
print('The first observation scraped was:', my_profile_list[0:])
print('The last observation scraped was:', my_profile_list[-1:])
print('END')
Here is a way to open and close tabs/browser.
from datetime import datetime
from scrape_linkedin import ProfileScraper
import random #new import made
from selenium import webdriver #new import made
import pandas as pd
import json
import os
import re
import time
my_profile_list = ['https://www.linkedin.com/in/williamhgates/', 'https://www.linkedin.com/in/christinelagarde/',
'https://www.linkedin.com/in/ursula-von-der-leyen/']
myLI_AT_Key = 'INSERT LI_AT Key'
for link in my_profile_list:
my_driver = webdriver.Chrome() #if you don't have Chromedrive in the environment path then use the next line instead of this
#my_driver = webdriver.Chrome(executable_path=r"C:\path\to\chromedriver.exe")
#sending our driver as the driver to be used by srape_linkedin
#you can also create driver options and pass it as an argument
ps = ProfileScraper(cookie=myLI_AT_Key, scroll_increment=random.randint(10,50), scroll_pause=0.8 + random.uniform(0.8,1),driver=my_driver) #changed name, default driver and scroll_pause time and scroll_increment made a little random
print('Currently scraping: ', link, 'Time: ', datetime.now())
profile = ps.scrape(url=link) #changed name
dataJSON = profile.to_dict()
profileName = re.sub('https://www.linkedin.com/in/', '', link)
profileName = profileName.replace("?originalSubdomain=es", "")
profileName = profileName.replace("?originalSubdomain=pe", "")
profileName = profileName.replace("?locale=en_US", "")
profileName = profileName.replace("?locale=es_ES", "")
profileName = profileName.replace("?originalSubdomain=uk", "")
profileName = profileName.replace("/", "")
with open(os.path.join(os.getcwd(), 'ScrapedLinkedInprofiles', profileName + '.json'), 'w') as json_file:
json.dump(dataJSON, json_file)
time.sleep(10 + random.randint(0,5)) #added randomness to the sleep time
#this will close your browser at the end of every iteration
my_driver.quit()
print('The first observation scraped was:', my_profile_list[0:])
print('The last observation scraped was:', my_profile_list[-1:])
print('END')
This scraper by default uses Chrome as the browser but also gives the freedom to choose what browser you want to use in all possible places like CompanyScraper, ProfileScraper, etc.
I have just changed the default arguments to be passed in the initialization of ProfileScrapper() class and made your driver run browser and close it rather than the default one, added some random time into the wait/sleep intervals as you had requested(you can tweak it as per your needs. You can change the Random Noise I have added to your comfort.
There is no need to use scrape_in_parallel() as I had suggested in my comments but if you want to then, you can define the number of browser instances(num_instances) you want to run along with your own dictionary of drivers having it's own options too(in a another dictionary) :
from scrape_linkedin import scrape_in_parallel, CompanyScraper
from selenium import webdriver
driver1 = webdriver.Chrome()
driver2 = webdriver.Chrome()
driver3 = webdriver.Chrome()
driver4 = webdriver.Chrome()
my_drivers = [driver1,driver2,driver3,driver4]
companies = ['facebook', 'google', 'amazon', 'microsoft', ...]
driver_dict = {}
for i in range(1,len(my_drivers)+1):
driver_dict[i] = my_drivers[i-1]
#Scrape all companies, output to 'companies.json' file, use 4 browser instances
scrape_in_parallel(
scraper_type=CompanyScraper,
items=companies,
output_file="companies.json",
num_instances=4,
driver= driver_dict
)
It's an open source code and since it's written solely in Python you can understand the source code very easily. It's quite an interesting scraper, thank you for letting me know about it too!
NOTE:
There are some concerning unresolved issues in this module as it's told in it's GitHub Issues tab. I would wait for a few more forks and updates if I were you if this doesn't work properly.
Related
I have been trying to web scrape an air bnb website to obtain the price without much luck. I have successfully been able to bring in the other areas of interest (home description, home location, reviews, etc). Below is what I've tried unsuccessfully. I think that the fact the "price" on the web page is a 'span class' as opposed to the others which are 'div class' is where my issue is, but I'm speculating.
The URL I'm using is: https://www.airbnb.com/rooms/52361296?category_tag=Tag%3A8173&adults=4&children=0&infants=0&check_in=2022-12-11&check_out=2022-12-18&federated_search_id=6174a078-a823-4fad-827a-7ca652b5e786&source_impression_id=p3_1645454076_foOVSAshSYvdbpbS
This can be placed as the input in the below code.
Any assistance would be greatly appreciated.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn
from bs4 import BeautifulSoup
import requests
from IPython.display import IFrame
input_string = input("""Enter URLs for AirBnB sites that you want webscraped AND separate by a ',' : """)
airbnb_list = []
try:
airbnb_list = input_string.split(",")
x = 0
y = len(airbnb_list)
while y >= x:
print(x+1 , '.) ' , airbnb_list[x])
x=x+1
if y == x:
break
#print(airbnb_list[len(airbnb_list)])
except:
print("""Please separate list by a ','""")
a = pd.DataFrame([{"Title":'', "Stars": '', "Size":'', "Check In":'', "Check Out":'', "Rules":'',
"Location":'', "Home Type":'', "House Desc":''}])
for x in range(len(airbnb_list)):
url = airbnb_list[x]
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
stars = soup.find(class_='_c7v1se').get_text()
desc = soup.find(class_='_12nksyy').get_text()
size = soup.find(class_='_jro6t0').get_text()
#checkIn = soup.find(class_='_1acx77b').get_text()
checkIn = soup.find(class_='_12aeg4v').get_text()
#checkOut = soup.find(class_='_14tl4ml5').get_text()
checkOut = soup.find(class_='_12aeg4v').get_text()
Rules = soup.find(class_='cihcm8w dir dir-ltr').get_text()
#location = soup.find(class_='_9ns6hl').get_text()
location = soup.find(class_='_152qbzi').get_text()
HomeType = soup.find(class_='_b8stb0').get_text()
title = soup.title.string
print('Stars: ', stars)
print('')
#Home Type
print('Home Type: ', HomeType)
print('')
#Space Description
print('Description: ', desc)
print('')
print('Rental size: ',size)
print('')
#CheckIn
print('Check In: ', checkIn)
print('')
#CheckOut
print('Check Out: ', checkOut)
print('')
#House Rules
print('House Rules: ',Rules)
print('')
#print(soup.find("button", {"id":"#Id name of the button"}))
#Home Location
print('Home location: ', location)
#Dates available
#print('Dates available: ', soup.find(class_='_1yhfti2').get_text())
print('===================================================================================')
df = pd.DataFrame([{"Title":title, "Stars": stars, "Size":size, "Check In":checkIn, "Check Out":checkOut, "Rules":Rules,
"Location":location, "Home Type":HomeType, "House Desc":desc}])
a = a.append(df)
#Attemping to print the price tag on the website
print(soup.find_all('span', {'class': '_tyxjp1'}))
print(soup.find(class_='_tyxjp1').get_text())
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-2d9689dbc836> in <module>
1 #print(soup.find_all('span', {'class': '_tyxjp1'}))
----> 2 print(soup.find(class_='_tyxjp1').get_text())
AttributeError: 'NoneType' object has no attribute 'get_text'
I see you are using the requests module to scrape airbnb.
That module is extremely versatile and works on websites that have static content.
However, it has one major drawback: it doesn't render content created by javascript.
This is a problem, as most of the websites these days create additional html elements using javascript once the user lands on the web page.
The airbnb price block is created exactly like that - using javascript.
There are many ways to scrape that kind of content.
My favourite way is to use selenium.
It's basically a library that allows you to launch a real browser and communicate with it using your programming language of choice.
Here's how you can easily use selenium.
First, set it up. Notice the headless option which can be toggled on and off.
Toggle it off if you want to see how the browser loads the webpage
# setup selenium (I am using chrome here, so chrome has to be installed on your system)
chromedriver_autoinstaller.install()
options = Options()
# if you set this to False if you want to see how the chrome window loads airbnb - useful for debugging
options.headless = True
driver = webdriver.Chrome(options=options)
Then, navigate to the website
# navigate to airbnb
driver.get(url)
Next, wait until the price block loads.
It might appear near instantaneous to us, but depending on the speed of your internet connection it might take a few seconds
# wait until the price block loads
timeout = 10
expectation = EC.presence_of_element_located((By.CSS_SELECTOR, '._tyxjp1'))
price_element = WebDriverWait(driver, timeout).until(expectation)
And finally, print the price
# print the price
print(price_element.get_attribute('innerHTML'))
I added my code to your example so you could play around with it
import chromedriver_autoinstaller
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
import pandas as pd
from bs4 import BeautifulSoup
import requests
from selenium.webdriver.common.by import By
input_string = input("""Enter URLs for AirBnB sites that you want webscraped AND separate by a ',' : """)
airbnb_list = []
try:
airbnb_list = input_string.split(",")
x = 0
y = len(airbnb_list)
while y >= x:
print(x+1 , '.) ' , airbnb_list[x])
x=x+1
if y == x:
break
#print(airbnb_list[len(airbnb_list)])
except:
print("""Please separate list by a ','""")
a = pd.DataFrame([{"Title":'', "Stars": '', "Size":'', "Check In":'', "Check Out":'', "Rules":'',
"Location":'', "Home Type":'', "House Desc":''}])
# setup selenium (I am using chrome here, so chrome has to be installed on your system)
chromedriver_autoinstaller.install()
options = Options()
# if you set this to False if you want to see how the chrome window loads airbnb - useful for debugging
options.headless = True
driver = webdriver.Chrome(options=options)
for x in range(len(airbnb_list)):
url = airbnb_list[x]
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
# navigate to airbnb
driver.get(url)
# wait until the price block loads
timeout = 10
expectation = EC.presence_of_element_located((By.CSS_SELECTOR, '._tyxjp1'))
price_element = WebDriverWait(driver, timeout).until(expectation)
# print the price
print(price_element.get_attribute('innerHTML'))
Keep in mind that your IP might eventually get banned for scraping AirBnb.
To work around that it is always a good idea to use proxy IPs and rotate them.
Follow this rotating proxies tutorial to avoid getting blocked.
Hope that helps!
I got a Python Selenium project that does what I want (yay!) but for every instance it opens a new browser window. Is there any way to prevent that?
I've went through the documentation of Selenium but they refer to driver.get(url). It's most likely because it's in the for...loop but I can't seem to get the URL to change with the queries and params if it's outside of the for...loop.
So, for example, I want to open these URLs:
https://www.google.com/search?q=site%3AParameter1+%22Query1%22
https://www.google.com/search?q=site%3AParameter2+%22Query1%22
https://www.google.com/search?q=site%3AParameter3+%22Query1%22
etc..
from selenium import webdriver
import time
from itertools import product
params = ['Parameter1', 'Parameter2', 'Parameter3', 'Parameter4']
queries = ['Query1', 'Query2', 'Query3', 'Query4',]
for (param, query) in product(params,queries):
url = f'https://www.google.com/search?q=site%3A{param}+%22{query}%22' # google as an example
driver = webdriver.Chrome('G:/Python Projects/venv/Lib/site-packages/chromedriver.exe')
driver.get(url)
#does stuff
You are declaring your path to Chrome in the loop. Declare it once and reuse:
from itertools import product
from selenium import webdriver
params = ['Parameter1', 'Parameter2', 'Parameter3', 'Parameter4']
queries = ['Query1', 'Query2', 'Query3', 'Query4',]
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
for (param, query) in product(params,queries):
url = f'https://www.google.com/search?q=site%3A{param}+%22{query}%22'
driver.get(url)
# driver.close()
I've been following along this guide to web scraping LinkedIn and google searches. There have been some changes in the HTML of google's search results since the guide was created so I've had to tinker with the code a bit. I'm at the point where I need to grab the links from the search results but have run into an issue where the program doesn't return anything even after implementing a code fix from this post due to an error. I'm not sure what I'm doing wrong here.
import Parameters
from time import sleep
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from parsel import Selector
import csv
# defining new variable passing two parameters
writer = csv.writer(open(Parameters.file_name, 'w'))
# writerow() method to the write to the file object
writer.writerow(['Name', 'Job Title', 'Company', 'College', 'Location', 'URL'])
# specifies the path to the chromedriver.exe
driver = webdriver.Chrome('/Users/.../Python Scripts/chromedriver')
driver.get('https://www.linkedin.com')
sleep(0.5)
# locate email form by_class_name then send_keys() to simulate key strokes
username = driver.find_element_by_id('session_key')
username.send_keys(Parameters.linkedin_username)
sleep(0.5)
password = driver.find_element_by_id('session_password')
password.send_keys(Parameters.linkedin_password)
sleep(0.5)
sign_in_button = driver.find_element_by_class_name('sign-in-form__submit-button')
sign_in_button.click()
sleep(3)
driver.get('https:www.google.com')
sleep(3)
search_query = driver.find_element_by_name('q')
search_query.send_keys(Parameters.search_query)
sleep(0.5)
search_query.send_keys(Keys.RETURN)
sleep(3)
################# HERE IS WHERE THE ISSUE LIES ######################
#linkedin_urls = driver.find_elements_by_class_name('iUh30')
linkedin_urls = driver.find_elements_by_css_selector("yuRUbf > a")
for url_prep in linkedin_urls:
url_prep.get_attribute('href')
#linkedin_urls = [url.text for url in linkedin_urls]
sleep(0.5)
print('Supposed to be URLs')
print(linkedin_urls)
The search parameter is
search_query = 'site:linkedin.com/in/ AND "python developer" AND "London"'
Results in an empty list:
Snippet of the HTML section I want to grab:
EDIT: This is the output if I go by .find_elements_by_class_name or by Sector97's 1st edits.
Found an alternative solution that might make it a bit easier to achieve what you're after. Credit to A.Pond at
https://stackoverflow.com/a/62050505
Use the google search api to get the links from the results.
You may need to install the library first
pip install google
You can then use the api to quickly extract an arbitrary number of links:
from googlesearch import search
links = []
query = 'site:linkedin.com/in AND "python developer" AND "London"'
for j in search(query, tld = 'com',start = 0,stop = 100,pause=4):
links.append(j)
I got the first 100 results but you can play around with the parameters to get more or less as you need.
You can see more about this api here:
https://www.geeksforgeeks.org/performing-google-search-using-python-code/
I think I found the error in your code.
Instead of using
linkedin_urls = driver.find_elements_by_css_selector("yuRUbf > a")
Try this instead:
web_elements = driver.find_elements_by_class_name("yuRUbf")
That gets you the parent elements. You can then extract the url text using a simple list comprehension:
linkedin_urls = [elem.find_element_by_css_selector('a').get_attribute('href') for elem in web_elements]
I am new in python.
This is my practice code.
After log in:
Determine if the page contains certain keyword
If this page contains keyword then execute file.exe from my local machine
Refresh Page and do Step1, Step2, Step3. again and again.
After log in, page only reloads twice instead again and again.
I cannot figure out where it goes wrong.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
webdriver_path = 'C:\Python\chromedriver.exe'
options = Options()
driver = webdriver.Chrome(executable_path=webdriver_path, options=options)
driver.get("www.TestLogin.com") #Here is a LoginPage
driver.find_element_by_id('_userId').send_keys('userid') #Here is account for login
driver.find_element_by_id('_userPass').send_keys('userpass') #Here is password for login
driver.find_element_by_id('btn-login').click()
driver.find_elements_by_id('table-orders')
import time
time.sleep(3)
driver.refresh()
#Array for keyword
import_searchs = [
'rose',
'tulip'
]
for i in import_searchs:
list = driver.find_elements_by_xpath('//*[contains(text(), "' + i + '")]')
for item in list:
import os
os.system('C"\file.exe')
while True works!
Thank you everybody
The below request finds the contest id's for the day. I am trying to pass that str into the driver.get url so it will go to each individual contest url and download each contests CSV. I would imagine you have to write a loop but I'm not sure what that would look like with a webdriver.
import time
from selenium import webdriver
import requests
import datetime
req = requests.get('https://www.draftkings.com/lobby/getlivecontests?sport=NBA')
data = req.json()
for ids in data:
contest = ids['id']
driver = webdriver.Chrome() # Optional argument, if not specified will search path.
driver.get('https://www.draftkings.com/account/sitelogin/false?returnurl=%2Flobby');
time.sleep(2) # Let DK Load!
search_box = driver.find_element_by_name('username')
search_box.send_keys('username')
search_box2 = driver.find_element_by_name('password')
search_box2.send_keys('password')
submit_button = driver.find_element_by_xpath('//*[#id="react-mobile-home"]/section/section[2]/div[3]/button/span')
submit_button.click()
time.sleep(2) # Let Page Load, If not it will go to Account!
driver.get('https://www.draftkings.com/contest/exportfullstandingscsv/' + str(contest) + '')
Try in following order:
import time
from selenium import webdriver
import requests
import datetime
req = requests.get('https://www.draftkings.com/lobby/getlivecontests?sport=NBA')
data = req.json()
driver = webdriver.Chrome() # Optional argument, if not specified will search path.
driver.get('https://www.draftkings.com/account/sitelogin/false?returnurl=%2Flobby')
time.sleep(2) # Let DK Load!
search_box = driver.find_element_by_name('username')
search_box.send_keys('Pr0c3ss')
search_box2 = driver.find_element_by_name('password')
search_box2.send_keys('generic1!')
submit_button = driver.find_element_by_xpath('//*[#id="react-mobile-home"]/section/section[2]/div[3]/button/span')
submit_button.click()
time.sleep(2) # Let Page Load, If not it will go to Account!
for ids in data:
contest = ids['id']
driver.get('https://www.draftkings.com/contest/exportfullstandingscsv/' + str(contest) + '')
You do not need to send load selenium for x nos of times to download x nos of files. Requests and selenium can share cookies. This means you can login to site with selenium, retrieve the login details and share them with requests or any other application. Take a moment to check out httpie, https://httpie.org/doc#sessions it seems you manually control sessions like requests does.
For requests look at: http://docs.python-requests.org/en/master/user/advanced/?highlight=sessions
For selenium look at: http://selenium-python.readthedocs.io/navigating.html#cookies
Looking at the Webdriver block,you can add proxies and load the browser headless or live: Just comment the headless line and it should load the browser live, this makes debugging easy, easy to understand movements and changes to site api/html.
import time
from selenium import webdriver
from selenium.common.exceptions import WebDriverException
import requests
import datetime
import shutil
LOGIN = 'https://www.draftkings.com/account/sitelogin/false?returnurl=%2Flobby'
BASE_URL = 'https://www.draftkings.com/contest/exportfullstandingscsv/'
USER = ''
PASS = ''
try:
data = requests.get('https://www.draftkings.com/lobby/getlivecontests?sport=NBA').json()
except BaseException as e:
print(e)
exit()
ids = [str(item['id']) for item in data]
# Webdriver block
driver = webdriver.Chrome()
options.add_argument('headless')
options.add_argument('window-size=800x600')
# options.add_argument('--proxy-server= IP:PORT')
# options.add_argument('--user-agent=' + USER_AGENT)
try:
driver.get(URL)
driver.implicitly_wait(2)
except WebDriverException:
exit()
def login(USER, PASS)
'''
Login to draftkings.
Retrieve authentication/authorization.
http://selenium-python.readthedocs.io/waits.html#implicit-waits
http://selenium-python.readthedocs.io/api.html#module-selenium.common.exceptions
'''
search_box = driver.find_element_by_name('username')
search_box.send_keys(USER)
search_box2 = driver.find_element_by_name('password')
search_box2.send_keys(PASS)
submit_button = driver.find_element_by_xpath('//*[#id="react-mobile-home"]/section/section[2]/div[3]/button/span')
submit_button.click()
driver.implicitly_wait(2)
cookies = driver.get_cookies()
return cookies
site_cookies = login(USER, PASS)
def get_csv_files(id):
'''
get each id and download the file.
'''
session = rq.session()
for cookie in site_cookies:
session.cookies.update(cookies)
try:
_data = session.get(BASE_URL + id)
with open(id + '.csv', 'wb') as f:
shutil.copyfileobj(data.raw, f)
except BaseException:
return
map(get_csv_files, ids)
will this help
for ids in data:
contest = ids['id']
driver.get('https://www.draftkings.com/contest/exportfullstandingscsv/' + str(contest) + '')
May be its time to decompose it a bit.
Create few isolated functions, which are:
0. (optional) Provide authorisation to target url.
1. Collecting all needed id (first part of your code).
2. Exporting CSV for specific id (second part of your code).
3. Loop through list of id and call func #2 for each.
Share chromedriver as input argument for each of them to save driver state and auth-cookies.
Its works fine, make code clear and readable.
I think you can set the URL of a contest to an a element in the landing page, and then click on it. Then repeat the step with other ID.
See my code below.
req = requests.get('https://www.draftkings.com/lobby/getlivecontests?sport=NBA')
data = req.json()
contests = []
for ids in data:
contests.append(ids['id'])
driver = webdriver.Chrome() # Optional argument, if not specified will search path.
driver.get('https://www.draftkings.com/account/sitelogin/false?returnurl=%2Flobby');
time.sleep(2) # Let DK Load!
search_box = driver.find_element_by_name('username')
search_box.send_keys('username')
search_box2 = driver.find_element_by_name('password')
search_box2.send_keys('password')
submit_button = driver.find_element_by_xpath('//*[#id="react-mobile-home"]/section/section[2]/div[3]/button/span')
submit_button.click()
time.sleep(2) # Let Page Load, If not it will go to Account!
for id in contests:
element = driver.find_element_by_css_selector('a')
script1 = "arguments[0].setAttribute('download',arguments[1]);"
driver.execute_script(script1, element, str(id) + '.pdf')
script2 = "arguments[0].setAttribute('href',arguments[1]);"
driver.execute_script(script2, element, 'https://www.draftkings.com/contest/exportfullstandingscsv/' + str(id))
time.sleep(1)
element.click()
time.sleep(3)