I am new in python.
This is my practice code.
After log in:
Determine if the page contains certain keyword
If this page contains keyword then execute file.exe from my local machine
Refresh Page and do Step1, Step2, Step3. again and again.
After log in, page only reloads twice instead again and again.
I cannot figure out where it goes wrong.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
webdriver_path = 'C:\Python\chromedriver.exe'
options = Options()
driver = webdriver.Chrome(executable_path=webdriver_path, options=options)
driver.get("www.TestLogin.com") #Here is a LoginPage
driver.find_element_by_id('_userId').send_keys('userid') #Here is account for login
driver.find_element_by_id('_userPass').send_keys('userpass') #Here is password for login
driver.find_element_by_id('btn-login').click()
driver.find_elements_by_id('table-orders')
import time
time.sleep(3)
driver.refresh()
#Array for keyword
import_searchs = [
'rose',
'tulip'
]
for i in import_searchs:
list = driver.find_elements_by_xpath('//*[contains(text(), "' + i + '")]')
for item in list:
import os
os.system('C"\file.exe')
while True works!
Thank you everybody
Related
I got a Python Selenium project that does what I want (yay!) but for every instance it opens a new browser window. Is there any way to prevent that?
I've went through the documentation of Selenium but they refer to driver.get(url). It's most likely because it's in the for...loop but I can't seem to get the URL to change with the queries and params if it's outside of the for...loop.
So, for example, I want to open these URLs:
https://www.google.com/search?q=site%3AParameter1+%22Query1%22
https://www.google.com/search?q=site%3AParameter2+%22Query1%22
https://www.google.com/search?q=site%3AParameter3+%22Query1%22
etc..
from selenium import webdriver
import time
from itertools import product
params = ['Parameter1', 'Parameter2', 'Parameter3', 'Parameter4']
queries = ['Query1', 'Query2', 'Query3', 'Query4',]
for (param, query) in product(params,queries):
url = f'https://www.google.com/search?q=site%3A{param}+%22{query}%22' # google as an example
driver = webdriver.Chrome('G:/Python Projects/venv/Lib/site-packages/chromedriver.exe')
driver.get(url)
#does stuff
You are declaring your path to Chrome in the loop. Declare it once and reuse:
from itertools import product
from selenium import webdriver
params = ['Parameter1', 'Parameter2', 'Parameter3', 'Parameter4']
queries = ['Query1', 'Query2', 'Query3', 'Query4',]
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
for (param, query) in product(params,queries):
url = f'https://www.google.com/search?q=site%3A{param}+%22{query}%22'
driver.get(url)
# driver.close()
I am trying to scrape some LinkedIn profiles of well known people. The code takes a bunch of LinkedIn profile URLS and then uses Selenium and scrape_linkedin to collect the information and save it into a folder as a .json file.
The problem I am running into is that LinkedIn naturally blocks the scraper from collecting some profiles. I am always able to get the first profile in the list of URLs. I put this down to the fact that it opens a new Google Chrome window and then goes to the LinkedIn page. (I could be wrong on this point however.)
What I would like to do is to add to the for loop a line which opens a new Google Chrome session and once the scraper has collected the data close the Google Chrome session such that on the next iteration in the loop it will open up a fresh new Google Chrome session.
From the package website here it states:
driver {selenium.webdriver}: driver type to use
default: selenium.webdriver.Chrome
Looking at the Selenium package website here I see:
driver = webdriver.Firefox()
...
driver.close()
So Selenium does have a close() option.
How can I add an open and close Google Chrome browser to the for loop?
I have tried alternative methods to try and collect the data such as changing the time.sleep() to 10 minutes, to changing the scroll_increment and scroll_pause but it still does not download the whole profile after the first one has been collected.
Code:
from datetime import datetime
from scrape_linkedin import ProfileScraper
import pandas as pd
import json
import os
import re
import time
my_profile_list = ['https://www.linkedin.com/in/williamhgates/', 'https://www.linkedin.com/in/christinelagarde/', 'https://www.linkedin.com/in/ursula-von-der-leyen/']
# To get LI_AT key
# Navigate to www.linkedin.com and log in
# Open browser developer tools (Ctrl-Shift-I or right click -> inspect element)
# Select the appropriate tab for your browser (Application on Chrome, Storage on Firefox)
# Click the Cookies dropdown on the left-hand menu, and select the www.linkedin.com option
# Find and copy the li_at value
myLI_AT_Key = 'INSERT LI_AT Key'
with ProfileScraper(cookie=myLI_AT_Key, scroll_increment = 50, scroll_pause = 0.8) as scraper:
for link in my_profile_list:
print('Currently scraping: ', link, 'Time: ', datetime.now())
profile = scraper.scrape(url=link)
dataJSON = profile.to_dict()
profileName = re.sub('https://www.linkedin.com/in/', '', link)
profileName = profileName.replace("?originalSubdomain=es", "")
profileName = profileName.replace("?originalSubdomain=pe", "")
profileName = profileName.replace("?locale=en_US", "")
profileName = profileName.replace("?locale=es_ES", "")
profileName = profileName.replace("?originalSubdomain=uk", "")
profileName = profileName.replace("/", "")
with open(os.path.join(os.getcwd(), 'ScrapedLinkedInprofiles', profileName + '.json'), 'w') as json_file:
json.dump(dataJSON, json_file)
time.sleep(10)
print('The first observation scraped was:', my_profile_list[0:])
print('The last observation scraped was:', my_profile_list[-1:])
print('END')
Here is a way to open and close tabs/browser.
from datetime import datetime
from scrape_linkedin import ProfileScraper
import random #new import made
from selenium import webdriver #new import made
import pandas as pd
import json
import os
import re
import time
my_profile_list = ['https://www.linkedin.com/in/williamhgates/', 'https://www.linkedin.com/in/christinelagarde/',
'https://www.linkedin.com/in/ursula-von-der-leyen/']
myLI_AT_Key = 'INSERT LI_AT Key'
for link in my_profile_list:
my_driver = webdriver.Chrome() #if you don't have Chromedrive in the environment path then use the next line instead of this
#my_driver = webdriver.Chrome(executable_path=r"C:\path\to\chromedriver.exe")
#sending our driver as the driver to be used by srape_linkedin
#you can also create driver options and pass it as an argument
ps = ProfileScraper(cookie=myLI_AT_Key, scroll_increment=random.randint(10,50), scroll_pause=0.8 + random.uniform(0.8,1),driver=my_driver) #changed name, default driver and scroll_pause time and scroll_increment made a little random
print('Currently scraping: ', link, 'Time: ', datetime.now())
profile = ps.scrape(url=link) #changed name
dataJSON = profile.to_dict()
profileName = re.sub('https://www.linkedin.com/in/', '', link)
profileName = profileName.replace("?originalSubdomain=es", "")
profileName = profileName.replace("?originalSubdomain=pe", "")
profileName = profileName.replace("?locale=en_US", "")
profileName = profileName.replace("?locale=es_ES", "")
profileName = profileName.replace("?originalSubdomain=uk", "")
profileName = profileName.replace("/", "")
with open(os.path.join(os.getcwd(), 'ScrapedLinkedInprofiles', profileName + '.json'), 'w') as json_file:
json.dump(dataJSON, json_file)
time.sleep(10 + random.randint(0,5)) #added randomness to the sleep time
#this will close your browser at the end of every iteration
my_driver.quit()
print('The first observation scraped was:', my_profile_list[0:])
print('The last observation scraped was:', my_profile_list[-1:])
print('END')
This scraper by default uses Chrome as the browser but also gives the freedom to choose what browser you want to use in all possible places like CompanyScraper, ProfileScraper, etc.
I have just changed the default arguments to be passed in the initialization of ProfileScrapper() class and made your driver run browser and close it rather than the default one, added some random time into the wait/sleep intervals as you had requested(you can tweak it as per your needs. You can change the Random Noise I have added to your comfort.
There is no need to use scrape_in_parallel() as I had suggested in my comments but if you want to then, you can define the number of browser instances(num_instances) you want to run along with your own dictionary of drivers having it's own options too(in a another dictionary) :
from scrape_linkedin import scrape_in_parallel, CompanyScraper
from selenium import webdriver
driver1 = webdriver.Chrome()
driver2 = webdriver.Chrome()
driver3 = webdriver.Chrome()
driver4 = webdriver.Chrome()
my_drivers = [driver1,driver2,driver3,driver4]
companies = ['facebook', 'google', 'amazon', 'microsoft', ...]
driver_dict = {}
for i in range(1,len(my_drivers)+1):
driver_dict[i] = my_drivers[i-1]
#Scrape all companies, output to 'companies.json' file, use 4 browser instances
scrape_in_parallel(
scraper_type=CompanyScraper,
items=companies,
output_file="companies.json",
num_instances=4,
driver= driver_dict
)
It's an open source code and since it's written solely in Python you can understand the source code very easily. It's quite an interesting scraper, thank you for letting me know about it too!
NOTE:
There are some concerning unresolved issues in this module as it's told in it's GitHub Issues tab. I would wait for a few more forks and updates if I were you if this doesn't work properly.
I'm using facebook graph API to update/edit the text of a specific post on my page
import facebook
page_token = '...'
fb = facebook.GraphAPI(access_token = page_token, version="2.12")
page_id = '...'
post_id = '...'
fb.put_object(parent_object = page_id + '_' + post_id,
connection_name = '',
message = 'new text')
now I'm trying to add a local image (stored in the same folder of the python script) to this post but I don't understand how to properly do it, I tried
fb.put_object(parent_object=page_id+'_'+post_id, connection_name='', message='new text', source = open('out.png', 'rb'))
and
fb.put_object(parent_object=page_id+'_'+post_id, connection_name='', message='new text', object_attachment = open('out.png', 'rb'))
but none of them works. Hints?
p.s. these are the permission of my app
pages_show_list
pages_read_engagement
pages_read_user_content
pages_manage_posts
pages_manage_engagement
EDIT: I tried with the function put_photo but it creates a new post adding the image to it, while I need to add an image to an already existing post
https://developers.facebook.com/docs/graph-api/reference/page/photos/
https://developers.facebook.com/docs/graph-api/reference/photo/
The Documentation states you can not perform updates.
I made a complete selenium script in python which can handle the same task :
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
import time
import pyautogui as pag
option = Options()
option.add_argument("--disable-infobars")
option.add_argument("start-maximized")
option.add_argument("--disable-extensions")
# Pass the argument 1 to allow and 2 to block
option.add_experimental_option("prefs", {
"profile.default_content_setting_values.notifications": 1
})
#########CHANGE################
EMAIL_ADDRESS="" # Email Address of your facebook account
PASSWORD="" # Password of your facebook account
POST_URL="" # URL of your post ex:- https://www.facebook.com/USERNAME/posts/ID
IMG_PATH="C:\\...\\test.jpg" # complete path of image
re_edit=True # set True when you already edited the post otherwise set it to False if you are editing the post for the first time
##########CHANGE###############
re_edit_val="/html/body/div[1]/div/div[1]/div[1]/div[3]/div/div/div[2]/div/div/div[1]/div[1]/div/div/div[1]/div/div[1]/div/div[2]"
re_edit_val2="/html/body/div[1]/div/div[1]/div[1]/div[4]/div/div/div[1]/div/div[2]/div/div/div/form/div/div[1]/div/div/div[3]/div[1]/div[2]/div/div[1]/span/div/div/div/div/div[1]"
driver = webdriver.Chrome(options=option, executable_path='chromedriver.exe')
driver.set_window_position(-10000,0) # hiding the windows
print("Accessing facebook.com")
driver.get("https://www.facebook.com/")
print("checking credentials")
driver.find_element_by_xpath("//input[#id='email']").send_keys(EMAIL_ADDRESS)
driver.find_element_by_xpath("//input[#id='pass']").send_keys(PASSWORD)
driver.find_element_by_xpath("//input[#id='pass']").send_keys(Keys.RETURN)
print("credentials checked successfully")
time.sleep(1)
print("Accessing the URL of the post")
driver.get(POST_URL)
time.sleep(0.5)
print("getting the edit access from facebook.com")
driver.find_element_by_xpath("/html/body/div[1]/div/div[1]/div[1]/div[3]/div/div/div[1]/div[1]/div/div/div/div/div/div/div/div/div/div/div/div/div/div[2]/div/div[2]/div/div[3]/div").click() # clicking three dots
time.sleep(3)
if re_edit:
re_edit_val="/html/body/div[1]/div/div[1]/div[1]/div[3]/div/div/div[2]/div/div/div[1]/div[1]/div/div/div[1]/div/div[1]/div/div[3]"
re_edit_val2="/html/body/div[1]/div/div[1]/div[1]/div[4]/div/div/div[1]/div/div[2]/div/div/div/form/div/div[1]/div/div/div[3]/div[1]/div[2]/div/div[1]/div/span/div/div/div/div/div[1]"
driver.find_element_by_xpath(re_edit_val).click() # clicking the edit post
time.sleep(3)
print("Uploading the given image...")
driver.find_element_by_xpath(re_edit_val2).click() # clicking Image upload
time.sleep(0.5)
pag.write(IMG_PATH) # uploading file path
pag.press('enter')
time.sleep(1.5)
print("Finishing up...")
driver.find_elements_by_xpath("/html/body/div[1]/div/div[1]/div[1]/div[4]/div/div/div[1]/div/div[2]/div/div/div/form/div/div[1]/div/div/div[3]/div[2]/div")[0].click() # clicking save button
print("Photo uploaded")
All you need to do is to change the variables given above
Note:- It is working dated 9 Dec, 2020
My purpose it to download a zip file from https://www.shareinvestor.com/prices/price_download_zip_file.zip?type=history_all&market=bursa
It is a link in this webpage https://www.shareinvestor.com/prices/price_download.html#/?type=price_download_all_stocks_bursa. Then save it into this directory "/home/vinvin/shKLSE/ (I am using pythonaywhere). Then unzip it and the csv file extract in the directory.
The code run until the end with no error but it does not downloaded.
The zip file is automatically downloaded when click on https://www.shareinvestor.com/prices/price_download_zip_file.zip?type=history_all&market=bursa manually.
My code with a working username and password is used. The real username and password is used so that it is easier to understand the problem.
#!/usr/bin/python
print "hello from python 2"
import urllib2
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from pyvirtualdisplay import Display
import requests, zipfile, os
display = Display(visible=0, size=(800, 600))
display.start()
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2)
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', "/home/vinvin/shKLSE/")
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', '/zip')
for retry in range(5):
try:
browser = webdriver.Firefox(profile)
print "firefox"
break
except:
time.sleep(3)
time.sleep(1)
browser.get("https://www.shareinvestor.com/my")
time.sleep(10)
login_main = browser.find_element_by_xpath("//*[#href='/user/login.html']").click()
print browser.current_url
username = browser.find_element_by_id("sic_login_header_username")
password = browser.find_element_by_id("sic_login_header_password")
print "find id done"
username.send_keys("bkcollection")
password.send_keys("123456")
print "log in done"
login_attempt = browser.find_element_by_xpath("//*[#type='submit']")
login_attempt.submit()
browser.get("https://www.shareinvestor.com/prices/price_download.html#/?type=price_download_all_stocks_bursa")
print browser.current_url
time.sleep(20)
dl = browser.find_element_by_xpath("//*[#href='/prices/price_download_zip_file.zip?type=history_all&market=bursa']").click()
time.sleep(30)
browser.close()
browser.quit()
display.stop()
zip_ref = zipfile.ZipFile(/home/vinvin/sh/KLSE, 'r')
zip_ref.extractall(/home/vinvin/sh/KLSE)
zip_ref.close()
os.remove(zip_ref)
HTML snippet:
<li>All Historical Data <span>About 220 MB</span></li>
Note that & is shown when I copy the snippet. It was hidden from view source, so I guess it is written in JavaScript.
Observation I found
The directory home/vinvin/shKLSE do not created even I run the code with no error
I try to download a much smaller zip file which can be completed in a second but still do not download after a wait of 30s. dl = browser.find_element_by_xpath("//*[#href='/prices/price_download_zip_file.zip?type=history_daily&date=20170519&market=bursa']").click()
I don't see any major drawback in your code block as such. But here are a few recommendations through this Solution & the execution of this Automated Test Script:
This code works perfect in Off Market Hours. During Market Hours a lot of JavaScript & Ajax Calls are in play and handling those are beyond the scope of this Question.
You may consider checking for the the intended download directory first & if not available, create a new one. That code block for this functionality is in Windows style and works perfect on Windows platform.
Once you click on "Login" induce some wait for the HTML DOM to render properly.
When you want to see off the downloading process, you need to set certain more preferences in the FirefoxProfile as mentioned in my code below.
Always consider maximizing the browser window through browser.maximize_window()
When you start downloading you need to wait for sufficient amount of time to get the file completely downloaded.
If you are using browser.quit() at the end you don't need to use browser.close()
You may consider to replace all the time.sleep() with either of ImplicitlyWait or ExplicitWait or FluentWait.
Here is your own code block with some simple tweaks in it:
#!/usr/bin/python
print "hello from python 2"
import urllib2
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from pyvirtualdisplay import Display
import requests, zipfile, os
display = Display(visible=0, size=(800, 600))
display.start()
newpath = 'C:\\home\\vivvin\\shKLSE'
if not os.path.exists(newpath):
os.makedirs(newpath)
profile = webdriver.FirefoxProfile()
profile.set_preference("browser.download.dir",newpath);
profile.set_preference("browser.download.folderList",2);
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/zip");
profile.set_preference("browser.download.manager.showWhenStarting",False);
profile.set_preference("browser.helperApps.neverAsk.openFile","application/zip");
profile.set_preference("browser.helperApps.alwaysAsk.force", False);
profile.set_preference("browser.download.manager.useWindow", False);
profile.set_preference("browser.download.manager.focusWhenStarting", False);
profile.set_preference("browser.helperApps.neverAsk.openFile", "");
profile.set_preference("browser.download.manager.alertOnEXEOpen", False);
profile.set_preference("browser.download.manager.showAlertOnComplete", False);
profile.set_preference("browser.download.manager.closeWhenDone", True);
profile.set_preference("pdfjs.disabled", True);
for retry in range(5):
try:
browser = webdriver.Firefox(profile)
print "firefox"
break
except:
time.sleep(3)
time.sleep(1)
browser.maximize_window()
browser.get("https://www.shareinvestor.com/my")
time.sleep(10)
login_main = browser.find_element_by_xpath("//*[#href='/user/login.html']").click()
time.sleep(10)
print browser.current_url
username = browser.find_element_by_id("sic_login_header_username")
password = browser.find_element_by_id("sic_login_header_password")
print "find id done"
username.send_keys("bkcollection")
password.send_keys("123456")
print "log in done"
login_attempt = browser.find_element_by_xpath("//*[#type='submit']")
login_attempt.submit()
browser.get("https://www.shareinvestor.com/prices/price_download.html#/?type=price_download_all_stocks_bursa")
print browser.current_url
time.sleep(20)
dl = browser.find_element_by_xpath("//*[#href='/prices/price_download_zip_file.zip?type=history_all&market=bursa']").click()
time.sleep(900)
browser.close()
browser.quit()
display.stop()
zip_ref = zipfile.ZipFile(/home/vinvin/sh/KLSE, 'r')
zip_ref.extractall(/home/vinvin/sh/KLSE)
zip_ref.close()
os.remove(zip_ref)
Let me know if this Answers your Question.
I rewrote your script, with comments explaining why I made the changes I made. I think your main problem might have been a bad mimetype, however, your script had a log of systemic issues that would have made it unreliable at best. This rewrite uses explicit waits, which completely removes the need to use time.sleep(), allowing it to run as fast as possible, while also eliminating errors that arise from network congestion.
You will need do the following to make sure all modules are installed:
pip install requests explicit selenium retry pyvirtualdisplay
The script:
#!/usr/bin/python
from __future__ import print_function # Makes your code portable
import os
import glob
import zipfile
from contextlib import contextmanager
import requests
from retry import retry
from explicit import waiter, XPATH, ID
from selenium import webdriver
from pyvirtualdisplay import Display
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.wait import WebDriverWait
DOWNLOAD_DIR = "/tmp/shKLSE/"
def build_profile():
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2)
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', DOWNLOAD_DIR)
# I think your `/zip` mime type was incorrect. This works for me
profile.set_preference('browser.helperApps.neverAsk.saveToDisk',
'application/vnd.ms-excel,application/zip')
return profile
# Retry is an elegant way to retry the browser creation
# Though you should narrow the scope to whatever the actual exception is you are
# retrying on
#retry(Exception, tries=5, delay=3)
#contextmanager # This turns get_browser into a context manager
def get_browser():
# Use a context manager with Display, so it will be closed even if an
# exception is thrown
profile = build_profile()
with Display(visible=0, size=(800, 600)):
browser = webdriver.Firefox(profile)
print("firefox")
try:
yield browser
finally:
# Let a try/finally block manage closing the browser, even if an
# exception is called
browser.quit()
def main():
print("hello from python 2")
with get_browser() as browser:
browser.get("https://www.shareinvestor.com/my")
# Click the login button
# waiter is a helper function that makes it easy to use explicit waits
# with it you dont need to use time.sleep() calls at all
login_xpath = '//*/div[#class="sic_logIn-bg"]/a'
waiter.find_element(browser, login_xpath, XPATH).click()
print(browser.current_url)
# Log in
username = "bkcollection"
username_id = "sic_login_header_username"
password = "123456"
password_id = "sic_login_header_password"
waiter.find_write(browser, username_id, username, by=ID)
waiter.find_write(browser, password_id, password, by=ID, send_enter=True)
# Wait for login process to finish by locating an element only found
# after logging in, like the Logged In Nav
nav_id = 'sic_loggedInNav'
waiter.find_element(browser, nav_id, ID)
print("log in done")
# Load the target page
target_url = ("https://www.shareinvestor.com/prices/price_download.html#/?"
"type=price_download_all_stocks_bursa")
browser.get(target_url)
print(browser.current_url)
# CLick download button
all_data_xpath = ("//*[#href='/prices/price_download_zip_file.zip?"
"type=history_all&market=bursa']")
waiter.find_element(browser, all_data_xpath, XPATH).click()
# This is a bit challenging: You need to wait until the download is complete
# This file is 220 MB, it takes a while to complete. This method waits until
# there is at least one file in the dir, then waits until there are no
# filenames that end in `.part`
# Note that is is problematic if there is already a file in the target dir. I
# suggest looking into using the tempdir module to create a unique, temporary
# directory for downloading every time you run your script
print("Waiting for download to complete")
at_least_1 = lambda x: len(x("{0}/*.zip*".format(DOWNLOAD_DIR))) > 0
WebDriverWait(glob.glob, 300).until(at_least_1)
no_parts = lambda x: len(x("{0}/*.part".format(DOWNLOAD_DIR))) == 0
WebDriverWait(glob.glob, 300).until(no_parts)
print("Download Done")
# Now do whatever it is you need to do with the zip file
# zip_ref = zipfile.ZipFile(DOWNLOAD_DIR, 'r')
# zip_ref.extractall(DOWNLOAD_DIR)
# zip_ref.close()
# os.remove(zip_ref)
print("Done!")
if __name__ == "__main__":
main()
Full disclosure: I maintain the explicit module. It is designed to make using explicit waits much easier, for exactly situations like this, where websites slowly load in dynamic content based on user interactions. You could replace all of the waiter.XXX calls above with direct explicit waits.
Take it out side the scope of the selenium. Change the preference settings so that when the link is clicked (First check if link is valid) it gives you a pop up asking to save , now use sikuli http://www.sikuli.org/ to click on the popup.
Mime types does not always work, and there is no black and white answer why it is not working.
The reason is due to the webpage is loading slowly. I added a wait of 20 seconds after open the webpage link
login_attempt.submit()
browser.get("https://www.shareinvestor.com/prices/price_download.html#/?type=price_download_all_stocks_bursa")
print browser.current_url
time.sleep(20)
dl = browser.find_element_by_xpath("//*[#href='/prices/price_download_zip_file.zip?type=history_all&market=bursa']").click()
It returns no error.
Additional,
/zip is incorrect MIME type. Change to profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/zip')
The final correction :
#!/usr/bin/python
print "hello from python 2"
import urllib2
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from pyvirtualdisplay import Display
import requests, zipfile, os
display = Display(visible=0, size=(800, 600))
display.start()
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2)
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', "/home/vinvin/shKLSE/")
# application/zip not /zip
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/zip')
for retry in range(5):
try:
browser = webdriver.Firefox(profile)
print "firefox"
break
except:
time.sleep(3)
time.sleep(1)
browser.get("https://www.shareinvestor.com/my")
time.sleep(10)
login_main = browser.find_element_by_xpath("//*[#href='/user/login.html']").click()
print browser.current_url
username = browser.find_element_by_id("sic_login_header_username")
password = browser.find_element_by_id("sic_login_header_password")
print "find id done"
username.send_keys("bkcollection")
password.send_keys("123456")
print "log in done"
login_attempt = browser.find_element_by_xpath("//*[#type='submit']")
login_attempt.submit()
browser.get("https://www.shareinvestor.com/prices/price_download.html#/?type=price_download_all_stocks_bursa")
print browser.current_url
time.sleep(20)
dl = browser.find_element_by_xpath("//*[#href='/prices/price_download_zip_file.zip?type=history_all&market=bursa']").click()
time.sleep(30)
browser.close()
browser.quit()
display.stop()
zip_ref = zipfile.ZipFile('/home/vinvin/shKLSE/file.zip', 'r')
zip_ref.extractall('/home/vinvin/shKLSE')
zip_ref.close()
# remove with correct path
os.remove('/home/vinvin/shKLSE/file.zip')
I haven't tried on the site you mentioned, however following code works perfectly and downloads the ZIP. if you are not able to download the zip, Mime type could be different. you can use chrome browser and network inspection to check the mime type of the file you are trying to download.
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2)
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', "/home/vinvin/shKLSE/")
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/zip')
browser = webdriver.Firefox(profile)
browser.get("http://www.colorado.edu/conflict/peace/download/peace.zip")
from selenium import webdriver
import random
url = "https://www.youtube.com/"
list_of_drivers = [webdriver.Firefox(), webdriver.Chrome(), webdriver.Edge()]
Driver = random.choice(list_of_drivers)
Driver.get(url)
I'm trying to cycle though a list of random webdrivers using selenium.
It does a good job at picking a random webdriver and opening the URL however, it also opens up other webdrivers with a blanck page.
How do I stop this from happening?
I am running python 2.7 in a virtualenv.
list_of_drivers = [webdriver.Firefox(), webdriver.Chrome(), webdriver.Edge()]
You created three instances already in this line, that's why all 3 browsers show up with a blank page at the very beginning.
Driver = random.choice(list_of_drivers)
Driver.get(url)
And then you randomly choose one to open a webpage, leaving the rest doing nothing.
Instead of creating three instances, just create one:
list_of_drivers = ['Firefox', 'Chrome', 'Edge']
Driver = getattr(webdriver, random.choice(list_of_drivers))()
Driver.get(url)