Selenium + Chrome + Python - python

I want to save a webpage. This looks simples. I used the code below. This opens the browser but the page is not saved.
Why?
When this works, where the file will be saved?
Thanks
Detais:
Chrome 68.0.3440.106 - 64 bits
ChromeDriver 2.41
Code:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome(executable_path=r"C:\Program Files (x86)\Selenium\chromedriver.exe")
browser.get('https://automatetheboringstuff.com')
ActionChains(browser).key_down(Keys.CONTROL).send_keys('s').key_up(Keys.CONTROL).perform()

If you are looking to save the html of a page, you can get that from the page source.
html = browser.page_source
and if you want to write this to a file, you can do this:
html_file = open('some_file_name.html', 'w')
html_file.write(html)

Related

Get full data from HTML page python

I am trying to download thousands of HTML pages in order to parse them. I tried it with selenium but the downloaded file does not contain all the text seen in the browser.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
chrome_options = Options()
chrome_options.add_argument("--headless")
browser = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options)
for url in URL_list:
browser.get(url)
content = browser.page_source
with open(DOWNLOAD_PATH + file_name + ".html", "w", encoding='utf-8') as file:
file.write(str(content))
browser.close()
but the html file I got doen't contain all the content I see in the browser in the same page. for example text I see on the screen is not found in the HTML file. only when I right click the page in the browser and "Save As" I get the full page.
URL example - https://www.camoni.co.il/411788/1Jacob
thank you
Be aware that using the webdriver in headless mode may not provide the same results. For a fast resolution I suggest scraping the pages source without the --headless option.
The other way around is, perhaps, to await certain elements to be located.
I suggest getting around Expected Conditions and waits for that example.
Here's a function that I prepared for your better understanding:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def awaitCertainElements_andGetSource():
element_one = driver.find_element(By.XPATH, "//*[text() = 'some text that is crucial for you'")
element_two = driver.find_element(By.XPATH, "//*[#id='some-id'")
wait = WebDriverWait(driver, 5)
wait.until(EC.visibility_of(element_one))
wait.until(EC.visibility_of(element_two))
return driver.get_source

Selenium cannot find elements

I try to automate retrieving data from "SAP Business Client" using Python and Selenium.
Since I cannot find the element I wanted even though I am sure it is correct, I printed out the html content with the following code:
from selenium import webdriver
from bs4 import BeautifulSoup as soup
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
EDGE_PATH = r"C:\Users\XXXXXX\Desktop\WPy64-3940\edgedriver_win64\msedgedriver"
service = Service(executable_path=EDGE_PATH)
options = Options()
options.use_chromium = True
options.add_argument("headless")
options.add_argument("disable-gpu")
cc_driver = webdriver.Edge(service = service, options=options)
cc_driver.get('https://saps4.sap.XXXX.de/sap/bc/ui5_ui5/ui2/ushell/shells/abap/FioriLaunchpad.html#Z_APSuche-display')
sleep(5)
cc_html = cc_driver.page_source
cc_content = soup(cc_html, 'html.parser')
print(cc_content.prettify())
cc_driver.close()
Now I am just surprised, because the printed out content is different than from firefox "inspect" function. For example, I can find the word "Nachname" from the firefox html content but not such word exists in the printed out html content from the code above:
Have someone an idea, why the printed out content is different?
Thank you for any help... Gunardi
the code you get from selenium is a the code without javascript process on it, then you shoul get the code from javascript using selenium interaction with javascipt,
String javascript = "return arguments[0].innerHTML"; String pageSource=(String)(JavascriptExecutor)driver) .executeScript(javascript, driver.findElement(By.tagName("html")enter code here)); pageSource = "<html>"+pageSource +"</html>"; System.out.println(pageSource);

How can I use Selenium (Python) to do a Google Search and then open the results of the first page in new tabs?

As the title said, I'd like to performa a Google Search using Selenium and then open all results of the first page on separate tabs.
Please have a look at the code, I can't get any further (it's just my 3rd day learning Python)
Thank you for your help !!
Code:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pyautogui
query = 'New Search Query'
browser = webdriver.Chrome('/Users/MYUSERNAME/Desktop/Desktop-Files/Chromedriver/chromedriver')
browser.get('http://www.google.com')
search = browser.find_element_by_name('q')
search.send_keys(query)
search.send_keys(Keys.RETURN)
element = browser.find_element_by_class_name('LC20lb')
element.click()
The reason why I imported pyautogui is because I tried simulating a right click and then open in new tab for each result but it was a little confusing :)
Forget about pyautogui as what you want to do can be done in Selenium. Same with most of the rest. You just do not need it. See if this code meets your needs.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
query = 'sins of a solar empire' #my query about a video game
browser = webdriver.Chrome()
browser.get('http://www.google.com')
search = browser.find_element_by_name('q')
search.send_keys(query)
search.send_keys(Keys.RETURN)
links = browser.find_elements_by_class_name('r') #I went on Google Search and found the container class for the link
for link in links:
url = link.find_element_by_tag_name('a').get_attribute("href") #this code extracts the url of the HTML link
browser.execute_script('''window.open("{}","_blank");'''.format(url)) # this code uses Javascript to open a new tab and open the given url in that new tab
print(link.find_element_by_tag_name('a').get_attribute("href"))

How to download complete webpage using Python Selenium

I have to write a Python code that will get URL, open a Chrome/Firefox browser using Selenium and will download it as a "Complete Webpage", mean with the CSS assets for example.
I know the basis of using Selenium, like:
from selenium import webdriver
ff = webdriver.firefox()
ff.get(URL)
ff.close
How can I perform the downloading action (Like clicking automatically in the browser CTRL+S)?
You can try following code to get HTML page as file:
from selenium import webdriver
ff = webdriver.Firefox()
ff.get(URL)
with open('/path/to/file.html', 'w') as f:
f.write(ff.page_source)
ff.close

Python: Tab gets stuck using Selenium and BeautifulSoup

I am trying to get the source code for a couple of links using selenium and BeautifulSoup. I open the first tab to get the source code which works fine, but the second tab gets stuck. I think it's something with BeautifulSoup. Does anyone know why or of an alternative for BeautifulSoup? Here is the code:
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
links = []
driver = webdriver.Firefox()
driver.get('about:blank')
for link in links:
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 'w')
browser.get(link)
source = str(BeautifulSoup(browser.page_source))
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 'w')
driver.close()

Categories

Resources