I'm used to requests where I can just print the response after I do make a GET request. I find myself unsure if parts of the page are in the resonse or not, particularly when the website uses React or jQuery.
Is there a way I can do the same with Selemium?
Like this?
DRIVER_PATH = '/usr/bin/chromedriver'
options = webdriver.ChromeOptions()
options.add_argument("--headless")
driver = webdriver.Chrome(executable_path=DRIVER_PATH, options=options)
driver.get('example.com')
# Print the DOM
driver.quit()
You are looking for driver.page_source.
from selenium import webdriver
DRIVER_PATH = '/usr/bin/chromedriver'
options = webdriver.ChromeOptions()
options.add_argument("--headless")
driver = webdriver.Chrome(executable_path=DRIVER_PATH, options=options)
driver.get('https://google.com')
# Print the DOM
print(driver.page_source)
driver.quit()
Related
Why does my "webdriver.Remote" not work?
from selenium import webdriver
options = webdriver.ChromeOptions()
driver = webdriver.Remote(
command_executor='http://127.0.0.1:4444/wd/hub',
options=options
)
driver.get("http://www.google.com")
driver.quit()
enter image description here
I tried running "webdriver.Chrome" locally directly and it was successful
options = webdriver.ChromeOptions()
# options.add_argument("--headless")
# options.add_argument("--disable-gpu")
driver = webdriver.Chrome(options=options)
driver.get("http://www.google.com")
I found that he kept Starting "Starting ChromeDriver 100.0.4896.60" while running, so I found another "ChromeDriver "in the selenium-server.jar sibling directory.How stupid of me.
I need to hide browser do some actions and then open browser in selenium python?
some code:
driver = webdriver.Chrome('./chromedriver') # connecting driver
options.add_argument('headless') # that's how I hide browser
driver = webdriver.Chrome(chrome_options=options)
driver.get("google.com")
and now I need to open browser for user
You wont able to do it with your current code as your have initiated chromedriver in headless mode and your browser simulation program that does not have a user interface.Also your url is not corrent in above example. Try below code
options = webdriver.ChromeOptions()
options.add_argument("--headless")
driver = webdriver.Chrome(executable_path=r" path of chromedriver.exe",chrome_options=options)
driver = webdriver.Chrome(executable_path=r"C:\New folder\chromedriver.exe")
base = "https://www.google.com/"
driver.get(base)
Output:
Another example
import time
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)
headless_page = "https://www.google.com/"
driver.get(headless_page)
url = driver.current_url
print(url) # print headless url
time.sleep(2)
driver = webdriver.Chrome() # reset headless to false
driver.get(url)
I am having a weird issue with Python and Selenium. I am accessing the URL https://www.biggerpockets.com/users/JarridJ1. When you click more it shows further content. I can understand that it is a React-based website. When I view it on browser and doa View Source I can see the required stuff in a react element <div data-react-class="Profile/Header/Header" data-react-props="{". I tried to automate Firefox via Selenium but I could not even get with that as well.
Check the screenshot:
Below is the code I tried:
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def parse(u):
print('Processing... {}'.format(u))
driver.get(u)
sleep(2)
html = driver.page_source
driver.save_screenshot('bp.png')
print(html)
if __name__ == '__main__':
options = Options()
options.add_argument("--headless") # Runs Chrome in headless mode.
options.add_argument('--no-sandbox') # Bypass OS security model
options.add_argument('--disable-gpu') # applicable to windows os only
options.add_argument('start-maximized') #
options.add_argument('disable-infobars')
options.add_argument("--disable-extensions")
driver = webdriver.Firefox()
parse('https://www.biggerpockets.com/users/JarridJ1')
This is a tricky one but I found a way to get to the element you have highlighted. Still not sure why driver.page_source is not return what you are looking for.
def parse(u):
print('Processing... {}'.format(u))
driver.get(u)
sleep(2)
get_everything = driver.find_elements_by_xpath("//*")
for element in get_everything:
print(element .get_attribute('innerHTML'))
#html = driver.page_source
#driver.save_screenshot('bp.png')
#print(html)
Below is my standalone example:
from selenium import webdriver
import time
driver = webdriver.Chrome("C:\Path\To\chromedriver.exe")
driver.get("https://www.biggerpockets.com/users/JarridJ1")
time.sleep(5)
a = driver.find_element_by_xpath("//div[#data-react-class='Profile/Header/Header']")
b = a.get_attribute("data-react-props")
print(b)
c = driver.find_elements_by_xpath("//*")
for i in c:
print(i.get_attribute('innerHTML'))
I have some problem getting the code to work. I want to scrape a website. So far I have used this :
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
wd.get('https://ma-mbp.ra.rockwell.com/mediabin/action/iw.ui')
time.sleep(25)
print(wd.page_source) # results
Output is always the same.
I am not able to do the logging.
<html><head></head><body></body></html>
I can select non element
I have the following code:
options = Options()
options = options.set_headless( headless=True)
class Sel_Driver():
def __init__(self):
self.driver = webdriver.Firefox(firefox_options=options)
I can then use self.driver.get(url) as part of a method to open urls I feed in. This works - I can feed in and open the URLs, but they don't in headless mode.
(I initially defined the driver as self.driver = webdriver.Firefox(firefox_options=Options().set_headless(headless=True) - but that didn't work, so I tried it as above).
What am I missing? I don't understand why the driver is able to open pages, but the options aren't enabled.
Please try following code :
options = Options()
options.add_argument("--headless")
driver = webdriver.Firefox(firefox_options=options)
This will work for you for sure. Try it.Please specify the path of the driver. It is for chrome change it to firefox.
from pyvirtualdisplay import Display
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=options, executable_path="C:\\Users\\Username\\Downloads\\chromedriver.exe")
print("Firefox Headless Browser Invoked")
driver.get('https://www.facebook.com/')
jks = driver.find_element_by_id("email").get_attribute("class")
print(jks)
driver.quit()