I am using python2.7 with beautiful Soup4 and Selenium webdriver. Now in my webautomation script i will open the link or URL and get into the home page. Now I need to click onto some anchor Labels to navigate through other pages.I did till now. now when i will be going to a new page, I need to get the new URL from the browser as I need to pass it Beautiful Soup4 for webpage scraping. So now my concern is how to get such URLs dynamic way?
Please advice if any!
You get current_url attribute on the driver:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://www.google.com')
print(browser.current_url)
Related
To open in Selenium a page where authentication is needed the code below is appropriate.
driver = webdriver.Firefox()
driver.get("https://username:password#testwebsite.com/testpage.html")
driver.implicitly_wait(30)
But the above works when a page should be open directly from Selenium as the initial page.
In my project, I click a link that opens a page where an authentication as above is required.
How to resolve it? I can not use directly driver.get(...) because I can not open directly this page...
I am trying to scrape data from this Web
I need to login here.
Here is my code.
I dont really know how to do that.
How can do this?
... Sincerely thanks. <3
To login you can go directly to the login page and then go to the page you want to scrape.
You have to download chromedriver from here and specify the path on my script.
This is how you can do it:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(executable_path=r'PATH')#PUT YOUR CHROMEDRIVER PATH HERE
driver.get("https://secure.vietnamworks.com/login/vi?client_id=3") #LOGIN URL
driver.find_element_by_id("email").send_keys("YOUR-EMAIL#gmail.com") #PUT YOUR EMAIL HERE
driver.find_element_by_id('login__password').send_keys('PASSWORD') #PUT YOUR PASSWORD HERE
driver.find_element_by_id("button-login").click()
driver.get("https://www.vietnamworks.com/technical-specialist-scientific-instruments-lam-viec-tai-hcm-chi-tuyen-nam-tuoi-tu-26-32-chi-nhan-cv-tieng-anh-1336108-jv/?source=searchResults&searchType=2&placement=1336109&sortBy=date") #THE WEB PAGE YOU NEED TO SCRAPE
And then you can get the data from the web page.
I don't think you need to use your main page a navigate tru that. You can just use the link https://www.vietnamworks.com/dang-nhap?from=home-v2&type=login and then write your credentials to the page that loads.
After the page loads, you use
find_element_by_xpath("""//*[#id="email"]""").send_keys("youremail")
password_element = find_element_by_xpath("""//[#id="login__password"]""").send_keys("yourpassword")
password_element.submit()
Xpath is obviously the xpath to the element you need. .submit() is the same as using enter.
ı want to scrape data from Instagram, Twitter. But ı cant do it with requests alone. So ı want to use a WebDriver like selenium to access the page and get the page source on the browser. How can ı get the content of the page which is opened by selenium?
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://example.com")
html_source = driver.page_source
You can use the driver.page_source property
I am trying to use python to get access to the inspect element code from a page that i have already open.
I tried using selenium
import selenium
driver = selenium.webdriver.Chrome()
driver.get("mywebsite")
html = driver.execute_script("return document.documentElement.innerHTML;")
It works, giving me the code, but it opens the a new chrome page.
Thanks.
I use Python selenium with phantomjs driver to scrape landing pages of website.
One site Phantomjs failed is : http://stance.com . It looks like the site uses angularjs and expressjs
My code is simply:
from selenium import webdriver
driver = webdriver.PHANTOMJS()
driver.get(http://www.stance.com')
The response is basically empty. Can anyone provide hints on why it failed? and what I can do to make to work.