I am trying to scrape data from this Web
I need to login here.
Here is my code.
I dont really know how to do that.
How can do this?
... Sincerely thanks. <3
To login you can go directly to the login page and then go to the page you want to scrape.
You have to download chromedriver from here and specify the path on my script.
This is how you can do it:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(executable_path=r'PATH')#PUT YOUR CHROMEDRIVER PATH HERE
driver.get("https://secure.vietnamworks.com/login/vi?client_id=3") #LOGIN URL
driver.find_element_by_id("email").send_keys("YOUR-EMAIL#gmail.com") #PUT YOUR EMAIL HERE
driver.find_element_by_id('login__password').send_keys('PASSWORD') #PUT YOUR PASSWORD HERE
driver.find_element_by_id("button-login").click()
driver.get("https://www.vietnamworks.com/technical-specialist-scientific-instruments-lam-viec-tai-hcm-chi-tuyen-nam-tuoi-tu-26-32-chi-nhan-cv-tieng-anh-1336108-jv/?source=searchResults&searchType=2&placement=1336109&sortBy=date") #THE WEB PAGE YOU NEED TO SCRAPE
And then you can get the data from the web page.
I don't think you need to use your main page a navigate tru that. You can just use the link https://www.vietnamworks.com/dang-nhap?from=home-v2&type=login and then write your credentials to the page that loads.
After the page loads, you use
find_element_by_xpath("""//*[#id="email"]""").send_keys("youremail")
password_element = find_element_by_xpath("""//[#id="login__password"]""").send_keys("yourpassword")
password_element.submit()
Xpath is obviously the xpath to the element you need. .submit() is the same as using enter.
Related
I'm creating a bot to download a pdf from a website. I used selenium to open google chrome and I can open the website window but I select the Xpath of the first item in the grid, but the click to download the pdf does not occur. I believe I'm getting the wrong Xpath.
I leave the site I'm accessing and my code below. Could you tell me what am I doing wrong? Am I getting the correct Xpath? Thank you very much in advance.
This site is an open government data site from my country, Brazil, and for those trying to access from outside, maybe the IP is blocked, but the page would be this:
Image site
Source site
Edit
Page source code
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
service = Service(ChromeDriverManager().install())
navegador = webdriver.Chrome(service=service)
try:
navegador.get("https://www.tce.ce.gov.br/cidadao/diario-oficial-eletronico")
time.sleep(2)
elem = navegador.find_element(By.XPATH, '//*[#id="formUltimasEdicoes:consultaAvancadaDataTable:0:j_idt101"]/input[1]')
elem.click()
time.sleep(2)
navegador.close()
navegador.quit()
except:
navegador.close()
navegador.quit()
I think you'll need this PDF, right?:
<a class="maximenuck " href="https://www.tce.ce.gov.br/downloads/Jurisdicionado/CALENDARIO_DAS_OBRIGACOES_ESTADUAIS_2020_N.pdf" target="_blank"><span class="titreck">Estaduais</span></a>
You'll need to locate that element by xpath, and then download the pdf's using the "href" value requests.get("Your_href_url")
The XPATH in your source-code is //*[#id="menu-principal"]/div[2]/ul/li[5]/div/div[2]/div/div[1]/ul/li[14]/div/div[2]/div/div[1]/ul/li[3]/a but that might not always be the same.
I am trying to make an Instagram bot that can perform various functions - InstaPy kept timing out on me so I decided to use selenium BUT the issue is: I can't seem to get the past the first hurdle of actually logging into IG.
I am not getting any errors on the console but it won't let me past the past additional cookies acceptance page. I have played with the xpath and done a few tweeks but still nothing - any ideas on a fix here ?
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time, urllib.request
import requests
PATH = r"/Users/PycharmProjects/pythonProject13/chromedriver"
driver = webdriver.Chrome(PATH)
driver.get('https://www.instagram.com')
#login
time.sleep(5)
notnow = driver.find_element_by_xpath("/html/body/div[4]/div/div/button[2], 'Allow Essential and Optional Cookies')]").click()
username=driver.find_element_by_css_selector("input[name='username']") #arialabelondevtools = #Phone number, username or email address
password=driver.find_element_by_css_selector("input[name='password']")
username.clear()
password.clear()
username.send_keys("testacct1")
password.send_keys("testpassword123")
login = driver.find_element_by_css_selector("button[type='submit']").click()
One of the most common mistake that people do is write absolute xpath or probably you are copying xpath from browser it self so instead use smarter xpath use id, class and other attributes to write xpath..
I recently did login to Instagram and here is the simple go
driver.get('https://www.instagram.com/')
wait = WebDriverWait(driver, 30)
wait.until(EC.visibility_of_element_located((By.XPATH, '//input[#name="username"]')))
driver.find_element_by_xpath('//input[#name="username"]').send_keys('your_login')
driver.find_element_by_xpath('//input[#type="password"]').send_keys('your_password')
driver.find_element_by_xpath('//input[#type="password"]').submit()
once you past login page you can
driver.get('https://instagram.com/')
it will reload to your home page...
So I'm trying to learn Selenium for automated testing. I have the Selenium IDE and the WebDrivers for Firefox and Chrome, both are in my PATH, on Windows. I've been able to get basic testing working but this part of the testing is eluding me. I've switched to using Python because the IDE doesn't have enough features, you can't even click the back button.
I'm pretty sure this has been answered elsewhere but none of the recommended links provided an answer that worked for me. I've searched Google and YouTube with no relevant results.
I'm trying to find every link on a page, which I've been able to accomplish, even listing the I would think this would be just a default test. I even got it to PRINT the text of the link but when I try to click the link it doesn't work. I've tried doing waits of various sorts, including
visibility_of_any_elements_located AND time.sleep(5) To wait before trying to click the link.
I've tried this to click the link after waiting self.driver.find_element(By.LINK_TEXT, ("lnktxt")).click(). But none work, not in below code, the below code works, listing the URL Text, the URL and the URL Text again, defined by a variable.
I guess I'm not sure how to get a variable into the By.LINK_TEXT or ...by_link_text statement, assuming that would work. I figured if I got it into the variable I could use it again. That worked for print but not for click()
I basically want to be able to load a page, list all links, click a link, go back and click the next link, etc.
The only post this site recommended that might be helpful was...
How can I test EVERY link on the WEBSITE with Selenium
But it's Java based and I've been trying to learn Python for the past month so I'm not ready to learn Java just to make this work. The IDE does not seem to have an easy option for this, or from all my searches it's not documented well.
Here is my current Selenium code in Python.
import pytest
import time
import json
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
wait_time_out = 15
class TestPazTestAll2():
def setup_method(self, method):
driver = webdriver.Firefox()
self.driver = webdriver.Firefox()
self.vars = {}
def teardown_method(self, method):
self.driver.quit()
def test_pazTestAll(self):
self.driver.get('https://poetaz.com/poems/')
lnks=self.driver.find_elements_by_tag_name("a")
print ("Total Links", len(lnks))
# traverse list
for lnk in lnks:
# get_attribute() to get all href
print(lnk.get_attribute("text"))
lnktxt = (lnk.get_attribute("text"))
print(lnk.get_attribute("href"))
print(lnktxt)
driver.quit()
Again, I'm sure I missed something in my searches but after hours of searching I'm reaching out.
Any help is appreciated.
I basically want to be able to load a page, list all links, click a link, go back and click the next link, etc.
I don't recommend doing this. Selenium and manipulating the browser is slow and you're not really using the browser for anything where you'd really need a browser.
What I recommend is simply sending requests to those scraped links and asserting response status codes.
import requests
link_elements = self.driver.find_elements_by_tag_name("a")
urls = map(lambda l: l.get_attribute("href"), link_elements)
for url in urls:
response = requests.get(url)
assert response.status_code == 200
(You also might need to prepend some base url to those strings found in href attributes.)
With the following code, I am able to open a web page and retrieve its contents.
Based on this web page contents, I would like to execute a post on this page where I supply some form data.
How can this be done with the selenium / chromedriver api?
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome(executable_path=r"/usr/local/share/chromedriver")
url = r'https:\\somewebpage.com'
result = browser.get(url)
I don't think this is possible with selenium alone.
What you could do is fill the form / click on the submit button with something like this:
input_a = driver.find_element_by_id("input_a")
input_b = driver.find_element_by_id("input_b")
input_a.send_keys("some data")
input_b.send_keys("some data")
driver.find_element_by_name("submit").click()
If you really want to create the POST request yourself, you should look into the https://github.com/cryzed/Selenium-Requests package, which will allow you to create POST requests just like the Requests package but with Selenium.
I am using python2.7 with beautiful Soup4 and Selenium webdriver. Now in my webautomation script i will open the link or URL and get into the home page. Now I need to click onto some anchor Labels to navigate through other pages.I did till now. now when i will be going to a new page, I need to get the new URL from the browser as I need to pass it Beautiful Soup4 for webpage scraping. So now my concern is how to get such URLs dynamic way?
Please advice if any!
You get current_url attribute on the driver:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://www.google.com')
print(browser.current_url)