React locator example - python

I am trying to understand how the react selectors are working according to https://playwright.dev/docs/selectors#react-selectors . So I am trying some things in playwright sandbox. Seems that the react component cannot be found.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://www.glassdoor.co.uk/Job/qa-engineer-jobs-SRCH_KO0,11.htm")
page.locator("_react=q[key='1007467366491']").click()
browser.close()
Error:
playwright._impl._api_types.TimeoutError: Timeout 30000ms exceeded.
=========================== logs ===========================
waiting for selector "_react=q[key='1007467366491']"
============================================================
sandbox example
Are there any more detailed examples for react out there?

Playwright does not support key filtering at the moment. But you can filter for the job.id which is part of the props:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://www.glassdoor.co.uk/Job/qa-engineer-jobs-SRCH_KO0,11.htm")
page.locator("_react=q[job.id=1007630619432]").click()
browser.close()

Related

Error generating PDF (blank or format error) - Playwright Python

Context:
Playwright Version: 1.29.1
Operating System: Windows
Python version: 3.8.2
Browser: Chromium
Describe the bug
This error happens in some specific situations, usually when directly or indirectly opening a pdf preview page.
def test():
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False) _**#In this situation, with headless set to False, the blank pdf is issued, but when set to True, it is not possible to send it. Any suggestion?**_
context = browser.new_context()
page = context.new_page()
page.goto('https://nyc3.digitaloceanspaces.com/midia/wp-content/uploads/2023/01/4tmrioel-sample.pdf')
page.pdf(path='Test.pdf', format='A4')
test()
For example, in the code snippet below, the pdf is generated empty, blank (the screenshots are black with no content). I noticed that this error happened because of the speed, so I used
the sleep library, as wait_for_load_state() does not work in this case.
However, the new generated pdfs come out with the wrong formatting (here is an attached image showing a print, I hid the content, but the layout is the same withou the black)
enter image description here
My theory is that the page generates like this because of the chromium pdf viewer summary. So, I tried to disable it in this code:
def test():
from playwright.sync_api import sync_playwright
from time import sleep
with sync_playwright() as p:
# browser = p.chromium.launch(headless=False)
browser = p.chromium.launch_persistent_context(user_data_dir=r'C:\Users\pedro\AppData\Local\Temp\playwright_chromiumdev_profile-AidV4Q\Default', args=['--print-to-pdf', '--disable-extensions', '--print-to-pdf-no-header'], headless=False)
page = browser.new_page()
page.goto('https://nyc3.digitaloceanspaces.com/midia/wp-content/uploads/2023/01/4tmrioel-sample.pdf')
sleep(5)
page.pdf(path='test.pdf', format='A4')
input()
Still, I couldn't solve the problem.
Details: Unable to run these codes in Headless mode, chromium appears to be being automated (making detection easier). So, does anyone have a solution to my problem?
You can't manage because it is not allowed in headless mode. As you can read in official docs: https://playwright.dev/python/docs/api/class-page#page-goto

Uploading an image to a website with Playwright

I'm trying to click the button upload an image to this website: https://prnt.sc/
But it seems like there is not even a [button], so can I even click anything? Is this even possible? Super confused.
There's lots of documentation on how to do this with selenium, but not much for Playwright unfortunately.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False, slow_mo=50)
page = browser.new_page()
page.goto("https://prnt.sc/")
page.locator("class=uploader__browse_button").click()
I am not using page.click because there is no button.
(From what I can see)
I still get errors using this code.
I've gone through the websites code and found
<form action="https://prntscr.com/upload.php" method="post" id="fileupload">
Hopefully that helps.
Just use set_input_files. Here is an example:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.webkit.launch()
page = browser.new_page()
page.goto('https://prnt.sc/')
# click on AGREE privacy
page.click('button[mode="primary"]')
# set file to form field
page.set_input_files('input[type="file"]', 'FULL_PATH_TO_FILE_HERE')
# wait for a link after upload
link = page.wait_for_selector('#link-textbox', state='visible').inner_text()
print(f'file link: {link}')
page.screenshot(path='example.png')
browser.close()

Trouble in Accepting the Dialog Box using Playwright Python

Hoe can I accept the dialog using python playwright. For your kind information I have already tried this code but it doesn't seems to work for me. Any other solution other than that will be appreciable. Thanks
from playwright.sync_api import sync_playwright
def handle_dialog(dialog):
print(dialog.message)
dialog.dismiss()
def run(playwright):
chromium = playwright.chromium
browser = chromium.launch()
page = browser.new_page()
page.on("dialog", handle_dialog)
page.evaluate("alert('1')")
browser.close()
with sync_playwright() as playwright:
run(playwright)

In Playwright for Python, how do I get elements relative to ElementHandle (children, parent, grandparent, siblings)?

In playwright-python I know I can get an elementHandle using querySelector().
Example (sync):
from playwright import sync_playwright
with sync_playwright() as p:
for browser_type in [p.chromium, p.firefox, p.webkit]:
browser = browser_type.launch()
page = browser.newPage()
page.goto('https://duckduckgo.com/')
element = page.querySelector('input[id=\"search_form_input_homepage\"]')
How do I get the an element relative to this based on this elementHandle? I.e. the parent, grandparent, siblings, children handles?
Original answer:
Using querySelector() / querySelectorAll with
XPath (XML Path Language) lets you retrieve the elementHandle (respectively a collection of handles). Generally speaking, XPath can be used to navigate through elements and attributes in an XML document.
from playwright import sync_playwright
with sync_playwright() as p:
for browser_type in [p.chromium, p.firefox, p.webkit]:
browser = browser_type.launch(headless=False)
page = browser.newPage()
page.goto('https://duckduckgo.com/')
element = page.querySelector('input[id=\"search_form_input_homepage\"]')
parent = element.querySelector('xpath=..')
grandparent = element.querySelector('xpath=../..')
siblings = element.querySelectorAll('xpath=following-sibling::*')
children = element.querySelectorAll('xpath=child::*')
browser.close()
Update (2022-07-22):
It seems that browser.newPage() is deprecated, so in newer versions of playwright, the function is called browser.new_page() (note the different function name).
Optionally create a browser context first (and close it afterwards) and call new_page() on that context.
The way the children/parent/grandparent/siblings are accessed stays the same.
from playwright import sync_playwright
with sync_playwright() as p:
for browser_type in [p.chromium, p.firefox, p.webkit]:
browser = browser_type.launch(headless=False)
context = browser.new_context()
page = context.new_page()
page.goto('https://duckduckgo.com/')
element = page.querySelector('input[id=\"search_form_input_homepage\"]')
parent = element.querySelector('xpath=..')
grandparent = element.querySelector('xpath=../..')
siblings = element.querySelectorAll('xpath=following-sibling::*')
children = element.querySelectorAll('xpath=child::*')
context.close()
browser.close()
The Accepted answer is in older version of playwright.Use the following format for current version it will work.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
for browser_type in [p.chromium, p.firefox, p.webkit]:
browser = browser_type.launch(headless=False)
context = browser.new_context()
page =context.new_page()
page.goto('https://duckduckgo.com/')
element = page.query_selector('input[id=\"search_form_input_homepage\"]')
parent = element.query_selector('xpath=..')
grandparent = element.query_selector('xpath=../..')
siblings = element.query_selector_all('xpath=following-sibling::*')
children = element.query_selector_all('xpath=child::*')
context.close()
browser.close()

How to get the new open page source?

I'm writing a Python crawler using the Selenium library and the PhantomJs browser. I triggered a click event in a page to open a new page, and then I used the browser.page_source method, but I get the original page source instead of the new open page source. I wonder how to get the new open page source?
Here's my code:
import requests
from selenium import webdriver
url = 'https://sf.taobao.com/list/50025969__2__%D5%E3%BD%AD.htm?auction_start_seg=-1&page=150'
browser = webdriver.PhantomJS(executable_path='C:\\ProgramData\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe')
browser.get(url)
browser.find_element_by_xpath("//*[#class='pai-item pai-status-done']").click()
html = browser.page_source
print(html)
browser.quit()
You need to switch to the new window first
browser.find_element_by_xpath("//*[#class='pai-item pai-status-done']").click()
browser.switch_to_window(browser.window_handles[-1])
html = browser.page_source
I believe you need to add a wait before getting page source.
I've used an implicit wait at the code below.
from selenium import webdriver
url = 'https://sf.taobao.com/list/50025969__2__%D5%E3%BD%AD.htm?auction_start_seg=-1&page=150'
browser = webdriver.PhantomJS(executable_path='C:\\ProgramData\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe')
browser.get(url)
browser.find_element_by_xpath("//*[#class='pai-item pai-status-done']").click()
browser.implicitly_wait(5)
html = browser.page_source
browser.quit()
Better to use an explicit wait, but it required a condition like EC.element_to_be_clickable((By.ID, 'someid'))

Categories

Resources