I'm trying to click the button upload an image to this website: https://prnt.sc/
But it seems like there is not even a [button], so can I even click anything? Is this even possible? Super confused.
There's lots of documentation on how to do this with selenium, but not much for Playwright unfortunately.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False, slow_mo=50)
page = browser.new_page()
page.goto("https://prnt.sc/")
page.locator("class=uploader__browse_button").click()
I am not using page.click because there is no button.
(From what I can see)
I still get errors using this code.
I've gone through the websites code and found
<form action="https://prntscr.com/upload.php" method="post" id="fileupload">
Hopefully that helps.
Just use set_input_files. Here is an example:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.webkit.launch()
page = browser.new_page()
page.goto('https://prnt.sc/')
# click on AGREE privacy
page.click('button[mode="primary"]')
# set file to form field
page.set_input_files('input[type="file"]', 'FULL_PATH_TO_FILE_HERE')
# wait for a link after upload
link = page.wait_for_selector('#link-textbox', state='visible').inner_text()
print(f'file link: {link}')
page.screenshot(path='example.png')
browser.close()
Related
Context:
Playwright Version: 1.29.1
Operating System: Windows
Python version: 3.8.2
Browser: Chromium
Describe the bug
This error happens in some specific situations, usually when directly or indirectly opening a pdf preview page.
def test():
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False) _**#In this situation, with headless set to False, the blank pdf is issued, but when set to True, it is not possible to send it. Any suggestion?**_
context = browser.new_context()
page = context.new_page()
page.goto('https://nyc3.digitaloceanspaces.com/midia/wp-content/uploads/2023/01/4tmrioel-sample.pdf')
page.pdf(path='Test.pdf', format='A4')
test()
For example, in the code snippet below, the pdf is generated empty, blank (the screenshots are black with no content). I noticed that this error happened because of the speed, so I used
the sleep library, as wait_for_load_state() does not work in this case.
However, the new generated pdfs come out with the wrong formatting (here is an attached image showing a print, I hid the content, but the layout is the same withou the black)
enter image description here
My theory is that the page generates like this because of the chromium pdf viewer summary. So, I tried to disable it in this code:
def test():
from playwright.sync_api import sync_playwright
from time import sleep
with sync_playwright() as p:
# browser = p.chromium.launch(headless=False)
browser = p.chromium.launch_persistent_context(user_data_dir=r'C:\Users\pedro\AppData\Local\Temp\playwright_chromiumdev_profile-AidV4Q\Default', args=['--print-to-pdf', '--disable-extensions', '--print-to-pdf-no-header'], headless=False)
page = browser.new_page()
page.goto('https://nyc3.digitaloceanspaces.com/midia/wp-content/uploads/2023/01/4tmrioel-sample.pdf')
sleep(5)
page.pdf(path='test.pdf', format='A4')
input()
Still, I couldn't solve the problem.
Details: Unable to run these codes in Headless mode, chromium appears to be being automated (making detection easier). So, does anyone have a solution to my problem?
You can't manage because it is not allowed in headless mode. As you can read in official docs: https://playwright.dev/python/docs/api/class-page#page-goto
Hoe can I accept the dialog using python playwright. For your kind information I have already tried this code but it doesn't seems to work for me. Any other solution other than that will be appreciable. Thanks
from playwright.sync_api import sync_playwright
def handle_dialog(dialog):
print(dialog.message)
dialog.dismiss()
def run(playwright):
chromium = playwright.chromium
browser = chromium.launch()
page = browser.new_page()
page.on("dialog", handle_dialog)
page.evaluate("alert('1')")
browser.close()
with sync_playwright() as playwright:
run(playwright)
I am trying to understand how the react selectors are working according to https://playwright.dev/docs/selectors#react-selectors . So I am trying some things in playwright sandbox. Seems that the react component cannot be found.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://www.glassdoor.co.uk/Job/qa-engineer-jobs-SRCH_KO0,11.htm")
page.locator("_react=q[key='1007467366491']").click()
browser.close()
Error:
playwright._impl._api_types.TimeoutError: Timeout 30000ms exceeded.
=========================== logs ===========================
waiting for selector "_react=q[key='1007467366491']"
============================================================
sandbox example
Are there any more detailed examples for react out there?
Playwright does not support key filtering at the moment. But you can filter for the job.id which is part of the props:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://www.glassdoor.co.uk/Job/qa-engineer-jobs-SRCH_KO0,11.htm")
page.locator("_react=q[job.id=1007630619432]").click()
browser.close()
I have a daily task at work to download some files from internal company website. The site requires a login. But the main url is something like:
https://abcd.com
But when I open that in the browser, it redirects to something like:
https://abcdGW/ln-eng.aspx?lang=eng&lnid=e69d5d-xxx-xxx-1111cef®l=en-US
My task normally is to open this site, login, click some links back and forth and download some files. This takes me 10 minutes everyday. But I wanna automate this using python. Using my basic knowledge I have written below code:
import urllib3
from bs4 import BeautifulSoup
import requests
import http
url = "https://abcd.com"
redirectURL = requests.get(url).url
jar = http.cookiejar.CookieJar(policy=None)
http = urllib3.PoolManager()
acc_pwd = {'datasouce': 'Data1', 'user':'xxxx', 'password':'xxxx'}
response = http.request('GET', redirectURL)
soup = BeautifulSoup(response.data)
r = requests.get(redirectURL, cookies=jar)
r = requests.post(redirectURL, cookies=jar, data=acc_pwd)
print ("RData %s" % r.text)
This shows that I am able to successfully login. The next step is something where i am stuck. On the page after login I have some links on left side, one of those I need to click. When I inspect them in Chrome, I see them as:
href="javascript:__doPostBack('myAppControl$menu_itm_proj11','')"><div class="menu-cell">
<img class="menu-image" src="images/LiteMenu/projects.png" style="border-width:0px;"><span class="menu-text">Projects</span> </div></a>
This is probably a javascript link. I need to click this, and then on new page another link, then another to download a file and back to the main page and do this all over again to download different files.
I would be grateful to anyone who can help or suggest.
Thanks to chris, I was able to complete this..
First using the request library I got the redirect url as:
redirectURL = requests.get(url).url
After that I use scrapy and selenium for click links and downloading files..
By adding selenium to the browser as add-in/plugin, it was quite simple.
I am trying to get some comments off the car blog, Jalopnik. It doesn't come with the web page initially, instead the comments get retrieved with some Javascript. You only get the featured comments. I need all the comments so I would click "All" (between "Featured" and "Start a New Discussion") and get them.
To automate this, I tried learning Selenium. I modified their script from Pypi, guessing the code for clicking a link was link.click() and link = broswer.find_element_byxpath(...). It doesn't look liek the "All" button (displaying all comments) was pressed.
Ultimately I'd like to download the HTML of that version to parse.
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import time
browser = webdriver.Firefox() # Get local session of firefox
browser.get("http://jalopnik.com/5912009/prius-driver-beat-up-after-taking-out-two-bikers/") # Load page
time.sleep(0.2)
link = browser.find_element_by_xpath("//a[#class='tc cn_showall']")
link.click()
browser.save_screenshot('screenie.png')
browser.close()
Using Firefox with the Firebug plugin, I browsed to http://jalopnik.com/5912009/prius-driver-beat-up-after-taking-out-two-bikers.
I then opened the Firebug console and clicked on ALL; it obligingly showed a single AJAX call to http://jalopnik.com/index.php?op=threadlist&post_id=5912009&mode=all&page=0&repliesmode=hide&nouser=true&selected_thread=null
Opening that url in a new window gets me the comment feed you are seeking.
More generally, if you substitute the appropriate article-ID into that url, you should be able to automate the process without Selenium.