I am trying to download a google doc as a pdf using Selenium in Python. Unfortunately, my html knowledge is quite minimal and as a result I don't know what html I need to have it click file and then download as pdf. I realize that I can use the web developer tool to get html but that isn't working for me so well.
Here is what I have tried so far:
from selenium import webdriver
url = ' https://docs.google.com/document/d/1Y1n-RR5j_FQ9WFMG8E_ajO0OpWLNANRu4lQCxTw9T5g/edit?pli=1'
browser = webdriver.Firefox()
browser.get(url)
Any help would be appreciated; thanks!
As you mention in your comment, Google Drive doesn't like being scraped.
The drive command looks like the right tool for this sort of job. - It'll do you're trying to do, but not the way you want to do it. According to the docs (i.e. I haven't tested it), this command looks like it would download your file:
drive pull --export docx --id 1Y1n-RR5j_FQ9WFMG8E_ajO0OpWLNANRu4lQCxTw9T5g
(Also, in general, I find the easiest way to use Selenium is to use the Selenium IDE to tell Selenium what you want to do, then export the resulting test case by going to File > Export Test Case As... > Python 2 / unittest / Web Driver.)
Hope that helps.
I have a working solution, I don't know if google will update to mitigate this. Now this is in c#, but the selenium functionality is basically the same.
Show all the menu items, except the download as menu and return the download as webelement. Use selenium to click it, then select a format and return the webelement to click as well. I couldn't do a click using just javascript, I was unable to figure out how to they triggered it, but clicking it using selenium driver worked just fine.
Make most of the menu's visible and return download as webelement.
document.querySelector(`#docs-file-menu`).className = 'menu-button goog-control goog-
inline-block goog-control-open docs-menu-button-open-below';
document.querySelector(`#docs-file-menu`).setAttribute('aria-expanded', 'true');
document.querySelectorAll(`.goog-menu:not(.goog-menu-noaccel)`)[0].className = 'goog-menu goog-menu-vertical docs-material docs-menu-hide-mnemonics docs-menu-attached-button-above';
document.querySelectorAll(`.goog-menu:not(.goog-menu-noaccel)`)[0].setAttribute('style', 'user-select: none; visibility: visible; left: 64px; top: 64px;');
// download as
// 2 parents above
document.querySelector(`[aria-label='Download as d']`).parentElement.parentElement.className = 'goog-menuitem apps-menuitem goog-submenu goog-submenu-open goog-menuitem-highlight'
return document.querySelector(`[aria-label='Download as d']`).parentElement.parentElement;
Click download as btn:
IWebElement btn = (IWebElement)((IJavaScriptExecutor)driver).ExecuteScript(btnClickJs);
btn.Click();
Select format:
var formatCss = document.querySelectorAll(`.goog-menu.goog-menu-noaccel`)[6].querySelectorAll(`.goog-menuitem.apps-menuitem`)
var format = 'injectformathere' ? 'injectformathere' : '.html'
for (let i = 0; i < formatCss.length; i++) {
if(formatCss[i].innerText.indexOf(format)!= -1)
return formatCss[i]
}
return null
Click format:
btn = (IWebElement)((IJavaScriptExecutor)driver).ExecuteScript(btnClickJs);
if (btn != null)
btn.Click();
Related
I am stuck in a problem and searched a couple of days.
I am testing a website using selenium in python, I want to check the working of "About Us" button. In that website, clicking "About US" will scroll the page smoothly and take you to that "About US" section.
Now I want to confirm with code, that did that clicking took me to the write section or not?
The first logic came in my mind was to check that either the main division of "About US" section is in viewport after clicking the button? But I don't know how to check it.
I have gone through the documentation and found is_displayed() method but that is used to see is the item is visible or not (like its opacity etc)
Kindly help me.
Regards
There are many ways to assert for correct page loaded, most used are the assert for correct url loaded and page title.
Assert for Correct URL Loaded:
String expectedUrl = "https://www.google.com";
WebDriver driver = new FirefoxDriver();
driver.get(expectedUrl);
try{
Assert.assertEquals(expectedUrl, driver.getCurrentUrl());
System.out.println("Navigated to correct webpage");
}
catch(Throwable pageNavigationError){
System.out.println("Didn't navigate to correct webpage");
}
Assert for page title:
String expectedTitle = "Google";
String expectedUrl = "https://www.google.com";
WebDriver driver = new FirefoxDriver();
driver.get(expectedUrl);
try{
Assert.assertEquals(expectedTitle, driver.getTitle());
System.out.println("Navigated to correct webpage");
}
catch(Throwable pageNavigationError){
System.out.println("Didn't navigate to correct webpage");
}
I am using selenium webdriver to do some automation on IE11 and am stuck on auto-downloading file while screen is locked.
The file download starts after pressing a button. That button does not link to a url for the download file, but seemed to link to a javascript function. I have managed everything until pressing the button but am stuck on the bottom bar prompt from IE11 (open or save). For security reasons I have to use IE11 and do the automation in locked screen. I have tried to send Alt+S using WScript.Shell, but it only seemed to work after I unlock the screen. Here's what I've tried.
shell = win32.Dispatch("WScript.Shell")
config = Path(localpath + localfile)
#confirm file exists
while not config.is_file():
shell.SendKeys("%s", 0)
time.sleep(2)
Is there a way to bypass the IE prompt and automatically save the file to the download folder during locked screen?
As far as I know, WebDriver has no capability to access the IE Download dialog boxes presented by browsers when you click on a download link or button. But, we can bypass these dialog boxes using a separate program called "wget".
we can use command-line program and this wget to download the file. Code as below (the following code is C# code, you could convert it to Python):
var options = new InternetExplorerOptions()
{
InitialBrowserUrl = URL, // "http://demo.guru99.com/test/yahoo.html";
IntroduceInstabilityByIgnoringProtectedModeSettings = true
};
//IE_DRIVER_PATH: #"D:\Downloads\webdriver\IEDriverServer_x64_3.14.0";
var driver = new InternetExplorerDriver(IE_DRIVER_PATH, options);
driver.Navigate();
Thread.Sleep(5000);
//get the download file link.
String sourceLocation = driver.FindElementById("messenger-download").GetAttribute("href");
Console.WriteLine(sourceLocation);
//using command-line program to execute the script and download file.
String wget_command = #"cmd /c D:\\temp\\wget.exe -P D:\\temp --no-check-certificate " + sourceLocation;
try
{
Process exec = Process.Start("CMD.exe", wget_command);
exec.WaitForExit();
Console.WriteLine("success");
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
driver.Close(); // closes browser
driver.Quit(); // closes IEDriverServer process
More details, you could refer to this article.
Edit: Since you are using python, you could use the find_element_by_id method to find the element, then using the get_attribute method to get the value. More details, you could check these articles:
Get Element Attribute
Python + Selenium for Dummies like Me
WebDriver API
Get value of an input box using Selenium (Python)
This is my very first time trying to scrape data from a website using Selenium. Fortunately I have got Selenium and Chrome to coordinate and the desired website opens.Once it opens up, I want to tell Python to click 'SEARCH' leaving the empty box blank (next to contains) and then tell Python to export the results ' and save the xlsx file as result_file. I do not know why the snippet is blowing up. Please provide your kind assistance.
from selenium import webdriver
driver = webdriver.Chrome("C:\Python27\Scripts\chromedriver.exe")
driver.get("https://etrakit.friscotexas.gov/Search/permit.aspx")
number_option = driver.find_element_by_id("cplMain_btnSearch")
number_option.click()
search_button = driver.find_element_by_id("cplMain_btnExportToExcel")
search_button.click()
result_file = open("result_file.xlsx", "w")
driver.close()
result_file.close()
Looking at the source of that page, the ID of the search button is "cplMain_btnSearch" not "SEARCH". And the Export button is "cplMain_btnExportToExcel".
To expand the answer of Daniel Roseman, you also need to specify the download location
options.add_argument("download.default_directory=C:/Python27")
driver = webdriver.Chrome(chrome_options=options)
The file will then be stored in your python27 directory with the name RadGridExport.csv.
I've been grappling with using PhantomJS/Selenium/python-selenium to download a file to the filesystem.
I'm able to easily navigate through the DOM and click, hover etc. Downloading a file is, however, proving to be quite troublesome. I've tried a headless approach with Firefox and pyvirtualdisplay but that wasn't working well either and was unbelievably slow. I know That CasperJS allows for file downloads. Does anyone know how to integrate CasperJS with Python or how to utilize PhantomJS to download files. Much appreciated.
Despite this question is quite old, downloading files through PhantomJS is still a problem. But we can use PhantomJS to get download link and fetch all needed cookies such as csrf tokens and so on. And then we can use requests to download it actually:
import requests
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('page_with_download_link')
download_link = driver.find_element_by_id('download_link')
session = requests.Session()
cookies = driver.get_cookies()
for cookie in cookies:
session.cookies.set(cookie['name'], cookie['value'])
response = session.get(download_link)
And now in response.content actual file content should appear. We can next write it with open or do whatever we want.
PhantomJS doesn't currently support file downloads. Relevant issues with workarounds:
File download
How to handle file save dialog box using Selenium webdriver and PhantomJS?
As far as I understand, you have at least 3 options:
switch to casperjs (and you should leave python here)
try with headless on xvfb
switch to normal non-headless browsers
Here are also some links that might help too:
Selenium Headless Automated Testing in Ubuntu
XWindows for Headless Selenium (with further links inside)
How to run browsers(chrome, IE and firefox) in headless mode?
Tutorial: How to use Headless Firefox for Scraping in Linux
My use case required a form submission to retrieve the file. I was able to accomplish this using the driver's execute_async_script() function.
js = '''
var callback = arguments[0];
var theForm = document.forms['theFormId'];
data = new FormData();
data.append('eventTarget', "''' + target + '''"); // this is the id of the file clicked
data.append('otherFormField', theForm.otherFormField.value);
var xhr = new XMLHttpRequest();
xhr.open('POST', theForm.action, true);
'''
for cookie in driver.get_cookies():
js += ' xhr.setRequestHeader("' + cookie['name'] + '", "' + cookie['value'] + '"); '
js += '''
xhr.onload = function () {
callback(this.responseText);
};
xhr.send(data);
'''
driver.set_script_timeout(30)
file = driver.execute_async_script(js)
Is not posible in that way. You can use other alternatives to download files like wget o curl.
Use firefox to find the right request and selenium to get the values for that and finally use out of to the box to download the file
curlCall=" curl 'http://www_sitex_org/descarga.jsf' -H '...allCurlRequest....' > file.xml"
subprocess.call(curlCall, shell=True)
I am trying to use Python to open an IE instance, navigate to a particular site, and enter login credentials. I am currently trying to make IEC work, but am open to other options that do the same thing.
I am having trouble the last part (login) because the "button" does not seem to be recognized as such. It appears to be some sort of trigger that acts as a button (role="button")? I am not very familiar with this:
<a title="Click here to log in." class="focus-parent ajax-request need-focus-pageobject" role="button" href="../MyAccount/MyAccountUserLogin.asp?Referrer=&AjaxRequest=true">
Here is the code I have tried:
import IEC
ie = IEC.IEController()
ie.Navigate('https://efun.toronto.ca/torontofun/Start/Start.asp')
ie.PollWhileBusy()
# code after here does not work properly
ie.Navigate('https://efun.toronto.ca/torontofun/MyAccount/MyAccountUserLogin.asp?Referrer=&AjaxRequest=true')
ie.ClickButton(caption='toolbar-login')
ie.SetInputValue('ClientBarcode', '123')
ie.SetInputValue('AccountPIN', 'XYZ')
ie.ClickButton(name='Enter')
I would appreciate tips on how to open the login menu in this case.
Did you try selenium ?
from selenium import webdriver
driver = webdriver.Ie()
login_btn = driver.find_element_by_id("toolbar-login")
login_btn.send_keys(Keys.RETURN)
user_name = driver.find_element_by_id("ClientBarcode")
user_name.sendKeys("user")
user_name.sendKeys(Keys.RETURN)