Download files via selenium headless chrome on python

Download files via selenium headless chrome on python - python

So the issue of downloading files via headless chrome with selenium still seems to be a problem as it was asked here with no answer over a month ago. but I don't understand how they are implementing the js which is in the bug thread. Is there an option I can add or a current fix for this? The original bug page located here
All of my stuff is up to date as of today 10/22/17
In python:
from selenium import webdriver
options = webdriver.ChromeOptions()
prefs = {"download.default_directory": "C:/Stuff",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True
}
options.add_experimental_option("prefs", prefs)
options.add_argument('headless')
driver = webdriver.Chrome(r'C:/Users/aaron/chromedriver.exe', chrome_options = options)
# test file to download which doesn't work
driver.get('http://ipv4.download.thinkbroadband.com/5MB.zip')
If the headless option is removed this works no problem.
The actual files I'm attempting to download are PDFs located at .aspx URLs. I'm downloading them by doing a .click() and it works great except not with the headless version. The hrefs are javascript do_postback scripts.

Why don't you locate the anchor href and then use get request to download the file. This way it will work in headless mode and will be much faster. I have done that in C#.
def download_file(url):
local_filename = url.split('/')[-1]
# NOTE the stream=True parameter
r = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
#f.flush() commented by recommendation from J.F.Sebastian
return local_filename

I believe now that Chromium supports this feature (as you linked to the bug ticket), it falls to the chromedriver team to add support for the feature. There is an open ticket here, but it does not appear to have a high priority at the moment. Please, everyone who needs this feature, go give it a +1!

For those of you not on the chromium ticket linked above or haven't found a solution. This is working for me. Chrome is updated to v65 and chromedriver/selenium are both up to date as of 4/16/18.
prefs = {'download.prompt_for_download': False,
'download.directory_upgrade': True,
'safebrowsing.enabled': False,
'safebrowsing.disable_download_protection': True}
options.add_argument('--headless')
options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome('chromedriver.exe', chrome_options=options)
driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
driver.desired_capabilities['browserName'] = 'ur mum'
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': r'C:\chickenbutt'}}
driver.execute("send_command", params)
If you're getting a Failed-file path too long error when downloading make sure that the downloadpath does't have a trailing space or slash\or backslash. The path must also use backslashes only. I have no idea why.

Related

Custom Download folder for chromedriver in python

I want to save the files I get with a scraper into custom folders. I looked around and none of the solutions I found worked for me. Here is my configuration:
options = webdriver.ChromeOptions()
prefs = {
'profile.default_content_settings.popups': 0,
'download.default_directory': my_data_folder,
"download.directory_upgrade": True,
"download.prompt_for_download": False,
"safebrowsing.enabled":False,
}
options.add_argument('--remote-debugging-port=9222')
options.add_experimental_option("useAutomationExtension", False)
desired_caps = {
'prefs': {
'savefile': {
'default_directory': my_data_folder,
"directory_upgrade": True,
"extensions_to_open": ""
}
}
}
options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=options, desired_capabilities=desired_caps)
But when I try downloading, it goes to ~/Downloads/ instead of my_data_folder.
I have tried prefs and desired_caps independently to no avail.
I am using Chromium 108.0.5359.22 snap
Help is appreciated !
I have tried:
How to download to a specific folder with Chromedriver?
Define download directory for chromedriver selenium with python
and many other posts and blogs. All these solutions are summarised in the script above.
Thanks !
UPDATE
It works if I add
options.add_argument("--headless")
The folder option works, but this is not desirable for other reasons. Is there a better way to fix this problem?

Have you thought about using shutil to move the file after the download ?
Here's how I had that implemented in another project I was working on
filename = max([
download_folder + "\\" + f for f in os.listdir(download_folder)],
key=os.path.getctime)
shutil.move(
filename,
os.path.join(download_folder,f"filename.format")
)

Selenium Webdriver requests a different url from the one set

After an update in selenium and visual studio I have the following problem. I try to get a url for example
thestore = "http://shop.oki.gr/shop/store/diathesimotita_new.asp" and instead I have a window opened with
http://www.puttop.top/object.php?u=http://shop.oki.gr/shop/store/customerauthenticateform.asp?redirectUrl=http://shop.oki.gr/shop/store/diathesimotita_new.asp&title=Login%20Page
which of course is not working.
options = webdriver.ChromeOptions()
options.add_experimental_option("prefs", {"download.default_directory": downloads_path,
"profile.default_content_settings.popups": 0,
"download.prompt_for_download": False,
"directory_upgrade": True,
"safebrowsing.enabled": True})
browser = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options)
thestore = "http://shop.oki.gr/shop/store/diathesimotita_new.asp"
browser.get(thestore)
The initial url is opening from other pc normally..
What is happening?

I cant open either of these but it seems like its redirecting you to a login page first
I have a couple programs that do something similar and I keep a function that login for me on hand, then I get the original url again

Python and Selenium download 0 KB excel files

I have already been to this link with same question, but I cannot find an answer to it:
Although my question is the same as the other question, I posted a new one with my code as well.
url='https://example.com/'
download_url="https:/example.com/Download"
chromedriver = 'path\\to\chromedriver.exe'
options = Options()
ua = UserAgent()
userAgent = ua.random
print(userAgent)
options.add_argument(f'user-agent={userAgent}')
options.add_experimental_option("prefs", {
"download.default_directory": r"C:\Users\helia\Desktop\Test",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": True
})
options.add_argument("--headless")
options.add_argument("--window-size=%s" % WINDOW_SIZE)
driver = webdriver.Chrome(chrome_options=options, executable_path=chromedriver)
driver.get(url)
user_name = driver.find_element_by_name('User')
pass_word = driver.find_element_by_name("Pass")
user_name.send_keys("my_username")
pass_word.send_keys("my_password")
driver.find_element_by_class_name("btnn.btnn-default.b").click()
driver.get(download_url)
driver.find_element_by_class_name("btn.btn-app").click()
driver.switch_to.alert.accept()
The code successfully downloads the file but the file is 0 KB. both on the website and on my local; however, the file on the site has never been 0 before.
(the program finishes while the file is being downloaded, could it be the cause? do I need to add some waits?)

Your question is not clear enough, however I guess your problem is:
After clicking the download button and accepting the alert your code finishes immediately so downloaded file have had no enough time to be actually downloaded.
In order to get the file completely downloaded you should prevent browser to be closed until the downloading not complete.

Python Web Scraping saving Tik Tok video from url

I am trying to save videos from this url:
Original:
https://api2.musical.ly/aweme/v1/play/?video_id=v09044a20000beeff4c108gs7sflfdug
Link changes to this:
http://v16.muscdn.com/3d238aa3e1c34000ce53792155cd0e15/5bcf3070/video/tos/maliva/tos-maliva-v-0068/e5a1ab74d0b54f97b3578924a428e58d/
The video is from TikTok. When you go to the url, it instantly redirects you to another url. The other url is the one I want in order to save the video. However, the url it directs you to does not have a "view html source" option. I can inspect the element and that shows it has a video tag, but I cannot find a way to save the url between the tag. I am using python and beautifulsoup. I tried to do this with selenium, but to no effect.

Edit:
The link that it redirects to changes all the time! as of 27/08/2019, the link below works...
If you get Access denied you should check the link once again...
I think you should use other libraries for saving videos...
For example (in Python 3+):
import urllib.request
vid_url = "http://v19.muscdn.com/21b98c731608b8aa296ec31468c26dd1/5d652a88/video/tos/maliva/tos-maliva-v-0068/e5a1ab74d0b54f97b3578924a428e58d/?rc=amdvdnY7NDdpaDMzNTczM0ApdSlINzU2NTM0MzM2MzM1MzQ1b2k5ZmU5Z2c1ZGY5ZmQzPGZAaUBoNnYpQGczdilAZjY1QHJjYzRkLWBjYl8tLV4xNnNzOmk0NTU1LjQtLi4uMTQ0NTYtOiM2MDAtXjQzXzMxMTFeMWEzYSNvIzphLW8jOmAtbyMwLl4%3D"
urllib.request.urlretrieve(vid_url, "your_video_name.mp4")
If you insist on using selenium you can add options like this:
options = webdriver.ChromeOptions()
options.add_experimental_option("prefs", {
"download.default_directory": r"C:\Users\xxx\downloads\Test",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": True
})
driver = webdriver.Chrome(chrome_options=options)
Hope this helps you!

Chromedriver, Selenium - Automate downloads

I am using Selenium 2.43.0 with Python 2.7.5. At one point, the test clicks on a button which sends form information to the server. If the request is successful, the server responds with
1) A successful message
2) A PDF with the form information merged in
I don't care to test the PDF, my test is just looking for a successful message. However the PDF is part of the package response from the server that I, as the tester, cannot change.
Until recently, this was never an issue using Chromedriver, since Chrome would automatically download pdfs into its default folder.
However, a few days ago one of my test environments started popping a separate window with a "Print" screen for the pdf, which derails my tests.
I don't want or need this dialog. How do I suppress this dialog programmatically using chromedriver's options? (Something equivalent to FireFox's pdfjs.disable option in about:config).
Here is my current attempt to bypass the dialog, which does not work (by "not work" does not disable or suppress the print pdf dialog window):
dc = DesiredCapabilities.CHROME
dc['loggingPrefs'] = {'browser': 'ALL'}
chrome_profile = webdriver.ChromeOptions()
profile = {"download.default_directory": "C:\\SeleniumTests\\PDF",
"download.prompt_for_download": False,
"download.directory_upgrade": True}
chrome_profile.add_experimental_option("prefs", profile)
chrome_profile.add_argument("--disable-extensions")
chrome_profile.add_argument("--disable-print-preview")
self.driver = webdriver.Chrome(executable_path="C:\\SeleniumTests\\chromedriver.exe",
chrome_options=chrome_profile,
service_args=["--log-path=C:\\SeleniumTests\\chromedriver.log"],
desired_capabilities=dc)
All component versions are the same in both testing environments:
Selenium 2.43.0, Python 2.7.5, Chromedriver 2.12, Chrome (browser) 38.0.02125.122

I had to dig into the source code on this one - I couldn't find any docs listing the full set of Chrome User Preferences.
The key is "plugins.plugins_disabled": ["Chrome PDF Viewer"]}
FULL CODE:
dc = DesiredCapabilities.CHROME
dc['loggingPrefs'] = {'browser': 'ALL'}
chrome_profile = webdriver.ChromeOptions()
profile = {"download.default_directory": "C:\\SeleniumTests\\PDF",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"plugins.plugins_disabled": ["Chrome PDF Viewer"]}
chrome_profile.add_experimental_option("prefs", profile)
#Helpful command line switches
# http://peter.sh/experiments/chromium-command-line-switches/
chrome_profile.add_argument("--disable-extensions")
self.driver = webdriver.Chrome(executable_path="C:\\SeleniumTests\\chromedriver.exe",
chrome_options=chrome_profile,
service_args=["--log-path=C:\\SeleniumTests\\chromedriver.log"],
desired_capabilities=dc)
Interestingly the blanket command chrome_profile.add_argument("--disable-plugins") switch did not solve this problem. But I prefer the more surgical approach anyways.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Download files via selenium headless chrome on python - python

I believe now that Chromium supports this feature (as you linked to the bug ticket), it falls to the chromedriver team to add support for the feature. There is an open ticket here, but it does not appear to have a high priority at the moment. Please, everyone who needs this feature, go give it a +1!

Related

Custom Download folder for chromedriver in python

Selenium Webdriver requests a different url from the one set

Python and Selenium download 0 KB excel files

Python Web Scraping saving Tik Tok video from url

Chromedriver, Selenium - Automate downloads

Categories

Resources