I've been attempting to create a function that iterates over inputs from a text file that contains URLs, using webbrowser package. It works fine when I create a empty list to which URLs are literally appended, as in:
import webbrowser
list = []
list.append(url1)
list.append(url2)
def webbrowsing(list)
for i in range(0, len(list)):
webbrowser.open(list[i])
where url1 and url2 are any valid URLs. And webbrowser.open() opens the URLs in Chrome and it is really good.
However, when I try and do the same thing with inputs from a text file of URLs, webbrowser opens the URLs from the file in Internet Explorer. I gave it a try using webbrowser.get(), explicitly directing it to use Chrome, but that didn't work.
I am not very sure why it does not open the URLs in Chrome, when almost everything seems the same as when the list is used as mentioned above. Chrome is set as my default web browser, and I rarely use the IE. I'd really appreciate any tips on that issue.
How do you define the 'webbrowser' object? I use something like this:
driver = webdriver.Chrome(driverPath) #driverPath contains the path to the 'chromedriver.exe' file
driver.get(url)
Related
I want to ask how to download file using Python from a link like this, I crawled through the stack for a while and didn't find anything that works.
I got a link to a file, something like this:
https://w3.google.com/tools/cio/forms/anon/org/contentload?content=https://w3.ibm.com/tools/cio/forms/secure/org/data/f48f2294-495b-48f5-8d4e-e418f4b25a48/F_Form1/attachment/bba4ddfd-837d-47a6-87ef-2114f6b3da08 (link doesn't work, just showing you how it should look)
And after clicking on it it opens a browser and starts opening file:
I don't know how the file will be named or what format file will have, I only have a URL that links to file like this image up.
I tried this:
def Download(link):
r = requests.get(link)
with open('filename.docx', 'wb') as f:
f.write(r.content)
But this definitely doesn't work, as you can see I manually put the name of the file because it desperate but it doesn't work either, it makes file but only 1kb size and nothing in it.
I don't know how to code it to automatically download it from links like this? Can you help?
use urlretrieve from urllib. See here
You can use urllib.request.urlretrieve to get the contents of the file.
Example:
import urllib.request
with open('filename.docx', 'wb') as f:
f.write(urllib.request.urlretrieve("https://w3.google.com/tools/cio/forms/anon/org/contentload?content=https://w3.ibm.com/tools/cio/forms/secure/org/data/f48f2294-495b-48f5-8d4e-e418f4b25a48/F_Form1/attachment/bba4ddfd-837d-47a6-87ef-2114f6b3da08"))
I need to access the source code of a locally saved file, but I need to automate this because there are multiple files in one folder. I've looked at the inspect module and the selenium module, but I still understand what to do. After accessing the source code, I need to use bs4 to extract from it.
I've read several posts on here and elsewhere with similar problems, but the thing is that my file does not open in the source code (it's written in xml and so far everything needs to be in source code before you can use these modules). If I open the file, it just uses my browser to open a regular page and then I have to click view page source.
How can I automate this so that it will open the page, go to the source code, and save it so I can stick it into a soup for later parsing?
path_g_jurt = r'C:\Users\g\Desktop\t\SDU\jurt htmls\jurt\meta jurt'
file = r'C:\Users\g\Desktop\t\SDU\jurt htmls\jurt\meta jurt' + "/" + file
for file in path_g_jurt:
if file.endswith(".xhtml"):
with open(file, encoding = "utf-8") as mdata_jurt:
soup = BeautifulSoup(mdata_jurt)
main = file.find("jcid").get_text()
misc_links = []
for item in file.find_all("regelgeving"):
misc = item.find("misc:link")
misc_links.append(misc.get("misc:jcid"))
Any help would be appreciated.
I am attempting to check for active web site folders against a list that was created using robots.txt (this is for learning security, Im doing this on a server that I own and control). I am using Python 2.7 on Kali Linux.
My code works if I just do one web address at a time, as I get a proper 200 or 404 response for folders that are active and not working, respectively.
When I attempt to this against the entire list, I get a string of 404 errors. When i print out actual addresses that the script is creating, everything looks correct.
Here is the code that I am doing:
import requests
attempt = open('info.txt', 'r')
folders = attempt.readlines()
for line in folders:
host = 'http://10.0.1.66/mutillidae'+line
attempt = requests.get(host)
print attempt
This results in a string of 404 errors. If I take the loop out, and try each one individually, I get a 200 response back showing that it is up and running.
I have also printed out the address using the same loop against the text document that contains the correct folders, and the addresses seem to look fine which I verified through copy and pasting. I have tried this with a file containing multiple folders and a single folder listed, and always get a 404 when attempting to read from the file.
The info.txt file contains the following:
/passwords/
/classes/
/javascript/
/config
/owasp-esapi-php/
/documentation/
Any advice is appreciated.
Lines returned by file.readlines() contain trailing newlines, which you must remove before passing them to requests.get. Replace the statement:
host = 'http://10.0.1.66/mutillidae'+line
with:
host = 'http://10.0.1.66/mutillidae' + line.rstrip()
and the problem will go away.
Note that your code would be easier to read if you refrained from using the same generic variable name such as attempt for different purposes, all in the same scope. Also, one should try to use variable names that reflect their usageāfor example, host would be better named url, as it doesn't hold the host name, but the entire URL.
I see that you can set where to download a file to through Webdriver, as follows:
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir",getcwd())
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/csv")
browser = webdriver.Firefox(firefox_profile=fp)
But, I was wondering if there is a similar way to give the file a name when it is downloaded? Preferably, probably not something that is associated with the profile, as I will be downloading ~6000 files through one browser instance, and do not want to have to reinitiate the driver for each download.
I would suggest a little bit strange way: do not download files with the use of Selenium if possible.
I mean get the file URL and use urllib library to download the file and save it to disk in a 'manual' way. The issue is that selenium doesn't have a tool to handle Windows dialogs, such as 'save as' dialog. I'm not sure, but I doubt that it can handle any OS dialogs at all, please correct me I'm wrong. :)
Here's a tiny example:
import urllib
urllib.urlretrieve( "http://www.yourhost.com/yourfile.ext", "your-file-name.ext")
The only job for us here is to make sure that we handle all the urllib Exceptions. Please see http://docs.python.org/2/library/urllib.html#urllib.urlretrieve for more info.
I do not know if there is a pure Selenium handler for this, but here is what I have done when I needed to do something with the downloaded file.
Set a loop that polls your download directory for the latest file that does not have a .part extension (this indicates a partial download and would occasionally trip things up if not accounted for. Put a timer on this to ensure that you don't go into an infinite loop in the case of timeout/other error that causes the download not to complete. I used the output of the ls -t <dirname> command in Linux (my old code uses commands, which is deprecated so I won't show it here :) ) and got the first file by using
# result = output of ls -t
result = result.split('\n')[1].split(' ')[-1]
If the while loop exits successfully, the topmost file in the directory will be your file, which you can then modify using os.rename (or anything else you like).
Probably not the answer you were looking for, but hopefully it points you in the right direction.
Solution with code as suggested by the selected answer. Rename the file after each one is downloaded.
import os
os.chdir(SAVE_TO_DIRECTORY)
files = filter(os.path.isfile, os.listdir(SAVE_TO_DIRECTORY))
files = [os.path.join(SAVE_TO_DIRECTORY, f) for f in files] # add path to each file
files.sort(key=lambda x: os.path.getmtime(x))
newest_file = files[-1]
os.rename(newest_file, docName + ".pdf")
This answer was posted as an edit to the question naming a file when downloading with Selenium Webdriver by the OP user1253952 under CC BY-SA 3.0.
I know similar questions have been asked before, but I've tried many times and it still doesn't work for me.
I only have a default profile in firefox (called c1r3g2wi.default) and no other profiles. I want my firefox browser to start with this profile when I launch it using the selenium webdriver. How do I do this in Python?
I did this:
fp = webdriver.FirefoxProfile('C:\Users\admin\AppData\Roaming\Mozilla\Firefox\Profiles\c1r3g2wi.default')
browser = webdriver.Firefox(fp)
But I got an error:
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect:
'C:\\Users\x07dmin\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\c1r3g2wi.default/*.*'
Help, or pointers in the right direction, would be very much appreciated.
Ok, I just solved this by simply changing all the slashes in my file path from "\" to "/".
Never knew this would make a difference.
C:/Users/admin/AppData/Roaming/Mozilla/Firefox/Profiles/c1r3g2wi.default
Moreover, you can use double backslashes in the path:
fp = webdriver.FirefoxProfile('C:\\Users\\admin\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\c1r3g2wi.default')
browser = webdriver.Firefox(fp)