Python selenium script acts weird with Crontab on Linux

Python selenium script acts weird with Crontab on Linux - python

Hi so I have this selenium script (using firefox and geckodriver on Raspian) that basically uses an external site to download the active stories for any given user on Instagram:
def download_user_stories(self, user_name):
driver = self.driver
driver.get("https://storiesig.com/stories/"+user_name)
time.sleep(2)
for i in range(1,50):
try:
xpath = "//div[#class='jsx-1407646540 container']//article[" + str(i) + "]//div[3]//a[1]"
print(xpath)
link_location = driver.find_element_by_xpath(xpath)
link = link_location.get_attribute('href')
current_time = datetime.datetime.now()
corrected_time = current_time.strftime("%Y-%b-%d")
if '.mp4' not in link:
extension = '.jpg'
else:
extension = '.mp4'
location = '/Users/"my name"/desktop/'+ user_name + '/' + corrected_time + '-' + str(i) + extension
print (location)
urllib.urlretrieve(link,location)
except Exception as e:
print("went to except")
print (e)
break
I initially ran it through terminal and it worked fine. Then I scheduled it through crontab on raspberry pi and it ran, outputting errors to a text file and I get this:
working on "friends name" now
//div[#class='jsx-1407646540 container']//article[1]//div[3]//a[1]
/home/pi/Desktop/InstaScraper/Script/"friends name"/2019-Feb-21-1.jpg
//div[#class='jsx-1407646540 container']//article[2]//div[3]//a[1]
went to except
Message: Unable to locate element: //div[#class='jsx-1407646540 container']//article[2]//div[3]//a[1]
So it finds the first link to download but doesnt manage to find the rest even though I confirmed for this particular user there was more to download. I also confirmed that the element it sais its unable to locate at the end is EXACTLY the XPath of the next element to download. So I'm baffled as to why its not finding a story exactly where it should be. The stranger thing is it no longer runs fine through terminal either even though it initially did. I dont get what changed or why it didnt work in crontab like it did in terminal.
Another interesting note is it downloads all the links properly for the first user in the User_Name_List.txt but not for the rest of them. (the function is called in a loop from another function in the class that has the username list)
I've googled this and thought about it and I just cant figure out whats wrong here.
Any help and explanation would be appreciated.
Note - you can go on storiesig.com and check out the relative XPath system for various active story (not the highlights) download links yourself if you think thats maybe where the issue is.

switching from relative XPaths to absolute XPaths solved this issue.

Related

Website returning response 200 in found directories

I am currently building a recursive URL fuzzer and I've found this peculiar problem that I can't find a solution online.
Currently, I've already built the fuzzer where when I find a directory that gives me a response 200 when it has found a working directory. The default, without the recursion function works well.
However, the program starts acting weirdly when I do the recursive function. For example, I have found a working directory http://localhost/website/images, using the program, I've appended a / to the directory to make it become http://localhost/website/images/ ,next, I repeat the function again, appending my wordlist to http://localhost/website/images/ to become for example, http://localhost/website/images/images.
The problem starts here, when the website receives http://localhost/website/images/images, it will return a response code of 200.
Even a directory like http://localhost/website/images/images/images/images/images/images/images/hello/hello/hello returns a response code of 200.
Is this a result of a security mechanism set in by the website?
My codes are as shown below :
def recursiveURL(URLtoTake):
with open('wordlist.txt') as f:
for line in f:
website = URLtoTake +"/" + line
URL = requests.get(website)
if URL.status_code == 200:
foundsites.append(website)
print "Found :" + website
recursiveURL(website)
else:
print "Not Found"

Python Webbrowser Opening URLs with Chrome instead of IE

I've been attempting to create a function that iterates over inputs from a text file that contains URLs, using webbrowser package. It works fine when I create a empty list to which URLs are literally appended, as in:
import webbrowser
list = []
list.append(url1)
list.append(url2)
def webbrowsing(list)
for i in range(0, len(list)):
webbrowser.open(list[i])
where url1 and url2 are any valid URLs. And webbrowser.open() opens the URLs in Chrome and it is really good.
However, when I try and do the same thing with inputs from a text file of URLs, webbrowser opens the URLs from the file in Internet Explorer. I gave it a try using webbrowser.get(), explicitly directing it to use Chrome, but that didn't work.
I am not very sure why it does not open the URLs in Chrome, when almost everything seems the same as when the list is used as mentioned above. Chrome is set as my default web browser, and I rarely use the IE. I'd really appreciate any tips on that issue.

How do you define the 'webbrowser' object? I use something like this:
driver = webdriver.Chrome(driverPath) #driverPath contains the path to the 'chromedriver.exe' file
driver.get(url)

Python Selenium: Firefox set_preference to overwrite files on download?

I am using these Firefox preference setting for selenium in Python 2.7:
ff_profile = webdriver.FirefoxProfile(profile_dir)
ff_profile.set_preference("browser.download.folderList", 2)
ff_profile.set_preference("browser.download.manager.showWhenStarting", False)
ff_profile.set_preference("browser.download.dir", dl_dir)
ff_profile.set_preference('browser.helperApps.neverAsk.saveToDisk', "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream")
With Selenium, I want to recurringly download the same file, and overwrite it, thus keeping the same filename – without me having to confirm the download.
With the settings above, it will download without asking for location, but all downloads will creates duplicates with the filename filename (1).ext, filename (2).ext etc in MacOS.
I'm guessing there might not be a setting to allow overwriting from within Firefox, to prevent accidents(?).
(In that case, I suppose the solution would be to handle the overwriting on the disk with other Python modules; another topic).

This is something that is out of the Selenium's scope and is handled by the operating system.
Judging by the context of this and your previous question, you know (or can determine from the link text) the filename beforehand. If this is really the case, before hitting the "download" link, make sure you remove the existing file:
import os
filename = "All-tradable-ETFs-ETCs-and-ETNs.xlsx" # or extract it dynamically from the link
filepath = os.path.join(dl_dir, filename)
if os.path.exists(filepath):
os.remove(filepath)

Naming a file when downloading with Selenium Webdriver

I see that you can set where to download a file to through Webdriver, as follows:
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir",getcwd())
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/csv")
browser = webdriver.Firefox(firefox_profile=fp)
But, I was wondering if there is a similar way to give the file a name when it is downloaded? Preferably, probably not something that is associated with the profile, as I will be downloading ~6000 files through one browser instance, and do not want to have to reinitiate the driver for each download.

I would suggest a little bit strange way: do not download files with the use of Selenium if possible.
I mean get the file URL and use urllib library to download the file and save it to disk in a 'manual' way. The issue is that selenium doesn't have a tool to handle Windows dialogs, such as 'save as' dialog. I'm not sure, but I doubt that it can handle any OS dialogs at all, please correct me I'm wrong. :)
Here's a tiny example:
import urllib
urllib.urlretrieve( "http://www.yourhost.com/yourfile.ext", "your-file-name.ext")
The only job for us here is to make sure that we handle all the urllib Exceptions. Please see http://docs.python.org/2/library/urllib.html#urllib.urlretrieve for more info.

I do not know if there is a pure Selenium handler for this, but here is what I have done when I needed to do something with the downloaded file.
Set a loop that polls your download directory for the latest file that does not have a .part extension (this indicates a partial download and would occasionally trip things up if not accounted for. Put a timer on this to ensure that you don't go into an infinite loop in the case of timeout/other error that causes the download not to complete. I used the output of the ls -t <dirname> command in Linux (my old code uses commands, which is deprecated so I won't show it here :) ) and got the first file by using
# result = output of ls -t
result = result.split('\n')[1].split(' ')[-1]
If the while loop exits successfully, the topmost file in the directory will be your file, which you can then modify using os.rename (or anything else you like).
Probably not the answer you were looking for, but hopefully it points you in the right direction.

Solution with code as suggested by the selected answer. Rename the file after each one is downloaded.
import os
os.chdir(SAVE_TO_DIRECTORY)
files = filter(os.path.isfile, os.listdir(SAVE_TO_DIRECTORY))
files = [os.path.join(SAVE_TO_DIRECTORY, f) for f in files] # add path to each file
files.sort(key=lambda x: os.path.getmtime(x))
newest_file = files[-1]
os.rename(newest_file, docName + ".pdf")
This answer was posted as an edit to the question naming a file when downloading with Selenium Webdriver by the OP user1253952 under CC BY-SA 3.0.

Using the default firefox profile with selenium webdriver in python

I know similar questions have been asked before, but I've tried many times and it still doesn't work for me.
I only have a default profile in firefox (called c1r3g2wi.default) and no other profiles. I want my firefox browser to start with this profile when I launch it using the selenium webdriver. How do I do this in Python?
I did this:
fp = webdriver.FirefoxProfile('C:\Users\admin\AppData\Roaming\Mozilla\Firefox\Profiles\c1r3g2wi.default')
browser = webdriver.Firefox(fp)
But I got an error:
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect:
'C:\\Users\x07dmin\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\c1r3g2wi.default/*.*'
Help, or pointers in the right direction, would be very much appreciated.

Ok, I just solved this by simply changing all the slashes in my file path from "\" to "/".
Never knew this would make a difference.
C:/Users/admin/AppData/Roaming/Mozilla/Firefox/Profiles/c1r3g2wi.default

Moreover, you can use double backslashes in the path:
fp = webdriver.FirefoxProfile('C:\\Users\\admin\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\c1r3g2wi.default')
browser = webdriver.Firefox(fp)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python selenium script acts weird with Crontab on Linux - python

switching from relative XPaths to absolute XPaths solved this issue.

Related

Website returning response 200 in found directories

Python Webbrowser Opening URLs with Chrome instead of IE

Python Selenium: Firefox set_preference to overwrite files on download?

Naming a file when downloading with Selenium Webdriver

Using the default firefox profile with selenium webdriver in python

Categories

Resources