Selenium Python simple automation task - python

I have a truckload of trace files I'm trying to catalog. The idea is to open each one with "chrome://tracing" then save a screenshot. Screenshots are easy to catalog.
Here is the process:
start chrome = works
open "chrome://tracing" = works
open file <== missing part <- I need help with
save screenshot = works
There are 2 ways to open the file in chrome://tracing:
a) - use the "load" button, navigate to file and open
Update: I was able to locate and click on the "Load" button using Selenium
Now - need to handle the file open / loading ??
b) - drag and drop a trace file to the main part of the window - opens it
[ no idea how to do this..]
Here is the actual code I have so far:
from selenium import webdriver
driver = webdriver.Chrome() # Optional argument, if not specified will search path
driver.get("chrome://tracing");
time.sleep(2) # Let the user actually see something
# Find load button
# or drop file to main window ?
# Send the file location to the button
file_location = 'C:\........json'
driver.send_keys(file_location) # don't know where to sent it :: idea from https://towardsdatascience.com/controlling-the-web-with-python-6fceb22c5f08
time.sleep(15) # some files are big - might take 15 seconds to load
date_stamp = str(datetime.datetime.now()).split('.')[0]
date_stamp = date_stamp.replace(" ", "_").replace(":", "_").replace("-", "_")
file_name = date_stamp + ".png"
driver.save_screenshot(file_name)
After some research and trial and error here is my final(?) working code
located "Load" button and opened the file Open dialog
used pywinauto to take care communication with the Open dialog
saved a screenshot - using a unique filename generated from datestamp
import time
from selenium import webdriver
from pywinauto.application import Application
import datetime
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
driver = webdriver.Chrome(chrome_options=options)
driver.get("chrome://tracing");
time.sleep(2)
# Find load button
sdomele = driver.find_element_by_tag_name("tr-ui-timeline-view")
ele = driver.execute_script("return arguments[0].shadowRoot;",sdomele)
button_found = ele.find_element_by_id("load-button")
button_found.click() # let's load that file
time.sleep(3)
# here comes the pywinauto part to take care communication with the Open file dialog
app = Application().connect(title='Open') # connect to an existing window
dlg = app.window(title='Open') # communicate with this window
#file_location = os.path.join(submission_dir, folder, file_name)
file_location = "C:\\FILES2OPEN\\file01.json"
app.dlg.type_keys(file_location) # txt goes to the "File Name" box
time.sleep(2) #type is slow - this is just for safety
app.dlg.OpenButton.click() # click the open button
time.sleep(15) # some files might be big
# generate filename based on current time
date_stamp = str(datetime.datetime.now()).split('.')[0]
date_stamp = date_stamp.replace(" ", "_").replace(":", "_").replace("-", "_")
file_name = date_stamp + ".png"
driver.save_screenshot(file_name) # save screenshot (just the "inner" part of the browser window / not a full screenshot)
time.sleep(2)
driver.quit()

The reason you were not able to find the load button is because its present in a shadow dom. So first you need to find the shadow dom using execute_script ,then locate the "load" button as usual. The following code worked for me :
sdomele = _driver.find_element_by_tag_name("tr-ui-timeline-view")
ele = _driver.execute_script("return arguments[0].shadowRoot;",sdomele)
ele.find_element_by_id("load-button").click()

Related

How to open the latest downloaded file

This is the code immediately before I get stuck:
#Start a new post
driver.find_element_by_xpath("//span[normalize-space()='Start a post']").click()
time.sleep(2)
driver.find_element_by_xpath("//li-icon[#type='image-icon']//*[name()='svg']").click()
time.sleep(2)
The above code works well. The next step though has me puzzled.
I would like to upload the latest image file from my downloads folder. When I click on the link above in LinkedIn it navigates to my user folder (Melissa). (see image)
So... I'm looking for the next lines of code to navigate from the default folder to the Downloads folder (C:\Users\Melissa\Downloads), then, select the newest file, then click 'open' so it attaches to the LinkedIn post.
You can upload the image using this method:
import getpass, glob, os
# Replace this with the code to get the upload input tag
upload_button = driver.find_element_by_xpath("//input[#type='file']")
# Get downloads
downloads = glob.glob("C:\\Users\\{}\\Downloads\\*".format(getpass.getuser()))
# Find the latest downloaded file
latest_download = max(downloads, key=os.path.getctime)
# Enter it into the upload input tag
upload_button.send_keys(latest_download)

Selenium Python download and rename file

I'm trying to download and rename files(around 60 per page) using selenium and hit a hard bump.
Here is what I have tried:
1.try to use the solution offered by supputuri, go through the chrome://downloads download manager, I used the code provided but encountered 2 issues: the opened tab does not close properly(which I can fix), most importantly, the helper function provided keeps returning 'None' as file name despite the fact that I can find the files downloaded in my download directory. This approach can work but prolly need some modification in the chrome console command part which I have no knowledge with.
# method to get the downloaded file name
def getDownLoadedFileName(waitTime):
driver.execute_script("window.open()")
# switch to new tab
driver.switch_to.window(driver.window_handles[-1])
# navigate to chrome downloads
driver.get('chrome://downloads')
# define the endTime
endTime = time.time()+waitTime
while True:
try:
# get downloaded percentage
downloadPercentage = driver.execute_script(
"return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value")
# check if downloadPercentage is 100 (otherwise the script will keep waiting)
if downloadPercentage == 100:
# return the file name once the download is completed
return driver.execute_script("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content #file-link').text")
except:
pass
time.sleep(1)
if time.time() > endTime:
break
Selenium give file name when downloading
the second approach I looked at is offered by Red from the post below. I figured since I'm downloading 1 file at a time, I could always find the most recent file and then change the file name after download completes and repeat this process. For this approach I have the following issue: once I grabbed the file object, I cannot seem to find a way to get the file name, I checked the python methods for file object and it doesn't have one that returns the name of the file.
import os
import time
def latest_download_file(num_file,path):
os.chdir(path)
while True:
files = sorted(os.listdir(os.getcwd()), key=os.path.getmtime)
#wait for file to be finish download
if len(files) < num_file:
time.sleep(1)
print('waiting for download to be initiated')
else:
newest = files[-1]
if ".crdownload" in newest:
time.sleep(1)
print('waiting for download to complete')
else:
return newest
python selenium, find out when a download has completed?
Let me know if you have any suggestions. Thanks.
The second approach worked, just download, monitor the directory and use os.rename after download finishes.

Python Uploading on webbrowser

I am writing an script that will upload file from my local machine to an webpage. This is the url: https://tutorshelping.com/bulkask and there is a upload option. but i am trouble not getting how to upload it.
my current script:
import webbrowser, os
def fileUploader(dirname):
mydir = os.getcwd() + dirname
filelist = os.listdir(mydir)
for file in filelist:
filepath = mydir + file #This is the file absolte file path
print(filepath)
url = "https://tutorshelping.com/bulkask"
webbrowser.open_new(url) # open in default browser
webbrowser.get('firefox').open_new_tab(url)
if __name__ == '__main__':
dirname = '/testdir'
fileUploader(dirname)
A quick solution would be to use something like AppRobotic personal macro software to either interact with the Windows pop-ups and apps directly, or just use X,Y coordinates to move the mouse, click on buttons, and then to send keyboard keys to type or tab through your files.
Something like this would work when tweaked, so that it runs at the point when you're ready to click the upload button and browse for your file:
import win32com.client
x = win32com.client.Dispatch("AppRobotic.API")
import webbrowser
# specify URL
url = "https://www.google.com"
# open with default browser
webbrowser.open_new(url)
# wait a bit for page to open
x.Wait(3000)
# use UI Item Explorer to find the X,Y coordinates of Search box
x.MoveCursor(438, 435)
# click inside Search box
x.MouseLeftClick
x.Type("AppRobotic")
x.Type("{ENTER}")
I don't think the Python webbrowser package can do anything else than open a browser / tab with a specific url.
If I understand your question well, you want to open the page, set the file to upload and then simulate a button click. You can try pyppeteer for this.
Disclaimer: I have never used the Python version, only the JS version (puppeteer).

Download a file with selenium on Heroku

I am attempting to download a file from a link, parse the file, then save specific data to my heroku database. I have successfully set up my selenium chrome webdriver and I am able to log in. Normally, when I get the url, it begins downloading automatically. I have set up a new directory for the file to be saved to on heroku. It does not appear to be here or anywhere.
I have tried different methods of setting the download directory, other methods of logging in to the website, and have functionally done it locally, but not in heroku production.
# importing libraries
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
import datetime
from datetime import timedelta
import os
import json
import csv
# temporary credentials to later be stored
# as env vars
user = "user"
psw = "pasw"
account = 'account'
# this is the directory to download the file
file_directory = os.path.abspath('files')
# making this directory the default chrome web driver directory
options = webdriver.ChromeOptions()
prefs = {
"download.default_directory": file_directory
}
options.add_experimental_option('prefs',prefs)
# setting up web driver
driver = webdriver.Chrome(chrome_options=options)
# logging in to pinterest
url_login = 'https://www.pinterest.com/login/?referrer=home_page'
driver.get(url_login)
username = driver.find_element_by_id("email")
username.send_keys(user)
password = driver.find_element_by_id("password")
password.send_keys(psw)
driver.find_element_by_id("password").send_keys(Keys.ENTER)
# sleep 20 sec so page loads fully
time.sleep(20)
# collect metrics for yesterday
yesterday = datetime.date.today() - datetime.timedelta(days=1)
yesterday = str(yesterday)
# download link for metrics
url = "https://analytics.pinterest.com/analytics/profile/" + account + "/export/?application=all&tab=impressions&end_date=" + yesterday + '&start_date=' + yesterday
driver.get(url)
# setting up file identification for pinterest CSV file
date = datetime.date.today() - datetime.timedelta(days=2)
date = str(date)[:10]
file_location = os.path.join(file_directory,'profile-'+account+'-impressions-all-'+date+'.csv')
# opening up file
test_list = []
with open(file_location,newline = '', encoding = 'utf-8') as f:
reader = csv.reader(f)
for row in reader:
test_list.append(row)
# gathering relevant metrics for yesterday
this_list = test_list[1:3]
# re-organizing metrics
this_dict = {}
i=0
while(i<len(this_list[0])):
this_dict[this_list[0][i]] = this_list[1][i]
i+=1
return(this_dict)
driver.close()
I expect that the get("https://analytics.pinterest.com/analytics/profile/" + account + "/export/?application=all&tab=impressions&end_date=" + yesterday + '&start_date=' + yesterday) will download the CSV to the directory I have specified. It does not. I have used heroku run bash and searched through to try to find it, but it does not work.
UPDATE I do NOT need to store the file permanently. I need to store it temporarily and parse it. I understand that on a dyno restart it will all be lost.
** UPDATE** I have done this with another method. I have passed the cookies and header to a requests session. I used a 'User-Agent' of a chrome browser on Linux. I then assigned the file to a variable (csv_file = s.get(url)). I split the lines up to an array. I then used an empty string and the .join() method to add each line to one massive string. I then parsed the string by identifiers that would normally separate the lines in a csv. I now have the relevant metrics.
The thing you're missing is that heroku run bash will start a different dyno, with no access to the filesystem of the one that downloaded the file.
It's fine to use the Heroku filesystem as temporary storage for actions within the same process. But if you need access to stored files from a separate process, you should use something else, eg S3.

Downloading csv file using selenium python and deleting it

Is it possible to Download a csv file using selenium python and then deleting it or just download the file temporary only?
This is the code i am using to download the csv file
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", os.getcwd())
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/csv,text/csv,application/pdfss, text/csv, application/excel")
fp.set_preference("pdfjs.disabled", True)
browser = webdriver.Firefox(firefox_profile=fp)
You can mention the path to your file and use os.remove() to remove/delete the file.
EDIT: If you wish to fetch the name of the file which you have downloaded (I don't think selenium has added this functionality yet) you can try checking the difference in the directory listing before and after downloading the file by using os.listdir().
import os
before = os.listdir('/home/jason/Downloads')
# Download the file using Selenium here
after = os.listdir('/home/jason/Downloads')
change = set(after) - set(before)
if len(change) == 1:
file_name = change.pop() #file name stored as string
else:
print "More than one file or no file downloaded"
I have added one line of code and the following solution works
import os
before = os.listdir('/home/jason/Downloads')
# Download the file using Selenium here
after = os.listdir('/home/jason/Downloads')
change = set(after) - set(before)
if len(change) == 1:
file_name = change.pop() #file name stored as string
os.remove(file_name)
else:
print "More than one file or no file downloaded"

Categories

Resources