This is the code immediately before I get stuck:
#Start a new post
driver.find_element_by_xpath("//span[normalize-space()='Start a post']").click()
time.sleep(2)
driver.find_element_by_xpath("//li-icon[#type='image-icon']//*[name()='svg']").click()
time.sleep(2)
The above code works well. The next step though has me puzzled.
I would like to upload the latest image file from my downloads folder. When I click on the link above in LinkedIn it navigates to my user folder (Melissa). (see image)
So... I'm looking for the next lines of code to navigate from the default folder to the Downloads folder (C:\Users\Melissa\Downloads), then, select the newest file, then click 'open' so it attaches to the LinkedIn post.
You can upload the image using this method:
import getpass, glob, os
# Replace this with the code to get the upload input tag
upload_button = driver.find_element_by_xpath("//input[#type='file']")
# Get downloads
downloads = glob.glob("C:\\Users\\{}\\Downloads\\*".format(getpass.getuser()))
# Find the latest downloaded file
latest_download = max(downloads, key=os.path.getctime)
# Enter it into the upload input tag
upload_button.send_keys(latest_download)
Related
I'm trying to download and rename files(around 60 per page) using selenium and hit a hard bump.
Here is what I have tried:
1.try to use the solution offered by supputuri, go through the chrome://downloads download manager, I used the code provided but encountered 2 issues: the opened tab does not close properly(which I can fix), most importantly, the helper function provided keeps returning 'None' as file name despite the fact that I can find the files downloaded in my download directory. This approach can work but prolly need some modification in the chrome console command part which I have no knowledge with.
# method to get the downloaded file name
def getDownLoadedFileName(waitTime):
driver.execute_script("window.open()")
# switch to new tab
driver.switch_to.window(driver.window_handles[-1])
# navigate to chrome downloads
driver.get('chrome://downloads')
# define the endTime
endTime = time.time()+waitTime
while True:
try:
# get downloaded percentage
downloadPercentage = driver.execute_script(
"return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value")
# check if downloadPercentage is 100 (otherwise the script will keep waiting)
if downloadPercentage == 100:
# return the file name once the download is completed
return driver.execute_script("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content #file-link').text")
except:
pass
time.sleep(1)
if time.time() > endTime:
break
Selenium give file name when downloading
the second approach I looked at is offered by Red from the post below. I figured since I'm downloading 1 file at a time, I could always find the most recent file and then change the file name after download completes and repeat this process. For this approach I have the following issue: once I grabbed the file object, I cannot seem to find a way to get the file name, I checked the python methods for file object and it doesn't have one that returns the name of the file.
import os
import time
def latest_download_file(num_file,path):
os.chdir(path)
while True:
files = sorted(os.listdir(os.getcwd()), key=os.path.getmtime)
#wait for file to be finish download
if len(files) < num_file:
time.sleep(1)
print('waiting for download to be initiated')
else:
newest = files[-1]
if ".crdownload" in newest:
time.sleep(1)
print('waiting for download to complete')
else:
return newest
python selenium, find out when a download has completed?
Let me know if you have any suggestions. Thanks.
The second approach worked, just download, monitor the directory and use os.rename after download finishes.
Can I make an excel file open in the browser to be viewed instead of being downloaded, and have a download button, similar to how PDFs are?
I'm using Python Flask, for PDFs I do:
#blueprint_name.route("/download_some_pdf", methods=["GET"])
def download_pdf():
return send_file(file_path, cache_timeout=1)
This opens the PDF in a browser tab with the download PDF button
and for Excel files:
#blueprint_name.route("/download_some_xlsx", methods=["GET"])
def download_xlsx():
return send_from_directory(dir_path, filename, as_attachment=True, cache_timeout=1)
If for the Excel I remove the as_attachment parameter, or I use send_file instead of send_from_directory, it still downloads the file but with the name of the method ("download_xlsx") instead of the filename!!
I'm using Python 3.8.3 and Flask 1.1.2
So far as I know, whether to open the file depends on the client browser. I have encountered a browser who could not read the pdf and download it directly. So once you set the file to have the "GET" property it is upon the receiver to open it or simply save it in directory. Moreover, the "open pdf" only happens after the file is already fully downloaded in Temporary directory set by the browser.
I am writing an script that will upload file from my local machine to an webpage. This is the url: https://tutorshelping.com/bulkask and there is a upload option. but i am trouble not getting how to upload it.
my current script:
import webbrowser, os
def fileUploader(dirname):
mydir = os.getcwd() + dirname
filelist = os.listdir(mydir)
for file in filelist:
filepath = mydir + file #This is the file absolte file path
print(filepath)
url = "https://tutorshelping.com/bulkask"
webbrowser.open_new(url) # open in default browser
webbrowser.get('firefox').open_new_tab(url)
if __name__ == '__main__':
dirname = '/testdir'
fileUploader(dirname)
A quick solution would be to use something like AppRobotic personal macro software to either interact with the Windows pop-ups and apps directly, or just use X,Y coordinates to move the mouse, click on buttons, and then to send keyboard keys to type or tab through your files.
Something like this would work when tweaked, so that it runs at the point when you're ready to click the upload button and browse for your file:
import win32com.client
x = win32com.client.Dispatch("AppRobotic.API")
import webbrowser
# specify URL
url = "https://www.google.com"
# open with default browser
webbrowser.open_new(url)
# wait a bit for page to open
x.Wait(3000)
# use UI Item Explorer to find the X,Y coordinates of Search box
x.MoveCursor(438, 435)
# click inside Search box
x.MouseLeftClick
x.Type("AppRobotic")
x.Type("{ENTER}")
I don't think the Python webbrowser package can do anything else than open a browser / tab with a specific url.
If I understand your question well, you want to open the page, set the file to upload and then simulate a button click. You can try pyppeteer for this.
Disclaimer: I have never used the Python version, only the JS version (puppeteer).
I am using selenium to login a page, and download some tiff files,
now i have a variable downloadurl, it contains an array of url links which i scraped from the website. now i am using the below code to download files:
driver = webdriver.Chrome();
driver.get(downloadurl)
I do get all files downloaded but with no names, eg. img(1), img(2) ...
Now my problem is: I want driver.get(downloadurl) download files one by one according to downloadurl array sequence, and rename the file right after it is downloaded according to title variable which is an array, then download the next file, and rename...
P.S. I avoid to use requests because the login procedure is very complicated and requires authorization cookies.
Many thanks for the help!
To elaborate on my comment:
import os
import time
for downloadlink, uniqueName in my_list_of_links_and_names:
driver = webdriver.Chrome();
driver.get(downloadurl)
time.sleep(5) # give it time to download (not sure if this is necessary)
# the file is now downloaded
os.rename("img(1).png", uniqueName) # the name is now changed
This will work assuming that "img(1).png" will be renamed and then the next download will come in as "img(1).png" yet again.
The hardest part would be making my_list_of_links_and_names but if you have the data in separate lists, just zip() them together. You can also generate your own title every loop based on some criteria...
First we will create a function (Rename_file) that renames the downloaded image from its folder.
def Rename_file(new_name, Dl_path): #Renames Downloaded Files in the path
filename = max([f for f in os.listdir(Dl_path)])
if 'image.png' in filename: #Finds 'image.png' name in said path
time.sleep(2) #you can change the value in here depending on your requirements
os.rename(os.path.join(Dl_path, filename), os.path.join(Dl_path, new_name+'.png')) #can be changed to .jpg etc
Then we Apply this function in array of url links:
for link in downloadurl: #Will get each link in download url array
for new_name in title:
driver.get(link) #download the said image in link
Rename_file(new_name,Dl_path)
Sample code:
downloadurl = ['www.sample2.com','www.sample2.com']
Dl_path = "//location//of//image_downloaded"
title = ['Title 1', 'Title 2']
def Rename_file(new_name, Dl_path):
filename = max([f for f in os.listdir(Dl_path)])
if 'image.png' in filename:
time.sleep(2)
os.rename(os.path.join(Dl_path, filename), os.path.join(Dl_path, new_name+'.png'))
for new_name in title:
for link in downloadurl:
driver.get(link)
time.sleep(2)
Rename_file(new_name,Dl_path)
I'm quite sure on the Rename function I created but I haven't really tested this with an array of url links since I really can't think of where could I test it. Hopefully this works on you. Please let me know :-)
I'm using a script to download images from a website and up until recently it has been working fine. Of note the page is https, but per the urllib docs that shouldn't be an issue. It first requests the page and uses a regex to pull the download links from the page. From there the script goes into a loop to download the file and the inner loop looks like so:
dllink = m[0].replace('\">Download','')
print dllink
#m = re.findall('[a-f0-9]+.[\w]+',dllink)
extension = re.findall('.[\w]+$',dllink)[0]
fname = post_id + extension
urllib.urlretrieve(dllink,cpath + "/" + fname)
printLine(post_id + " ")
delay = random.uniform(32.0,64.0)
dlcount = dlcount + 1
time.sleep(delay)
Again, it downloads a file, but the files I'm downloading are on the order of 200k-4m and every file has started returning 4k. I've copy-pasted the download links into browsers and it pulls the right image and wget also downloads it just fine, so I'm not sure what's wrong with my code where it's only downloading 4k of the file. If this is a server side issue is there a way to call wget from python to accomplish the same thing without urlretreive? Thanks in advance for any help!