I'm using a script to download images from a website and up until recently it has been working fine. Of note the page is https, but per the urllib docs that shouldn't be an issue. It first requests the page and uses a regex to pull the download links from the page. From there the script goes into a loop to download the file and the inner loop looks like so:
dllink = m[0].replace('\">Download','')
print dllink
#m = re.findall('[a-f0-9]+.[\w]+',dllink)
extension = re.findall('.[\w]+$',dllink)[0]
fname = post_id + extension
urllib.urlretrieve(dllink,cpath + "/" + fname)
printLine(post_id + " ")
delay = random.uniform(32.0,64.0)
dlcount = dlcount + 1
time.sleep(delay)
Again, it downloads a file, but the files I'm downloading are on the order of 200k-4m and every file has started returning 4k. I've copy-pasted the download links into browsers and it pulls the right image and wget also downloads it just fine, so I'm not sure what's wrong with my code where it's only downloading 4k of the file. If this is a server side issue is there a way to call wget from python to accomplish the same thing without urlretreive? Thanks in advance for any help!
Related
This is the code immediately before I get stuck:
#Start a new post
driver.find_element_by_xpath("//span[normalize-space()='Start a post']").click()
time.sleep(2)
driver.find_element_by_xpath("//li-icon[#type='image-icon']//*[name()='svg']").click()
time.sleep(2)
The above code works well. The next step though has me puzzled.
I would like to upload the latest image file from my downloads folder. When I click on the link above in LinkedIn it navigates to my user folder (Melissa). (see image)
So... I'm looking for the next lines of code to navigate from the default folder to the Downloads folder (C:\Users\Melissa\Downloads), then, select the newest file, then click 'open' so it attaches to the LinkedIn post.
You can upload the image using this method:
import getpass, glob, os
# Replace this with the code to get the upload input tag
upload_button = driver.find_element_by_xpath("//input[#type='file']")
# Get downloads
downloads = glob.glob("C:\\Users\\{}\\Downloads\\*".format(getpass.getuser()))
# Find the latest downloaded file
latest_download = max(downloads, key=os.path.getctime)
# Enter it into the upload input tag
upload_button.send_keys(latest_download)
I have a list of Url of files that open as Download Dialogue box with an option to save and open.
I'm using the python requests module to download the files. While using Python IDLE I'm able to download the file with the below code.
link = fileurl
r = requests.get(link,allow_redirects=True)
with open ("a.torrent",'wb') as code:
code.write(r.content)
But when I use this code along with for loop, the file which gets downloaded is corrupted or says unable to open.
for link in links:
name = str(links.index(link)) ++ ".torrent"
r = requests.get(link,allow_redirects=True)
with open (name,'wb') as code:
code.write(r.content)
If you are trying to download a video from a website, try
r = get(video_link, stream=True)
with open ("a.mp4",'wb') as code:
for chunk in r.iter_content(chunk_size=1024):
file.write(chunk)
I'm trying to download and rename files(around 60 per page) using selenium and hit a hard bump.
Here is what I have tried:
1.try to use the solution offered by supputuri, go through the chrome://downloads download manager, I used the code provided but encountered 2 issues: the opened tab does not close properly(which I can fix), most importantly, the helper function provided keeps returning 'None' as file name despite the fact that I can find the files downloaded in my download directory. This approach can work but prolly need some modification in the chrome console command part which I have no knowledge with.
# method to get the downloaded file name
def getDownLoadedFileName(waitTime):
driver.execute_script("window.open()")
# switch to new tab
driver.switch_to.window(driver.window_handles[-1])
# navigate to chrome downloads
driver.get('chrome://downloads')
# define the endTime
endTime = time.time()+waitTime
while True:
try:
# get downloaded percentage
downloadPercentage = driver.execute_script(
"return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value")
# check if downloadPercentage is 100 (otherwise the script will keep waiting)
if downloadPercentage == 100:
# return the file name once the download is completed
return driver.execute_script("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content #file-link').text")
except:
pass
time.sleep(1)
if time.time() > endTime:
break
Selenium give file name when downloading
the second approach I looked at is offered by Red from the post below. I figured since I'm downloading 1 file at a time, I could always find the most recent file and then change the file name after download completes and repeat this process. For this approach I have the following issue: once I grabbed the file object, I cannot seem to find a way to get the file name, I checked the python methods for file object and it doesn't have one that returns the name of the file.
import os
import time
def latest_download_file(num_file,path):
os.chdir(path)
while True:
files = sorted(os.listdir(os.getcwd()), key=os.path.getmtime)
#wait for file to be finish download
if len(files) < num_file:
time.sleep(1)
print('waiting for download to be initiated')
else:
newest = files[-1]
if ".crdownload" in newest:
time.sleep(1)
print('waiting for download to complete')
else:
return newest
python selenium, find out when a download has completed?
Let me know if you have any suggestions. Thanks.
The second approach worked, just download, monitor the directory and use os.rename after download finishes.
This is my first time posting on stack overflow, and I look forward to getting more involved with the community. I need to download, rename, and save many Excel files from an ASPX website, but I cannot access these Excel files directly via a URL (i.e., the URL does not end with "excelfilename.csv"). What I can do is go to a URL which initiates the download of the file. An example of the URL is below.
https://www.websitename.com/something/ASPXthing.aspx?ReportName=ExcelFileName&Date=SomeDate&reportformat=csv
The inputs that I want to vary via loops are "ExcelFileName" and "SomeDate". I know one can fetch these files with urllib when the Excel files can be accessed directly via a URL, but how can I do it with a URL like this one?
Thanks in advance for helping out!
Using the requests library, you can fetch the file and iterate over chunks to write to file
import requests
report_names = ["Filename1","Filename2"]
dates = ['2016-02-22','2016-02-23'] # as strings
for report_name in report_names:
for date in dates:
with open('%s_%s_fetched.csv' % (report_name.split('.')[0],date,), 'wb') as handle:
response = requests.get('https://www.websitename.com/something/ASPXthing.aspx?ReportName=%s&Date=%s&reportformat=csv' % (report_name,date,), stream=True)
if not response.ok:
# Something went wrong
for block in response.iter_content(1024):
handle.write(block)
Perhaps I'm misunderstanding how .torrent files work, but is there a way in python to download the actual referenced content the .torrent file points to that would be downloaded say using a torrent client such as utorrent, but from the shell/command line using python?
The following works for simply downloading a .torrent file and sure I could open the torrent client as well do download the .torrent, but I'd rather streamline the process in the command line. Can't seem to find much online about doing this...
torrent = torrentutils.parse_magnet(magnet)
infohash = torrent['infoHash']
session = requests.Session()
session.headers.update({'User-Agent': 'Mozilla/5.0'})
url = "http://torcache.net/torrent/" + infohash + ".torrent"
answer = session.get(url)
torrent_data = answer.content
buffer = BytesIO(torrent_data)
gz = gzip.GzipFile(fileobj = buffer)
output = open(torrent['name'], "wb")
output.write(torrent_data)
As far as I know i can't use libtorrent for python3 on a 64 bit windows os.
If magnet: links work in your web browser then a simple way to start a new torrent download from your Python script is to open the url using your web browser:
import webbrowser
webbrowser.open(magnet_link)
Or from the command-line:
$ python -m webbrowser "magnet:?xt=urn:btih:ebab37b86830e1ed624c1fdbb2c59a1800135610&dn=StackOverflow201508.7z"
The download is performed by your actual torrent client such as uTorrent.
BitTornado works on Windows and has a command line interface. Take a look at btdownloadheadless.py. But this written in Python 2.
http://www.bittornado.com/download.html