Download Excel File via button on website with Python - python

I am currently working on a code that downloads an excel file from a website. The respective file is actually hidden behind an Export button (see website: https://www.amundietf.co.uk/retail/product/view/LU1437018838). However, I have already identified the link behind which is the following: https://www.amundietf.co.uk/retail/ezaap/service/ExportDataProductSheet/Fwd/en-GB/718/LU1437018838/object/export/repartition?idFonctionnel=export-repartition-pie-etf&exportIndex=3&hasDisclaimer=1. Since the link does not directly guide to the file but rather executes some Java widget, I am not able to download the file via python. I have tried the folling code:
import re
import requests
link = 'https://www.amundietf.co.uk/retail/ezaap/service/ExportDataProductSheet/Fwd/en-GB/718/LU1437018838/object/export/repartition?idFonctionnel=export-repartition-pie-etf&exportIndex=3&hasDisclaimer=1'
r = requests.get(link, verify= False)
However, I am not able to connect to the file. Does somebody has an idea for doing this?

I would recommend using HTML:
<html lang=en>
<body>
Click here to download
</body>
</html>
In the href attribute to tag, you can put the path to your own excel file. I used an external link to an example file I found on google. To open in new tab, use target="_blank" as attribute to .
Hope it works!

Related

trying to download full HTML pages

I am tring to download few hundreds of HTML pages in order to parse them and calculate some measures.
I tried it with linux WGET, and with a loop of the following code in python:
url = "https://www.camoni.co.il/411788/168022"
html = urllib.request.urlopen(url).read()
but the html file I got doen't contain all the content I see in the browser in the same page. for example text I see on the screen is not found in the HTML file. only when I right click the page in the browser and "Save As" i get the full page.
the problem - I need a big anount of pages and can not do it by hand.
URL example - https://www.camoni.co.il/411788/168022 - thelast number changes
thank you
That's because that site is not static. It uses JavaScript (in this example jQuery lib) to fetch additional data from server and paste on page.
So instead of trying to GET raw HTML you should inspect requests in developer tools. There's a POST request on https://www.camoni.co.il/ajax/tabberChangeTab with such data:
tab_name=tab_about
memberAlias=ד-ר-דינה-ראלט-PhD
currentURL=/411788/ד-ר-דינה-ראלט-PhD
And the result is HTML that pasted on page after.
So instead of trying to just download page you should inspect page and requests to get data or use headless browser such as Google Chrome to emulate 'Save As' button and save data.

How to get the download link of html a tag which has no explicit true link?

I encountered a web page that has many download sign like
If I click on each of these download sign, the browser will start downloading a zip file.
However, it seems that these download sign are just images with no explicit download links can be copied.
I looked into the source of html. I figured out each download sign belong to a tr tag block as below.
<tr>
<td title="aaa">
<span class="centerFile">
<img src="/images/downloadCenter/pic.png" />
</span>aaa
</td>
<td>2021-09-10 13:42</td>
<td>bbb</td>
<td><a id="4099823" data='{"clazzId":37675171,"libraryId":"98689831","relationId":1280730}' recordid="4099823" target="_blank" class="download_ic checkSafe" style="line-height:54px;"><img src="/images/down.png" /></a></td>
</tr>
Click this link will download a zip file with download link
So my problem is how to get download links of these download sign without actually clicking them in the browser. In particular, I want to know how to do this using python by analyzing the source html so I could to do batch downloading?
If you want to do the batch download of those files, and are not able to find out links by analysis of html and javascript (because it's probably javascript function that creates this link, or javascript call to backend) then you can use selenium to simulate you acting as user.
You will need to do something like code below, where I'm using class name from html you present, where I think is call to javascript download function:
from selenium import webdriver
driver = webdriver.Chrome()
# URL of website
url = "https://www.yourwebsitelinkhere.com/"
driver.get(url)
# use class name to find anchor link
download_links = driver.find_elements_by_css_selector(".download_ic.checkSafe")
for link in download_links:
link.click()
Example how it works for stackoverflow (in the day of writing this answer)
driver = webdriver.Chrome()
driver.get("https://stackoverflow.com")
elements = driver.find_elements_by_css_selector('.-marketing-link.js-gps-track')
elements[0].click()
And this should lead you to stackoverflow about site.
[EDIT] Answer edited, as it seems compound classes are not supported by selenium, example for stackoverflow added

Django FileResponse don't download HTML file

I'm trying to download an pre-generated HTML file, but all that i've tried doesn't work.
Searching StackOverflow i found that return FileResponse(open(file_path, 'rb')) will download a file, but intead of download, the HTML just is rendered on the tab. I think the problem is the browser receive the HTML and instead of display the "Save as" dialog just render it to the current tab.
In my main template i have a form with target="_blank" tag, and a submit button that open (without AJAX) a new tab who suposed to download automatically the file.
What i want: After i submit the code a new tab appears, the view related to that URL do some code (not related to the download) and after that process (that is working fine) download an HTML file to the device. The HTML exists and don't have any problem, te only problem it's that i want to DOWNLOAD the file but the browser display it instead.
Note 1: Yes, i know that with right clic -> download i can download the HTML that i see, but this system is for non IT people, i need to do it the easest way possible.
Note 2: I put the without AJAX message because i've found on another post that FileResponsive don't word from AJAX.
You should put special header in your response
Content-Disposition: attachment; filename="cool.html"
response = FileResponse(open(file_path, 'rb'))
response['Content-Disposition'] = 'attachment; filename="cool.html"'
return response

How to fix HTML downloading instead of image file

I'm trying to download a file from a link using urllib in Python 3.7 and it downloads the HTML file and not the Image File.
So I'm trying to receive information from a Google Form, the information is sent to a Google Sheet. I'm able to receive the information in the sheet no problem. However the Form requires an Image submission which appears in the sheet as a URL. (Example: https://drive.google.com/open?id=1YCBmEOz6_l7WDQw5t6AYBSb9B5XXKTuX)
This is my code:
import urllib.request
import random
Then I create a download function:
def downloader(image_url):
file_name = random.randrange(1,10000)
full_file_name = str(file_name) + '.png'
print(full_file_name)
urllib.request.urlretrieve(image_url,full_file_name)
I get the URL and isolate the ID of the image:
ImgId="https://drive.google.com/open?id=1Mp5XYoyyEfWJryz8ojLbHuZ6V0IzERIV"
ImgId=ImgId[33:]
Then I put the ID in a download link:
ImgId="https://drive.google.com/uc?authuser=0&id="+ImgId+"&export=download"
Which results in (in the above example) "https://drive.google.com/uc?authuser=0&id=1YCBmEOz6_l7WDQw5t6AYBSb9B5XXKTuX&export=download".
Next I run the download function:
downloader(ImgId)
So after this I expected the png file to be downloaded into the folder of the program, however it downloaded a html file of the google drive log-in page instead of an image file, or even an html file of the image. Noting that to view or download the image it requires you to be signed in to Google to download in the browser, could authorization be an issue?
(Note: If I manually paste the download link as generated by the program into my browser it downloads the image correctly)
(P.S I'm an absolute noob, so yeah)
(Thanks in advance for any answers)
Instead of using urllib for dowmloading, use requests and get the page contents using GET rest call and then convert the response content to soup content using beautifulsoup and then point to the content which you want to download, as the download function inside html would have a download link associated with it and then send a get request again with js download.
import requests
import bs4
response = requests.get(<your_url>)
soup = bs4.BeautifulSoup(response.content, 'html5lib')
# Get the download link and supply all the necessary values to the link
# Initiate Requests again

Download file associated to export button in webpage from terminal

I would like to download the file that is produced by clicking on the "EXCEL Document" button in the bottom right of this page from the terminal.
Is it possible to do that from a unix bash?
Also within R or using python would be ok.
http://www.vivc.de/index.php?r=eva-analysis-mikrosatelliten-vivc%2Fresultmsatvar&EvaAnalysisMikrosatellitenVivcSearch%5Bleitname_list%5D=&EvaAnalysisMikrosatellitenVivcSearch%5Bleitname_list%5D%5B%5D=ABADI&EvaAnalysisMikrosatellitenVivcSearch%5BName_in_bibliography%5D=
Thanks
The requests library may be what you're looking for. You'd need the URL to pass in to requests.get
import requests
r = requests.get('http://google.com')
r.raise_for_status() # Will error out if there's an issue with the get request
print(r.content)
outputs
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world\'s information, including webpages, images, videos and more. Google has many special features to help you find exactly what you\'re looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><scr...

Categories

Resources