How to make a dataframe download through browser using python

How to make a dataframe download through browser using python - python

I have a function, which generates a dataframe, which I am exporting as an excel sheet, at the end of the function.
df.to_excel('response.xlsx')
This excel file is being saved in my working directory.
Now I'm hosting this in Streamlit on heroku as a web app, but I want this excel file to be downloaded in user's local disk (a normal browser download) once this function is called. Is there a way to do it ?

Snehan Kekre, from streamlit, wrote the following solution in this thread.
streamlit as st
import pandas as pd
import io
import base64
import os
import json
import pickle
import uuid
import re
def download_button(object_to_download, download_filename, button_text, pickle_it=False):
"""
Generates a link to download the given object_to_download.
Params:
------
object_to_download: The object to be downloaded.
download_filename (str): filename and extension of file. e.g. mydata.csv,
some_txt_output.txt download_link_text (str): Text to display for download
link.
button_text (str): Text to display on download button (e.g. 'click here to download file')
pickle_it (bool): If True, pickle file.
Returns:
-------
(str): the anchor tag to download object_to_download
Examples:
--------
download_link(your_df, 'YOUR_DF.csv', 'Click to download data!')
download_link(your_str, 'YOUR_STRING.txt', 'Click to download text!')
"""
if pickle_it:
try:
object_to_download = pickle.dumps(object_to_download)
except pickle.PicklingError as e:
st.write(e)
return None
else:
if isinstance(object_to_download, bytes):
pass
elif isinstance(object_to_download, pd.DataFrame):
#object_to_download = object_to_download.to_csv(index=False)
towrite = io.BytesIO()
object_to_download = object_to_download.to_excel(towrite, encoding='utf-8', index=False, header=True)
towrite.seek(0)
# Try JSON encode for everything else
else:
object_to_download = json.dumps(object_to_download)
try:
# some strings <-> bytes conversions necessary here
b64 = base64.b64encode(object_to_download.encode()).decode()
except AttributeError as e:
b64 = base64.b64encode(towrite.read()).decode()
button_uuid = str(uuid.uuid4()).replace('-', '')
button_id = re.sub('\d+', '', button_uuid)
custom_css = f"""
<style>
#{button_id} {{
display: inline-flex;
align-items: center;
justify-content: center;
background-color: rgb(255, 255, 255);
color: rgb(38, 39, 48);
padding: .25rem .75rem;
position: relative;
text-decoration: none;
border-radius: 4px;
border-width: 1px;
border-style: solid;
border-color: rgb(230, 234, 241);
border-image: initial;
}}
#{button_id}:hover {{
border-color: rgb(246, 51, 102);
color: rgb(246, 51, 102);
}}
#{button_id}:active {{
box-shadow: none;
background-color: rgb(246, 51, 102);
color: white;
}}
</style> """
dl_link = custom_css + f'<a download="{download_filename}" id="{button_id}" href="data:application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;base64,{b64}">{button_text}</a><br></br>'
return dl_link
vals= ['A','B','C']
df= pd.DataFrame(vals, columns=["Title"])
filename = 'my-dataframe.xlsx'
download_button_str = download_button(df, filename, f'Click here to download {filename}', pickle_it=False)
st.markdown(download_button_str, unsafe_allow_html=True)
I'd recommend searching the thread on that discussion forum. There seem to be at least 3-4 alternatives to this code.

Mark Madson has this workaround posted on github. I lifted it from the repo and am pasting here as an answer.
import base64
import pandas as pd
import streamlit as st
def st_csv_download_button(df):
csv = df.to_csv(index=False) #if no filename is given, a string is returned
b64 = base64.b64encode(csv.encode()).decode()
href = f'Download CSV File'
st.markdown(href, unsafe_allow_html=True)
usage:
st_csv_download_button(my_data_frame)
right click + save-as.
I think you can do the same by doing to_excel instead of to_csv.

Related

How to send prettytable through email using python

I have created a prettytable in python and I have to send the output of prettytable through email
env = "Dev"
cost = 25.3698
line = [env, "${:,.2f}".format(cost)]
totalcostofenv = PrettyTable(['Environment', 'Cost'])
totalcostofenv.add_row(line)
print(totalcostofenv)
Below attached is the output :
Table Output
Can anyone help me to solve this?
This was my question asked and I found an solution , Below displayed is my code:
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
import os
from prettytable import PrettyTable
env = "Dev"
cost = 25.3698
line = [env, "${:,.2f}".format(cost)]
totalcostofenv = PrettyTable(['Environment', 'Cost'])
totalcostofenv.add_row(line)
print(totalcostofenv)
print(totalcostofenv.get_html_string())
def trigger_email():
my_message = totalcostofenv.get_html_string()
text = "Hi!"
html = """\
<html>
<head>
<style>
table, th, td {
border: 1px solid black;
border-collapse: collapse;
}
th, td {
padding: 5px;
text-align: left;
}
</style>
</head>
<body>
<p>Cost Usage of Plantd Environemnts<br>
%s
</p>
</body>
</html>
""" % (my_message)
part1 = MIMEText(text, 'plain')
part2 = MIMEText(html, 'html')
msg = MIMEMultipart()
from_addr = "from-address"
mail_password = os.environ.get('gmail-pass')
to_addr = "to-address"
msg.attach(part1)
msg.attach(part2)
try:
smtp = smtplib.SMTP('smtp.gmail.com',587)
smtp.starttls()
smtp.login(from_addr , mail_password)
smtp.sendmail(from_addr , to_addr , msg.as_string())
print('Mail sent')
except:
print('Mail not sent')
trigger_email()

You can use MJML templating like this
<mjml>
<mj-head>
<mj-title>Set the title, usually for accessibility tools</mj-title>
<mj-preview>Set inbox preview text here, otherwise it might be something nonsensical</mj-preview>
<mj-attributes>
<mj-all font-family="Helvetica, Arial, sans-serif"></mj-all>
<mj-text font-weight="400" font-size="16px" color="#4A4A4A" line-height="24px" />
<mj-section padding="0px"></mj-section>
</mj-attributes>
</mj-head>
<mj-body>
{{table}}
</mj-body>
</mjml>
Code:
import pystache
# read in the email template, remember to use the compiled HTML version!
email_template = (Path() / 'email_template.html').read_text()
# Logic
env = "Dev"
cost = 25.3698
line = [env, "${:,.2f}".format(cost)]
totalcostofenv = PrettyTable(['Environment', 'Cost'])
totalcostofenv.add_row(line)
# Pass in values for the template using a dictionary
template_params = {'table': totalcostofenv }
# Attach the message to the Multipart Email
final_email_html = pystache.render(email_template, template_params)
message.attach(MIMEText(final_email_html), 'html')
"""Continue with sending..."""

Pycharm python AttributeError: module 'urllib3' has no attribute 'Request'

When I run script I get this error:
AttributeError: module 'urllib3' has no attribute 'Request'
How to install/import urllib properly to run this script?
Here's some code:
import urllib3
import csv
import re
from bs4 import BeautifulSoup
rank_page = 'https://socialblade.com/youtube/top/50/mostviewed'
request = urllib3.Request(rank_page, headers={'User-Agent': 'your user-agent'})
page = urllib3.urlopen(request)
soup = BeautifulSoup(page, 'html.parser')
channels = soup.find('div', attrs={'style': 'float: right; width: 900px;'}).find_all('div', recursive=False)[4:]
file = open('topyoutubers.csv', 'wb')
writer = csv.writer(file)
# write title row
writer.writerow(['Username', 'Uploads', 'Views'])
for channel in channels:
username = channel.find('div', attrs={'style': 'float: left; width: 350px; line-height: 25px;'}).a.text.strip()
uploads = channel.find('div', attrs={'style': 'float: left; width: 80px;'}).span.text.strip()
views = channel.find_all('div', attrs={'style': 'float: left; width: 150px;'})[1].span.text.strip()
print
username + ' ' + uploads + ' ' + views
writer.writerow([username.encode('utf-8'), uploads.encode('utf-8'), views.encode('utf-8')])
file.close()

It seem attribute 'Request' only work in Python2. Just use:
import urllib.request
urllib.request.urlopen(url, headers)

How do you pull text in dd attribute with no name using selenium and python?

I have a webpage with a dd value. Here is the code:
<dd itemprop="youSave"
class="saleprice pricingSummary-priceList-value"
style="width: 50%; height: 20px; text-align: left; padding-left: 8px; line-height: 16px;">
$467.99</dd>
How do I pull the price ($467.99) using selenium and python and also remove the dollar sign?
Here is my current code which pulls the price_saved but can't get attribute:
try:
while True:
price_saved = driver.find_elements_by_xpath("//dl[#class='dl']//dd")
print("ok")
saved_print = price_saved.get_attribute("*Don't know what to put here*")
print("k")
print(saved_print)
i = i + 1
except:
print("failed")

Here are some possibilities depending on overall html
#class selector
driver.find_element_by_css_selector('.pricingSummary-priceList-value').text.replace('$','')
#attribute selector
driver.find_element_by_css_selector('[itemprop=youSave]').text.replace('$','')
#combined
driver.find_element_by_css_selector('[itemprop=youSave].pricingSummary-priceList-value').text.replace('$','')
#combined with type selector
driver.find_element_by_css_selector('dd[itemprop=youSave].pricingSummary-priceList-value').text.replace('$','')
Example with list comprehension if multiple matches wanted (find_elements returns a list):
[i.text.replace('$','') for i in driver.find_elements_by_css_selector('[itemprop=youSave]')]
class is faster than type which is faster than attribute. Shorter combinations can be faster than long dependant to some extent on rules given before and ordering but that is beyond scope of here.
I have used a single class of the multi-valued class. You can add in other class
driver.find_element_by_css_selector('.saleprice.pricingSummary-priceList-value').text.replace('$','')

Error with encode / decode in html generated for outlook

I have 2 functions, the first prepares the html and writes to a .txt file. The second function opens this file and manages an email through Outlook.
In the body of the message, the content of this html will be placed with the correct formatting. Everything goes perfectly, the .txt comes with the html without any errors, but when Outlook is opening it, it is closed and the Error / Exception is generated below:
'ascii' codec can't encode character u'\xe7' in position 529: ordinal not in range(128)
I know that this \xe7 is ç, but I can not solve it, I've already tried to define it by .decode("utf-8") and .encode("utf-8"), in the variable email_html_leitura, but the error of codec persists. Follow the code of the 2 functions to see if I did something wrong:
Function 1:
import sys
import codecs
import os.path
def gerar_html_do_email(self):
texto_solic = u'Solicitação Grupo '
with codecs.open('html.txt', 'w+', encoding='utf8') as email_html:
try:
for k, v in self.dicionario.iteritems():
email_html.write('<h1>'+k+'</h1>'+'\n')
for v1 in v:
if (v1 in u'Consulte o documento de orientação.') or (v1 in u'Confira o documento de orientação.'):
for x, z in self.tit_nome_pdf.iteritems():
if x in k:
email_html.write('<a href='+'%s/%s'%(self.hiperlink,z+'>')+'Mais detalhes'+'</a>'+'\n')
else:
email_html.write('<p>'+v1+'</p>'+'\n')
email_html.write('<p><i>'+texto_solic+'</i></p>'+'\n')
email_html.close()
except Exception as erro:
self.log.write('gerar_html_para_o_email: \n%s\n'%erro)
Function 2:
def gerar_email(self):
import win32com.client as com
try:
outlook = com.Dispatch("Outlook.Application")
mail = outlook.CreateItem(0)
mail.To = u"Lista Liberação de Versões Sistema"
mail.CC = u"Lista GCO"
mail.Subject = u"Atualização Semanal Sistema Acrool"
with codecs.open('html.txt', 'r+', encoding='utf8') as email_html_leitura:
mail.HTMLBody = """
<html>
<head></head>
<body>
<style type=text/css>
h1{
text-align: center;
font-family: "Arial";
font-size: 1.1em;
font-weight: bold;
}
p{
text-align: justify;
font-family: "Arial";
font-size: 1.1em;
}
a{
font-family: "Arial";
font-size: 1.1em;
}
</style>
%s
</body>
</html>
"""%(email_html_leitura.read().decode("utf-8"))
email_html_leitura.close()
mail.BodyFormat = 2
mail.Display(True)
except Exception as erro:
self.log.write('gerar_email: \n%s\n'%erro)

Extract URL from webpage and save to disk

I am trying to write a script to automaotmcally query sci-hub.io with an article's title and save a PDF copy of the articles full text to my computer with a specific file name.
To do this I have written the following code:
url = "http://sci-hub.io/"
data = read_csv("C:\\Users\\Sangeeta's\\Downloads\\distillersr_export (1).csv")
for index, row in data.iterrows():
try:
print('http://sci-hub.io/' + str(row['DOI']))
res = requests.get('http://sci-hub.io/' + str(row['DOI']))
print(res.content)
except:
print('NO DOI: ' + str(row['ref']))
This opens a CSV file with a list of DOI's and names of the file to be saved. For each DOI, it then queries sci-hub.io for the full-text. The presented page embeds the PDF in however I am now unsure how to extract the URL for the PDF and save it to disk.
An example of the page can be seen in the image below:
In this image, the desired URL is http://dacemirror.sci-hub.io/journal-article/3a257a9ec768d1c3d80c066186aba421/pajno2010.pdf.
How can I automatically extract this URL and then save the PDF file to disk?
When I print res.content, I get this:
b'<!DOCTYPE html>\n<html>\n <head>\n <title></title>\n <meta charset="UTF-8">\n <meta name="viewport" content="width=device-width">\n </head>\n <body>\n <style type = "text/css">\n body {background-color:#F0F0F0}\n div {overflow: hidden; position: absolute;}\n #top {top:0;left:0;width:100%;height:50px;font-size:14px} /* 40px */\n #content {top:50px;left:0;bottom:0;width:100%}\n p {margin:0;padding:10px}\n a {font-size:12px;font-family:sans-serif}\n a.target {font-weight:normal;color:green;margin-left:10px}\n a.reopen {font-weight:normal;color:blue;text-decoration:none;margin-left:10px}\n iframe {width:100%;height:100%}\n \n p.agitation {padding-top:5px;font-size:20px;text-align:center}\n p.agitation a {font-size:20px;text-decoration:none;color:green}\n\n .banner {position:absolute;z-index:9999;top:400px;left:0px;width:300px;height:225px;\n border: solid 1px #ccc; padding: 5px;\n text-align:center;font-size:18px}\n .banner img {border:0}\n \n p.donate {padding:0;margin:0;padding-top:5px;text-align:center;background:green;height:40px}\n p.donate a {color:white;font-weight:bold;text-decoration:none;font-size:20px}\n\n #save {position:absolute;z-index:9999;top:180px;left:8px;width:210px;height:36px;\n border-radius: 4px; border: solid 1px #ccc; padding: 5px;\n text-align:center;font-size:18px;background:#F0F0F0;color:#333}\n\n #save a {text-decoration:none;color:white;font-size:inherit;color:#666}\n\n #save p { margin: 0; padding: 0; margin-top: 8px}\n\n #reload {position:absolute;z-index:9999;top:240px;left:8px;width:210px;height:36px;\n border-radius: 4px; border: solid 1px #ccc; padding: 5px;\n text-align:center;font-size:18px;background:#F0F0F0;color:#333}\n\n #reload a {text-decoration:none;color:white;font-size:inherit;color:#666}\n\n #reload p { margin: 0; padding: 0; margin-top: 8px}\n\n\n #saveastro {position:absolute;z-index:9999;top:360px;left:8px;width:230px;height:70px;\n border-radius: 4px; border: solid 1px #ccc; background: white; text-align:center}\n #saveastro p { margin: 0; padding: 0; margin-top: 16px}\n \n \n #donate {position:absolute;z-index:9999;top:170px;right:16px;width:220px;height:36px;\n border-radius: 4px; border: solid 1px #ccc; padding: 5px;\n text-align:center;font-size:18px;background:white;color:#333}\n \n #donate a {text-decoration:none;color:green;font-size:inherit}\n\n #donatein {position:absolute;z-index:9999;top:220px;right:16px;width:220px;height:36px;\n border-radius: 4px; border: solid 1px #ccc; padding: 5px;\n text-align:center;font-size:18px;background:green;color:#333}\n\n #donatein a {text-decoration:none;color:white;font-size:inherit}\n \n #banner {position:absolute;z-index:9999;top:50%;left:45px;width:250px;height:250px; padding: 0; border: solid 1px white; border-radius: 4px}\n \n </style>\n \n \n \n <script type = "text/javascript">\n window.onload = function() {\n var url = document.getElementById(\'url\');\n if (url.innerHTML.length > 77)\n url.innerHTML = url.innerHTML.substring(0,77) + \'...\';\n };\n </script>\n <div id = "top">\n \n <p class="agitation" style = "padding-top:12px">\n \xd0\xa1\xd1\x82\xd1\x80\xd0\xb0\xd0\xbd\xd0\xb8\xd1\x87\xd0\xba\xd0\xb0 \xd0\xbf\xd1\x80\xd0\xbe\xd0\xb5\xd0\xba\xd1\x82\xd0\xb0 Sci-Hub \xd0\xb2 \xd1\x81\xd0\xbe\xd1\x86\xd0\xb8\xd0\xb0\xd0\xbb\xd1\x8c\xd0\xbd\xd1\x8b\xd1\x85 \xd1\x81\xd0\xb5\xd1\x82\xd1\x8f\xd1\x85 \xe2\x86\x92 <a target="_blank" href="https://vk.com/sci_hub">vk.com/sci_hub</a>\n </p>\n \n </div>\n \n <div id = "content">\n <iframe src = "http://moscow.sci-hub.io/202d9ebdfbb8c0c56964a31b2fdfe8e9/roerdink2016.pdf" id = "pdf"></iframe>\n </div>\n \n <div id = "donate">\n <p><a target = "_blank" href = "//sci-hub.io/donate">\xd0\xbf\xd0\xbe\xd0\xb4\xd0\xb4\xd0\xb5\xd1\x80\xd0\xb6\xd0\xb0\xd1\x82\xd1\x8c \xd0\xbf\xd1\x80\xd0\xbe\xd0\xb5\xd0\xba\xd1\x82 →</a></p>\n </div>\n <div id = "donatein">\n <p><a target = "_blank" href = "//sci-hub.io/donate">support the project →</a></p>\n </div>\n <div id = "save">\n <p>\xe2\x87\xa3 \xd1\x81\xd0\xbe\xd1\x85\xd1\x80\xd0\xb0\xd0\xbd\xd0\xb8\xd1\x82\xd1\x8c \xd1\x81\xd1\x82\xd0\xb0\xd1\x82\xd1\x8c\xd1\x8e</p>\n </div>\n <div id = "reload">\n <p>↻ \xd1\x81\xd0\xba\xd0\xb0\xd1\x87\xd0\xb0\xd1\x82\xd1\x8c \xd0\xb7\xd0\xb0\xd0\xbd\xd0\xbe\xd0\xb2\xd0\xbe</p>\n </div>\n \n \n<!-- Yandex.Metrika counter --> <script type="text/javascript"> (function (d, w, c) { (w[c] = w[c] || []).push(function() { try { w.yaCounter10183018 = new Ya.Metrika({ id:10183018, clickmap:true, trackLinks:true, accurateTrackBounce:true, ut:"noindex" }); } catch(e) { } }); var n = d.getElementsByTagName("script")[0], s = d.createElement("script"), f = function () { n.parentNode.insertBefore(s, n); }; s.type = "text/javascript"; s.async = true; s.src = "https://mc.yandex.ru/metrika/watch.js"; if (w.opera == "[object Opera]") { d.addEventListener("DOMContentLoaded", f, false); } else { f(); } })(document, window, "yandex_metrika_callbacks"); </script> <noscript><div><img src="https://mc.yandex.ru/watch/10183018?ut=noindex" style="position:absolute; left:-9999px;" alt="" /></div></noscript> <!-- /Yandex.Metrika counter -->\n </body>\n</html>\n'
Which does include the URL, however I am unsure how to extract it.
Update:
I am now able to extract the URL but when I try to access the page with the PDF (through urllib.request) I get a 403 response even though the URL is valid. Any ideas on why and how to fix? (I am able to access through my browser so not IP blocked)

You can use urllib library to access the html of the page and even download files, and regex to find the url of the file you want to download.
import urllib
import re
site = urllib.urlopen(".../index.html")
data = site.read() # turns the contents of the site into a string
files = re.findall('(http|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?(.pdf)', data) # finds the url
for file in files:
urllib.urlretrieve(file, filepath) # "filepath" is where you want to save it

Here is the Solution:-
url = re.search('<iframe src = "\s*([^"]+)"', res.content)
url.group(1)
urllib.urlretrieve(url.group(1),'C:/.../Docs/test.pdf')
I ran it and it is working :)
For Python 3:
Change urrlib.urlretrive to urllib.request.urlretrieve

You can do it with a clunky code requiring selenium, requests and scrapy.
Use selenium to request either an article title or DOI.
>>> from selenium import webdriver
>>> driver.get("http://sci-hub.io/")
>>> input_box = driver.find_element_by_name('request')
>>> input_box.send_keys('amazing scientific results\n')
An article by the title 'amazing scientific results' doesn't seem to exist. As a result, the site returns a diagnostic page in the browser window which we can ignore. It also puts 'http://sci-hub.io/' in webdriver's current_url property. This is helpful because it's an indication that the requested result isn't available.
>>> driver.current_url
'http://sci-hub.io/'
Let's try again, looking for the item that you know exists.
>>> driver.get("http://sci-hub.io/")
>>> input_box = driver.find_element_by_name('request')
>>> input_box.send_keys('DOI: 10.1016/j.anai.2016.01.022\n')
>>> driver.current_url
'http://sci-hub.io/10.1016/j.anai.2016.01.022'
This time the site returns a distinctive url. Unfortunately, if we load this using selenium we will get the pdf and, unless you're more able than I am, you will find it difficult to download this to a file on your machine.
Instead, I download it using the requests library. Loaded in this form you will find that the url of the pdf becomes visible in the HTML.
>>> import requests
>>> r = requests.get(driver.current_url)
To ferret out the url I use scrapy.
>>> from scrapy.selector import Selector
>>> selector = Selector(text=r.text)
>>> pdf_url = selector.xpath('.//iframe/#src')[0].extract()
Finally I use requests again to download the pdf so that I can save it to a conveniently named file on local storage.
>>> r = requests.get(pdf_url).content
>>> open('article_name', 'wb').write(r)
211853

I solved this using a combination of the answers above - namely SBO7 & Roxerg.
I use the following to extract the URL from the page and then download the PDF:
res = requests.get('http://sci-hub.io/' + str(row['DOI']))
useful = BeautifulSoup(res.content, "html5lib").find_all("iframe")
urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', str(useful[0]))
response = requests.get(urls[0])
with open("C:\\Users\\Sangeeta's\\Downloads\\ref\\" + str(row['ref']) + '.pdf', 'wb') as fw:
fw.write(response.content)
Note: This will not work for all articles - some link to webpages (example) and this doesn't correctly work for those.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to make a dataframe download through browser using python - python

Related

How to send prettytable through email using python

Pycharm python AttributeError: module 'urllib3' has no attribute 'Request'

How do you pull text in dd attribute with no name using selenium and python?

Error with encode / decode in html generated for outlook

Extract URL from webpage and save to disk

Categories

Resources