Python sorting html table

Python sorting html table - python

I am looping through a list of servers and connecting with OpenSSL, to retrieve the SSL cert, and grabbing the server name, the date the cert expires, and calculating the number of days until cert expires. I am then building an html table with the data. The columns are Host, Hostname, Expiration Date, and Remaining Days. What is the best way to sort the table by the "Remaining Days" column?
# Update the hosts entry
ssl_results[str(ip)][0] = host
ssl_results[str(ip)][1] = server_name
ssl_results[str(ip)][2] = exp_date
ssl_results[str(ip)][3] = days_to_expire
# Loop through the ssl_results entries and generate a email + results file
try:
# variable to hold html for email
SSLCertificates = """<html>
<head>
<style>
table{width: 1024px;}
table, th, td {
border: 1px solid black;
border-collapse: collapse;
}
th, td {
padding: 5px;
text-align: left;
}
ul:before{
content:attr(data-header);
font-size:120%;
font-weight:bold;
margin-left:-15px;
}
</style>
</head>
<body>
<p><h2>Blah, </h2>
<h3>SSL Expiration Summary:</h3>
<span style="color:red;"><b>Blah Blah Blah.<b></span><br><br>
<table id=\"exp_ssls\"><tr><th>Host</th><th>Hostname</th><th>Expiration Date</th><th>Remaining Days</th></tr>
"""
for entries in ssl_results:
SSLCertificates += "<tr><td>" + str(entries) + "</td><td>" + str(ssl_results[entries][1]) + "</td><td>" + str(
ssl_results[entries][2]) + "</td><td>" + str(ssl_results[entries][3]) + "</td></tr>"
SSLCertificates += """</body>
</html>"""
f = open('SSLCertificates.html', 'w')
f.write(SSLCertificates)
f.close()
filename = 'SSLCertificates.html'
attachment = open(filename, 'rb')

Sort the dict before you form the html tags. then Iterate thru the dict and print it using html tags. Use sorted() to sort your dict before you iterate thru it.
import operator
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=operator.itemgetter(1))
sorted_x will be a list of tuples sorted by the second element in each tuple. dict(sorted_x) == x.

Related

Scrape "Button" tag with Selenium

import requests
from selenium import webdriver
import bs4
PATH = 'C:\Program Files (x86)\chromedriver.exe'
driver = webdriver.Chrome(PATH)
oLat = 33.8026087
oLong = -84.3369491999999
dLat = 33.79149
dLong = -84.32312
url = "https://ride.lyft.com/ridetype?origin=" + str(oLat) + "%2C" + str(oLong) + "&destination=" + str(dLat) + "%2C" + str(dLong) + "&ride_type=&offerProductId=standard"
driver.get(url)
content = driver.page_source
soup = bs4.BeautifulSoup(content)
print(soup)
print(url)
Here is my code currently. I am trying to scrape the lyft price estimate.
The data is in the "button" tag. This does not show up in the html from the code I provided above. How can I get this data to show up?
import requests
from selenium import webdriver
import bs4
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
PATH = 'C:\Program Files (x86)\chromedriver.exe'
driver = webdriver.Chrome(PATH)
oLat = 33.7885662
oLong = -84.326684
dLat = 33.4486296
dLong = -84.4550443
url = "https://ride.lyft.com/ridetype?origin=" + str(oLat) + "%2C" + str(oLong) + "&destination=" + str(dLat) + "%2C" + str(dLong) + "&ride_type=&offerProductId=standard"
driver.get(url)
spanThing = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR , "span.sc-7e9e68d9-0 lctkqn")))
print(spanThing)
driver.quit()
I tried this additional code, but it doesn't find the span and class for some reason. I'm not sure why

To extract the Page Source you need to induce WebDriverWait for the visibility_of_element_located() of a static element and you can use the following locator strategies:
oLat = 33.8026087
oLong = -84.3369491999999
dLat = 33.79149
dLong = -84.32312
url = "https://ride.lyft.com/ridetype?origin=" + str(oLat) + "%2C" + str(oLong) + "&destination=" + str(dLat) + "%2C" + str(dLong) + "&ride_type=&offerProductId=standard"
driver.get(url)
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(., 'Sign up / Log in to request ride')]")))
print(driver.page_source)
driver.quit()
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Console Output:
<html lang="en-US" class="js-focus-visible" data-js-focus-visible=""><head><meta name="viewport" content="width=device-width"><script type="module">
if (window.performance) {
const toSnake = (str) => str.replace(/([A-Z])/g, function($1) {return '_' + $1.toLowerCase();});
const measure = () => {
const { timing } = window.performance;
if (!timing.navigationStart) return;
const al = [
'event_name','sending_service','connection_end','connection_start','dom_complete',
'dom_content_loaded_event_end','dom_content_loaded_event_start','dom_interactive',
'dom_loading','domain_lookup_end','domain_lookup_start','fetch_start','load_event_end',
'load_event_start','navigation_start','redirect_end','redirect_start','request_start',
'response_end','response_start','secure_connection_start','unload_event_end',
'unload_event_start','connect_start','connect_end','ms_first_paint','source','uri_path',
'request_end','code','track_id','uri_href'
];
const { href = '', pathname = '' } = window.location;
const sE = { event_name: 'navigation_timing_absolute', uri_href: href, uri_path: pathname, sending_service: 'riderweb', source: 'riderweb' };
for (let eN in timing) {
const sEN = toSnake(eN);
if (al.includes(sEN)) { sE[sEN] = timing[eN]; }
}
// iOS 11 supports ES modules, but sendBeacon not available until 11.3.
if (navigator.sendBeacon) {
navigator.sendBeacon('https://www.lyft.com/api/track', JSON.stringify(sE));
}
};
try {
if (document.readyState === 'complete') {
measure();
} else {
window.addEventListener('load', measure);
}
} catch(e) {}
}
</script><script>
var _i18n_extends = Object.assign || function (target) { for (var i = 1; i < arguments.length; i++) { var source = arguments[i]; for (var key in source) { if (Object.prototype.hasOwnProperty.call(source, key)) { target[key] = source[key]; } } } return target; };
;if(!window.__TRANSLATIONS__) window.__TRANSLATIONS__ = {};
window.__TRANSLATIONS__.locale = "en-US";
window.__TRANSLATIONS__.bundleName = "common";
if (!window.__TRANSLATIONS__.data) window.__TRANSLATIONS__.data = {};
_i18n_extends(window.__TRANSLATIONS__.data, {"%;":{"s":"OK"},"#":{"s":"Sorry, we can't find that page"},"$":{"s":"Sorry, there was an error"},"%":{"s":"Back"},"A":{"s":"No tip"},"T":{"s":"Lyft: Request a ride on the web"},"p":{"s":"Current location"},"q":{"s":"You set your pickup as \"Your Location\"{originatingAppMsg}"},"r":{"s":" in Google Maps"},"s":{"s":"To use the same pickup location, Lyft needs access to your current location."},"t":{"s":"Share your location"},"u":{"s":"Location sharing is denied"},"w":{"s":"Submit"},"x":{"s":"Save"},"y":{"s":"Confirm"},"z":{"s":"Unknown error"},"{":{"s":"Close"},"|":{"s":"Cancel"},"}":{"s":"Edit"},"~":{"s":"Delete"},"! ":{"s":"Done"},"!!":{"s":"Log out"},"!#":{"s":"Are you sure you want to log out?"},"!%":{"s":"Payment defaults"},"!&":{"s":"Add a payment method to get started."},"!(":{"s":"Add new card"},"!)":{"s":"Could not update payment method"},"!*":{"s":"Payment"},"!+":{"s":"manage your payment methods"},"!,":{"s":"Payment method"},"!-":{"s":"Card failed!"},"!.":{"s":"Payment method not supported on ride.lyft.com."},"!\u002F":{"s":"Payment method updated across Lyft apps."},"!0":{"s":"You cannot delete your only valid payment method."},"!1":{"s":"Gift cards"},"!2":{"s":"redeem gift cards"},"!3":{"s":"This field is required"},"!4":{"s":"Something went wrong. Please try again."},"!5":{"s":"Click to log out or switch accounts"},"!6":{"s":"Go back"},"!Z":{"s":"Schedule"},"!k":{"s":"schedule a ride"},"(6":{"s":"Ride"},"(7":{"s":"Rent"},"(8":{"s":"Rent a car through Lyft or our partner Sixt"},"(9":{"s":"Help"},"(:":{"s":"Business"},"(;":{"s":"Upcoming rides"},"(\u003C":{"s":"Install on Phone"},"(=":{"s":"Sign up \u002F Log in"},"(\u003E":{"s":"Log in"},"(l":{"s":"Install app"},"(m":{"s":"Free"},")z":{"s":"Not now"},"){":{"s":"Get the Lyft app"},")|":{"s":"More travel options from the palm of your hand"},")}":{"s":"From bikes to rentals and everything in between. If it gets you there, it's in the app."},"*\u003E":{"s":"Install on Desktop"},"*?":{"s":"Install on Desktop. It's free and takes up no space on your device"},"*C":{"s":"Text me a link"},"*D":{"s":"We'll send you a text with a link to download the app."},"*E":{"s":"Enter mobile phone number"},"*F":{"s":"Phone invalid"},"*G":{"s":"Refresh"},"*H":{"s":"An update is available"},",+":{"s":"View profile"},",,":{"s":"Get a ride"},",-":{"s":"Rides"},",.":{"s":"Gift cards"},",\u002F":{"s":"Promos"},",0":{"s":"Donate"},",1":{"s":"Invite friends"},",2":{"s":"Help"},",3":{"s":"Settings"},",4":{"s":"Safety Tools"},",5":{"s":"Lyft Rentals"},")d":{"s":"Log in \u002F Sign up"},")e":{"s":"You will need to log in to {action}!"},")f":{"s":"Log in"},")g":{"s":"Cancel"},"a":{"s":"Lyft and OpenStreetMap watermark"},"#L":{"s":"add promotions"},"&^":{"s":"Just now"},"&`.zero":{"s":"{minutes} minutes ago"},"&_.one":{"s":"{minutes} minute ago"},"&`.two":{"s":"{minutes} minutes ago"},"&`.few":{"s":"{minutes} minutes ago"},"&`.many":{"s":"{minutes} minutes ago"},"&`.other":{"s":"{minutes} minutes ago"},"&b.zero":{"s":"{hours} hours ago"},"&a.one":{"s":"{hours} hour ago"},"&b.two":{"s":"{hours} hours ago"},"&b.few":{"s":"{hours} hours ago"},"&b.many":{"s":"{hours} hours ago"},"&b.other":{"s":"{hours} hours ago"},"&d.zero":{"s":"{days} days ago"},"&c.one":{"s":"{days} day ago"},"&d.two":{"s":"{days} days ago"},"&d.few":{"s":"{days} days ago"},"&d.many":{"s":"{days} days ago"},"&d.other":{"s":"{days} days ago"},"&e":{"s":"Less than a minute"},"&g.zero":{"s":"{minutes} Total minutes"},"&f.one":{"s":"{minutes} Total minute"},"&g.two":{"s":"{minutes} Total minutes"},"&g.few":{"s":"{minutes} Total minutes"},"&g.many":{"s":"{minutes} Total minutes"},"&g.other":{"s":"{minutes} Total minutes"},"&i.zero":{"s":"{hours} Total hours"},"&h.one":{"s":"{hours} Total hour"},"&i.two":{"s":"{hours} Total hours"},"&i.few":{"s":"{hours} Total hours"},"&i.many":{"s":"{hours} Total hours"},"&i.other":{"s":"{hours} Total hours"},"&k.zero":{"s":"{days} Total days"},"&j.one":{"s":"{days} Total day"},"&k.two":{"s":"{days} Total days"},"&k.few":{"s":"{days} Total days"},"&k.many":{"s":"{days} Total days"},"&k.other":{"s":"{days} Total days"},"(a":{"s":"Any fare exceeding your Lyft Cash balance will be charged to your default payment method."},"(|":{"s":"Total"},"(}":{"s":"You'll pay this price unless you add a stop, change your destination, or if credit expires."},"(~":{"s":"This is an estimated range for your trip."},") ":{"s":"\u003CLink\u003ELog in\u003C\u002FLink\u003E or sign up to lock in your price and request a ride."},")?":{"s":"Driver Name:"},")#":{"s":"Driver's car image"},")A":{"s":"License Plate Number:"},")B":{"s":"Pick up"},")C":{"s":"Picked up"},")D":{"s":"Drop-off"},")E":{"s":"Dropped off"},")F":{"s":"Current location"},")c":{"s":"Close banner"},")y":{"s":"Riders"},"*I":{"s":"Add card"},"*J":{"s":"Edit {cardLabel}"},"+2":{"s":"$10"},"+3":{"s":"$8"},"+4":{"s":"$10"},"+5":{"s":"Unlimited 180-min classic rides for 24 hours"},"+6":{"s":"$15"},"+7":{"s":"Unlimited 30-min classic rides for 24 hours"},"+8":{"s":"Your payment info will be stored securely."},",#":{"s":"Please follow \u003CSupportLink\u003Ethese instructions\u003C\u002FSupportLink\u003E to allow this site to show notifications."},",$":{"s":"Notifications are blocked"},",\u003C":{"s":"Session expired"},",=":{"s":"You have been logged out. Please log back in to continue."},"!u":{"s":"Click to edit your pickup location"},"%\u002F":{"s":"You must \u003CLink\u003Elog in\u003C\u002FLink\u003E to {action}."},"&J":{"s":"Something went wrong. Unable to load your referral history. Please try again."},"7f523512b795a02fd9b9b05a1e22ff9b":{"s":"Card number"},"3effb3a930ea2ce61705bffc624e19b6":{"s":"Expiration"},"755c8f863223ae3f7ac0ac1cfe8b3072":{"s":"Name on card"},"22b715147b81b76566fa183406659069":{"s":"Country"},"4b3d5e03b24b6bbc630d15ad2251755f":{"s":"Billing address"},"e0a8872668d31bb76156a8d80a5d7a6c":{"s":"City"},"f420cf2cf310bbff1ead064745e66ec1":{"s":"State"},"8e9d206ff46216065a42a3953a63bd9f":{"s":"Province \u002F Territory"},"9dca7ddd59d7aca64aae58c7a99e16ce":{"s":"State \u002F Province"},"50be4be10369e747d757e7b2db2c9ed3":{"s":"Zip code"},"11ceb56a912fd18cc9ea1054c5405c13":{"s":"Postal code"},"5a0a89ab4fd1ceebfd9f68b88d27e685":{"s":"Save"},"45c9b92858c6ce6b50c1967661063ae8":{"s":"Cancel"},"29fc403cabcebe790ddd09c592f7e7cd":{"s":"There was a problem reading your card details. Please try again."},"1ae24aeff3771f629b2f865074b68050":{"s":"You must be logged in to add a payment method."},"275c89584bcddfbf0019d8d5a2ce6128":{"s":"You must be logged in to edit a payment method."},"2a420e791e0ec6d47cb64d5fab8376a9":{"s":"Please fill out all required fields"},"a966a08942254351695c6993e781301e":{"s":"Something went wrong. Please check your information and try again"}});
</script><meta charset="utf-8"><meta content="IE=Edge" http-equiv="X-UA-Compatible"><meta name="google" content="notranslate"><meta http-equiv="Accept-CH" content="DPR, Viewport-Width, Width, Downlink, Save-Data, Content-DPR"><link rel="home" href="https://ride.lyft.com"><link rel="canonical" href="https://ride.lyft.com"><link rel="icon" href="https://cdn.lyft.com/static/www-meta-assets/favicon.ico"><link rel="shortcut icon" sizes="192x192" href="https://cdn.lyft.com/static/riderweb/images/icons/icon-192x192.png"><link rel="apple-touch-startup-image" href="https://cdn.lyft.com/static/riderweb/images/icons/icon-192x192.png"><link rel="apple-touch-icon" href="https://cdn.lyft.com/static/riderweb/images/icons/icon-192x192.png"><meta name="apple-mobile-web-app-capable" content="yes"><meta name="apple-mobile-web-app-status-bar-style" content="black-translucent"><meta property="og:title" content="Lyft: Request a ride on the web"><meta property="og:url" content="https://ride.lyft.com"><meta name="twitter:card" content="summary_large_image"><meta name="twitter:site" content="#lyft"><meta name="msapplication-starturl" content="https://ride.lyft.com"><link rel="stylesheet" href="https://cdn.lyft.com/coreui/base.4.6.5.css"><meta name="google-site-verification" content="V9fk-oLTj9Ewu7Kc6Vetf94qp8HZ3gfjxFMkn8LmZ3Y"><link rel="manifest" href="/manifest.json" crossorigin="use-credentials"><meta name="theme-color" content="#FFFFFF"><meta name="description" content="Request a Lyft ride in a web browser on your phone, tablet, or laptop – no app download required. Get a ride from a friendly driver in minutes."><meta property="og:description" content="Request a Lyft ride in a web browser on your phone, tablet, or laptop – no app download required. Get a ride from a friendly driver in minutes."><meta property="og:image" content="/images/share.png">
.
,
<next-route-announcer><p aria-live="assertive" id="__next-route-announcer__" role="alert" style="border: 0px; clip: rect(0px, 0px, 0px, 0px); height: 1px; margin: -1px; overflow: hidden; padding: 0px; position: absolute; width: 1px; white-space: nowrap; overflow-wrap: normal;"></p></next-route-announcer><iframe name="__privateStripeMetricsController9540" frameborder="0" allowtransparency="true" scrolling="no" role="presentation" allow="payment *" src="https://js.stripe.com/v3/m-outer-93afeeb17bc37e711759584dbfc50d47.html#url=https%3A%2F%2Fride.lyft.com%2Fridetype%3Forigin%3D33.8026087%252C-84.3369491999999%26destination%3D33.79149%252C-84.32312%26ride_type%3D%26offerProductId%3Dstandard&title=Lyft%3A%20Price%20estimate&referrer=&muid=NA&sid=NA&version=6&preview=false" aria-hidden="true" tabindex="-1" style="border: none !important; margin: 0px !important; padding: 0px !important; width: 1px !important; min-width: 100% !important; overflow: hidden !important; display: block !important; visibility: hidden !important; position: fixed !important; height: 1px !important; pointer-events: none !important; user-select: none !important;"></iframe></body></html>

How to send prettytable through email using python

I have created a prettytable in python and I have to send the output of prettytable through email
env = "Dev"
cost = 25.3698
line = [env, "${:,.2f}".format(cost)]
totalcostofenv = PrettyTable(['Environment', 'Cost'])
totalcostofenv.add_row(line)
print(totalcostofenv)
Below attached is the output :
Table Output
Can anyone help me to solve this?
This was my question asked and I found an solution , Below displayed is my code:
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
import os
from prettytable import PrettyTable
env = "Dev"
cost = 25.3698
line = [env, "${:,.2f}".format(cost)]
totalcostofenv = PrettyTable(['Environment', 'Cost'])
totalcostofenv.add_row(line)
print(totalcostofenv)
print(totalcostofenv.get_html_string())
def trigger_email():
my_message = totalcostofenv.get_html_string()
text = "Hi!"
html = """\
<html>
<head>
<style>
table, th, td {
border: 1px solid black;
border-collapse: collapse;
}
th, td {
padding: 5px;
text-align: left;
}
</style>
</head>
<body>
<p>Cost Usage of Plantd Environemnts<br>
%s
</p>
</body>
</html>
""" % (my_message)
part1 = MIMEText(text, 'plain')
part2 = MIMEText(html, 'html')
msg = MIMEMultipart()
from_addr = "from-address"
mail_password = os.environ.get('gmail-pass')
to_addr = "to-address"
msg.attach(part1)
msg.attach(part2)
try:
smtp = smtplib.SMTP('smtp.gmail.com',587)
smtp.starttls()
smtp.login(from_addr , mail_password)
smtp.sendmail(from_addr , to_addr , msg.as_string())
print('Mail sent')
except:
print('Mail not sent')
trigger_email()

You can use MJML templating like this
<mjml>
<mj-head>
<mj-title>Set the title, usually for accessibility tools</mj-title>
<mj-preview>Set inbox preview text here, otherwise it might be something nonsensical</mj-preview>
<mj-attributes>
<mj-all font-family="Helvetica, Arial, sans-serif"></mj-all>
<mj-text font-weight="400" font-size="16px" color="#4A4A4A" line-height="24px" />
<mj-section padding="0px"></mj-section>
</mj-attributes>
</mj-head>
<mj-body>
{{table}}
</mj-body>
</mjml>
Code:
import pystache
# read in the email template, remember to use the compiled HTML version!
email_template = (Path() / 'email_template.html').read_text()
# Logic
env = "Dev"
cost = 25.3698
line = [env, "${:,.2f}".format(cost)]
totalcostofenv = PrettyTable(['Environment', 'Cost'])
totalcostofenv.add_row(line)
# Pass in values for the template using a dictionary
template_params = {'table': totalcostofenv }
# Attach the message to the Multipart Email
final_email_html = pystache.render(email_template, template_params)
message.attach(MIMEText(final_email_html), 'html')
"""Continue with sending..."""

Extract URL from webpage and save to disk

I am trying to write a script to automaotmcally query sci-hub.io with an article's title and save a PDF copy of the articles full text to my computer with a specific file name.
To do this I have written the following code:
url = "http://sci-hub.io/"
data = read_csv("C:\\Users\\Sangeeta's\\Downloads\\distillersr_export (1).csv")
for index, row in data.iterrows():
try:
print('http://sci-hub.io/' + str(row['DOI']))
res = requests.get('http://sci-hub.io/' + str(row['DOI']))
print(res.content)
except:
print('NO DOI: ' + str(row['ref']))
This opens a CSV file with a list of DOI's and names of the file to be saved. For each DOI, it then queries sci-hub.io for the full-text. The presented page embeds the PDF in however I am now unsure how to extract the URL for the PDF and save it to disk.
An example of the page can be seen in the image below:
In this image, the desired URL is http://dacemirror.sci-hub.io/journal-article/3a257a9ec768d1c3d80c066186aba421/pajno2010.pdf.
How can I automatically extract this URL and then save the PDF file to disk?
When I print res.content, I get this:
b'<!DOCTYPE html>\n<html>\n <head>\n <title></title>\n <meta charset="UTF-8">\n <meta name="viewport" content="width=device-width">\n </head>\n <body>\n <style type = "text/css">\n body {background-color:#F0F0F0}\n div {overflow: hidden; position: absolute;}\n #top {top:0;left:0;width:100%;height:50px;font-size:14px} /* 40px */\n #content {top:50px;left:0;bottom:0;width:100%}\n p {margin:0;padding:10px}\n a {font-size:12px;font-family:sans-serif}\n a.target {font-weight:normal;color:green;margin-left:10px}\n a.reopen {font-weight:normal;color:blue;text-decoration:none;margin-left:10px}\n iframe {width:100%;height:100%}\n \n p.agitation {padding-top:5px;font-size:20px;text-align:center}\n p.agitation a {font-size:20px;text-decoration:none;color:green}\n\n .banner {position:absolute;z-index:9999;top:400px;left:0px;width:300px;height:225px;\n border: solid 1px #ccc; padding: 5px;\n text-align:center;font-size:18px}\n .banner img {border:0}\n \n p.donate {padding:0;margin:0;padding-top:5px;text-align:center;background:green;height:40px}\n p.donate a {color:white;font-weight:bold;text-decoration:none;font-size:20px}\n\n #save {position:absolute;z-index:9999;top:180px;left:8px;width:210px;height:36px;\n border-radius: 4px; border: solid 1px #ccc; padding: 5px;\n text-align:center;font-size:18px;background:#F0F0F0;color:#333}\n\n #save a {text-decoration:none;color:white;font-size:inherit;color:#666}\n\n #save p { margin: 0; padding: 0; margin-top: 8px}\n\n #reload {position:absolute;z-index:9999;top:240px;left:8px;width:210px;height:36px;\n border-radius: 4px; border: solid 1px #ccc; padding: 5px;\n text-align:center;font-size:18px;background:#F0F0F0;color:#333}\n\n #reload a {text-decoration:none;color:white;font-size:inherit;color:#666}\n\n #reload p { margin: 0; padding: 0; margin-top: 8px}\n\n\n #saveastro {position:absolute;z-index:9999;top:360px;left:8px;width:230px;height:70px;\n border-radius: 4px; border: solid 1px #ccc; background: white; text-align:center}\n #saveastro p { margin: 0; padding: 0; margin-top: 16px}\n \n \n #donate {position:absolute;z-index:9999;top:170px;right:16px;width:220px;height:36px;\n border-radius: 4px; border: solid 1px #ccc; padding: 5px;\n text-align:center;font-size:18px;background:white;color:#333}\n \n #donate a {text-decoration:none;color:green;font-size:inherit}\n\n #donatein {position:absolute;z-index:9999;top:220px;right:16px;width:220px;height:36px;\n border-radius: 4px; border: solid 1px #ccc; padding: 5px;\n text-align:center;font-size:18px;background:green;color:#333}\n\n #donatein a {text-decoration:none;color:white;font-size:inherit}\n \n #banner {position:absolute;z-index:9999;top:50%;left:45px;width:250px;height:250px; padding: 0; border: solid 1px white; border-radius: 4px}\n \n </style>\n \n \n \n <script type = "text/javascript">\n window.onload = function() {\n var url = document.getElementById(\'url\');\n if (url.innerHTML.length > 77)\n url.innerHTML = url.innerHTML.substring(0,77) + \'...\';\n };\n </script>\n <div id = "top">\n \n <p class="agitation" style = "padding-top:12px">\n \xd0\xa1\xd1\x82\xd1\x80\xd0\xb0\xd0\xbd\xd0\xb8\xd1\x87\xd0\xba\xd0\xb0 \xd0\xbf\xd1\x80\xd0\xbe\xd0\xb5\xd0\xba\xd1\x82\xd0\xb0 Sci-Hub \xd0\xb2 \xd1\x81\xd0\xbe\xd1\x86\xd0\xb8\xd0\xb0\xd0\xbb\xd1\x8c\xd0\xbd\xd1\x8b\xd1\x85 \xd1\x81\xd0\xb5\xd1\x82\xd1\x8f\xd1\x85 \xe2\x86\x92 <a target="_blank" href="https://vk.com/sci_hub">vk.com/sci_hub</a>\n </p>\n \n </div>\n \n <div id = "content">\n <iframe src = "http://moscow.sci-hub.io/202d9ebdfbb8c0c56964a31b2fdfe8e9/roerdink2016.pdf" id = "pdf"></iframe>\n </div>\n \n <div id = "donate">\n <p><a target = "_blank" href = "//sci-hub.io/donate">\xd0\xbf\xd0\xbe\xd0\xb4\xd0\xb4\xd0\xb5\xd1\x80\xd0\xb6\xd0\xb0\xd1\x82\xd1\x8c \xd0\xbf\xd1\x80\xd0\xbe\xd0\xb5\xd0\xba\xd1\x82 →</a></p>\n </div>\n <div id = "donatein">\n <p><a target = "_blank" href = "//sci-hub.io/donate">support the project →</a></p>\n </div>\n <div id = "save">\n <p>\xe2\x87\xa3 \xd1\x81\xd0\xbe\xd1\x85\xd1\x80\xd0\xb0\xd0\xbd\xd0\xb8\xd1\x82\xd1\x8c \xd1\x81\xd1\x82\xd0\xb0\xd1\x82\xd1\x8c\xd1\x8e</p>\n </div>\n <div id = "reload">\n <p>↻ \xd1\x81\xd0\xba\xd0\xb0\xd1\x87\xd0\xb0\xd1\x82\xd1\x8c \xd0\xb7\xd0\xb0\xd0\xbd\xd0\xbe\xd0\xb2\xd0\xbe</p>\n </div>\n \n \n<!-- Yandex.Metrika counter --> <script type="text/javascript"> (function (d, w, c) { (w[c] = w[c] || []).push(function() { try { w.yaCounter10183018 = new Ya.Metrika({ id:10183018, clickmap:true, trackLinks:true, accurateTrackBounce:true, ut:"noindex" }); } catch(e) { } }); var n = d.getElementsByTagName("script")[0], s = d.createElement("script"), f = function () { n.parentNode.insertBefore(s, n); }; s.type = "text/javascript"; s.async = true; s.src = "https://mc.yandex.ru/metrika/watch.js"; if (w.opera == "[object Opera]") { d.addEventListener("DOMContentLoaded", f, false); } else { f(); } })(document, window, "yandex_metrika_callbacks"); </script> <noscript><div><img src="https://mc.yandex.ru/watch/10183018?ut=noindex" style="position:absolute; left:-9999px;" alt="" /></div></noscript> <!-- /Yandex.Metrika counter -->\n </body>\n</html>\n'
Which does include the URL, however I am unsure how to extract it.
Update:
I am now able to extract the URL but when I try to access the page with the PDF (through urllib.request) I get a 403 response even though the URL is valid. Any ideas on why and how to fix? (I am able to access through my browser so not IP blocked)

You can use urllib library to access the html of the page and even download files, and regex to find the url of the file you want to download.
import urllib
import re
site = urllib.urlopen(".../index.html")
data = site.read() # turns the contents of the site into a string
files = re.findall('(http|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?(.pdf)', data) # finds the url
for file in files:
urllib.urlretrieve(file, filepath) # "filepath" is where you want to save it

Here is the Solution:-
url = re.search('<iframe src = "\s*([^"]+)"', res.content)
url.group(1)
urllib.urlretrieve(url.group(1),'C:/.../Docs/test.pdf')
I ran it and it is working :)
For Python 3:
Change urrlib.urlretrive to urllib.request.urlretrieve

You can do it with a clunky code requiring selenium, requests and scrapy.
Use selenium to request either an article title or DOI.
>>> from selenium import webdriver
>>> driver.get("http://sci-hub.io/")
>>> input_box = driver.find_element_by_name('request')
>>> input_box.send_keys('amazing scientific results\n')
An article by the title 'amazing scientific results' doesn't seem to exist. As a result, the site returns a diagnostic page in the browser window which we can ignore. It also puts 'http://sci-hub.io/' in webdriver's current_url property. This is helpful because it's an indication that the requested result isn't available.
>>> driver.current_url
'http://sci-hub.io/'
Let's try again, looking for the item that you know exists.
>>> driver.get("http://sci-hub.io/")
>>> input_box = driver.find_element_by_name('request')
>>> input_box.send_keys('DOI: 10.1016/j.anai.2016.01.022\n')
>>> driver.current_url
'http://sci-hub.io/10.1016/j.anai.2016.01.022'
This time the site returns a distinctive url. Unfortunately, if we load this using selenium we will get the pdf and, unless you're more able than I am, you will find it difficult to download this to a file on your machine.
Instead, I download it using the requests library. Loaded in this form you will find that the url of the pdf becomes visible in the HTML.
>>> import requests
>>> r = requests.get(driver.current_url)
To ferret out the url I use scrapy.
>>> from scrapy.selector import Selector
>>> selector = Selector(text=r.text)
>>> pdf_url = selector.xpath('.//iframe/#src')[0].extract()
Finally I use requests again to download the pdf so that I can save it to a conveniently named file on local storage.
>>> r = requests.get(pdf_url).content
>>> open('article_name', 'wb').write(r)
211853

I solved this using a combination of the answers above - namely SBO7 & Roxerg.
I use the following to extract the URL from the page and then download the PDF:
res = requests.get('http://sci-hub.io/' + str(row['DOI']))
useful = BeautifulSoup(res.content, "html5lib").find_all("iframe")
urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', str(useful[0]))
response = requests.get(urls[0])
with open("C:\\Users\\Sangeeta's\\Downloads\\ref\\" + str(row['ref']) + '.pdf', 'wb') as fw:
fw.write(response.content)
Note: This will not work for all articles - some link to webpages (example) and this doesn't correctly work for those.

Display JSON data from a python file into HTML

I currently have an HTML file and a python file. The python file uses YELP's API and returns JSON data. How do I display that data onto my webpage through HTML? Is there a function like document.getElementById("id").innerHTML = JSONDATA from JavaScript?
Please let me know if you need any more details; this is my first time posting and first time using an API/making a website. I understand the JSON data is not going to look nice but I will put it into a dictionary and sort it later, basically right now I am just wondering how to display data from a Python file into a HTML file. Also, feel free to link any helpful tutorials.
Found the following Node.js code as it was suggested to use Javascript instead, where in this would I put my tokens/secrets? And then how would I call it in my html file... Thank you.
/* require the modules needed */
var oauthSignature = require('oauth-signature');
var n = require('nonce')();
var request = require('request');
var qs = require('querystring');
var _ = require('lodash');
/* Function for yelp call
* ------------------------
* set_parameters: object with params to search
* callback: callback(error, response, body)
*/
var request_yelp = function(set_parameters, callback) {
/* The type of request */
var httpMethod = 'GET';
/* The url we are using for the request */
var url = 'http://api.yelp.com/v2/search';
/* We can setup default parameters here */
var default_parameters = {
location: 'San+Francisco',
sort: '2'
};
/* We set the require parameters here */
var required_parameters = {
oauth_consumer_key : process.env.oauth_consumer_key,
oauth_token : process.env.oauth_token,
oauth_nonce : n(),
oauth_timestamp : n().toString().substr(0,10),
oauth_signature_method : 'HMAC-SHA1',
oauth_version : '1.0'
};
/* We combine all the parameters in order of importance */
var parameters = _.assign(default_parameters, set_parameters, required_parameters);
/* We set our secrets here */
var consumerSecret = process.env.consumerSecret;
var tokenSecret = process.env.tokenSecret;
/* Then we call Yelp's Oauth 1.0a server, and it returns a signature */
/* Note: This signature is only good for 300 seconds after the oauth_timestamp */
var signature = oauthSignature.generate(httpMethod, url, parameters, consumerSecret, tokenSecret, { encodeSignature: false});
/* We add the signature to the list of paramters */
parameters.oauth_signature = signature;
/* Then we turn the paramters object, to a query string */
var paramURL = qs.stringify(parameters);
/* Add the query string to the url */
var apiURL = url+'?'+paramURL;
/* Then we use request to send make the API Request */
request(apiURL, function(error, response, body){
return callback(error, response, body);
});
};

I had a similar situation. I had to show the IAM users of AWS account in a HTML page. I used AWS boto3 Python client to grab all IAM users and write a JSON file. Then from HTML file I read that JSON file and showed all users in a table.
Here is the Python code IAM.PY:
import boto3
import os
import subprocess
import json
iam_client = boto3.client('iam')
def list_user_cli():
list_cmd = "aws iam list-users"
output = subprocess.check_output(list_cmd, shell = True)
output = str(output.decode('ascii'))
return output
def write_json_file(filename, data):
try:
with open(filename, "w") as f:
f.writelines(data)
print(filename + " has been created.")
except Exception as e:
print(str(e))
if __name__ == "__main__":
filename = "iam.json"
data = list_user_cli()
write_json_file(filename, data)
Here is the HTML file IAM.HTML:
<!DOCTYPE html>
<html>
<head>
<!-- Latest compiled and minified CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
<title>IAM User List</title>
<style type="text/css">
body{
margin: 20px;
}
</style>
</head>
<body>
<div class="container">
<table class="table table-responsive table-hover table-bordered">
<thead>
<tr>
<th>User ID</th>
<th>User Name</th>
<th>Path</th>
<th>Create Date</th>
<th>Arn</th>
</tr>
</thead>
<tbody id="iam_tbody">
</tbody>
</table>
</div>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.2.0/jquery.min.js"></script>
<script type="text/javascript">
$(document).ready(function(){
$.ajax({
method: "GET",
url: "http://localhost/iam/iam.json",
}).done(function(response){
user_list = response.Users;
for(i = 0; i<user_list.length; i++){
tr = "<tr>";
tr += "<td>";
tr += user_list[i]["UserId"];
tr += "</td>";
tr += "<td>";
tr += user_list[i]["UserName"];
tr += "</td>";
tr += "<td>";
tr += user_list[i]["Path"];
tr += "</td>";
tr += "<td>";
tr += user_list[i]["CreateDate"];
tr += "</td>";
tr += "<td>";
tr += user_list[i]["Arn"];
tr += "</td>";
tr += "<tr>";
$("#iam_tbody").append(tr);
}
});
});
</script>
</body>
</html>
Output

You can use Jquery Ajax to call your API, include Jquery in your html file.
$.ajax({
method: "GET",
url: "api_url",
}).done(function( response ) {
$('#divId').append(response);
});
In Your Html File
<div id="divId"></div>
Jquery Ajax Documentation

Nested for-loop iteration stops

I have two input files: an html one, and a css for it. I want to produce some operation on the html file based on the contents of the css file.
my html is like this:
<html>
<head>
<title></title>
</head>
<body>
<p class = "cl1" id = "id1"> <span id = "span1"> blabla</span> </p>
<p class = "cl2" id = "id2"> <span id = "span2"> blablabla</span> <span id = "span3"> qwqwqw </span> </p>
</body>
</html>
Styles for span ids are defined in the css file (individually for each span id!)
Before doing real stuff (deletion of spans based on their style) I was trying just to print out ids from the html and the style descritption from the css corresponding to each id.
Code:
from lxml import etree
tree = etree.parse("file.html")
filein = "file.css"
def f1():
with open(filein, 'rU') as f:
for span in tree.iterfind('//span'):
for line in f:
if span and span.attrib.has_key('id'):
x = span.get('id')
if "af" not in x and x in line:
print x, line
def main():
f1()
So, there are two for-loops, which iterate perfectly if separated, but when put together in this function the iteration stops after the first loop:
>> span1 span`#span1 { font-weight: bold; font-size: 11.0pt; font-style: normal; letter-spacing: 0em }
How can I fix this?

If as I think, tree is completely loaded in memory, you could try to reverse the loops. That way, you only browse the file filein once :
def f1():
with open(filein, 'rU') as f:
for line in f:
for span in tree.iterfind('//span'):
if span and span.attrib.has_key('id'):
x = span.get('id')
if "af" not in x and x in line:
print x, line

It happens because you have read all filein lines till second outer loop begin.
To make it work, you need add f.seek(0) before starting inner loop over filein:
with open(filein, 'rU') as f:
for span in tree.iterfind('//span'):
f.seek(0)
for line in f:
if span and span.attrib.has_key('id'):
x = span.get('id')
if "af" not in x and x in line:
print x, line

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python sorting html table - python

Related

Scrape "Button" tag with Selenium

How to send prettytable through email using python

Extract URL from webpage and save to disk

Display JSON data from a python file into HTML

Nested for-loop iteration stops

Categories

Resources