Get download URL from html source to download file from content-disposition - python

I'm trying to download a file with Python from a site. The issue is the download automatically starts after submitting the form on the page. Using Mechanize, I am able to log in, get to the page where the download lives, fill out the form, and submit the form (which kicks off the download of an xls file).
Looking in content-disposition, I can see attachment name:
attachment {'filename': 'policytransactions.xls'}
but I cannot figure out how to download this file locally.
Looking at the page source, I can see that the answer to my question is somewhere in here:
<td><div id="form1:j_idt37" class="ui-datatable ui-widget dataTable"><table role="grid"><thead><tr role="row"><th id="form1:j_idt37:j_idt38" class="ui-state-default" role="columnheader"><div class="ui-dt-c"><span></span></div></th></tr></thead><tfoot></tfoot><tbody id="form1:j_idt37_data" class="ui-datatable-data ui-widget-content"><tr data-ri="0" class="ui-widget-content ui-datatable-even" role="row"><td role="gridcell"><div class="ui-dt-c">
<script type="text/javascript" src="/policy/app/javax.faces.resource/jsf.js?ln=javax.faces"></script>
<span class="outputText">XLS</span></div></td></tr></tbody></table></div><script id="form1:j_idt37_s" type="text/javascript">PrimeFaces.cw('DataTable','widget_form1_j_idt37',{id:'form1:j_idt37'});</script></td>
<td><table>
<tbody>
<tr>
<td><span id="form1:dateField3"><input id="form1:dateField3_input" name="form1:dateField3_input" type="text" value="04/01/2017" class="ui-inputfield ui-widget ui-state-default ui-corner-all" /></span><script id="form1:dateField3_s" type="text/javascript">$(function(){PrimeFaces.cw('Calendar','widget_form1_dateField3',{id:'form1:dateField3',popup:true,locale:'en_US',dateFormat:'mm/dd/yy',defaultDate:'04/01/2017'});});</script></td>
<td><span id="form1:dateField4"><input id="form1:dateField4_input" name="form1:dateField4_input" type="text" value="04/28/2017" class="ui-inputfield ui-widget ui-state-default ui-corner-all" /></span><script id="form1:dateField4_s" type="text/javascript">$(function(){PrimeFaces.cw('Calendar','widget_form1_dateField4',{id:'form1:dateField4',popup:true,locale:'en_US',dateFormat:'mm/dd/yy',defaultDate:'04/28/2017'});});</script></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<input type="hidden" name="javax.faces.ViewState" id="javax.faces.ViewState" value="e2s1" />
</form>
</div>
Any suggestions on how to grab this? Thanks

Related

Selenium can't find element by name or id (Python)

Consider:
from selenium import webdriver
import os
from selenium.webdriver import chrome
driver = webdriver.Chrome()
driver.get("http://nmcunited.me.showenter.com/%D7%9C%D7%94-%D7%9C%D7%94-%D7%9C%D7%A0%D7%93.html")
driver.implicitly_wait(15)
christmasTag = driver.find_element_by_id('f_2561406_1')
christsmasTag.click()
driver.close()
I used the Python code above in order to practice on some basic Selenium operations.
I couldn't find any element, neither by name nor by id. No matter what I tried, I always got the same error, which stated that the element I'm looking for doesn't exist (the idea was to press one of the buttons, if it matters...).
This is a part of the HTML code of the website:
<!-- Main page -->
<td
valign="top">
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<!-- Page Content -->
<!-- Page Content -->
<tr>
<td valign="top">
<table border="0" width="100%" cellpadding="3" cellspacing="0">
<tr>
<!-- The form -->
<td valign="top">
<form id="form2" name="form2" method="post" action="/site/form/sendForm.asp" OnSubmit="return CheckSubmit();" enctype="multipart/form-data">
<!-- Form Field List -->
<table id='sop_form_594602' class='sop_all_form' align="center" border="0" width="100%" cellpadding="3" cellspacing="0"
>
<tr>
<td style="color:;" class="changeText14">
<span style="font-weight:bold;">באיזה חג מפוטר ראין גוסלינג?</span><br>
<input type="radio" name="f_2561406" id="f_2561406_0" value="1. חג הפסחא" class="rachbox" checked>
<label for="f_2561406_0">1. חג הפסחא</label>
<br>
<input type="radio" name="f_2561406" id="f_2561406_1" value="2. חג המולד" class="rachbox">
<label for="f_2561406_1">2. חג המולד</label>
<br>
<input type="radio" name="f_2561406" id="f_2561406_2" value="3. סליבסטר" class="rachbox">
<label for="f_2561406_2">3. סליבסטר</label>
<br>
<input type="radio" name="f_2561406" id="f_2561406_3" value="4. חג ראש השנה" class="rachbox">
<label for="f_2561406_3">4. חג ראש השנה</label>
<br>
</td>
</tr>
This is the said website (broken link)
By the way, this is the first program that uses Selenium I'm trying to run. Is there a possibility that I need to change some settings or that I need to install something else?
The items you are trying to find are inside of an iframe. You need to switch the context of the webdriver to the frame first.
from selenium import webdriver
import os
from selenium.webdriver import chrome
driver = webdriver.Chrome()
driver.get("http://nmcunited.me.showenter.com/%D7%9C%D7%94-%D7%9C%D7%94-%D7%9C%D7%A0%D7%93.html")
driver.implicitly_wait(15)
frame = driver.find_element_by_css_selector('div.tool_forms iframe')
driver.switch_to.frame(frame)
christmasTag = driver.find_element_by_id('f_2561406_1')
christmasTag.click()
driver.close()
Verify whether the "id" is getting changed and generated dynamically by refreshing the page. If it is the case then use the below XPath expression to find the element:
"//input[#value='2. חג המולד']"

python-selenium returning element is not interactable error

I am using selenium-python binding. I am getting the following error while trying to select and manipulate an element. (using Chromedriver)
Message: invalid element state: Element is not currently interactable and may not be manipulated
I think the element is successfully selected with the following syntax: but I cannot manipulate it with, for example, clear() or send_keys("some value"). I would like to fill the text area, but I cannot make it work. If you have experienced similar problems, please share your thought. Thank you.
UPDATE: I noticed html is changing as I manually type to style="display: none" that might be a reason for this error. Modified the code below. Can you please point out any solution?
driver.find_element(by='xpath', value="//table[#class='input table']//input[#id='gwt-debug-url-suggest-box']")
or
driver.find_element(by='xpath', value="//input[#id='gwt-debug-url-suggest-box']")
or
driver.find_element_by_id("gwt-uid-47")
or
driver.find_element(by='xpath', value="//div[contains(#class, 'sppb-b')][normalize-space()='www.example.com/page']")
Here is the html source code:
<div>
<div class="spH-c" id="gwt-uid-64"> Your landing page </div>
<div class="spH-f">
<table class="input-table" width="100%">
<tbody>
<tr>
<td class="spA-e">
<div class="sppb-a" id="gwt-uid-47">
<div class="sppb-b spA-b" aria-hidden="true" style="display: none;">www.example.com/page</div>
<input type="text" class="spC-a sppb-c" id="gwt-debug-url-suggest-box" aria-labelledby="gwt-uid-64 gwt-uid-47" dir="">
</div>
<div class="error" style="display:none" id="gwt-debug-invalid-url-error-message" role="alert"> Please enter a valid URL. </div>
</td>
<td class="spB-b">
<div class="spB-a" aria-hidden="true" style="display: none;"></div>
</td>
</tr>
</tbody>
</table>
</div>
</div>
Have you tried selecting by:
driver.find_element_by_id("gwt-debug-url-suggest-box")
driver.send_keys("Your input")
This way you are selecting the input directly.
Anyway,the link to the page would help.

Python: Log in to a website using Requests module

I am trying to use Requests module to login into a site and get the html of the landing page. I am new to these stuff and I can't find a decent tutorial for this.
Here's the information that I have about that page
HTML of the form for login (url:http://14.139.251.99:8080/jopacv06/html/checkouts)
<FORM NAME="form" METHOD="POST" ACTION="./memberlogin" onsubmit="this.onsubmit= function(){return false;}">
<table class='loginTbl' border='1' align="center" cellspacing='3' cellpadding='3' width='60%'>
<input type="hidden" name="hdnrequesttype" value="1" />
<thead>
<tr>
<td colspan='3' align="middle" class='loginHead'>Login</td>
</tr>
</thead>
<tbody class='loginBody'>
<tr>
<td class='loginBodyTd1' nowrap="nowrap">Employee ID</td>
<td class='loginBodyTd2'><input type='text' name='txtmemberid' id='txtmemberid' value='' class='loginTextBox' size='30' maxlength='8'/></td>
<td class='loginBodyTd3' rowspan='2'><input type="submit" class="goclearbutton" value=" Go "></td>
</tr><input type='hidden' name='txtmemberpwd' id='txtmemberpwd' value='' />
</tbody>
<tfoot>
<tr>
<td colspan='3' class='loginFoot'>
<font class='loginRed'>New Visitor?</font>
Send your registration request to library !
</td>
</tr>
</tfoot>
</table>
</form>
I came to know that I may need to set cookie , so the cookie name in the landing page is JSESSIONID(in case that's reqd). And I discovered that once I successfuly log in then I would have to use beautifulSoup to get the details. Please help me how to combine these pieces together.
You will have to do something like this,
import requests
response = requests.post("http://14.139.251.99:8080/jopacv06/html/checkouts/memberlogin", data = {'txtmemberid': '1'})
if response.status_code == 200:
html_code = response.text
// Do whatever you want to do further with this HTML now.

python + flask passing data from html back to python script (specifically the value of a label)

I am using python and flask to make a web app. I am new to it, but have gotten most of what I am trying to accomplish done. Where I am stuck, is that I have a label whose value is a python variable( {{id}} ) This id is the id of a row I need to update in a sqlite database. My code is below. when I click the approve button, it takes me to a route which does the update query, but I have no way to pass the {{id}} with it. This would have been much easier if I could have just used javascript for the update query, but everything I've found using javascript, is for web sql, not sqlite, even though some of them say they are for sqlite.
</script>
<table border='1' align="center">
{% for post in posts %}
<tr>
<td>
<label>{{post.id}}</label>
<h1 id ='grill1'>{{post.photoName}}</h1>
<span>
<img id = 'photo1' src='{{post.photo}}' alt="Smiley face" height="200" width="200">
</span><br>
<h5 id ='blurb1'>
{{post.blurb}}
</h5>
<br>
<div style=" padding:10px; text-align:center;">
<input type="button" value="Approve" name="approve" onclick="window.location='/approve;">
<input type="button" value="Deny" onclick="window.location='/deny';"><br>
</div>
</td>
</tr>
{% endfor %}
</table>
Why not just do:
...
<input type="button" value="Approve" name="approve" onclick="window.location='/approve/{{post.id}};">
<input type="button" value="Deny" onclick="window.location='/deny/{{post.id}}';">
...
Then your flask route for approve and / or deny can just take a parameter for the post to approve or deny. i.e.:
#app.route("/approve/<int:post_id>")
def approve(post_id):
"""approve this post!"""

Passing html tables to a new page in Google App Engine

I tried to convert app engine generated output page into pdf, and had some problems.
First: I select the contents in jQuery.
Second: Send this javascript variable to a new python script
Third: In the new python script, using xhtml2pdf to the conversion.
However, I got confused in the Second step. Below is my approach:
HTML:
<div class="articles">
<h2 class="model_header">PFAM Output</h2>
<form>
<table align="center">
<!--end 04uberoutput_start-->
<table class="out_chemical" width="550" border="1">
<tr>
<th scope="col" colspan="5">
<div align="center">Chemical Inputs</div>
</th>
</tr>
<tr>
<th scope="col" width="250">
<div align="center">Variable</div>
</th>
<th scope="col" width="150">
<div align="center">Unit</div>
</th>
<th scope="col" width="150">
<div align="center">Value</div>
</th>
</tr>
<tr>
<td>
<div align="center">Water Column Half life #20 &#8451</div>
</td>
<td>
<div align="center">days</div>
</td>
<td>
<div align="center">11</div>
</td>
</tr>
</table>
</table>
</form>
</div>
JS
$(document).ready(function () {
var jq_html = $("div.articles").html();
console.log(jq_html);
$('.getpdf').append('<tr style="display:none"><td><input name="extract" value="' + jq_html + '"></input></td></tr>');
$('.getpdf').append('<tr><td><input type="submit" value="Generate PDF"/></td></tr>');
})
new python script to do the conversion
def post(self):
form = cgi.FieldStorage()
extract = form.getvalue('extract')
print extract
self.response.out.write(html)
When I tried to check if variable extract is transferred correctly, I got an empty page. It seems like this variable is ignored... The whole framework seems fine if I feed extract with a number. So could anyone help me to identify if my approach is correct? Thanks!
This line of code does not handle escaping HTML correctly. Additionally, it is a text field rather than a hidden field:
$('.getpdf').append('<tr style="display:none"><td><input name="extract" value="' + jq_html + '"></input></td></tr>');
A better way to do it would be like this:
$('<tr style="display:none"><td><input type="hidden" name="extract"></td></tr>')
.appendTo('.getpdf')
.find('input')
.val(jq_html);

Categories

Resources