Is there a way to get a value usuing beautifulsoup - python

I'm trying to add all the values from html table into the list. Figured out the way to do it here using
soup.find_all('a'), it gives me CSCI 101
<td><font size="-1" face="Verdana" color="#000080">CSCI 101</font></td>
Now I need to do same thing here. Need to get the number 22481, but I couldn't find the way to do so.
<input type="hidden" name="sel_term" value="202120">
<input type="hidden" name="del_crn" value="00000">
<input type="hidden" name="save_crn" value="">
<td><input type="submit" name="sel_crn" value="22481" style="background-color:transparent;cursor:hand;border:none;color:#8A2BE2"></td>
Any ideas?

soup.find('input', {'type': 'submit'})['value']

Related

How to parse a html string using python scrapy

I have a list of html input elements as below.
lists=[<input type="hidden" name="csrf_token" value="jZdkrMumEBeXQlUTbOWfInDwNhtVHGSxKyPvaipoAFsYqCgRLJzc">,
<input type="text" class="form-control" id="username" name="username">,
<input type="password" class="form-control" id="password" name="password">,
<input type="submit" value="Login" class="btn btn-primary">]
From these I need to extract the attribute values of name, type, and value
For eg:
Consider the input <input type="hidden" name="csrf_token" value="jZdkrMumEBeXQlUTbOWfInDwNhtVHGSxKyPvaipoAFsYqCgRLJzc">
then I need output as following dictionary format
{'csrf_token':('hidden',"jZdkrMumEBeXQlUTbOWfInDwNhtVHGSxKyPvaipoAFsYqCgRLJzc")}
Could anyone please a guidance to solve this
I recommend you to use the Beautiful Soup Python library (https://pypi.org/project/beautifulsoup4/) to get the HTML content and the values of the elements. There are functions already created for that purpose.

Need a generic xpath for the following html code

Following is the HTML code for which I need a unique XPath.
Firebug gives me a xpath like
.//*[#id='service_dlg']/form/p[4]/span/input[1] [For Essential]
.//*[#id='service_dlg']/form/p[4]/span/input[2] [For Enhanced] etc.
I need something like [#name = 'Essential'] so that I need not write multiple xpaths in my code. I want to pass values like essential, enhanced and premium from a function.
<p>
<label style="vertical-align:top">Feature Pack:</label>
<span style="display:inline-block">
<input name="pack" value="Essential" type="radio">
<label class="feature">Essential</label>
<input name="pack" value="Enhanced" type="radio">
<label class="feature">Enhanced</label>
<input name="pack" value="Premium" type="radio">
<label class="feature">Premium</label>
<br>
<input name="featurepack" value="u" type="checkbox">
<label class="feature">URL Filtering</label>
<br>
<input name="featurepack" value="t" type="checkbox">
<label class="feature">Threat Prevention</label>
<br>
<input name="featurepack" value="w" type="checkbox">
<label class="feature">Wildfire</label>
<br>
</span>
try as follows for Enhanced:
//input[#value='Enhanced']
To make it generic and receive value as a function parameter, try as follows:
driver.find_element_by_xpath("//input[#value='%s']" % value) # where value is function parameter, can be Essential, Enhanced or Premium

Keep text from escaping / add watchers to Jira ticket

So, I've hacked this together from a few sources, so if I'm totally going about it the wrong way I welcome feed back. It also occurs to me that this is not possible, as it's probably a security check designed to prevent this behavior being used maliciously.
But anyway:
I have a form on our Django site where people can request to change the name of one of our items, which should automatically create a jira ticket. Here's the form:
<form target="_blank" action='http://issues.dowjones.net/secure/CreateIssueDetails!init.jspa' method='get' id='create_jira_ticket_form'>
<a id='close_name_change_form' class="close">×</a>
<label for="new_name">New name: </label>
<input id="new_name" type="text" name="new_name" value="{{item.name}}">
<input type="hidden" value="10517" name="pid">
<input type="hidden" value="3" name="issuetype">
<input type="hidden" value="5" name="priority">
<input type="hidden" value="Change name of {{item.name}} to " name="summary" id='summary'>
<input type="hidden" value="{{request.user}}" name="reporter">
<input type="hidden" value="user123" name="assignee">
<input type="hidden" value="" name="description" id="description">
<input id='name_change_submit' class="btn btn-primary btn-sm" type="submit" value="Create JIRA ticket">
</form>
Then I have a little JS to amend the fields with the new values:
$(document).ready(function(){
$('#create_jira_ticket_form').submit(function(){
var watchers = ' \[\~watcher1\] \[\watcher2\]';
var new_name = $('#new_name').val();
var summary = $('#summary').val();
$('#summary').val(summary + new_name);
$('#description').val(summary + new_name + watchers);
})
})
It comes very close to working, but the description field is escaped, leaving it looking like:
Change name of OLDNAME to NEWNAME %5B%7Ewatcher1t%5D %5B%7Ewatcher2%5D
Which is less than helpful. How can I keep it as is so I can add watchers?
This happens when your form encodes the fields and values in your form.
You can try this out by this simple snippet:
console.log($('form').serialize());
you should see something like
description=ejdd+%5B~watcher1%5D+%5Bwatcher2%5D
in order to prevent this you should change your method='get' to method='post'.
The encoding happens because it's apart of HTTP, read here why
You can also read the spec paragraph
17.13.3 Processing form data

python - xPath syntax for second occurence

<input name="utf8" type="hidden" value="✓" />
<input name="ohboy" type="hidden" value="I_WANT_THIS" />
<label for="user_email">Email</label>
<input class="form-control" id="user_email" name="user[email]" size="30" type="email" value="" />
I'm kinda stuck here, I was originally going to use find() instead of xpath() because the tag input is in several places in the source, but i figured out that find() only returns the first occurence in the source
Use find(), passing the xpath expression specifying an integer index of an element:
from lxml.html import fromstring
html_data = """<input name="utf8" type="hidden" value="✓" />
<input name="ohboy" type="hidden" value="I_WANT_THIS" />
<label for="user_email">Email</label>
<input class="form-control" id="user_email" name="user[email]" size="30" type="email" value="" />"""
tree = fromstring(html_data)
print tree.find('.//input[2]').attrib['value']
prints:
I_WANT_THIS
But, even better (and cleaner) would be to find the input by name attribute:
print tree.find('.//input[#name="ohboy"]').attrib['value']

Handle Multiple Checkboxes with a Single Serverside Variable

I have the following HTML code:
<form method="post">
<h5>Sports you play:</h5>
<input type="checkbox" name="sports_played" value="basketball"> basketball<br>
<input type="checkbox" name="sports_played" value="football"> football<br>
<input type="checkbox" name="sports_played" value="baseball"> baseball<br>
<input type="checkbox" name="sports_played" value="soccer"> tennis<br>
<input type="checkbox" name="sports_played" value="mma"> MMA<br>
<input type="checkbox" name="sports_played" value="hockey"> hockey<br>
<br>
<input class="btn" type="submit">
</form>
And then ideally I would like to have the following python serverside code:
class MyHandler(ParentHandler):
def post(self):
sports_played = self.request.get('sports_played')
#sports_played is a list or array of all the selected checkboxes that I can iterate through
I tried doing this by making the HTML sports_played name and array, sports_played[], but that didn't do anything and right now it just always returns the first selected item.
Is this possible? Really I just don't want to have to do a self.request.get('HTML_item') for each and every checkbox incase I need to alter the HTML I don't want to have to change the python.
Thanks!
The answer is shown in the webapp2 docs for the request object:
self.request.get('sports_played', allow_multiple=True)
Alternatively you can use
self.request.POST.getall('sports_played')
The name of the inputs should have [] at the end so that they are set to the server as an array. Right now, your multiple checkboxes are being sent to the server as many variables with the same name, so only one is recognized. It should look like this:
<form method="post">
<h5>Sports you play:</h5>
<input type="checkbox" name="sports_played[]" value="basketball"> basketball<br>
<input type="checkbox" name="sports_played[]" value="football"> football<br>
<input type="checkbox" name="sports_played[]" value="baseball"> baseball<br>
<input type="checkbox" name="sports_played[]" value="soccer"> tennis<br>
<input type="checkbox" name="sports_played[]" value="mma"> MMA<br>
<input type="checkbox" name="sports_played[]" value="hockey"> hockey<br>
<br>
<input class="btn" type="submit">
</form>
Now, if you select more than one, the values will be sent as an array.
Although this answer is not related to this question, but it may help all the django developers who is walking here and there.
In Django request.POST is a QueryDict object. So you can get all the values as list by following way
request.POST.getlist('sports_played')
N.B: This only works in Django

Categories

Resources