I am looking at scraping the below information using both selenium and bs4, and was wondering if I find the below div tag, is it possible to scrape the data inside the quotation marks? for exmaple: data-room-type-code="SUK"
<div
class="sl-flexbox room-price-item hidden-top-border"
data-room-name="Superior Shard Room"
data-bed-type="K"
data-bed-name="King"
data-pay-type-tag-filter="No Prepayment"
data-cancel-tag-filter=""
data-breakfast-tag-filter=""
data-room-type-code="SUK"
data-rate-code="ZBAR"
data-price="430"
>
<div class="room-price-basic-info">
<div class="room-price-title title-regular">Flexible Rate / CustomStay</div>
<ul class="abstract text-regular">
<li>No Prepayment</li>
</ul>
<div
class="show-detail text-btn js-show-detail"
data-index="0-productRates-0"
>
OFFER DETAILS
</div>
</div>
<div class="room-price-book-info">
<div class="number text-medium">GBP 430</div>
</div>
<div class="boot-btn text-medium js-booking-room" data-type="PRICE">
Book Now
</div>
</div>
I have a list of items. So do make it simple let's say my list is
[id, title, picturePath]
I'm trying to achieve flashing out a Bootstrap Card Deck using from Flask import Markup.
Here's my code to flash the card out.
payload = Markup(f"<div class='card'>
<div class='card-body'>
<img src='{item[2]}' class='card-img-top'>
<h5 class='card-title'>{item[1]}</h5>
<p class='card-text'>asd</p>
</div>
</div>")
flash(payload)
And here's my code in the HTML page using Jinja2
<div class="container">
<div class="card-deck">
<div class="flashes">
{% for message in get_flashed_messages()%}
{{ message|safe }}
{% endfor %}
</div>
</div>
</div>
And... This is the result I get:
However, My expected Results is:
And that expected result is done by just putting in the html elements in the html code. It suppose to be in a 3x3 grid.
It seems as if flashing out these things using Markup causes it to loose its html class or i guess the ... formating? Not sure. But Does anyone have any idea?
I need help with my main.py file, I can't manage to make my webapp to post comments and save them with the user name.
This is my main code.
main.py
import os
import jinja2
import webapp2
from google.appengine.ext import ndb
template_dir = os.path.join(os.path.dirname(__file__), 'templates')
jinja_env = jinja2.Environment(loader = jinja2.FileSystemLoader(template_dir), autoescape = True)
DEFAULT_WALL = 'Public'
def wall_key(wall_name = DEFAULT_WALL):
"""This will create a datastore key"""
return ndb.Key('Wall', wall_name)
class CommentContainer(ndb.Model):
"""This function contains the name, the content and date of the comments"""
name = ndb.StringProperty(indexed=False)
content = ndb.StringProperty(indexed=False)
date = ndb.DateTimeProperty(auto_now_add = True)
class Handler(webapp2.RequestHandler):
"""This handler process the request, manipulates data and define a response to be returned to the client"""
def write(self, *a, **kw):
self.response.out.write(*a, **kw)
def render_str(self, template, **params):
t = jinja_env.get_template(template)
return t.render(params)
def render(self, template, **kw):
self.write(self.render_str(template, **kw))
class MainPage(Handler):
def get(self):
wall_name = self.request.get('wall_name', DEFAULT_WALL)
if wall_name == DEFAULT_WALL.lower(): wall_name = DEFAULT_WALL
comments_query = CommentContainer.query(ancestor = wall_key(wall_name)).order(-CommentContainer.date)
#comments = comments_query.fetch()
self.render("content.html")
class Post(Handler):
def post(self):
wall_name = self.request.get('wall_name',DEFAULT_WALL)
comment_container = CommentContainer(parent = wall_key(wall_name))
comment_container.name = self.request.get('name')
comment_container.content = self.request.get('content')
if comment_container.content == '':
self.redirect("/error")
else:
comment_container.put()
self.redirect('/#comment_section')
class Error_Page(Handler):
"""This controls empty comments"""
def get(self):
self.render("error.html")
app = webapp2.WSGIApplication([
("/", MainPage),
("/comments", Post),
("/error", Error_Page)
],
debug = True)
This is my template content.html:
{% extends "index.html" %}
<html>
{% block content %}
<div class="container">
<div class="row header">
<div class="col-md-6">
<img class="title-photo" src="images/ed.png" alt="Photo of Ed">
</div>
<div class="col-md-6 text-right">
<h1>Eduardo González Robles.</h1>
<h2>Portfolio.</h2>
</div>
</div>
<div class="row">
<div class="col-md-12">
<hr>
</div>
</div>
<div class="row text-center">
<div class="col-md-4">
<h3 class="text-body"><u>Block vs Inline</u>
</h3>
<p class="p-text"><span>Block Elements</span> are those who take the complete line and full width of the page creating a "box".<br>
<span>Inline Elements</span> are those who doesn´t affect the layout, just the element inside the tag.
</p>
</div>
<div class="col-md-4">
<h3 class="text-body"><u>Selectors</u></h3>
<p class="p-text"><span>Class selectors</span> are used to target elements with specific attributes<br>On the other hand, <span>id selectors</span> are just for unique elements.</p>
</div>
<div class="col-md-4">
<h3 class="text-body"><u>Responsive Layout</u></h3>
<p class="p-text"><span>Responsive Layout</span> is the combination of html and css design to make the website look good in terms of enlargement, shrink and width in any screen (<em>computers, laptops, netbooks, tablets, phones</em>). </p>
</div>
</div>
<div class="row text-center">
<div class="col-md-6">
<article>
<h3><u>The Importance of Avoiding Repetition</u></h3>
<p class="p-text">For a better reading and understanding of the code, it is important not to repeat elements and group them with selectors. It makes it <span>easier to read</span> and understand, the code does not get longer and with that it´s easier to <span>find errors</span> in the syntax.</p>
</article>
</div>2
<div class="col-md-6">
<h3><u>Tree like Structure</u></h3>
<p class="p-text">All the elements in html have a <span><em>parent</em></span> element. Elements that are inside another element are called <span><em>child</em></span><br>For example, the tag <u>html</u> is the parent of all the structure, body and head are the children of html.</p>
</div>
</div>
<div class="row">
<div class="col-md-12 text-center">
<img class= "tree-image" src="http://www.w3schools.com/xml/nodetree.gif" alt="Tree Like Structure"><br>
<q class="img-quote">http://www.w3schools.com/html/html_quotation_elements.asp</q>
</div>
</div>
<div class="row">
<div class="col-md-12">
<hr>
</div>
</div>
<div class="row">
<div class="col-md-12 text-center">
<h2><span class="python"><u>Python</u></span></h2>
<p class="p-text">The reason why we cannot write in a common language in Python is beacuse computers are "stupid" and they have to take the orders from a language they understand so they can follow the exact command and execute the function correct.</p>
</div>
</div>
<div class ="row text-center">
<div class = "col-md-4">
<h3><span class="python"><u>Procedual Thinking</u></span></h3>
<p class="p-text"> It refers to be capable of understand and give clear commands to a computer so this one can understand and execute them.</p>
</div>
<div class="col-md-4">
<h3><span class="python"><u>Abstract Thinking</u></span></h3>
<p class="p-text"> It means you have the abstraction ability to avoid the unnecessary repetitions in the code.
</p>
</div>
<div class="col-md-4">
<h3><span class="python"><u>Technological Empathy</u></span></h3>
<p class="p-text">This term makes reference that you understand what is a computers, it's functions. A computer is a tool that we use to write programming languages.</p>
</div>
</div>
<div class="row text-center">
<div class="col-md-6">
<h3><span class="python"><u>System Thinking</u></span></h3>
<p class="p-text">This refers when you break a big problem into smaller problems, it makes easier to understand the big picture. Solving problems is a part of being able to system thinking. The most difficult part of solving problems is to know where to start. The very first thing you have to do is to try to understand the problem. All computer problems have two things in common, they have inputs and desired outputs. Udacity's Pythonist Guide goes like this:</p>
<ol class="ol-text" start = "0">
<li><span>Don't Panic</span></li>
<li>What are the inputs?</li>
<li>What are the outputs?</li>
<li>Solve the problem</li>
<li>Simple mechanical solution</li>
<li>Develop incrementally and test as you go</li>
</ol>
</div>
<div class="col-md-6">
<h3><span class="python"><u>Debbugging</u></span></h3>
<p class="p-text"> Action of identifying the causes that produces errors in the code. There are many strategies to identify the errors, here are 5:<br></p>
<ol class="ol-text">
<li>Analize the error messages when the programs crash.</li>
<li>Work with the example code and compare with yours.</li>
<li>Make sure that the example code works.</li>
<li>Check with the print statement that your code is working properly. Sometimes it doesn't crash but that doesn't mean that is working correct, so you have to make sure it's doing what you are commanding.</li>
<li>Save and compare your old code versions so you can back and analize them if needed.</li>
</ol>
</div>
</div>
<div class ="row text-center">
<div class="col-md-4">
<h3><span class="python"><u>Variables</u></span></h3>
<p class="p-text">Variable are used as a data storage that saves information like integers, decimals and characters. Variables give names to the values and you can change those values for new one if desire. Variables work because they increase the code reading when we use names that makes sense to humans.</p>
</div>
<div class="col-md-4">
<h3><span class="python"><u>Strings</u></span></h3>
<p class="p-text">The strings are one of the most common things in python, you have to put the word between quotes. If you start just with one quote the you have to finish with one too, same thing with 2 or 3 quotes.</p>
</div>
<div class="col-md-4">
<h3><span class="python"><u>Procedures/Functions</u></span></h3>
<p class="p-text">Are "commands" that takes data as input and transform them into an output. They help programmers to avoid repetition because once you define the procedure, you can use it forever and you don't have to write it again.
</div>
</div>
<div class="row text-center">
<div class="col-md-12">
<h3><span class="python"><u>Comparatives</u></span></h3>
<p class="p-text">They are usually Booleans so they can only mean True or False. They are used to check real vs non-real parameters.You can use them in numbers and strings. These signs are used for the comparatives: <,>,<=,>=,==,!=<br><br>If Statements are other comparatives that are involved with the conditional. The syntax in Python for an if statement is like this:<br>if [condition]:<br>[indentedStatementBlock]<br><br>There is another comparative form named IF-ELSE statement, it has two indented blocks, one for if and one for else. If activates itself when the condition is True, the else block it's activated when the condition is False. <br> if condition:<br>indentedStatementBlockForTrueCondition<br>else:<br>indentedStatementBlockForFalseCondition]<br><br>The <strong>"or"</strong> value analize one of the expressions, if the result is true, then the other expression is not evaluated.</p></p>
</div>
</div>
<div class="row text-center">
<div class="col-md-4">
<h3><span class="python"><u>Lists</u></span></h3>
<p class="p-text">Lists can contain many objects in certain order; you can access to that list and add, modify, remove objects. The difference between <strong>Strings</strong> and <strong>Lists</strong> is that lists can contain strings and supports mutation. Lists start with a bracket and end with a bracket [LIST].</p>
</div>
<div class="col-md-4">
<h3><span class="python"><u>Mutation</u></span></h3>
<p class="p-text">Mutation allows us to modify the list after we created it. There's another term named <u>Aliasing</u> and it is when two different names refer to the same object.</p>
</div>
<div class="col-md-4">
<h3><span class="python"><u>List Operations</u></span></h3>
<p class= "p-text">These are some examples of list operations</p>
<ol class="ol-text">
<li>Append.- adds new elements at the end of the list. [list].append([element]).</li>
<li>Plus (+).- Acts similar to concatenation of strings. This one produces new lists. [list1] + [list2] (result) [list1list2].</li>
<li>Length.- Indicates how many elements the list contain. <br>len ("Eduardo") = 7.</li>
</ol>
</div>
</div>
<div class= "row text-center">
<div class="col-md-6">
<h3><span class="python"><u>While Loop</u></span></h3>
<p class="p-text">The While Loops are used to repeat codes. It has a test expression followed by a block, when the expression is true then the block executes, when the expression is false, the the code skips the True block to go to the False one. On the other hand, loops can be infinite if the expression does not correspond to False.<br><u>while</u>[test expression]:<br>[True block]<br>if [test expression]:<br>[False block]</p>
</div>
<div class="col-md-6">
<h3><span class="python"><u>For Loop</u></span></h3>
<p class="p-text">For loops are easier to use because you need to write less code than the while loops. They are written like this: <br><u>for</u> [name] <u>in</u> [list]:<br>[block]<br>The Loop goes through each element of the list and evaluates the block. In other words, for loop is used to repeat an action/sentence certain number of times within a range. We can use For Loops on list to, syntax looks like this:<br>for [name] in [list]:<br>[block]
</div>
</div>
<div class="row text-center">
<div class="col-md-4">
<h3><span class="python"><u>Find</u></span></h3>
<p class ="p-text"> Find is a method not an operand because it's a process created by Python. It helps you to find strings in strings. Syntax:<br>str.find(str, beg=0 end=len(string))</p>
</div>
</div>
<div class="row">
<div class="col-md-12">
<hr>
</div>
</div>
<div class="row text-center">
<div class="col-md-12">
<h2><span class="stage-3"><u>Stage 3: Create a Movie Website</u></span></h2>
</div>
</div>
<div class = "row text-center">
<div class = "col-md-4">
<h3><span class="stage-3"><u>import System</u></span></h3>
<p class = "p-text"> The import System in Python is used to bring codes to a desired file, you can import time, webbrowser, turtle, even packages outside the Pyhton Standard Library like Twilio or frameworks like fresh tomatoes</p>
</div>
<div class = "col-md-4">
<h3><span class="stage-3"><u>Built-in Functions</u></span></h3>
<p class = "p-text"> In the stage 3 of IPND we saw different types of methods like de open method that opens files inside files or websites. <br>The rename method renames files and/or directories src indicates de name of the actual file or directory, dst indicates the new name of the file or directory. <br> The translate method allows you to put 2 arguments.</p>
</div>
<div class = "col-md-4">
<h3><span class="stage-3"><u>Python Class-Intances/Objects</u></span></h3>
<p class = "p-text"> Classes in Python are like building blueprints, they contain information that creates instances. Instances or Objects are examples of a Class. Another way to describe a Class would be as a box that contains sorted files that access to different files, once a Class enter to it's files, it let us use all the functions that those files contain.</p>
</div>
</div>
<div class = "row text-center">
<div class = "col-md-3">
<h3><span class="stage-3"><u>Constructor</u></span></h3>
<p class = "p-text"> When we create instances, we invoke the constructor method init inside the class, it is here where all the data asociated with the instance starts.</p>
</div>
<div class = "col-md-3">
<h3><span class="stage-3"><u>self</u></span></h3>
<p class = "p-text"> The constructor uses the keyword "self" to access to the instance attribute.</p>
</div>
<div class = "col-md-3">
<h3><span class="stage-3"><u> Instance Variable</u></span></h3>
<p class = "p-text"> All the variables asociated with an specific instance are called Instance Variables, they are unique to the object and you access to them using the kew word slef inside the class and the instance name outside the class.</p>
</div>
<div class = "col-md-3">
<h3><span class="stage-3"><u> Instance Method</u></span></h3>
<p class = "p-text"> The instance method are all the functions inside the class asociated with the with the instances and that have self as their first argument.</p>
</div>
</div>
<div class="row footer">
<div class="col-md-6 text-left">
<p>Walnut Creek Ca.
94596<br>
USA.</p>
</div>
<div class="col-md-6 text-right">
gonzandrobles#gmail.com
</div>
</div>
<div class="lesson">
<div class="concept" id="comment_section">
<div class="concept-title">Write your comments below thanks!!
</div>
<br>
<form align="center" action="/comments" method="post">
<label>Your name:
<div>
<textarea name="name" rows="1" cols="50"></textarea>
</div>
</label>
<label>Your comment:
<div>
<textarea name="content" rows="5" cols="100"></textarea>
</div>
</label>
<div>
<input type="submit" value="Post Your Comment">
</div>
</form>
<div class="part-concept">
<div class="part-concept-title">
Previous Comments:
</div>
<br>
{% for comment in comments %}
{% if comment.content != '' %}
{% if comment.name != '' %}
<b>{{ comment.name }}</b> wrote:
{% else %}
<b>Anonymous</b> wrote:
{% endif %}
<blockquote>{{ comment.content }}</blockquote>
{% endif %}
<div>{{ error }}</div>
{% endfor %}
</div>
</div>
</div>
{% endblock %}
</html>
Here is my template error.html:
{% extends "index.html" %}
<html>
{% block content %}
<div>
<span>Please, add a comment.</span>
<button = onclick="goBack()">Return to comments</button>
<script>
function goBack() {
window.history.back();
}
</script>
</div>
{% endblock %}
</html>
And finally my index.html:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Portfolio GonzandRobles</title>
<link href='http://fonts.googleapis.com/css?family=Amatic+SC' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="http://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css">
<link rel="stylesheet" href="css/main.css">
</head>
<body>
{% block content %}
{% endblock %}
</body>
</html>
Im also having problems deploying my app with google app engine, it says that my app name doesn't exist.
Thank you very much.
You need to pass the comments query into your template like this:
class MainPage(Handler):
def get(self):
wall_name = self.request.get('wall_name', DEFAULT_WALL)
if wall_name == DEFAULT_WALL.lower(): wall_name = DEFAULT_WALL
comments_query = CommentContainer.query(ancestor = wall_key(wall_name)).order(-CommentContainer.date)
#comments = comments_query.fetch()
self.render("content.html", comments = comments_query)
This makes comments available to iterate through as you are in your template.
As for your deployment problem, open app.yaml and look at the application: field of your project. It needs to be exactly the same as the one you created online (Project ID) in the Google Developer Console. I'm going to guess that the names don't match or, you didn't create the app online yet.
I am trying to scrape the details like Contact, Location, Phone and Rate. The html is as below. The list is a dynamic one so sometimes only few of the items like Contact and Location may appear on the page while sometimes all of them can appear. I am thinking I can use the icon tag to get the required text but am unable to find any documentation on this. Any help would be highly appreciated.
Thanks in advance.
<div class="detail-all-label">
<i class="abc-Contact"></i>
<div class="detail-all-text"><b>Contact</b>: Ram Bahadur</div>
</div>
<div class="detail-all-label">
<i class="abc-font abc-Location"></i>
<div class="detail-all-text"><b>Location</b>: Kathmandu</div>
</div>
<div class="detail-all-label">
<i class="abc-font abc-Website"></i>
<div class="detail-all-text"><b>Website</b>: itworkremotely</div>
</div>
<div class="detail-all-label">
<i class="abc-font abc-Phone"></i>
<div class="detail-all-text"><b>Phone</b>: 3283550121</div>
</div>
<div class="detail-all-label">
<i class="abc-font abc-Rate"></i>
<div class="detail-all-text"><b>Rate</b>: €700 - 10000</div>
</div>
You can get all of the detail values that have a preceding b element inside the div with class="detail-all-text":
for detail in response.xpath("//div[#class='detail-all-text']/b"):
name = detail.xpath("text()").extract()[0]
value = detail.xpath("following-sibling::text()")[0]
print name, value
So as the title states I have some HTML code from http://chem.sis.nlm.nih.gov/chemidplus/name/acetone that I am parsing and want to extract some data like the Acetone under MeSH Heading from my similar post How to set up XPath query for HTML parsing?
<div id="names">
<h2>Names and Synonyms</h2>
<div class="ds">
<button class="toggle1Col" title="Toggle display between 1 column of wider results and multiple columns.">↔</button>
<h3>Name of Substance</h3>
<div class="yui3-g-r">
<div class="yui3-u-1-4">
<ul>
<li id="ds2">
<div>2-Propanone</div>
</li>
</ul>
</div>
<div class="yui3-u-1-4">
<ul>
<li id="ds3">
<div>Acetone</div>
</li>
</ul>
</div>
<div class="yui3-u-1-4">
<ul>
<li id="ds4">
<div>Acetone [NF]</div>
</li>
</ul>
</div>
<div class="yui3-u-1-4">
<ul>
<li id="ds5">
<div>Dimethyl ketone</div>
</li>
</ul>
</div>
</div>
<h3>MeSH Heading</h3>
<ul>
<li id="ds6">
<div>Acetone</div>
</li>
</ul>
</div>
</div>
Previously in other pages I would do mesh_name = tree.xpath('//*[text()="MeSH Heading"]/..//div')[1].text_content() to extract the data because other pages had similar structures, but now I see that is not the case as I didn't account for inconsistency. So, is there a way of after going to the node that I want and then obtaining it's subchild, allowing for consistency across different pages?
Would doing tree.xpath('//*[text()="MeSH Heading"]//preceding-sibling::text()[1]') work?
From what I understand, you need to get the list of items by a heading title.
How about making a reusable function that would work for every heading in the "Names and Synonyms" container:
from lxml.html import parse
tree = parse("http://chem.sis.nlm.nih.gov/chemidplus/name/acetone")
def get_contents_by_title(tree, title):
return tree.xpath("//h3[. = '%s']/following-sibling::*[1]//div/text()" % title)
print get_contents_by_title(tree, "Name of Substance")
print get_contents_by_title(tree, "MeSH Heading")
Prints:
['2-Propanone', 'Acetone', 'Acetone [NF]', 'Dimethyl ketone']
['Acetone']