Python request.get not getting all <div>

Python request.get not getting all <div> - python

I am making a get request to https://racing.appledaily.com.hk/race-day/race-position?raceDay=1865&race=15632 with a chrome's developer page, i can see a full html page. However when i make a get request, python only returns part of the html.
HTML:
<html class="gr__racinng_applledaily_com_hk" style='overflow: initial;">
<head> ... </head>
<body data-gr-c-s-loaded="true">
<!-- Google Tag Mananger (noscript) -->
<noscript> ...</noscript>
<!-- End Google Tag Mananger (noscript) -->
<div data-v-6223d6a8 id="app" class="web"> ... </div>
</body>
</html>
the <div data-v-6223d6a8 id="app" class="web"> ... </div> part is missing
Code Used :
content = request.get('https://racing.appledaily.com.hk/race-day/race-position?raceDay=1865&race=15632')

Request gets the HTML from the source. When you use the inspector you see the full HTML because your browser renders the rest of the HTML. To get the full HTML look into Scrapy Splash or Selenium.

Related

How do you input and output text with Pyscript?

I’m learning py-script where you can use <py-script></py-script> in an HTML5 file to write Python Code. As a python coder, I would like to try web development while still using python, so it would be helpful if we could output and input information using py-script.
For example, could someone explain how to get this function to work:
<html>
<head>
<link rel="stylesheet" href="https://pyscript.net/alpha/pyscript.css" />
<script defer src="https://pyscript.net/alpha/pyscript.js"></script>
</head>
<body>
<div>Type an sample input here</div>
<input id = “test_input”></input>
<-- How would you get this button to display the text you typed into the input into the div with the id, “test”--!>
<button id = “submit-button” onClick = “py-script-function”>
<div id = “test”></div>
<div
<py-script>
<py-script>
</body>
</html
I would appreciate it and I hope this will also help the other py-script users.

I checked source code on GitHub and found folder examples.
Using files todo.html and todo.py I created this index.html
(which I tested using local server python -m http.server)
Some elements I figured out because I have some experience with JavaScript and CSS - so it could be good to learn JavaScript and CSS to work with HTML elements.
index.html
<!DOCTYPE html>
<html>
<head>
<!--<link rel="stylesheet" href="https://pyscript.net/alpha/pyscript.css" />-->
<script defer src="https://pyscript.net/alpha/pyscript.js"></script>
</head>
<body>
<div>Type an sample input here</div>
<input type="text" id="test-input"/>
<button id="submit-button" type="submit" pys-onClick="my_function">OK</button>
<div id="test-output"></div>
<py-script>
from js import console
def my_function(*args, **kwargs):
#print('args:', args)
#print('kwargs:', kwargs)
console.log(f'args: {args}')
console.log(f'kwargs: {kwargs}')
text = Element('test-input').element.value
#print('text:', text)
console.log(f'text: {text}')
Element('test-output').element.innerText = text
</py-script>
</body>
</html>
Here screenshot with JavaScript console in DevTool in Firefox.
It needed longer time to load all modules
(from Create pyodine runtime to Collecting nodes...)
Next you can see outputs from console.log().
You may also use print() but it shows text with extra error writing to undefined ....

An alternative to way to display the output would be to replace the
Element('test-output').element.innerText = text
by
pyscript.write('test-output', text)

Head element only renders on first page load

I have a strange issue where with my very first flask app, and that is for my index.html template, the head tag only renders the first time I visit the page. As soon a I do a hard refresh, it doesn't render. My template is:
<html>
<head>
<link href="{{ url_for('static', filename='css/bootstrap.min.css') }}" rel="stylesheet" media="screen">
</head
<body>
<p>{{name}}</p>
<p>Python versions:</p>
{% for ver in example_list %}
<p>{{ver}}</p>
{% endfor %}
<span class="glyphicon glyphicon-search"></span>
</body>
</html>
I suspect this has something to do with caching, but not sure. Why does the page load properly the first time, each time. I have do open a new tab though, just visiting the same URL again on the same tab also reveals zero presence of the head tag.
For bonus points, why does my glyphicon span not render on any page load?

Python web scraped HTML code do not show login form

I am trying to login a website using Python, but in the html source code which I got for urllib doesn't contain a form to login and I have checked by chrome, it also shows the same html code.
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>cApexWEB 1.1</title>
</head>
<frameset border=false frameborder=0 framespacing=0>
<frameset>
<frame name="main" src="capexmain_middle.htm" scrolling="no" target="_top">
</frameset>
<noframes>
<body>
<p>This page uses frames, but your browser doesn\'t support them.</p>
</body>
</noframes>
</frameset>
</html>

Some modern websites use dynamic content loading to load login forms from external authentication services. You may use Selenium to simulate a browser and bypass this problem. You find a detailed introduction and explanation here.

django templates displaying the html tags as it is

below is the index.html file inside my workspace/projectname/templates/appname
<!DOCTYPE html>
<html>
<head>
<title>my news</title>
</head>
<body>
<h1>look below for news</h1>
{%if categories%}
<ul>
{%for category in categories%}
<li>{{category.name}}</li>
{%endfor%}
</ul>
{%endif%}
{%if headings%}
<p>
{%for heading in headings%}
{{heading.title}}
<br>
{{heading.content}}
{%endfor%}
</p>
{%endif%}
</body>
</html>
the problem is <ul> and <li> tags are working and displaying the list as it should do.the <a> tag is also displaying a hyperlink,but the <p> tag and <br> tags are not being rendered and are being displayed as a text,cant think what might be the problem.i am fairly new to django.

Try using {{heading.content|safe}} or turn autoescape off (See docs).

Although, the other answer accurately solves your problem, but that approach is not safe everytime.
If you know that only trustworthy people are going to write that article/post, then you can simply turn Django's autoescaping off (as pointed in the other answer).
But if you want to display HTML from an untrustworthy source, you are prone to XSS attacks. In that case you should use applications like django-bleach. It will escape specific HTML tags like <script> and any other tags that you want to escape.

Dynamically generated webpage scraping

I'm trying to build a parser which can download a data from web page. The problem is that the page is probably "dynamically generated". There is some code in curly brackets which generates html code probably. It seems like Django code.
Here is a pattern:
<script charset="utf-8" type="text/javascript">var browseDefaultColumn = 4; var browse5ColumnLength= '15,24'; var browse4ColumnLength = '20,28'; var browse3ColumnLength = '25,42';var priceFilterSliderEnabled = true;var browseLowPageLength = 24;var browseHighPageLength = 100;</script>
<script id="products-template" type="text/template">
{{#products}}
<li class="{{RowCssClass}}" style="{{RowStyle}}" li-productid="{{ItemCode}}">
<div class="s-productthumbbox">
<div class="productimage s-productthumbimage col-xs-6 col-sm-12 col-md-12">
<a href="{{PrdUrl}}" class="s-product-sache">{{#ImgSashVisible}}
<img src="{{ImgSashUrl}}" class="rtSashImg img-responsive">
{{/ImgSashVisible}}
</a>
<a href="{{PrdUrl}}" class="ProductImageList">
<div>
<img class="rtimg img-responsive" src='{{MainImage}}' alt='{{Brand}} {{DisplayName}}' />
</div>
{{#EnableAltImages}}
<div class="AlternateImageContainerDiv">
<img class="rtimg ProductImageListAlternateImage img-responsive" src='{{AltImage}}' alt='{{Brand}} {{DisplayName}}' />
</div>
{{/EnableAltImages}}
</a>
<div class="QuickBuyAndWishListContainerDiv hidden-xs {{QuickBuyAndWishListCss}}">
{{#IsQuickBuyEnabled}}
I'm looking for a way how to get the whole code containing generated code so I can parse it for example using Beautiful Soup. Or other efficient way to get the data.

The HTML you have is probably a template, and it needs to be parsed by a template engine to populate the content, after which you should be able to get the final HTML and parse that.
You do not normally get template HTML server from a server, this must be an offline file?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python request.get not getting all <div> - python

Request gets the HTML from the source. When you use the inspector you see the full HTML because your browser renders the rest of the HTML. To get the full HTML look into Scrapy Splash or Selenium.

Related

How do you input and output text with Pyscript?

Head element only renders on first page load

Python web scraped HTML code do not show login form

django templates displaying the html tags as it is

Dynamically generated webpage scraping

Categories

Resources