I was trying to fetch a product info from an online store. It doesn't require any kind of authentication. I can see product information from the browser. I've inspected in developer tools and found the request which fetches data from the URL. My python code is bellow
import requests
def coles():
URL = "https://shop.coles.com.au/search/resources/store/20520/productview/bySeoUrlKeyword/mutti-tomato-passata-2349503p"
PARAMS = {
"catalogId": 12064
}
res = requests.get(URL,PARAMS)
print(res.text)
#coles()
if __name__ == '__main__':
coles()
But I'm getting this instead of the product info
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<link rel="shortcut icon" href="about:blank">
</head>
<body>
<script src="/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/j.js"></script>
<script src="/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/f.js"></script>
<script src="/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint/script/kpf.js?url=/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint&token=dcc221cb-d87a-e9e3-5316-7c7a20910bf8"></script>
</body>
</html>
But in the inspection, I got something like this. Maybe I need to add some header information or missing something.
Here is request info and paramerter
If I put this request URL in browser I get the data.
Related
I have a String variable(name) that contains the name of the song.
(Python)
from pytube import YouTube
yt = YouTube("https://www.youtube.com/watch?v=6BYIKEH0RCQ")
name = yt.title #Contains the title of the song
Here is my HTML code for website to download the mp3 song:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Download</title>
</head>
<body>
<button>Click here </button>
</body>
</html>
With this code, I'd like to use the exact title of song as the name of the file when its been downloaded from the user.
I want to use the name Variable from Python in place of Song_name in HTML code.
Please suggest me any possible way in order to make this work.
You can try try this:
from pytube import YouTube
yt = YouTube("https://www.youtube.com/watch?v=6BYIKEH0RCQ")
name = yt.title
HTML=f"""
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Download</title>
</head>
<body>
<button>Click here </button>
</body>
</html>
"""
with open("test.html","w") as f:
f.write(HTML)
This will put title of song in download attribute. If you want you may put it anywhere. Just don't forget to use f"" and {variable}.
When making a request using the requests library to https://stackoverflow.com
page = requests.get(url='https://stackoverflow.com')
print(page.content)
I get the following:
<!DOCTYPE html>
<html class="html__responsive html__unpinned-leftnav">
<head>
<title>Stack Overflow - Where Developers Learn, Share, & Build Careers</title>
<link rel="shortcut icon" href="https://cdn.sstatic.net/Sites/stackoverflow/Img/favicon.ico?v=ec617d715196">
<link rel="apple-touch-icon" href="https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a">
<link rel="image_src" href="https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a">
..........
These source code here have the absolute paths, but when running the same URL using requests-html with js rendering
with HTMLSession() as session:
page = session.get('https://stackoverflow.com')
page.html.render()
print(page.content)
I get the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>StackOverflow.org</title>
<script type="text/javascript" src="lib/jquery.js"></script>
<script type="text/javascript" src="lib/interface.js"></script>
<script type="text/javascript" src="lib/window.js"></script>
<link href="lib/dock.css" rel="stylesheet" type="text/css" />
<link href="lib/window.css" rel="stylesheet" type="text/css" />
<link rel="icon" type="image/gif" href="favicon.gif"/>
..........
The links here are relative paths,
How can I get the source code with absolute paths like requests when using requests-html with js rendering?
This should probably a feature request for the request-html developers. However for now we can achieve this with this hackish solution:
from requests_html import HTMLSession
from lxml import etree
with HTMLSession() as session:
html = session.get('https://stackoverflow.com').html
html.render()
# iterate over all links
for link in html.pq('a'):
if "href" in link.attrib:
# Make links absolute
link.attrib["href"] = html._make_absolute(link.attrib["href"])
# Print html with only absolute links
print(etree.tostring(html.lxml).decode())
We change the html-objects underlying lxml tree, by iterating over all links and changing their location to absolute using the html-object's private _make_absolute function.
The documentation on the module in this link mentions a distinguishment between the absolute and relative links.
Quote:
Grab a list of all links on the page, in absolute form (anchors
excluded):
r.html.absolute_links
Could you try this statement?
i am trying to go to http://192.168.1.235/status.cgi after logging into the website. I am unsure of how to do this. As can be seen below in my code, i tried to access the next page immediately after I logged in, however this results in a redirect to the login page.
import requests
#login
payload = {'password': "password"}
netGearSiteLogin = requests.post("http://192.168.1.235/login.cgi",params=payload)
netGearSitePortStatus = requests.get("http://192.168.1.235/status.cgi")
print(netGearSitePortStatus.text)
Results
<head>
<title>Redirect to Login</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<link rel="stylesheet" type="text/css" href="/style.css">
<script type="text/javascript" language="JavaScript">
function RedirectToLoginPage()
{
top.location.href = "/login.cgi";
}
</script>
</head>
<body onload="RedirectToLoginPage();">
</body>
</html>
I solved it by using Sessions:
import requests
#login
payload = {'password': "password"}
netGearSiteLogin = requests.post("http://192.168.1.235/login.cgi",data=payload,allow_redirects=True)
with requests.Session() as session:
post = session.post("http://192.168.1.235/login.cgi", data=payload)
r = session.get("http://192.168.1.235/status.cgi")
print(r.text) #or whatever else you want to do with the request data!
my django app calls a python script(query.cgi). but when I run it, the website shows the html printout from that script instead of showing the output as a webpage.
def query(request):
if request.method == 'GET':
output = subprocess.check_output(['python', 'query.cgi']).decode('utf-8')
return HttpResponse(output, content_type="text/plain")
The webpage shows:
Content-Type: text/html
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<link type="text/css" rel="stylesheet" href="css/css_4.css" media="screen" />
<title>....</title>
</head><body>.....</body></html>
Thanks for any help!!
return HttpResponse(output, content_type="text/plain")
The reason it's returning escaped HTML is because you have content_type='text/plain', which says you are just sending plain text.
Try changing it to 'text/html'.
I am just looking into using Jinja2 with a python application I have already written. I may be going about this in the wrong way, but here is what I would like to do.
from jinja2 import Environment, FileSystemLoader
from weasyprint import HTML
env = Environment(loader=FileSystemLoader('.'))
template = env.get_template("really.html")
template_vars = {"title":"TITLE","graph":'total.png'}
html_out = template.render(template_vars)
HTML(string=html_out).write_pdf("report.pdf")
This nearly produces what I want, I get a pdf called report.pdf, but instead of the attached file, it is a string of total.png. This is my first run at using Jinja, so hopefully attaching an image like this is possible. Thanks.
This is the template, not much built, just trying to do this piece at first.
<!DOCTYPE html>
<html>
<head lang="en">
<meta charset="UTF-8">
<title>{{ title }}</title>
</head>
<body>
<h2>Graph Goes Here</h2>
{{ graph }}
</body>
</html>
I have an answer to my own question, I was simply able to add the image url into the template, without trying to pass it in as a variable.
<!DOCTYPE html>
<html>
<head lang="en">
<meta charset="UTF-8">
<title>{{ title }}</title>
</head>
<body>
<h2>Graph Goes Here</h2>
<img src="graph.png">
</body>
</html>
Guess I was over complicating it a bit...