Rendering dynamically generated HTML through pyramid Response - python

I am new to python's pyramid framework so kindly help me.
I have a HTML dynamically generated. This HTML is generated by a python script which dynamically writes (tags/tables) which are extracted from some 'xyz.html' [using beautifulsoup] to another 'abc.html'.
Now i need to send this html page ('abc.html') back as a 'Response' object of 'pyramid.response' .
how can i do this. I tried the following
_resp = Response()
_resp.headerlist = [('Content-type',"text/html; charset=UTF-8'"\]
_resp.app_iter = open('abc.html','r')
return _resp
and also
with open('abc.html','r') as f:
data = f.read()
f.close()
return Response(data,content_type='text/html')
both did not work.
PS: I cannot use renderer="package:subpack/abc.html" or any similar renderer as this generated html is stored in a dynamically generated location everytime so i cannot guess the final storage location of this html file.
Thanks in advance for you help.

I'm a little surprised your first example doesn't work. Check out this cookbook entry on it from the Pyramid docs and see if that helps.
http://docs.pylonsproject.org/projects/pyramid_cookbook/en/latest/static_assets/files.html#serving-file-content-dynamically

Related

python-docx does not add picture

I'm trying to insert a picture into a Word document using python-docx but running into errors.
The code is simply:
document.add_picture("test.jpg", width = Cm(2.0))
From looking at the python-docx documentation I can see that the following XML should be generated:
<pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:nvPicPr>
<pic:cNvPr id="1" name="python-powered.png"/>
<pic:cNvPicPr/>
</pic:nvPicPr>
<pic:blipFill>
<a:blip r:embed="rId7"/>
<a:stretch>
<a:fillRect/>
</a:stretch>
</pic:blipFill>
<pic:spPr>
<a:xfrm>
<a:off x="0" y="0"/>
<a:ext cx="859536" cy="343814"/>
</a:xfrm>
<a:prstGeom prst="rect"/>
</pic:spPr>
</pic:pic>
This does in fact get generated in my document.xml file. (When unzipping the docx file). However looking into the OOXML format I can see that the image should also be saved under the media folder and the relationship should be mapped in word/_rels/document.xml:
<Relationship Id="rId20"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
Target="media/image20.png"/>
None of this is happens however, and when I open the Word document I'm met with a "The picture can't be displayed" placeholder.
Can anyone help me understand what is going on?
It looks like the image is not embedded the way it should be and I need to insert it in the media folder and add the mapping for it, however as a well documented feature this should be working as expected.
UPDATE:
Testing it out with an empty docx file that image does get added as expected which leads me to believe it might have something to do with the python-docx-template library. (https://github.com/elapouya/python-docx-template)
It uses python-docx and jinja to allow templating capabilities but runs and works the same way python-docx should. I added the image to a subdoc which then gets inserted into a full document at a given place.
A sample code can be seen below (from https://github.com/elapouya/python-docx-template/blob/master/tests/subdoc.py):
from docxtpl import DocxTemplate
from docx.shared import Inches
tpl=DocxTemplate('test_files/subdoc_tpl.docx')
sd = tpl.new_subdoc()
sd.add_paragraph('A picture :')
sd.add_picture('test_files/python_logo.png', width=Inches(1.25))
context = {
'mysubdoc' : sd,
}
tpl.render(context)
tpl.save('test_files/subdoc.docx')
I'll keep this up in case anyone else manages to make the same mistake as I did :) I managed to debug it in the end.
The problem was in how I used the python-docx-template library. I opened up a DocxTemplate like so:
report_output = DocxTemplate(template_path)
DoThings(value,template_path)
report_output.render(dictionary)
report_output.save(output_path)
But I accidentally opened it up twice. Instead of passing the template to a function, when working with it, I passed a path to it and opened it again when creating subdocs and building them.
def DoThings(data,template_path):
doc = DocxTemplate(template_path)
temp_finding = doc.new_subdoc()
#DO THINGS
Finally after I had the subdocs built, I rendered the first template which seemed to work fine for paragraphs and such but I'm guessing the images were added to the "second" opened template and not to the first one that I was actually rendering. After passing the template to the function it started working as expected!
I came acrossed with this problem and it was solved after the parameter width=(1.0) in method add_picture removed.
when parameter width=(1.0) was added, I could not see the pic in test.docx
so, it MIGHT BE resulted from an unappropriate size was set to the picture,
to add pictures, headings, paragraphs to existing document:
doc = Document(full_path) # open an existing document with existing styles
for row in tableData: # list from the json api ...
print ('row {}'.format(row))
level = row['level']
levelStyle = 'Heading ' + str(level)
title = row['title']
heading = doc.add_heading( title , level)
heading.style = doc.styles[levelStyle]
p = doc.add_paragraph(row['description'])
if row['img_http_path']:
ip = doc.add_paragraph()
r = ip.add_run()
r.add_text(row['img_name'])
r.add_text("\n")
r.add_picture(row['img_http_path'], width = Cm(15.0))
doc.save(full_path)

How to use Python to pipe a .htm file to a website

I have a file, gather.htm which is a valid HTML file with header/body and forms. If I double click the file on the Desktop, it properly opens in a web browser, auto-submits the form data (via <SCRIPT LANGUAGE="Javascript">document.forms[2].submit();</SCRIPT>) and the page refreshes with the requested data.
I want to be able to have Python make a requests.post(url) call using gather.htm. However, my research and my trail-and-error has provided no solution.
How is this accomplished?
I've tried things along these lines (based on examples found on the web). I suspect I'm missing something simple here!
myUrl = 'www.somewhere.com'
filename='/Users/John/Desktop/gather.htm'
f = open (filename)
r = requests.post(url=myUrl, data = {'title':'test_file'}, files = {'file':f})
print r.status_code
print r.text
And:
htmfile = 'file:///Users/John/Desktop/gather.htm'
files = {'file':open('gather.htm')}
webbrowser.open(url,new=2)
response = requests.post(url)
print response.text
Note that in the 2nd example above, the webbrowser.open() call works correctly but the requests.post does not.
It appears that everything I tried failed in the same way - the URL is opened and the page returns default data. It appears the website never receives the gather.htm file.
Since your request is returning 200 OK, there is nothing wrong getting your post request to the server. It's hard to give you an exact answer, but the problem lies with how the server is handling the request. Either your post request is being formatted in a way that the server doesn't recognise, or the server hasn't been set up to deal with them at all. If you're managing the website yourself, some additional details would help.
Just as a final check, try the following:
r = requests.post(url=myUrl, data={'title':'test_file', 'file':f})

Basic file uploading via website form using POST Requests in Python

I try to upload a file on a random website using Python and HTTP requests. For this, I use the handy library named Requests.
According to the documentation, and some answers on StackOverflow here and there, I just have to add a files parameter in my application, after studying the DOM of the web page.
The method is simple:
Look in the source code for the URL of the form ("action" attribute);
Look in the source code for the "name" attribute of the uploading
button ;
Look in the source code for the "name" and "value" attributes of the submit form button ;
Complete the Python code with the required parameters.
Sometimes this works fine. Indeed, I managed to upload a file on this site : http://pastebin.ca/upload.php
After looking in the source code, the URL of the form is upload.php, the buttons names are file and s, the value is Upload, so I get the following code:
url = "http://pastebin.ca/upload.php"
myFile = open("text.txt","rb")
r = requests.get(url,data={'s':'Upload'},files={'file':myFile})
print r.text.find("The uploaded file has been accepted.")
# ≠ -1
But now, let's look at that site: http://www.pictureshack.us/
The corresponding code is as follows:
url = "http://www.pictureshack.us/index2.php"
myFile = open("text.txt","rb")
r = requests.get(url,data={'Upload':'upload picture'},files={'userfile':myFile})
print r.text.find("Unsupported File Type!")
# = -1
In fact, the only difference I see between these two sites is that for the first, the URL where the work is done when submitting the form is the same as the page where the form is and where the files are uploaded.
But that does not solve my problem, because I still do not know how to submit my file in the second case.
I tried to make my request on the main page instead of the .php, but of course it does not work.
In addition, I have another question.
Suppose that some form elements do not have "name" attribute. How am I supposed to designate it at my request with Python?
For example, this site: http://imagesup.org/
The submitting form button looks like this: <input type="submit" value="Héberger !">
How can I use it in my data parameters?
The forms have another component you must honour: the method attribute. You are using GET requests, but the forms you are referring to use method="post". Use requests.post to send a POST request.

Python 3, How To Use Python To Save Data From This Page?

I am trying to save price data from this page using Python 3.x.
I want my script to go through every option under the Fund Provider dropdown, and then save the resulting table to a local file.
Unfortunately, when I look at the source code, it appears that all the menu options and table data come from JSON files, and I am not sure where to begin as I can't seem to read the files from a browser. I know how to use urlretrieve, and have used it for simple static web pages, but my skills are not advanced enough to navigate complex multiple resource documents.
Any advice on how I can achieve my goal would be most appreciated.
Sorry for doing an incorrect copy and paste with the URL. Anyway - I found a solution. What I needed to do is:
use Firebug (an extention for Firebug) to identify the location of the json files, along with the posted information.
then use urlretrieve to download the data, including post information with each request
example code:
from urllib.request import urlretrieve
import urllib
url = 'http://www.example.com'
values = {'example_param1' : 'example value 1',
'example_param2' : 'example value 2'}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8') # data should be bytes
save_path = save_root + fund_provider + '.json'
urlretrieve(url, save_path, data=data )

How do you extract feed urls from an OPML file exported from Google Reader?

I have a piece of software called Rss-Aware that I'm trying to use. It basically desktop feed-checker that checks if RSS feeds are updated and gives a notification through Ubuntu's Notify-OSD system.
However, to know what feeds to check, you have to list out the feed urls in a text file in ~/.rss-aware/rssfeeds.txt one after the other in a list with linebreak between each feed url. Something like:
http://example.com/feed.xml
http://othersite.org/feed.xml
http://othergreatsite.net/rss.xml
...Seems pretty simple right? Well, the list of feeds I'd like to use are exported from Google Reader as an OPML file (it's a type of XML) and I have no clue how to parse it to just output the the feed urls. It seems like it should be pretty straight forward yet I'm stumped.
I'd love if anyone could give an implementation in Python or Ruby or something I could do quickly from a prompt. A bash script would be awesome.
Thanks you so much for the help, I'm a really weak programmer and would love to learn how to do this basic parsing.
EDIT: Also, here is the OPML file I'm trying to extract the feed urls from.
I wrote a subscription list parser for this very purpose. It's called listparser, and it's written in Python. I just tested your OPML file, and it appears to parse the file perfectly. It will also make your feeds' labels available.
If you've ever used feedparser, the interface should be familiar:
>>> import listparser as lp
>>> d = lp.parse('https://dl.dropbox.com/u/670189/google-reader-subscriptions.xml')
>>> len(d.feeds)
112
>>> d.feeds[100].url
u'http://longreads.com/rss'
>>> d.feeds[100].tags
[u'reading']
It's possible to create the file with feed URLs using a script similar to:
import listparser as lp
d = lp.parse('https://dl.dropbox.com/u/670189/google-reader-subscriptions.xml')
f = open('/home/USERNAME/.rss-aware/rssfeeds.txt', 'w')
for i in d.feeds:
f.write(i.url + '\n')
f.close()
Just replace USERNAME with your actual username. Done!
XML parsing was so easy to implement and worked great for me.
from xml.etree import ElementTree
def extract_rss_urls_from_opml(filename):
urls = []
with open(filename, 'rt') as f:
tree = ElementTree.parse(f)
for node in tree.findall('.//outline'):
url = node.attrib.get('xmlUrl')
if url:
urls.append(url)
return urls
urls = extract_rss_urls_from_opml('your_file')
Since it's an XML file, you can use an XPath query to extract the urls.
In the XML file, it looks like the rss feed urls are stored in xmlUrl attributes. The XPath expression //#xmlUrl will select all values of that attribute.
If you want to test this out in your web-browser, you can use an online XPath tester. If you want to perform this XPath query in Python, this question explains how to use XPath in Python. Additionally, the lxml docs have a page on using XPath in lxml that might be helpful.
You could also use a regex. I used the following search-and-replace regex to convert my Google Reader OPML export to a Firefox HTML live-bookmark import:
^\s+<outline.*?title="(.*?)".*?xmlUrl="(.*?)".*?htmlUrl="(.*?)".*?/>
<DT><A FEEDURL="$2" HREF="$3">$1</A>

Categories

Resources