I have the following simple HTML file.
<html data-noop=="http://www.w3.org/1999/xhtml">
<head>
<title>Hello World</title>
</head>
<body>
SUMMARY1
hello world
</body>
</html>
I want to read this into a python script and replace SUMMARY1 with the text "hi there" (say). I do the following in python
with open('test.html','r') as htmltemplatefile:
htmltemplate = htmltemplatefile.read().replace('\n','')
htmltemplate.replace('SUMMARY1','hi there')
print htmltemplate
The above code reads in the file into the variable htmltemplate.
Next I call the replace() function of the string object to replace the pattern SUMMARY1 with "hi there". But the output does not seem to search and replace SUMMARY1 with "hi there". Here is what I'm getting.
<html data-noop=="http://www.w3.org/1999/xhtml"><head><title>Hello World</title></head><body>SUMMARY1hello world</body></html>
Could someone point out what I'm doing wrong here?
open() does not return a str, it returns a file object. Additionally, you are only opening it for reading ('r'), not for writing.
What you want to do is something like:
new_lines = []
with open('test.html', 'r') as f:
new_lines = f.readlines()
with open('test.html', 'w') as f:
f.writelines([x.replace('a', 'b') for x in new_lines])
The fileinput library makes this a lot easier.
Related
Im trying to open a template, modified and save it with a different name. I have done it with csv but never with html.
import csv
import pandas as pd
with open('filename.csv', 'r') as infile, open('new_filename.csv'), 'w') as outfile:
stripped = (line.strip() for line in infile)
lines = (line.split(";") for line in stripped if line)
writer = csv.writer(outfile)
writer.writerows(lines)
My script in HTML is:
f = open('filename.html','wb')
message= """<html>
<head></head>
<body><p>My code</p></body>
</html>"""
f.write(message)
f.close()
It can I put in the last code save outfile as? or is there another better and useful way to do it?
Thank you for your help and advice.
Here is an example on how to a html file and write it on other file:
input.html:
<html>
<head></head>
<body><p>My code</p></body>
</html>
process_html.py:
def process_html(inhtml):
# Change this function for different processing method!
processed_html = inhtml.replace('My code', 'My Beautiful codes')
return processed_html
with open('input.html', 'r') as infile, open('output.html', 'w') as outfile:
inhtml = infile.read()
processed_html = process_html(inhtml)
outfile.write(processed_html)
After running process_html.py, you can expect to get an html file called output.html that comes from input.html and processed.
Below is the code
urls.append('http://google.com')
urls.append('http://stacoverflow.com')
whole = """<html>
<head>
<title>output -</title>
</head>
<body>Below are the list of URLS
%s // here I want to write both urls.
</body>
</html>"""
for x in urls:
print x
f = open('myfile.html', 'w')
f.write(whole)
f.close()
So this is the code for saving the file in HTML format. But I can't find the way to get the contents of for loop into HTML file. In other words, I want to write a list of indexes elements i.e. http://google.com, http://stackoverflow.com into my HTML file. As you can see that I have created myfile.html as HTML file, So I want to write both URLs which are in the list of indexes into my HTML file
Hope this time I better explain?
How can I? Would anyone like to suggest me something? It would be a really big help.
Try below code:
urls.append('http://google.com')
urls.append('http://stacoverflow.com')
whole = """<html>
<head>
<title>output -</title>
</head>
<body>Below are the list of URLS
%s
</body>
</html>"""
f = open('myfile.html', 'w')
f.write(whole % ", ".join(urls))
f.close()
I am using python 3.5 and in some cases, when I call tidylib.tidy_document
on an HTML file, the '/' character at the end of the <link ../> tag in the
header is getting removed. Tidylib does not give any errors or warnings when
it removes this character.
The HTML file I am using is part of an Epub generated with writer2epub. The
error occurs in almost all files in this Epub. The only exceptions are very
short ones (e.g. titlepage of the document). The error is the same in all
affected files.
I suspected a problem with the use of carriage returns (0x0d) instead of
linefeeds (0x0a), but changing them doesn't make a difference. I also see that the file contains various other non-ASCII characters, so maybe they're to blame. Googling for unicode problems with tidylib didn't turn up anything that seems to relate to this problem.
I have uploaded a test file that reproduces the problem with the following code:
import re
from tidylib import tidy_document
def printLink(html):
""" Print the <link> tag from the HTML header """
for line in html.split('\n'):
match = re.search('<link[^>]+>', line)
if match is not None:
print(match.group(0))
if __name__ == '__main__':
fname = 'test04.xhtml'
print(fname)
with open(fname, 'r') as fh:
html = fh.read()
print('checkpoint 01')
printLink(html)
newHtml, errors = tidy_document(html)
print('checkpoint 02')
printLink(newHtml)
If the problem is reproduced, the output will be:
<link rel="stylesheet" href="../styles/style001.css" type="text/css" />
at checkpoint 01 and
<link rel="stylesheet" href="../styles/style001.css" type="text/css">
at checkpoint 02.
What is causing tidylib to remove this one '/' character?
I'm new to python and currently trying to use mako templating.
I want to be able to take an html file and add a template to it from another html file.
Let's say I got this index.html file:
<html>
<head>
<title>Hello</title>
</head>
<body>
<p>Hello, ${name}!</p>
</body>
</html>
and this name.html file:
world
(yes, it just has the word world inside).
I want the ${name} in index.html to be replaced with the content of the name.html file.
I've been able to do this without the name.html file, by stating in the render method what name is, using the following code:
#route(':filename')
def static_file(filename):
mylookup = TemplateLookup(directories=['html'])
mytemplate = mylookup.get_template('hello/index.html')
return mytemplate.render(name='world')
This is obviously not useful for larger pieces of text. Now all I want is to simply load the text from name.html, but haven't yet found a way to do this. What should I try?
return mytemplate.render(name=open(<path-to-file>).read())
Thanks for the replies.
The idea is to use the mako framework since it does things like cache and check if the file has been updated...
this code seems to eventually work:
#route(':filename')
def static_file(filename):
mylookup = TemplateLookup(directories=['.'])
mytemplate = mylookup.get_template('index.html')
temp = mylookup.get_template('name.html').render()
return mytemplate.render(name=temp)
Thanks again.
Did I understand you correctly that all you want is read the content from a file? If you want to read the complete content use something like this (Python >= 2.5):
from __future__ import with_statement
with open(my_file_name, 'r') as fp:
content = fp.read()
Note: The from __future__ line has to be the first line in your .py file (or right after the content encoding specification that can be placed in the first line)
Or the old approach:
fp = open(my_file_name, 'r')
try:
content = fp.read()
finally:
fp.close()
If your file contains non-ascii characters, you should also take a look at the codecs page :-)
Then, based on your example, the last section could look like this:
from __future__ import with_statement
#route(':filename')
def static_file(filename):
mylookup = TemplateLookup(directories=['html'])
mytemplate = mylookup.get_template('hello/index.html')
content = ''
with open('name.html', 'r') as fp:
content = fp.read()
return mytemplate.render(name=content)
You can find more details about the file object in the official documentation :-)
There is also a shortcut version:
content = open('name.html').read()
But I personally prefer the long version with the explicit closing :-)
I am currently working on an assignment for creating an HTML file using python. I understand how to read an HTML file into python and then edit and save it.
table_file = open('abhi.html', 'w')
table_file.write('<!DOCTYPE html><html><body>')
table_file.close()
The problem with the above piece is it's just replacing the whole HTML file and putting the string inside write(). How can I edit the file and the same time keep it's content intact. I mean, writing something like this, but inside the body tags
<link rel="icon" type="image/png" href="img/tor.png">
I need the link to automatically go in between the opening and closing body tags.
You probably want to read up on BeautifulSoup:
import bs4
# load the file
with open("existing_file.html") as inf:
txt = inf.read()
soup = bs4.BeautifulSoup(txt)
# create new link
new_link = soup.new_tag("link", rel="icon", type="image/png", href="img/tor.png")
# insert it into the document
soup.head.append(new_link)
# save the file again
with open("existing_file.html", "w") as outf:
outf.write(str(soup))
Given a file like
<html>
<head>
<title>Test</title>
</head>
<body>
<p>What's up, Doc?</p>
</body>
</html>
this produces
<html>
<head>
<title>Test</title>
<link href="img/tor.png" rel="icon" type="image/png"/></head>
<body>
<p>What's up, Doc?</p>
</body>
</html>
(note: it has munched the whitespace, but gotten the html structure correct).