I am currently working on an assignment for creating an HTML file using python. I understand how to read an HTML file into python and then edit and save it.
table_file = open('abhi.html', 'w')
table_file.write('<!DOCTYPE html><html><body>')
table_file.close()
The problem with the above piece is it's just replacing the whole HTML file and putting the string inside write(). How can I edit the file and the same time keep it's content intact. I mean, writing something like this, but inside the body tags
<link rel="icon" type="image/png" href="img/tor.png">
I need the link to automatically go in between the opening and closing body tags.
You probably want to read up on BeautifulSoup:
import bs4
# load the file
with open("existing_file.html") as inf:
txt = inf.read()
soup = bs4.BeautifulSoup(txt)
# create new link
new_link = soup.new_tag("link", rel="icon", type="image/png", href="img/tor.png")
# insert it into the document
soup.head.append(new_link)
# save the file again
with open("existing_file.html", "w") as outf:
outf.write(str(soup))
Given a file like
<html>
<head>
<title>Test</title>
</head>
<body>
<p>What's up, Doc?</p>
</body>
</html>
this produces
<html>
<head>
<title>Test</title>
<link href="img/tor.png" rel="icon" type="image/png"/></head>
<body>
<p>What's up, Doc?</p>
</body>
</html>
(note: it has munched the whitespace, but gotten the html structure correct).
Related
I have a yaml file stored at URL location. How do I load it into python for processing?
This is the code I use to read then simply print it out to verify. But I do not see the yaml file format, looks like html to me.
code:
import yaml
import urllib
from urllib import request
x = urllib.request.urlopen("https://git.myplace.net/projects/groups%2users.yaml")
User_Object = yaml.load(x)
print(User_Object)
...
The output looks like:
anch:create-branch-action":{"serverCondition":false}});}(_PageDataPlugin));</script><meta name="application-name" content="Bitbucket"><link rel="shortcut icon" type="image/x-icon" href="/s/-1051105741/5ab4b55/261/1.0/_/download/resources/com.atlassian.bitbucket.server.bitbucket-webpack-INTERNAL:favicon/favicon.ico" /><link rel="search" href="https://git.cnvrmedia.net/plugins/servlet/opensearch-descriptor" type="application/opensearchdescription+xml" title="Bitbucket code search"/></head><body class="aui-page-sidebar bitbucket-theme"><ul id="assistive-skip-links" class="assistive"><li>Skip to sidebar navigation</li><li>Skip to content</li></ul><div id="page"><!-- start
The file name is "groups+users.yaml". What is the best way to read in yaml format for python to parse/process?
Below is the code
urls.append('http://google.com')
urls.append('http://stacoverflow.com')
whole = """<html>
<head>
<title>output -</title>
</head>
<body>Below are the list of URLS
%s // here I want to write both urls.
</body>
</html>"""
for x in urls:
print x
f = open('myfile.html', 'w')
f.write(whole)
f.close()
So this is the code for saving the file in HTML format. But I can't find the way to get the contents of for loop into HTML file. In other words, I want to write a list of indexes elements i.e. http://google.com, http://stackoverflow.com into my HTML file. As you can see that I have created myfile.html as HTML file, So I want to write both URLs which are in the list of indexes into my HTML file
Hope this time I better explain?
How can I? Would anyone like to suggest me something? It would be a really big help.
Try below code:
urls.append('http://google.com')
urls.append('http://stacoverflow.com')
whole = """<html>
<head>
<title>output -</title>
</head>
<body>Below are the list of URLS
%s
</body>
</html>"""
f = open('myfile.html', 'w')
f.write(whole % ", ".join(urls))
f.close()
I am using python 3.5 and in some cases, when I call tidylib.tidy_document
on an HTML file, the '/' character at the end of the <link ../> tag in the
header is getting removed. Tidylib does not give any errors or warnings when
it removes this character.
The HTML file I am using is part of an Epub generated with writer2epub. The
error occurs in almost all files in this Epub. The only exceptions are very
short ones (e.g. titlepage of the document). The error is the same in all
affected files.
I suspected a problem with the use of carriage returns (0x0d) instead of
linefeeds (0x0a), but changing them doesn't make a difference. I also see that the file contains various other non-ASCII characters, so maybe they're to blame. Googling for unicode problems with tidylib didn't turn up anything that seems to relate to this problem.
I have uploaded a test file that reproduces the problem with the following code:
import re
from tidylib import tidy_document
def printLink(html):
""" Print the <link> tag from the HTML header """
for line in html.split('\n'):
match = re.search('<link[^>]+>', line)
if match is not None:
print(match.group(0))
if __name__ == '__main__':
fname = 'test04.xhtml'
print(fname)
with open(fname, 'r') as fh:
html = fh.read()
print('checkpoint 01')
printLink(html)
newHtml, errors = tidy_document(html)
print('checkpoint 02')
printLink(newHtml)
If the problem is reproduced, the output will be:
<link rel="stylesheet" href="../styles/style001.css" type="text/css" />
at checkpoint 01 and
<link rel="stylesheet" href="../styles/style001.css" type="text/css">
at checkpoint 02.
What is causing tidylib to remove this one '/' character?
I have the following simple HTML file.
<html data-noop=="http://www.w3.org/1999/xhtml">
<head>
<title>Hello World</title>
</head>
<body>
SUMMARY1
hello world
</body>
</html>
I want to read this into a python script and replace SUMMARY1 with the text "hi there" (say). I do the following in python
with open('test.html','r') as htmltemplatefile:
htmltemplate = htmltemplatefile.read().replace('\n','')
htmltemplate.replace('SUMMARY1','hi there')
print htmltemplate
The above code reads in the file into the variable htmltemplate.
Next I call the replace() function of the string object to replace the pattern SUMMARY1 with "hi there". But the output does not seem to search and replace SUMMARY1 with "hi there". Here is what I'm getting.
<html data-noop=="http://www.w3.org/1999/xhtml"><head><title>Hello World</title></head><body>SUMMARY1hello world</body></html>
Could someone point out what I'm doing wrong here?
open() does not return a str, it returns a file object. Additionally, you are only opening it for reading ('r'), not for writing.
What you want to do is something like:
new_lines = []
with open('test.html', 'r') as f:
new_lines = f.readlines()
with open('test.html', 'w') as f:
f.writelines([x.replace('a', 'b') for x in new_lines])
The fileinput library makes this a lot easier.
I wants to browse image & upload image to folder in my application using python
when i click on submit button its shows me http://www.domain.com/store_mp3_view & image is not uploaded
html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<body>
<form action="/store_mp3_view" method="post" accept-charset="utf-8"
enctype="multipart/form-data">
<label for="mp3">Mp3</label>
<input id="mp3" name="mp3" type="file" value="" />
<input type="submit" value="submit" />
</form>
</body>
</html>
python file code
import os
import uuid
from pyramid.response import Response
def store_mp3_view(request):
# filename contains the name of the file in string format.
#
# WARNING: this example does not deal with the fact that IE sends an
# absolute file path as the filename. This example is naive; it
# trusts user input.
filename = request.POST['mp3'].filename
# ``input_file`` contains the actual file data which needs to be
# stored somewhere.
input_file = request.POST['mp3'].file
# Note that we are generating our own filename instead of trusting
# the incoming filename since that might result in insecure paths.
# Please note that in a real application you would not use /tmp,
# and if you write to an untrusted location you will need to do
# some extra work to prevent symlink attacks.
file_path = os.path.join(/files, '%s.mp3' % uuid.uuid4())
# We first write to a temporary file to prevent incomplete files from
# being used.
temp_file_path = file_path + '~'
output_file = open(temp_file_path, 'wb')
# Finally write the data to a temporary file
input_file.seek(0)
while True:
data = input_file.read(2<<16)
if not data:
break
output_file.write(data)
# If your data is really critical you may want to force it to disk first
# using output_file.flush(); os.fsync(output_file.fileno())
output_file.close()
# Now that we know the file has been fully saved to disk move it into place.
os.rename(temp_file_path, file_path)
return Response('OK')
return Response('OK')
You can use models filefield inside models.py
class Document(models.Model):
docfile = models.FileField(upload_to='documents/', max_length=5234,blank=True, null=True,)
corresponding forms.py
class DocumentForm(forms.Form):
docfile = forms.FileField(label='', show_hidden_initial='none',required=True,)
Inside views.py
if request.FILES.has_key('your_fileName'):
newdoc = Document(docfile = request.FILES['your_fileName'])
newdoc.save()
I got it working with above code, hope it helps
With this code you can upload multiple files
def insert_file(self):
for i in request.FILES.getlist('mp3'):
fileName = i.name
out_file = open(fileName,'w')
out_file.write(i.read())
return HttpResponse('Inserted Successfully')