How to load a yaml file from url to process in python? - python

I have a yaml file stored at URL location. How do I load it into python for processing?
This is the code I use to read then simply print it out to verify. But I do not see the yaml file format, looks like html to me.
code:
import yaml
import urllib
from urllib import request
x = urllib.request.urlopen("https://git.myplace.net/projects/groups%2users.yaml")
User_Object = yaml.load(x)
print(User_Object)
...
The output looks like:
anch:create-branch-action":{"serverCondition":false}});}(_PageDataPlugin));</script><meta name="application-name" content="Bitbucket"><link rel="shortcut icon" type="image/x-icon" href="/s/-1051105741/5ab4b55/261/1.0/_/download/resources/com.atlassian.bitbucket.server.bitbucket-webpack-INTERNAL:favicon/favicon.ico" /><link rel="search" href="https://git.cnvrmedia.net/plugins/servlet/opensearch-descriptor" type="application/opensearchdescription+xml" title="Bitbucket code search"/></head><body class="aui-page-sidebar bitbucket-theme"><ul id="assistive-skip-links" class="assistive"><li>Skip to sidebar navigation</li><li>Skip to content</li></ul><div id="page"><!-- start
The file name is "groups+users.yaml". What is the best way to read in yaml format for python to parse/process?

Related

convert html file to BytesIO then pass as Flask variable

I'm trying to convert a HTML file to BytesIO so that I don't need to write the file in the filesystem and get it from memory instead. I've read this about converting image to BytesIO however I can't apply it to HTML file.
I'm using Flask as my framework.
What i have tried:
buffer = io.BytesIO()
merged_df.to_html(buffer, encoding = 'utf-8', table_uuid = 'seasonality_table')
buffer.seek(0)
html_memory = base64.b64encode(buffer.getvalue())
return render_template('summary_01.html', html_data = html_memory.decode('utf-8'))
then in the HTML code which I want to output the html file:
<img id="picture" src="data:html;base64, {{ html_data }}">
Error message I got =
TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO
First: using <img> to display HTML is totally wrong idea.
Tag <img> is only for images like PNG, JPG, etc.
You can get directly HTML using to_html() without filename
html = merged_df.to_html(table_id='seasonality_table')
and send this as HTML
return render_template('summary_01.html', html_data=html)
and you need safe to display it as HTML
{{ html_data | safe }}
BTW:
If you want to put data as file for downloading then you should use <a> instead of <img> and it may need application/octet-stream instead of html to start downloading it.
html = merged_df.to_html(table_id='seasonality_table')
html = base64.b64encode(html.encode('utf-8')).decode('utf-8')
return render_template('summary_01.html', html_data=html)
DOWNLOAD
Minimal working example
from flask import Flask, render_template_string
import pandas as pd
import base64
app = Flask(__name__)
#app.route('/')
def index():
data = {
'A': [1,2,3],
'B': [4,5,6],
'C': [7,8,9]
}
df = pd.DataFrame(data)
html = df.to_html(table_id='seasonality_table')
html_base64 = base64.b64encode(html.encode()).decode()
return render_template_string('''<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
</head>
<body>
{{ html_data | safe }}
DOWNLOAD
</body>
</html>''', html_data=html, html_base64=html_base64)
if __name__ == '__main__':
#app.debug = True
#app.run(threaded=False, processes=3)
#app.run(use_reloader=False)
app.run()

csv file not workinf properly

so I created a simple code to read a csv file in python 3.0 using pandas
import pandas as pd
df = pd.read_csv('https://www.goodreads.com/review_porter/export/153331182/goodreads_export.csv', on_bad_lines= 'skip')
print(df)
and instead of the csv file i ended with this:
<!DOCTYPE html>
0 <html>
1 <head>
2 <title>Sign Up</title>
3 <meta content='telephone=no' name='format-dete...
4 <link href='https://www.goodreads.com/user/sig...
.. ...
255 }
256 //]]>
257 </script>
258 </html>
259 <!-- This is a random-length HTML comment: xme...
[260 rows x 1 columns]
can someone help me understand why in this particular case is not working, becouse i tryed another .csv and worked just fine. The site that i use is https://www.goodreads.com/ and the .csv file is from the export section.
Thats because that link need you to be authenticated before you can access the csv file. Since you have not passed any authentication it just read the sign up page and displaying the HTML format.
You can try this:
import requests
response = requests.get(url, auth=(username, password), verify=False)
Even if you download the csv file, it should work too.

Does tidylib damage my HTML file?

I am using python 3.5 and in some cases, when I call tidylib.tidy_document
on an HTML file, the '/' character at the end of the <link ../> tag in the
header is getting removed. Tidylib does not give any errors or warnings when
it removes this character.
The HTML file I am using is part of an Epub generated with writer2epub. The
error occurs in almost all files in this Epub. The only exceptions are very
short ones (e.g. titlepage of the document). The error is the same in all
affected files.
I suspected a problem with the use of carriage returns (0x0d) instead of
linefeeds (0x0a), but changing them doesn't make a difference. I also see that the file contains various other non-ASCII characters, so maybe they're to blame. Googling for unicode problems with tidylib didn't turn up anything that seems to relate to this problem.
I have uploaded a test file that reproduces the problem with the following code:
import re
from tidylib import tidy_document
def printLink(html):
""" Print the <link> tag from the HTML header """
for line in html.split('\n'):
match = re.search('<link[^>]+>', line)
if match is not None:
print(match.group(0))
if __name__ == '__main__':
fname = 'test04.xhtml'
print(fname)
with open(fname, 'r') as fh:
html = fh.read()
print('checkpoint 01')
printLink(html)
newHtml, errors = tidy_document(html)
print('checkpoint 02')
printLink(newHtml)
If the problem is reproduced, the output will be:
<link rel="stylesheet" href="../styles/style001.css" type="text/css" />
at checkpoint 01 and
<link rel="stylesheet" href="../styles/style001.css" type="text/css">
at checkpoint 02.
What is causing tidylib to remove this one '/' character?

Loading mako templates from files

I'm new to python and currently trying to use mako templating.
I want to be able to take an html file and add a template to it from another html file.
Let's say I got this index.html file:
<html>
<head>
<title>Hello</title>
</head>
<body>
<p>Hello, ${name}!</p>
</body>
</html>
and this name.html file:
world
(yes, it just has the word world inside).
I want the ${name} in index.html to be replaced with the content of the name.html file.
I've been able to do this without the name.html file, by stating in the render method what name is, using the following code:
#route(':filename')
def static_file(filename):
mylookup = TemplateLookup(directories=['html'])
mytemplate = mylookup.get_template('hello/index.html')
return mytemplate.render(name='world')
This is obviously not useful for larger pieces of text. Now all I want is to simply load the text from name.html, but haven't yet found a way to do this. What should I try?
return mytemplate.render(name=open(<path-to-file>).read())
Thanks for the replies.
The idea is to use the mako framework since it does things like cache and check if the file has been updated...
this code seems to eventually work:
#route(':filename')
def static_file(filename):
mylookup = TemplateLookup(directories=['.'])
mytemplate = mylookup.get_template('index.html')
temp = mylookup.get_template('name.html').render()
return mytemplate.render(name=temp)
Thanks again.
Did I understand you correctly that all you want is read the content from a file? If you want to read the complete content use something like this (Python >= 2.5):
from __future__ import with_statement
with open(my_file_name, 'r') as fp:
content = fp.read()
Note: The from __future__ line has to be the first line in your .py file (or right after the content encoding specification that can be placed in the first line)
Or the old approach:
fp = open(my_file_name, 'r')
try:
content = fp.read()
finally:
fp.close()
If your file contains non-ascii characters, you should also take a look at the codecs page :-)
Then, based on your example, the last section could look like this:
from __future__ import with_statement
#route(':filename')
def static_file(filename):
mylookup = TemplateLookup(directories=['html'])
mytemplate = mylookup.get_template('hello/index.html')
content = ''
with open('name.html', 'r') as fp:
content = fp.read()
return mytemplate.render(name=content)
You can find more details about the file object in the official documentation :-)
There is also a shortcut version:
content = open('name.html').read()
But I personally prefer the long version with the explicit closing :-)

Edit and create HTML file using Python

I am currently working on an assignment for creating an HTML file using python. I understand how to read an HTML file into python and then edit and save it.
table_file = open('abhi.html', 'w')
table_file.write('<!DOCTYPE html><html><body>')
table_file.close()
The problem with the above piece is it's just replacing the whole HTML file and putting the string inside write(). How can I edit the file and the same time keep it's content intact. I mean, writing something like this, but inside the body tags
<link rel="icon" type="image/png" href="img/tor.png">
I need the link to automatically go in between the opening and closing body tags.
You probably want to read up on BeautifulSoup:
import bs4
# load the file
with open("existing_file.html") as inf:
txt = inf.read()
soup = bs4.BeautifulSoup(txt)
# create new link
new_link = soup.new_tag("link", rel="icon", type="image/png", href="img/tor.png")
# insert it into the document
soup.head.append(new_link)
# save the file again
with open("existing_file.html", "w") as outf:
outf.write(str(soup))
Given a file like
<html>
<head>
<title>Test</title>
</head>
<body>
<p>What's up, Doc?</p>
</body>
</html>
this produces
<html>
<head>
<title>Test</title>
<link href="img/tor.png" rel="icon" type="image/png"/></head>
<body>
<p>What's up, Doc?</p>
</body>
</html>
(note: it has munched the whitespace, but gotten the html structure correct).

Categories

Resources