pre Python 3.4 HTML entity unescaping - python

NOTE: This let's you do 3.4 HTML5 entity conversion on pre-3.4 Python versions!
I'm writing a parser+renderer for the CommonMark spec of Markdown and I'm trying to figure out the best way to escape HTML entities across various python versions.
For Python 3 I use html.parser.HTMLParser().unescape and for Python 2 I use HTMLParser.HTMLParser().unescape. They are essentially the same function, except the later Python versions have updated entity definition tables, because of this this string that works fine in Python 3.4
#␣Ӓ␣Ϡ␣� => #␣Ӓ␣Ϡ␣�
results in this on Python 3.3 and 2
#␣Ӓ␣Ϡ␣� => #␣Ӓ␣Ϡ␣�
This also happens with various other HTML entities, ie
Ď␣ℋ␣ⅆ␣∲
I'm wondering if anyone know's of a way of doing this that's either cross-compatible, or works fine on pre 3.4 versions, that doesn't require 3rd party modules?
I'm trying to avoid having to basically copy over the entity table from python 3.4 and store it in a file somewhere :<

You can you the Python library html5charref
$ pip install git+https://github.com/bpabel/html5charref.git
Then use it as follows:
html = u'This has &copy; and &lt; and &#x000a9; symbols'
print html5charref.unescape(html)
# u'This has \uxa9 and < and \uxa9 symbols'

Related

How to translate text in Python

I have a text in my API using tkinter library(i don't know if it is important). I need a script or something like thing that exists in games. For example i have english version of counter strike and i can run installer which translates game into polish version. Below images. I don't want to translate entire text manually to make separate, but polish API.
I want to turn this:
if messagebox.askokcancel("Warning","Are you sure you want to clear workspace? \
You will lose all files selected"):
Into this:
if messagebox.askokcancel("Ostrzeżenie","Jesteś pewien, że chcesz wyczyścić przestrzeń \
roboczą? Utracisz wszystkie wybrane pliki"):
English version:
Polish version (this is what i want to get by script) done manually to show the problem:
Some code as a image(no need to put more):

PEP508: why either version requirement or URL but not both?

When configuring install_requires=[...] in a setup.py file, we can specify either version numbers:
package >= 1.2.3
or a source:
package # git+https://git.example.com/some/path/to/package#master#egg=package
But I did not manager to specify both, I got an error for everything I tried.
Looking at the PEP 508, it looks like it is intended:
specification = wsp* ( url_req | name_req ) wsp*
where wsp* just means optional whitespace.
Did I get it correctly that it is not possible to write something like this?
package >= 1.2.3 # git+https://...
What is the reason for this decision?
I believe this is because getting a python package from a URL/Github does not have a way to get historical builds/packages like you would via packages stored via PyPi.
Github/URLs references a single snapshot of code, you could sort of simulate getting specific versions if you have tags or release branches in GitHub and update the URL to reference those versions:
git+https://git.example.com/some/path/to/package#master#egg=package
git+https://git.example.com/some/path/to/package#develop#egg=package
git+https://git.example.com/some/path/to/package#1.4.2#egg=package

Why does elasticsearch raise elasticsearch_dsl.exceptions.ValidationException: You cannot write to a wildcard index.on pypy but not cpython?

I have some code that saves data to Elasticsearch. It runs fine in Python 3.5.2 (cpython), but raises an exception when running on pypi3 6.0.0 (Python 3.5.3). Any ideas why?
File "/opt/venvs/parsedmarc/site-packages/parsedmarc/elastic.py", line 366, in save_forensic_report_to_elasticsearch
forensic_doc.save()
File "/opt/venvs/parsedmarc/site-packages/elasticsearch_dsl/document.py", line 394, in save
index=self._get_index(index),
File "/opt/venvs/parsedmarc/site-packages/elasticsearch_dsl/document.py", line 138, in _get_index
raise ValidationException('You cannot write to a wildcard index.')
elasticsearch_dsl.exceptions.ValidationException: You cannot write to a wildcard index
I have tried several combinations of swapping DocType for Document, adding or deleting class Index or class Meta. However, in any combination, the name of the index was left empty. Debugging proved that, and after a bit of tweaking, I got it to work.
This bit of code worked for me:
class Index:
# index = 'sample_index'
name = 'sample_index'
Note that I haven't tried to use this with name only. Also, it worked with DocType, but it should work with new Document class as well.
Reference to Sean's question at GitHub. The above solution is confirmed on the thread.
Keeping it in one place, previously mentioned link: Document to replace DocType for newer versions of Elasticsearch.
We ran into similar issues, seems like elasticsearch_dsl renamed DocType to Document in their new version 6.2 what breaks backwards compatibility.
https://github.com/elastic/elasticsearch-dsl-py/blob/master/Changelog.rst
Either fix your version to 6.1 or you have to update to the new Document type.

bibtex to html with pybtex, python 3

I want to take a file of one or more bibtex entries and output it as an html-formatted string. The specific style is not so important, but let's just say APA. Basically, I want the functionality of bibtex2html but with a Python API since I'm working in Django. A few people have asked similar questions here and here. I also found someone who provided a possible solution here.
The first issue I'm having is pretty basic, which is that I can't even get the above solutions to run. I keep getting errors similar to ModuleNotFoundError: No module named 'pybtex.database'; 'pybtex' is not a package. I definitely have pybtex installed and can make basic API calls in the shell no problem, but whenever I try to import pybtex.database.whatever or pybtex.plugin I keep getting ModuleNotFound errors. Is it maybe a python 2 vs python 3 thing? I'm using the latter.
The second issue is that I'm having trouble understanding the pybtex python API documentation. Specifically, from what I can tell it looks like the format_from_string and format_from_file calls are designed specifically for what I want to do, but I can't seem to get the syntax correct. Specifically, when I do
pybtex.format_from_file('foo.bib',style='html')
I get pybtex.plugin.PluginNotFound: plugin pybtex.style.formatting.html not found. I think I'm just not understanding how the call is supposed to work, and I can't find any examples of how to do it properly.
Here's a function I wrote for a similar use case--incorporating bibliographies into a website generated by Pelican.
from pybtex.plugin import find_plugin
from pybtex.database import parse_string
APA = find_plugin('pybtex.style.formatting', 'apa')()
HTML = find_plugin('pybtex.backends', 'html')()
def bib2html(bibliography, exclude_fields=None):
exclude_fields = exclude_fields or []
if exclude_fields:
bibliography = parse_string(bibliography.to_string('bibtex'), 'bibtex')
for entry in bibliography.entries.values():
for ef in exclude_fields:
if ef in entry.fields.__dict__['_dict']:
del entry.fields.__dict__['_dict'][ef]
formattedBib = APA.format_bibliography(bibliography)
return "<br>".join(entry.text.render(HTML) for entry in formattedBib)
Make sure you've installed the following:
pybtex==0.22.2
pybtex-apa-style==1.3

Convert Javascript array to python list?

I want to make google trasnlate script.
I am making a request to translate.google.com and google return an array but the array contains undefined items.You can imagine response comes as string.
I can remove commas if there is more than one consecutive with regex etc. but I am looking best solution :)
How can I convert this javascript array to python list?
["a","b",,,"e"]
My script : http://ideone.com/jhjZe
JavaScript part - encoding
In Javascript you do:
var arr = ["a","b",,,"e"];
var json_string = JSON.stringify(arr);
then you somehow pass json_string (now equal to "["a","b",null,null,"e"]" string) from JavaScript to Python.
Python part - decoding
Then, on Python side do:
json_string = '["a","b",null,null,"e"]' # passed from JavaScript
try:
import simplejson as json
except (ImportError,):
import json
result = json.loads(json_string)
As a result you get [u'a', u'b', None, None, u'e'] in Python.
More links
See below:
demo of JavaScript part,
Documentation on JSON.stringify() at Mozilla Developer Network,
demo of Python part,
Dependencies
The above solutions require:
JSON.stringify() in JavaScript, which is in all mobile browsers, Chrome, Firefox, Opera, Safari and in IE since version 8.0 (more detailed list of compatible browsers is here),
json Python library (the above code will use simplejson optionally, if available, but is not required), which comes in standard library,
So, in short there are no external dependencies.

Categories

Resources