pptx-python unwanted chart title - python

When creating a new powerpoint slide with a line chart on it, I keep getting a chart title even though I didn't ask for one. I have tried all sorts of ways to get rid of it using combinations of chart.has_title=False or chart.has_text_frame = False and nothing seemed to work.
I looked at the diff between the xml when the chart was working well and now when it display this unwanted title. Among other things, there was this <c:autoTitleDeleted val="0"/> property. In the python-pptx chart.xmlwriter source code itself I changed the value to 1 and the chart title disappeared so I assume this is the root cause of this unwanted titleā€“I have no idea why the autoTitleDeleted element is now being added to the xml from python-pptx.
I also saw this issue https://github.com/scanny/python-pptx/issues/460, but when I try to implement the fix I get the following error:
autoTitleDeleted = chart_element.get_or_add_autoTitleDeleted()
AttributeError: 'CT_Chart' object has no attribute 'get_or_add_autoTitleDeleted'
And I can't find in the docs anywhere a get_or_add_audoTitleDeleted method nor in the source code.
I also tried changing the xml manually by simply doing this:
chart._element.xml = xml.replace('autoTitleDeleted val="0', 'autoTitleDeleted val="1')
But I get a AttributeError: can't set attribute
So I have 3 questions:
1) How can I resolve this?
2) For the future, when I find the xml causing an issue, how can I manually change it? Is there a library somewhere for xml manipulation?
3) Why is this autoTitleDeleted element being added in the first place?

I would check your python-pptx version. This new attribute was added quite recently just for this reason. I recommend upgrading to the latest version 0.6.18 and see what happens. The fact that the traceback reports not finding that attribute is the evidence of a prior version. You can see the code that provides that attribute here:
https://github.com/scanny/python-pptx/blob/master/pptx/oxml/chart/chart.py#L40

Related

bibtex to html with pybtex, python 3

I want to take a file of one or more bibtex entries and output it as an html-formatted string. The specific style is not so important, but let's just say APA. Basically, I want the functionality of bibtex2html but with a Python API since I'm working in Django. A few people have asked similar questions here and here. I also found someone who provided a possible solution here.
The first issue I'm having is pretty basic, which is that I can't even get the above solutions to run. I keep getting errors similar to ModuleNotFoundError: No module named 'pybtex.database'; 'pybtex' is not a package. I definitely have pybtex installed and can make basic API calls in the shell no problem, but whenever I try to import pybtex.database.whatever or pybtex.plugin I keep getting ModuleNotFound errors. Is it maybe a python 2 vs python 3 thing? I'm using the latter.
The second issue is that I'm having trouble understanding the pybtex python API documentation. Specifically, from what I can tell it looks like the format_from_string and format_from_file calls are designed specifically for what I want to do, but I can't seem to get the syntax correct. Specifically, when I do
pybtex.format_from_file('foo.bib',style='html')
I get pybtex.plugin.PluginNotFound: plugin pybtex.style.formatting.html not found. I think I'm just not understanding how the call is supposed to work, and I can't find any examples of how to do it properly.
Here's a function I wrote for a similar use case--incorporating bibliographies into a website generated by Pelican.
from pybtex.plugin import find_plugin
from pybtex.database import parse_string
APA = find_plugin('pybtex.style.formatting', 'apa')()
HTML = find_plugin('pybtex.backends', 'html')()
def bib2html(bibliography, exclude_fields=None):
exclude_fields = exclude_fields or []
if exclude_fields:
bibliography = parse_string(bibliography.to_string('bibtex'), 'bibtex')
for entry in bibliography.entries.values():
for ef in exclude_fields:
if ef in entry.fields.__dict__['_dict']:
del entry.fields.__dict__['_dict'][ef]
formattedBib = APA.format_bibliography(bibliography)
return "<br>".join(entry.text.render(HTML) for entry in formattedBib)
Make sure you've installed the following:
pybtex==0.22.2
pybtex-apa-style==1.3

Feedparser returns only first entry of ATOM feed

I updated my (already) working code from python2.7 to python3.5 and the following problem suddenly appears.
By parsing the given ATOM feed with many entries (correct syntax), feedparser 5.2.1. returns only the first entry of the feed and of course the "meta" data of the feed.
My (unmodified) code:
feed_data = feedparser.parse("www.myfeed.com/myfeeds.atom")
for entry in feed_data.entries:
print(entry)
output
{'uid':'99999','author':'XY', ...more content of the first entry...}
{}
The next (second) entry is empty... and the other entries are not even listed... The lenght of feed_data.entries is 2 (it should be 78).
UPDATE
Now (today) I get 3 entries as output, because one new entry was appended at the beginning of the entry-list, so I guess it is an "encoding" problem with the specific 3rd entry in the current feed.
Any ideas how to fix the problem?
Okay guys,
Python3.5 is not supported yet. But the support for this python version is prepared in the develop branch of the github project (see here).
It works with this development version of feedparser, so I'll try this and might wait (nothing happend sind one year) until the official release of this "feature".

Get output from SyntaxNet as python object, not text

After executing some of an example syntaxnet scripts(like parse.sh) I receive output in text-conll format. My goal is to take some features and proceed them to next network. One possible choice is to parse text output with something like nltk.corpus.reader.ConllCorpusReader to a python object. But for me interesting
is:
It is possible with some code modification to get from SyntaxNet not text, but Python object related to parsed results?
I've found that in parser_eval.py on lines 133-138 syntaxnet fetched already text version of results.
while True:
tf_eval_epochs, tf_eval_metrics, tf_documents = sess.run([
parser.evaluation['epochs'],
parser.evaluation['eval_metrics'],
parser.evaluation['documents'],
])
But I cannot locate the place from what object this text was generated and how.
There are many ways to do it, and from what I know all involve parsing the output of SyntaxNet, and load it into NLTK objects. I wrote a simple post on my blog, exemplifying it:
http://www.davidsbatista.net/blog/2017/03/25/syntaxnet/

python lxml xpath AttributeError (NoneType) with correct xpath and usually working

I am trying to migrate a forum to phpbb3 with python/xpath. Although I am pretty new to python and xpath, it is going well. However, I need help with an error.
(The source file has been downloaded and processed with tagsoup.)
Firefox/Firebug show xpath: /html/body/table[5]/tbody/tr[position()>1]/td/a[3]/b
(in my script without tbody)
Here is an abbreviated version of my code:
forumfile="morethread-alte-korken-fruchtweinkeller-89069-6046822-0.html"
XPOSTS = "/html/body/table[5]/tr[position()>1]"
t = etree.parse(forumfile)
allposts = t.xpath(XPOSTS)
XUSER = "td[1]/a[3]/b"
XREG = "td/span"
XTIME = "td[2]/table/tr/td[1]/span"
XTEXT = "td[2]/p"
XSIG = "td[2]/i"
XAVAT = "td/img[last()]"
XPOSTITEL = "/html/body/table[3]/tr/td/table/tr/td/div/h3"
XSUBF = "/html/body/table[3]/tr/td/table/tr/td/div/strong[position()=1]"
for p in allposts:
unreg=0
username = None
username = p.find(XUSER).text #this is where it goes haywire
When the loop hits user "tompson" / position()=11 at the end of the file, I get
AttributeError: 'NoneType' object has no attribute 'text'
I've tried a lot of try except else finallys, but they weren't helpful.
I am getting much more information later in the script such as date of post, date of user registry, the url and attributes of the avatar, the content of the post...
The script works for hundreds of other files/sites of this forum.
This is no encode/decode problem. And it is not "limited" to the XUSER part. I tried to "hardcode" the username, then the date of registry will fail. If I skip those, the text of the post (code see below) will fail...
#text of getpost
text = etree.tostring(p.find(XTEXT),pretty_print=True)
Now, this whole error would make sense if my xpath would be wrong. However, all the other files and the first numbers of users in this file work. it is only this "one" at position()=11
Is position() uncapable of going >10 ? I don't think so?
Am I missing something?
Question answered!
I have found the answer...
I must have been very tired when I tried to fix it and came here to ask for help. I did not see something quite obvious...
The way I posted my problem, it was not visible either.
the HTML I downloaded and processed with tagsoup had an additional tag at position 11... this was not visible on the website and screwed with my xpath
(It probably is crappy html generated by the forum in combination with tagsoups attempt to make it parseable)
out of >20000 files less than 20 are afflicted, this one here just happened to be the first...
additionally sometimes the information is in table[4], other times in table[5]. I did account for this and wrote a function that will determine the correct table. Although I tested the function a LOT and thought it working correctly (hence did not inlcude it above), it did not.
So I made a better xpath:
'/html/body/table[tr/td[#width="20%"]]/tr[position()>1]'
and, although this is not related, I ran into another problem with unxpected encoding in the html file (not utf-8) which was fixed by adding:
parser = etree.XMLParser(encoding='ISO-8859-15')
t = etree.parse(forumfile, parser)
I am now confident that after adjusting for strange additional and multiple , and tags my code will work on all files...
Still I will be looking into lxml.html, as I mentioned in the comment, I have never used it before, but if it is more robust and may allow for using the files without tagsoup, it might be a better fit and save me extensive try/except statements and loops to fix the few files screwing with my current script...

Does anyone know the attributes of a Google Analytics DataPoint object?

I'm currently trying to pull GA data using Python; I've gotten as far as retrieving a list of DataPoint objects, and I can see inside them using .list, but I can't access their values directly.
For example, I've got this
>>> print(data.list)
[[[u'Android Browser'], [80]], [[u'Chrome'], [127]], [[u'Firefox'], [78]], [[u'Internet Explorer'], [564]], [[u'Mozilla'], [2]], [[u'Mozilla Compatible Agent'], [7]], [[u'Opera'], [2]], [[u'Safari'], [175]]]
But when I try to do this
data[0]
I get this
<googleanalytics.data.DataPoint object at 0x00D06DB0>
which is just a black box to me; I can't get inside it to split up the content for actual use.
I got one lucky guess: the first of the pair of attributes is called 'title'.
"data[0].title" gives me this
'ga:browser=Android Browser'
which I can use. I just need that second attribute name. Does anybody know it?
Thanks a lot!
There's a page at the documentation explaining each field.
http://code.google.com/apis/analytics/docs/gdata/gdataReferenceDataFeed.html#dataResponse
I figured it out: I was able to crack open the object using the inspect module, and that told me that the attributes were accessible using the same names I used in the query. Convenient language, this Python.

Categories

Resources