Pweave - putting LaTeX output within python functions - python

I'm putting together a fairly complex python script with functions that may or may not be called depending on the data that's being analysed.
In pure python, all works well. As soon as I break out of the code block to create a LaTeX section for the results, I get undefined variable errors. Stripping this back to its most simple case:
<<echo=False,complete=False>>=
def getValues(title, start, end):
#
\section{<%= title %>
... more LaTeX code...
<<echo=False,complete=False>>=
return
#
stripping out the pweave code block tags and the LaTeX markup, this works correctly. As soon as I add the markup the \section line returns that title is undefined when I attempt to pweave the file.
My understanding from the documentation was that the complete=False would combine the code blocks although I get the same error with or without this.
Since I want the output documentation to be dependent on the functions called, how can I achieve this?
I'd be very grateful if anyone can point me to a missed example, but I've been unable to find a way of doing what I need.

This does not work as ˋcomplete=Falseˋ does not apply to inline blocks so ˋtitleˋ is undefined when your code runs. You could generate your Latex output inside python chunks using results="tex" chunk option.

Related

Renaming files with different Names depending upon different match cases in Python

I'm a newbie at python, I'm trying to write a python code to rename multiple files with different names depending upon different match cases, here's my code
for i, file in enumerate(os.listdir(inputpath)):
if(match(".*"716262_2.*$"),file):
dstgco="Test1"+"DateHere"+".xls"
gnupgCommandOp=gnupgCommand(os.rename(os.path.join(inputpath,file),os.path.join(inputpath,dstgco)))
returnCode = call(gnupgCommandop)
if(match(".*"270811_2.*$"),file):
dstgmo="Test2"+"DateHere"+".xls"
gnupgCommandOp=gnupgCommand(os.rename(os.path.join(inputpath,file),os.path.join(inputpath,dstgmo)))
returnCode = call(gnupgCommandop)
currently what is happening is only one file is getting renamed which is Test2 DateHere with str object is not a callable error, my requirement is to rename files present at the location depending on the different match cases, I'm writing incorrect for loop or if statements?
things I have tried :
used incremental count
used glob
used only os.listdir and not enumerate
seems like it is matching the first statement and breaking on the next retrieval, may be I wrote If statements wrong
I can't debug this since I'm calling this code from an internal tool using a bat file.
can someone please help me out with this, I know only a single gnupgCommandOp should be used, is my syntax is wrong? is what would be a better way to achieve this?

Use list gotten by a function into other function

So, I have function ABC in a certain .py file that returns a list. And in ANOTHER .py FILE, i have another function where I'm supposed to write into an empty file (that function will return that new file). I want to write into that new empty file my list gotten with the function ABC. How am I supposed to do that?
Obs - sorry for not posting any code but I really have no ideas about how to do this, besides that I've found nothing in another questions similars to this.
https://www.learnpython.org/en/Functions has a very good section on how to use functions and the following sections on classes, objects and modules are worth reviewing.
Write a file:
Python 2.7 : Write to file instantly
Passing args vs list:
Advantages of using *args in python instead of passing a list as a parameter
That gives you a starting point for learning Python and solving your immediate problem.

Can the Python read() command be structured with two input variables?

I've been given some python code (at least I was told it was in python and it doesn't match matlab code structure) to get running and one of the lines is
data = f.read(1024x1024, 'int32')
I'm getting a syntax error which doesn't surprise me as I thought read() could only take one input and that was size...
I checked the docs https://docs.python.org/2/tutorial/inputoutput.html
and had a general look around, for example here:
http://www.tutorialspoint.com/python/python_files_io.htm and here: http://pymbook.readthedocs.org/en/latest/file.html
There are no indications that read() can take two inputs, nevermind one with a 'x' in it.
(I am also not clear on what the intentions of the 1024x1024 was, which is why I'm questioning if it's python, it looks like they're trying to set the size but it doesn't work like that for the read method)
Does anyone know what I'm missing? (or can work out what was originally meant by the command?)
Whole script section:
f = open(filename, 'r')
out = open(outfile, 'w')
data = f.read(1024x1024, 'int32')
result = out.write(data[0:256000])
out.closed
f.closed
It's basically notes on what they want to happen in a particular section of the script but they wrote it as if it was code and I have no idea what the intention of the data line is.
It looks more like pseudocode than anything; specifying "int32" makes me think they are reading from a binary file. You probably need something like
import numpy as np
def load_array(filename, dtype="int32", shape=(1024,1024)):
return np.fromfile(filename, dtype).reshape(shape)
Your syntax error has nothing to do with the read "command". Syntax error means that the interpreter/parser couldn't make sense of what you're writing at all. When that happens in python it will normally point at what's confusing the interpreter, fx:
data = f.read(1024x1024, 'int32')
^
SyntaxError: invalid syntax
Note the ^ pointing at 1024x1024 which is the fault, it simply doesn't understand what 1024x1024 is (so it won't get to the point to actually try to call the read method). If you meant to multiply the numbers you should have written 1024*1024 instead.
When you change to 1024*1024 you'll get other errors (for not reading the documentation for read - it doesn't take those arguments).
As for the language I'd suspect that there's no sane language with such a construct. The problem here is that x doesn't work well as a multiplication operator since that would be problematic with things like axe (did he mean a*e or the variable named axe?). It looks more like it's pseudo code.

Email parser work on individual data; breaks when used in loops list comprehensions, then breaks on original data as well... then works with map

There's some weird mysterious behavior here.
EDIT This has gotten really long and tangled, and I've edited it like 10 times. The TL/DR is that in the course of processing some text, I've managed to write a function that:
works on individual strings of a list
throws a variety of errors when I try to apply it to the whole list with a list comprehension
throws similar errors when I try to apply it to the whole list with a loop
after throwing those errors, stops working on the individual strings until I re-run the function definition and feed it some sample data, then it starts working again, and finally
turns out to work when I apply it to the whole list with map().
There's an ipython notebook saved as html which displays the whole mess here: http://paul-gowder.com/wtf.html ---I've put a link at the top to jump past some irrelevant stuff. I've also made a[nother] gist that just has the problem code and some sample data, but since this problem seems to throw around a bunch of state somehow, I can't guarantee it'll be reproducible from it: https://gist.github.com/paultopia/402891d05dd8c05995d2
End TL/DR, begin mess
I'm doing some toy text-mining on that old enron dataset, and I have the following set of functions to clean up the emails preparatory to turning them into a document term matrix, after loading nltk stopwords and such. The following uses the email library in python 2.7
def parseEmail(document):
# strip unnecessary headers, header text, etc.
theMessage = email.message_from_string(document)
tofield = theMessage['to']
fromfield = theMessage['from']
subjectfield = theMessage['subject']
bodyfield = theMessage.get_payload()
wholeMsgList = [tofield, fromfield, subjectfield, bodyfield]
# get rid of any fields that don't exist in the email
cleanMsgList = [x for x in wholeMsgList if x is not None]
# now return a string with all that stuff run together
return ' '.join(cleanMsgList)
def lettersOnly(document):
return re.sub("[^a-zA-Z]", " ", document)
def wordBag(document):
return lettersOnly(parseEmail(document)).lower().split()
def cleanDoc(document):
dasbag = wordBag(document)
# get rid of "enron" for obvious reasons, also the .com
bagB = [word for word in dasbag if not word in ['enron','com']]
unstemmed =[word for word in bagB if not word in stopwords.words("english")]
return [stemmer.stem(word) for word in unstemmed]
print enronEmails[0][1]
print cleanDoc(enronEmails[0][1])
First (T-minus half an hour) running this on an email represented as a unicode string produced the expected result: print cleanDoc(enronEmails[0][1]) yielded a list of stemmed words. To be clear, the underlying data enronEmails is a list of [label, message] lists, where label is an integer 0 or 1, and message is a unicode string. (In python 2.7.)
Then at t-10, I added a couple lines of code (since deleted and lost, unfortunately...but see below), with some list comprehensions in them to just extract the messages from the enronEmails, run my cleanup function on them, and then join them back into strings for convenient conversion into document term matrix via sklearn. But the function started throwing errors. So I put my debugging hat on...
First I tried rerunning the original definition and test cell. But when I re-ran that cell, my email parsing function suddenly started throwing an error in the message_from_string method:
AttributeError: 'list' object has no attribute 'message_from_string'
So that was bizarre. This was exactly the same function, called on exactly the same data: cleanDoc(enronEmails[0][1]). The function was working, on the same data, and I haven't changed it.
So checked to make extra-sure I didn't mutate the data. enronEmails[0][1] was still a string. Not a list. I have no idea why traceback was of the opinion that I was passing a list to cleanDoc(). I wasn't.
But the plot thickens
So then I went to a make a gist to create a wholly reproducible example for the purpose of posting this SO question. I started with the working part. The gist: https://gist.github.com/paultopia/c8c3e066c39336e5f3c2.
To make sure it was working, first I stuck it in a normal .py file and ran it from command line. It worked.
Then I stuck it in a cell at the bottom of my ipython notebook with all the other stuff in it. That worked too.
Then I tried the parseEmail function on enronEmails[0][1]. That worked again. Then I went all the way back up to the original cell that was throwing an error not five minutes ago and re-ran it (including the import from sklearn, and including the original definition of all functions). And it freaking worked.
BUT THEN
I then went back in and tried again with the list comprehensions and such. And this time, I kept track more carefully of what was going on. Adding the following cells:
1.
def atLeastThreeString(cleandoc):
return ' '.join([w for w in cleandoc if len(w)>2])
print atLeastThreeString(cleanDoc(enronEmails[0][1]))
THIS works, and produces the expected output: a string with words over 2 letters. But then:
2.
justEmails = [email[1] for email in enronEmails]
bigEmailsList = [atLeastThreeString(cleanDoc(email)) for email in justEmails]
and all of a sudden it starts throwing a whole new error, same place in the traceback:
AttributeError: 'unicode' object has no attribute 'message_from_string'
which is extra funny, because I was passing it unicode strings a minute ago and it was doing just fine. And, just to thicken the plot, then going back and rerunning cleanDoc(enronEmails[0][1]) throws the same error
This is driving me insane. How is it possible that creating a new list, and then attempting to run function A on that list, not only throws an error on the new list, but ALSO causes function A to throw an error on data that it was previously working on? I know I'm not mutating the original list...
I've posted the entire notebook in html form here, if anyone wants to see full code and traceback: http://paul-gowder.com/wtf.html The relevant parts start about 2/3 of the way down, at the cells numbered 24-5, where it works, and then the cell numbered 26, where it blows up.
help??
Another edit: I've added some more debugging efforts to the bottom of the above-linked html notebook. As you can see, I've traced the problem down to the act of looping, whether done implicitly in list comprehension form or explicitly. My function works on an individual item in the list of just e-mails, but then fails on every single item when I try to loop over that list, except when I use map() to do it. ???? Has the world gone insane?
I believe the problem is these staements:
justEmails = [email[1] for email in enronEmails]
bigEmailsList = [atLeastThreeString(cleanDoc(email)) for email in justEmails]
In python 2, the dummy variable email leaks out into the namespace, and so you are overwriting the name of the email module, and you are then trying to call a method from that module on a python string. I don't have ntlk in python 2, so I cant test it, but I think this must be it.

Referencing long names with Python Sphinx

I'm working on documentation for my Python module (using Sphinx and reST), and I'm finding that when cross-referencing other Python objects (modules, classes, functions, etc) the full object name ends up being incredibly long. Often it is longer than 80 characters, which I would like to avoid at all costs.
Here is an example:
def exampleFunction():
'''Here is an example docstring referencing another
:class:`module1.module2.module3.module4.module5.ReallyLongExampleClassName`
'''
The issue is that when creating the documentation for the ReallyLongExampleClassName class, I generated it for the full path name module1.module2.module3.module4.module5.ReallyLongExampleClassaName.
I'm wondering if there is any way to solve this? I have tried the following methods, with no success:
1) Adding a line break in the middle of the module name. Example:
:class:`module1.module2.module3.module4.
module5.ReallyLongExampleClassName`
2) Referencing the class name in a different (but still Python importable) way. Example:
:class:`module1.module2.ReallyLongClassName`
I believe that since the documentation for ReallyLongClassName is tied to the full path names that Sphinx cannot correlate the shortened version with the fully named version.
Edit 04/05/2012:
As per the answer/suggestion of j13r (see below) I tried the following:
:class:`module1.module2.module3.module4.module5\
ReallyLongExampleClassName`
And this worked successfully. The only caveat to get this to work, is that the second line must not have spaces before it (which is quite frustrating when using this in a docstring). Thus to make my original example work it would look like:
def exampleFunction():
'''Here is an example docstring referencing another
:class:`module1.module2.module3.module4.module5.\
ReallyLongExampleClassName`
'''
Nice, and ugly. If you were to put spaces before ReallyLongExampleClassName to indent it to the same level as the line above it the output would include the spaces and thus Sphinx would try to reference something like module1.module2.module3.module4.module5.ReallyLongExampleClassName.
I should also note that I tried two other variations of this, which did NOT work:
# Note: Trying to put a space before the '\'
:class:`module1.module2.module3.module4.module5. \
ReallyLongExampleClassName`
# Note: Trying to leave out the '\'
:class:`module1.module2.module3.module4.module5.
ReallyLongExampleClassName`
I was looking for a solution that didn't involve destroying the formatting of the docstring, but I suppose it will do...I think I actually prefer a line that goes past 80 characters to this.
Thanks to j13r for the answer!
According to the sphinx documentation (https://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html#cross-referencing-python-objects) you could use a dot before your target class:
:class:`.ReallyLongExampleClassName`
or
:class:`.module5.ReallyLongExampleClassName`
and let sphinx search for the class:
... if the name is prefixed with a dot, and no exact match is found, the target is taken as a suffix and all object names with that suffix are searched. For example, :py:meth:.TarFile.close references the tarfile.TarFile.close() function, even if the current module is not tarfile. Since this can get ambiguous, if there is more than one possible match, you will get a warning from Sphinx.
You can use ~ as prefix, it does exactly what you want.
http://sphinx-doc.org/markup/inline.html#xref-syntax
Another strategy is to use reST Substitutions. This will let you save more space in the text by calling the :class: cross-reference later on:
def exampleFunction():
'''Here is an example docstring referencing another
|ReallyLongExampleClassName|
.. |ReallyLongExampleClassName| replace::
:class:`.ReallyLongExampleClassName`
'''
If you're referring to the same class in many of your files, you could instead put the substitution in your Sphinx conf.py file, using the rst_epilog setting. From the Sphinx documentation:
rst_epilog
A string of reStructuredText that will be included at the end of every source file that is read. This is the right place to add substitutions that should be available in every file. An example:
rst_epilog = """
.. |psf| replace:: Python Software Foundation
"""
New in version 0.6.
Then your docstring would just be:
def exampleFunction():
'''Here is an example docstring referencing another
|ReallyLongExampleClassName|
'''
Wild stab in the dark. Perhaps this works:
:class:`module1.module2.module3.module4.\
module5.ReallyLongExampleClassName`
It would be valid Python
import scipy.\
stats

Categories

Resources