Reading documentations I often encounter doctests that I would like to run. Let's say you want to run the following in a Jupyter notebook:
>>> a = 2
>>> b = 3
>>> c = a + b
What is the fastest way to do it?
Just copy and paste it in a new cell. Jupyter strips such markup for you when it runs the sample:
If you must strip the markup (for aesthetic reasons, perhaps), you can use a bit of Python code to do so:
def extract_console_code(sample):
return ''.join([l[4:] for l in sample.splitlines(True) if l[:4] in ('>>> ', '... ')])
print(extract_console_code(r'''<paste code here>'''))
Note the r raw string literal! This should work for most Python code. Only if your code sample contains more ''' triple-single-quotes would you have to handle those separately (by using double quotes around the code, or by concatenating sections together with different string literal styles). Also, note that we skip any line that doesn't start with >>> or ...; those are output lines and not code.
You'll have to run this in a Python script, because the Jupyter console still just strips those initial lines away, and so for your exact example, depending on how you added the lines, it could be that none or only a few of the lines are returned; any line starting with >>> or ..., even in a string literal, will have been stripped by Jupyter already!
Related
I recently encountered the common "unexpected indent" problem when trying to evaluate python code by copying them from PyDev and Emacs into a python interpreter.
After trying to fix tab/spaces and some searches, I found the cause in this answer:
This error can also occur when pasting something into the Python
interpreter (terminal/console).
Note that the interpreter interprets an empty line as the end of an
expression, so if you paste in something like
def my_function():
x = 3
y = 7
the interpreter will interpret the empty line before y = 7 as the end
of the expression ...
, which is exactly the case in my situation. And there is also a comment to the answer which points out a solution:
key being that blank lines within the function definition are fine,
but they still must have the initial whitespace since Python
interprets any blank line as the end of the function
But the solution is impractical as I have many empty lines that are problematic for the interpreter. My question is:
Is there a method/tool to automatically insert the right number of initial whitespaces to empty lines so that I can copy-and-paste my code from an editor to an interpreter?
Don't bother with inserting spaces. Tell the interpreter to execute a block of text instead:
>>> exec(r'''
<paste your code>
''')
The r''' ... ''' tripple-quoted string preserves escapes and newlines. Sometimes (though in my experience, rarely) you need to use r""" ... """ instead, when the code block contains tripple-quoted strings using single quotes.
Another option is to switch to using IPython to do your day-to-day testing of pasted code, which handles pasted code with blank lines natively.
So I'm trying to parse a bunch of citations from a text file using the re module in python 3.4 (on, if it matters, a mac running mavericks). Here's some minimal code. Note that there are two commented lines: they represent two alternative searches. (Obviously, the little one, r'Rawls', is the one that works)
def makeRefList(reffile):
print(reffile)
# namepattern = r'(^[A-Z1][A-Za-z1]*-?[A-Za-z1]*),.*( \(?\d\d\d\d[a-z]?[.)])'
# namepattern = r'Rawls'
refsTuplesList = re.findall(namepattern, reffile, re.MULTILINE)
print(refsTuplesList)
The string in question is ugly, and so I stuck it in a gist: https://gist.github.com/paultopia/6c48c398a42d4834f2ae
As noted, the search string r'Rawls' produces expected output ['Rawls', 'Rawls']. However, the other search string just produces an empty list.
I've confirmed this regex (partially) works using the regex101 tester. Confirmation here: https://regex101.com/r/kP4nO0/1 -- this match what I expect it to match. Since it works in the tester, it should work in the code, right?
(n.b. I copied the text from terminal output from the first print command, then manually replaced \n characters in the string with carriage returns for regex101.)
One possible issue is that python has appended the bytecode flag (is the little b called a "flag?") to the string. This is an artifact of my attempt to convert the text from utf-8 to ascii, and I haven't figured out how to make it go away.
Yet re clearly is able to parse strings in that form. I know this because I'm converting two text files from utf-8 to ascii, and the following code works perfectly fine on the other string, converted from the other text file, which also has a little b in front of it:
def makeCiteList(citefile):
print(citefile)
citepattern = r'[\s(][A-Z1][A-Za-z1]*-?[A-Za-z1]*[ ,]? \(?\d\d\d\d[a-z]?[\s.,)]'
rawCitelist = re.findall(citepattern, citefile)
cleanCitelist = cleanup(rawCitelist)
finalCiteList = list(set(cleanCitelist))
print(finalCiteList)
return(finalCiteList)
The other chunk of text, which the code immediately above matches correctly: https://gist.github.com/paultopia/a12eba2752638389b2ee
The only hypothesis I can come up with is that the first, broken, regex expression is puking on the combination of newline characters and the string being treated as a byte object, even though a) I know the regex is correct for newlines (because, confirmation from the linked regex101), and b) I know it's matching the strings (because, confirmation from the successful match on the other string).
If that's true, though, I don't know what to do about it.
Thus, questions:
1) Is my hypothesis right that it's the combination of newlines and b that blows up my regex? If not, what is?
2) How do I fix that?
a) replace the newlines with something in the string?
b) rewrite the regex somehow?
c) somehow get rid of that b and make it into a normal string again? (how?)
thanks!
Addition
In case this is a problem I need to fix upstream, here's the code I'm using to get the text files and convert to ascii, replacing non-ascii characters:
this function gets called on utf-8 .txt files saved by textwrangler in mavericks
def makeCorpoi(citefile, reffile):
citebox = open(citefile, 'r')
refbox = open(reffile, 'r')
citecorpus = citebox.read()
refcorpus = refbox.read()
citebox.close()
refbox.close()
corpoi = [str(citecorpus), str(refcorpus)]
return corpoi
and then this function gets called on each element of the list the above function returns.
def conv2ASCII(bigstring):
def convHandler(error):
return ('1FOREIGN', error.start + 1)
codecs.register_error('foreign', convHandler)
bigstring = bigstring.encode('ascii', 'foreign')
stringstring = str(bigstring)
return stringstring
Aah. I've tracked it down and answered my own question. Apparently one needs to call some kind of encode method on the decoded thing. The following code produces an actual string, with newlines and everything, out the other end (though now I have to fix a bunch of other bugs before I can figure out if the final output is as expected):
def conv2ASCII(bigstring):
def convHandler(error):
return ('1FOREIGN', error.start + 1)
codecs.register_error('foreign', convHandler)
bigstring = bigstring.encode('ascii', 'foreign')
newstring = bigstring.decode('ascii', 'foreign')
return newstring
apparently the str() function doesn't do the same job, for reasons that are mysterious to me. This is despite an answer here How to make new line commands work in a .txt file opened from the internet? which suggests that it does.
In PyCharm, if I open a Python Console, I can't terminate a multi-line string.
Here's what happens in IDLE for comparison:
>>> words = '''one
two
three'''
>>> print(words)
one
two
three
>>>
But if I try the same thing in an interactive Python Console from within PyCharm, the console expects more input after I type the final 3 apostrophes. Anyone know why?
>>> words = '''one
... two
... three'''
...
I'm not sure what the context is, but in many cases it would just be easier to make a tuple/list from the things you want printed on different lines and join them with "\n":
>>> words = "\n".join(["one", "two", "three"])
You may also try three double-quote symbols instead. Maybe PyCharm is confused by what's being delimited. I've always wondered this in Python because strings can be concatenated just by pure juxtaposition. So effectively, '' 'one\n\two\nthree' '' ought to take the three different strings, (1) '' (2) 'one\n\two\nthree' and (3) '', and concatenate them. Since the spaces between them ought not be needed (principle of least astonishment), it's more intuitive to me that the triple-single-(or double)-quote would be interpreted that way. But since the triple version is it's own special character, it doesn't work like that.
In IPython the syntax you give works with no problem. IPython also provides a nice magic command %cpaste in which you can paste multi-line expressions or statements, and then delimit the final line with --, and upon hitting enter, it executes the pasted block. I prefer IPython (running in a buffer in Emacs) to PyCharm by a lot, but maybe you can see if there's a comparable magic function, or just look up the source for that magic function and write one yourself?
Given a string s containing (syntactically valid) Python source code, how can I split s into an array whose elements are the strings corresponding to the Python "statements" in s?
I put scare-quotes around "statements" because this term does not capture exactly what I'm looking for. Rather than trying to come up with a more accurate wording, here's an example. Compare the following two ipython interactions:
In [1]: if 1 > 0:
......: pass
......:
In [2]: if 1 > 0
File "<ipython-input-1082-0b411f095922>", line 1
if 1 > 0
^
SyntaxError: invalid syntax
In the first interaction, after the first [RETURN] statement, ipython processes the input if 1 > 0: without objection, even though it is still incomplete (i.e. it is not a full Python statement). In contrast, in the second interaction, the input is not only incomplete (in this sense), but also not acceptable to ipython.
As a second, more complete example, suppose the file foo.py contains the following Python source code:
def print_vertically(s):
'''A pretty useless procedure.
Prints the characters in its argument one per line.
'''
for c in s:
print c
greeting = ('hello '
'world'.
upper())
print_vertically(greeting)
Now, if I ran the following snippet, featuring the desired split_python_source function:
src = open('foo.py').read()
for i, s in enumerate(split_python_source(src)):
print '%d. >>>%s<<<' % (i, s)
the output would look like this:
0. >>>def print_vertically(s):<<<
1. >>> '''A pretty useless procedure.
Prints the characters in its argument one per line.
'''<<<
2. >>> for c in s:<<<
3. >>> print c<<<
4. >>>greeting = ('hello '
'world'.
upper())<<<
5. >>>print_vertically(greeting)<<<
As you can see, in this splitting, for c in s: (for example) gets assigned to its own item, rather being part of some "compound statement."
In fact, I don't have a very precise specification for how the splitting should be done, as long as it is done "at the joints" (like ipython does).
I'm not familiar with the internals of the Python lexer (though almost certainly many people on SO are :), but my guess is that you're basically looking for lines, with one important exception : paired open-close delimiters that can span multiple lines.
As a quick and dirty first pass, you might be able to start with something that splits a piece of code on newlines, and then you could merge successive lines that are found to contain paired delimiters -- parentheses (), braces {}, brackets [], and quotes '', ''' ''' are the ones that come to mind.
I am working on a latex document that will require typesetting significant amounts of python source code. I'm using pygments (the python module, not the online demo) to encapsulate this python in latex, which works well except in the case of long individual lines - which simply continue off the page. I could manually wrap these lines except that this just doesn't seem that elegant a solution to me, and I prefer spending time puzzling about crazy automated solutions than on repetitive tasks.
What I would like is some way of processing the python source code to wrap the lines to a certain maximum character length, while preserving functionality. I've had a play around with some python and the closest I've come is inserting \\\n in the last whitespace before the maximum line length - but of course, if this ends up in strings and comments, things go wrong. Quite frankly, I'm not sure how to approach this problem.
So, is anyone aware of a module or tool that can process source code so that no lines exceed a certain length - or at least a good way to start to go about coding something like that?
You might want to extend your current approach a bit, but using the tokenize module from the standard library to determine where to put your line breaks. That way you can see the actual tokens (COMMENT, STRING, etc.) of your source code rather than just the whitespace-separated words.
Here is a short example of what tokenize can do:
>>> from cStringIO import StringIO
>>> from tokenize import tokenize
>>>
>>> python_code = '''
... def foo(): # This is a comment
... print 'foo'
... '''
>>>
>>> fp = StringIO(python_code)
>>>
>>> tokenize(fp.readline)
1,0-1,1: NL '\n'
2,0-2,3: NAME 'def'
2,4-2,7: NAME 'foo'
2,7-2,8: OP '('
2,8-2,9: OP ')'
2,9-2,10: OP ':'
2,11-2,30: COMMENT '# This is a comment'
2,30-2,31: NEWLINE '\n'
3,0-3,4: INDENT ' '
3,4-3,9: NAME 'print'
3,10-3,15: STRING "'foo'"
3,15-3,16: NEWLINE '\n'
4,0-4,0: DEDENT ''
4,0-4,0: ENDMARKER ''
I use the listings package in LaTeX to insert source code; it does syntax highlight, linebreaks et al.
Put the following in your preamble:
\usepackage{listings}
%\lstloadlanguages{Python} # Load only these languages
\newcommand{\MyHookSign}{\hbox{\ensuremath\hookleftarrow}}
\lstset{
% Language
language=Python,
% Basic setup
%basicstyle=\footnotesize,
basicstyle=\scriptsize,
keywordstyle=\bfseries,
commentstyle=,
% Looks
frame=single,
% Linebreaks
breaklines,
prebreak={\space\MyHookSign},
% Line numbering
tabsize=4,
stepnumber=5,
numbers=left,
firstnumber=1,
%numberstyle=\scriptsize,
numberstyle=\tiny,
% Above and beyond ASCII!
extendedchars=true
}
The package has hook for inline code, including entire files, showing it as figures, ...
I'd check a reformat tool in an editor like NetBeans.
When you reformat java it properly fixes the lengths of lines both inside and outside of comments, if the same algorithm were applied to Python, it would work.
For Java it allows you to set any wrapping width and a bunch of other parameters. I'd be pretty surprised if that didn't exist either native or as a plugin.
Can't tell for sure just from the description, but it's worth a try:
http://www.netbeans.org/features/python/