Vim: change a function's call-signature througout the code-base - python

I'm slowly turning Vim into my IDE-of-choice, with ctags and static-analysis/autocompletion/etc plugins (e.g. vim-jedi, youcompleteme, etc). But I haven't found anything that can do one specific task:
Say I have a function (I'll use Python here):
def my_function(outFile, foo, bar):
outFile.write(foo[bar])
Later I change its signature so the outFile positional-argument is a named one:
def my_function(foo, bar, outFile=None):
if outFile is None:
outFile = getDefaultOutfile()
outFile.write(foo[bar])
Now I want to change all of the old calls, thoughout the entire codebase:
my_function(oF, f, b)
to
my_function(f, b, outFile=oF)
Is there an easy way to do this in Vim (or other Linux utils e.g. sed)? I know PyCharm etc can do this, but I'm not intending to jump ship just yet.

You can do this following regex substitution:
:s/\vmy_function\((\w*), (\w*), (\w*)\)/my_function(\2, \3, outfile=\1)/g
Read up on vimregex, specifically capturing groups. Basically, whatever (\w*) matches will be saved and named \1 (then \2, \3, etc.) for us to use in the replacement later.
It's worth noting that this works for your example, but will not work if there's extra or missing spaces. To make this more robust, you could change it to:
:s/\vmy_function\((\w*),\s*(\w*),\s*(\w*)\)/my_function(\2, \3, outfile=\1)/g

As an alternative to find (described in an earlier comment), one can also use grep to open the files containing the desired function:
vim `grep -l 'my_function' *py`
The files will be loaded in different buffers. Then use a general buffer replacement:
bufdo %s/\(my_function\)(.*)/\1(f, b, outFile=oF)/gc
The c flag here is optional but I would recommended for this particular replacement.

Related

Adding syntax to IPython?

I would like to add some syntax changes to (my installation of) IPython. For example, I might want to use \+ to mean operator.add. I imagine that I can insert some code that would process the input and turn it into actual (I)Python, and then IPython can do its own processing. But I don't know where to put that code.
(Disclaimer: Don't do it for production code, or code that's intended for other people to see/use.)
Here is an example of how to transform "\+ a b" to "a + b".
from IPython.core.inputtransformer import StatelessInputTransformer
#StatelessInputTransformer.wrap
def my_filter(line):
words = line.split()
if line.startswith(r'\+ ') and len(words) == 3:
return '{} + {}'.format(*words[1:])
return line
ip = get_ipython()
ip.input_transformer_manager.physical_line_transforms.insert(0, my_filter())
Note that this is all string based. This hook executes in an unevaluated context. It means that you can't do conditional transformation based on which value is a or b. A magic would best suit your need in that case.
Moreover, you have to be careful when parsing input string. In my example, the following is broken \+ (a * b) c because of the split. In that case, you will need a tokenization tool. IPython provides one with TokenInputTransformer. It works like StatelessInputTransformer but it is called with a list of tokens instead of the whole line.
Simply run this code to add the filter. If you want it to be available as you start IPython, you can save it as a .py or .ipy file and put it in
~/.ipython/profile_*/startup
https://ipython.org/ipython-doc/dev/config/inputtransforms.html

How can I create a file with `/` in its file name? [duplicate]

I know that this is not something that should ever be done, but is there a way to use the slash character that normally separates directories within a filename in Linux?
The answer is that you can't, unless your filesystem has a bug. Here's why:
There is a system call for renaming your file defined in fs/namei.c called renameat:
SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
int, newdfd, const char __user *, newname)
When the system call gets invoked, it does a path lookup (do_path_lookup) on the name. Keep tracing this, and we get to link_path_walk which has this:
static int link_path_walk(const char *name, struct nameidata *nd)
{
struct path next;
int err;
unsigned int lookup_flags = nd->flags;
while (*name=='/')
name++;
if (!*name)
return 0;
...
This code applies to any file system. What's this mean? It means that if you try to pass a parameter with an actual '/' character as the name of the file using traditional means, it will not do what you want. There is no way to escape the character. If a filesystem "supports" this, it's because they either:
Use a unicode character or something that looks like a slash but isn't.
They have a bug.
Furthermore, if you did go in and edit the bytes to add a slash character into a file name, bad things would happen. That's because you could never refer to this file by name :( since anytime you did, Linux would assume you were referring to a nonexistent directory. Using the 'rm *' technique would not work either, since bash simply expands that to the filename. Even rm -rf wouldn't work, since a simple strace reveals how things go on under the hood (shortened):
$ ls testdir
myfile2 out
$ strace -vf rm -rf testdir
...
unlinkat(3, "myfile2", 0) = 0
unlinkat(3, "out", 0) = 0
fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
close(3) = 0
unlinkat(AT_FDCWD, "testdir", AT_REMOVEDIR) = 0
...
Notice that these calls to unlinkat would fail because they need to refer to the files by name.
You could use a Unicode character that displays as / (for example the fraction slash), assuming your filesystem supports it.
It depends on what filesystem you are using. Of some of the more popular ones:
ext3: No
ext4: No
jfs: Yes
reiserfs: No
xfs: No
Only with an agreed-upon encoding. For example, you could agree that % will be encoded as %% and that %2F will mean a /. All the software that accessed this file would have to understand the encoding.
The short answer is: No, you can't. It's a necessary prohibition because of how the directory structure is defined.
And, as mentioned, you can display a unicode character that "looks like" a slash, but that's as far as you get.
In general it's a bad idea to try to use "bad" characters in a file name at all; even if you somehow manage it, it tends to make it hard to use the file later. The filesystem separator is flat-out not going to work at all, so you're going to need to pick an alternative method.
Have you considered URL-encoding the URL then using that as the filename? The result should be fine as a filename, and it's easy to reconstruct the name from the encoded version.
Another option is to create an index - create the output filename using whatever method you like - sequentially-numbered names, SHA1 hashes, whatever - then write a file with the generated filename/URL pair. You can save that into a hash and use it to do a URL-to-filename lookup or vice-versa with the reversed version of the hash, and you can write it out and reload it later if needed.
The short answer is: you must not. The long answer is, you probably can or it depends on where you are viewing it from and in which layer you are working with.
Since the question has Unix tag in it, I am going to answer for Unix.
As mentioned in other answers that, you must not use forward slashes in a filename.
However, in MacOS you can create a file with forward slashes / by:
# avoid doing it at all cost
touch 'foo:bar'
Now, when you see this filename from terminal you will see it as foo:bar
But, if you see it from finder: you will see finder converted it as foo/bar
Same thing can be done the other way round, if you create a file from finder with forward slashes in it like /foobar, there will be a conversion done in the background. As a result, you will see :foobar in terminal but the other way round when viewed from finder.
So, : is valid in the unix layer, but it is translated to or from / in the Mac layers like Finder window, GUI. : the colon is used as the separator in HFS paths and the slash / is used as the separator in POSIX paths
So there is a two-way translation happening, depending on which “layer” you are working with.
See more details here: https://apple.stackexchange.com/a/283095/323181
You can have a filename with a / in Linux and Unix. This is a very old question, but surprisingly nobody has said it in almost 10 years since the question was asked.
Every Unix and Linux system has the root directory named /. A directory is just a special kind of file. Symbolic links, character devices, etc are also special kinds of files. See here for an in depth discussion.
You can't create any other files with a /, but you certainly have one -- and a very important one at that.

How to find undocumented methods in my code?

I am writing documentation for a project and I would like to make sure I did not miss any method. The code is written in Python and I am using PyCharm as an IDE.
Basically, I would need a REGEX to match something like:
def method_name(with, parameters):
someVar = something()
...
but it should NOT match:
def method_name(with, parameters):
""" The doc string """
...
I tried using PyCharm's search with REGEX feature with the pattern ):\s*[^"'] so it would match any line after : that doesn't start with " or ' after whitespace, but it doesn't work. Any idea why?
You mentioned you were using PyCharm: there is an inspection "Missing, empty, or incorrect docstring" that you can enable and will do that for you.
Note that you can then change the severity for it to show up more or less prominently.
There is a tool called pydocstyle which checks if all classes, functions, etc. have properly formatted docstrings.
Example from the README:
$ pydocstyle test.py
test.py:18 in private nested class `meta`:
D101: Docstring missing
test.py:27 in public function `get_user`:
D300: Use """triple double quotes""" (found '''-quotes)
test:75 in public function `init_database`:
D201: No blank lines allowed before function docstring (found 1)
I don't know about PyCharm, but pydocstyle can, for example, be integrated in Vim using the Syntastic plugin.
I don't know python, but I do know my regex.
And your regex has issues. First of all, as comments have mentioned, you may have to escape the closing parenthesis. Secondly, you don't match the new line following the function declaration. Finally, you look for single or double quotations at the START of a line, yet the start of a line contains whitespace.
I was able to match your sample file with \):\s*\n\s*["']. This is a multiline regex. Not all programs are able to match multiline regex. With grep, for example, you'd have to use this method.
A quick explanation of what this regex matches: it looks for a closing parenthesis followed by a semicolon. Any number of optional whitespace may follow that. Then there should be a new line followed by any number of whitespace (indentation, in this case). Finally, there must be a single or double quote. Note that this matches functions that do have comments. You'd want to invert this to find those without.
In case PyCharm is not available, there is a little tool called ckdoc written in Python 3.5.
Given one or more files, it finds modules, classes and functions without a docstring. It doesn't search in imported built-in or external libraries – it only considers objects defined in files residing in the same folder as the given file, or subfolders of that folder.
Example usage (after removing some docstrings)
> ckdoc/ckdoc.py "ckdoc/ckdoc.py"
ckdoc/ckdoc.py
module
ckdoc
function
Check.documentable
anykey_defaultdict.__getitem__
group_by
namegetter
type
Check
There are cases when it doesn't work. One such case is when using Anaconda with modules. A possible workaround in that case is to use ckdoc from Python shell. Import necessary modules and then call the check function.
> import ckdoc, main
> ckdoc.check(main)
/tmp/main.py
module
main
function
main
/tmp/custom_exception.py
type
CustomException
function
CustomException.__str__
False
The check function returns True if there are no missing docstrings.

Automated way to switch from epydoc's docstring formatting to sphinx docstring formatting?

I've got a project which I documented using epydoc. Now I'm trying to switch to sphinx. I formatted all my docstrings for epydocs, using B{}, L{} etc for bolding, linking and the like, and using #param, #return, #raise etc to explain input, output, exceptions and the likes.
So now that I'm switching to sphinx it loses all these features. Is there an automated way to convert docstrings formatted for epydocs to docstrings formatted for sphinx?
To expand on Kevin Horn's answer, docstrings can be translated on the fly in an event handler triggered by the autodoc-process-docstring event.
Below is a small demonstration (try it by adding the code to conf.py). It replaces the # character in some common Epytext fields with :, which is used in the corresponding Sphinx fields.
import re
re_field = re.compile('#(param|type|rtype|return)')
def fix_docstring(app, what, name, obj, options, lines):
for i in xrange(len(lines)):
lines[i] = re_field.sub(r':\1', lines[i])
def setup(app):
app.connect('autodoc-process-docstring', fix_docstring)
Pyment is a tool that can convert Python docstrings and create missing ones skeletons. It can manage Google, Epydoc (javadoc style), Numpydoc, reStructuredText (reST, Sphinx default) docstring formats.
It accepts a single file or a folder (exploring also sub-folders). For each file, it will recognize each docstring format and convert it to the desired one. At the end, a patch will be generated to apply to the file.
To convert your project:
install Pyment
Type the following (you can use a virtualenv):
$ git clone https://github.com/dadadel/pyment.git
$ cd pyment
$ python setup.py install
convert from Epydoc to Sphinx
You can convert your project to Sphinx format (reST), which is the default output format, by doing:
$ pyment /my/folder/project
In theory you could write a Sphinx extension which would catch whatever event gets fired when a docstring gets read (source_read, maybe?) and translate the docstrings on the fly.
I say in theory because:
I've been meaning to write such a thing for a very long time, but haven't managed to get around to it yet.
Translating stuff like this is always harder than it seems.
You could also probably try just replacing all the docstrings in your code with a similar translator outside of Sphinx, perhaps using the ast module or something similar.
As one of the comment suggested, sphinx-epytext does provides the relevant support. How it worked for me:
Installing it is very easy:
pip install -U sphinx-epytext
It contains one file process_docstring.py that converts the epytext markups to reStructuredText markups by replacing # with colon :.
Some of the fields I found missing in there were: ivar, var, cvar, vartype
Simply extend the existing list FIELDS in there:
FIELDS.extend(['ivar', 'var', 'cvar', 'vartype'])
Epytext understands #type for variables, but sphinx understands :vartype.
To fix that, replace the former ones with later ones inside process_docstring method.
Most of the syntax or docstring parts that Sphinx can't comprehend are reported as Warnings. You can log these warnings by running sphinx-build with -w <WarningLogFile>. As per my experience with it, Sphinx is very sensitive about how a field should start or end, missing-formatting-syntax, etc.

Pythonic and efficient way of defining multiple regexes for use over many iterations

I am presently writing a Python script to process some 10,000 or so input documents. Based on the script's progress output I notice that the first 400+ documents get processed really fast and then the script slows down although the input documents all are approximately the same size.
I am assuming this may have to do with the fact that most of the document processing is done with regexes that I do not save as regex objects once they have been compiled. Instead, I recompile the regexes whenever I need them.
Since my script has about 10 different functions all of which use about 10 - 20 different regex patterns I am wondering what would be a more efficient way in Python to avoid re-compiling the regex patterns over and over again (in Perl I could simply include a modifier //o).
My assumption is that if I store the regex objects in the individual functions using
pattern = re.compile()
the resulting regex object will not be retained until the next invocation of the function for the next iteration (each function is called but once per document).
Creating a global list of pre-compiled regexes seems an unattractive option since I would need to store the list of regexes in a different location in my code than where they are actually used.
Any advice here on how to handle this neatly and efficiently?
The re module caches compiled regex patterns. The cache is cleared when it reaches a size of re._MAXCACHE which by default is 100. (Since you have 10 functions with 10-20 regexes each (i.e. 100-200 regexes), your observed slow-down makes sense with the clearing of the cache.)
If you are okay with changing private variables, a quick and dirty fix to your program might be to set re._MAXCACHE to a higher value:
import re
re._MAXCACHE = 1000
Last time I looked, re.compile maintained a rather small cache, and when it filled up, just emptied it. DIY with no limit:
class MyRECache(object):
def __init__(self):
self.cache = {}
def compile(self, regex_string):
if regex_string not in self.cache:
self.cache[regex_string] = re.compile(regex_string)
return self.cache[regex_string]
Compiled regular expression are automatically cached by re.compile, re.search and re.match, but the maximum cache size is 100 in Python 2.7, so you're overflowing the cache.
Creating a global list of pre-compiled regexes seems an unattractive option since I would need to store the list of regexes in a different location in my code than where they are actually used.
You can define them near the place where they are used: just before the functions that use them. If you reuse the same RE in a different place, then it would have been a good idea to define it globally anyway to avoid having to modify it in multiple places.
In the spirit of "simple is better" I'd use a little helper function like this:
def rc(pattern, flags=0):
key = pattern, flags
if key not in rc.cache:
rc.cache[key] = re.compile(pattern, flags)
return rc.cache[key]
rc.cache = {}
Usage:
rc('[a-z]').sub...
rc('[a-z]').findall <- no compilation here
I also recommend you to try regex. Among many other advantages over the stock re, its MAXCACHE is 500 by default and won't get dropped completely on overflow.

Categories

Resources