Being new at programming in general, and new with Python in particular, I'm having some beginner's troubles.
I'm trying out a function from NLTK called generate:
string.generate()
It returns what seems like a string. However, if I write:
stringvariable = string.generate()
or
stringvariable = str(string.generate())
… the stringvariable is always Empty.
So I guess I'm missing something here. Can the text output generated, that I see on the screen, be something else than a string output? And if so, is there any way for me to grab that output and put it into a variable?
Briefly put, how to I get what comes out of string.generate() into stringvariable, if not as described above?
you can rewrite generate. The only disadvantage is that it can change and your code might not be updated to reflect these changes:
from nltk.util import tokenwrap
def generate_no_stdout(self, length=100):
if '_trigram_model' not in self.__dict__:
estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)
self._trigram_model = NgramModel(3, self, estimator=estimator)
text = self._trigram_model.generate(length)
return tokenwrap(text)
then "a.generate()" becomes "generate_no_stdout(a)"
generate() prints its output rather than returning a string, so you need to capture it.
Related
My programming is almost all self taught, so I apologise in advance if some of my terminology is off in this question. Also, I am going to use a simple example to help illustrate my question, but please note that the example itself is not important, its just a way to hopefully make my question clearer.
Imagine that I have some poorly formatted text with a lot of extra white space that I want to clean up. So I create a function that will replace any groups of white space characters that has a new line character in it with a single new line character and any other groups of white space characters with a single space. The function might look like this
def white_space_cleaner(text):
new_line_finder = re.compile(r"\s*\n\s*")
white_space_finder = re.compile(r"\s\s+")
text = new_line_finder.sub("\n", text)
text = white_space_finder.sub(" ", text)
return text
That works just fine, the problem is that now every time I call the function it has to compile the regular expressions. To make it run faster I can rewrite it like this
new_line_finder = re.compile(r"\s*\n\s*")
white_space_finder = re.compile(r"\s\s+")
def white_space_cleaner(text):
text = new_line_finder.sub("\n", text)
text = white_space_finder.sub(" ", text)
return text
Now the regular expressions are only compiled once and the function runs faster. Using timeit on both functions I find that the first function takes 27.3 µs per loop and the second takes 25.5 µs per loop. A small speed up, but one that could be significant if the function is called millions of time or has hundreds of patterns instead of 2. Of course, the downside of the second function is that it pollutes the global namespace and makes the code less readable. Is there some "Pythonic" way to include an object, like a compiled regular expression, in a function without having it be recompiled every time the function is called?
Keep a list of tuples (regular expressions and the replacement text) to apply; there doesn't seem to be a compelling need to name each one individually.
finders = [
(re.compile(r"\s*\n\s*"), "\n"),
(re.compile(r"\s\s+"), " ")
]
def white_space_cleaner(text):
for finder, repl in finders:
text = finder.sub(repl, text)
return text
You might also incorporate functools.partial:
from functools import partial
replacers = {
r"\s*\n\s*": "\n",
r"\s\s+": " "
}
# Ugly boiler-plate, but the only thing you might need to modify
# is the dict above as your needs change.
replacers = [partial(re.compile(regex).sub, repl) for regex, repl in replacers.iteritems()]
def white_space_cleaner(text):
for replacer in replacers:
text = replacer(text)
return text
Another way to do it is to group the common functionality in a class:
class ReUtils(object):
new_line_finder = re.compile(r"\s*\n\s*")
white_space_finder = re.compile(r"\s\s+")
#classmethod
def white_space_cleaner(cls, text):
text = cls.new_line_finder.sub("\n", text)
text = cls.white_space_finder.sub(" ", text)
return text
if __name__ == '__main__':
print ReUtils.white_space_cleaner("the text")
It's already grouped in a module, but depending on the rest of the code a class can also be suitable.
You could put the regular expression compilation into the function parameters, like this:
def white_space_finder(text, new_line_finder=re.compile(r"\s*\n\s*"),
white_space_finder=re.compile(r"\s\s+")):
text = new_line_finder.sub("\n", text)
text = white_space_finder.sub(" ", text)
return text
Since default function arguments are evaluated when the function is parsed, they'll only be loaded once and they won't be in the module namespace. They also give you the flexibility to replace those from calling code if you really need to. The downside is that some people might consider it to be polluting the function signature.
I wanted to try timing this but I couldn't figure out how to use timeit properly. You should see similar results to the global version.
Markus's comment on your post is correct, though; sometimes it's fine to put variables at module-level. If you don't want them to be easily visible to other modules, though, consider prepending the names with an underscore; this marks them as module-private and if you do from module import * it won't import names starting with an underscore (you can still get them if you ask from them by name, though).
Always remember; the end-all to "what's the best way to do this in Python" is almost always "what makes the code most readable?" Python was created, first and foremost, to be easy to read, so do what you think is the most readable thing.
In this particular case I think it doesn't matter. Check:
Is it worth using Python's re.compile?
As you can see in the answer, and in the source code:
https://github.com/python/cpython/blob/master/Lib/re.py#L281
The implementation of the re module has a cache of the regular expression itself. So, the small speed up you see is probably because you avoid the lookup for the cache.
Now, as with the question, sometimes doing something like this is very relevant like, again, building a internal cache that remains namespaced to the function.
def heavy_processing(arg):
return arg + 2
def myfunc(arg1):
# Assign attribute to function if first call
if not hasattr(myfunc, 'cache'):
myfunc.cache = {}
# Perform lookup in internal cache
if arg1 in myfunc.cache:
return myfunc.cache[arg1]
# Very heavy and expensive processing with arg1
result = heavy_processing(arg1)
myfunc.cache[arg1] = result
return result
And this is executed like this:
>>> myfunc.cache
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'function' object has no attribute 'cache'
>>> myfunc(10)
12
>>> myfunc.cache
{10: 12}
You can use a static function attribute to hold the compiled re. This example does something similar, keeping a translation table in one function attribute.
def static_var(varname, value):
def decorate(func):
setattr(func, varname, value)
return func
return decorate
#static_var("complements", str.maketrans('acgtACGT', 'tgcaTGCA'))
def rc(seq):
return seq.translate(rc.complements)[::-1]
Forgive this rather basic Python question, but I literally have very little Python experience. I'm create a basic Python script for use with Kodi:
http://kodi.wiki/view/List_of_built-in_functions
Example code:
import kodi
variable = "The value to use in PlayMedia"
kodi.executebuiltin("PlayMedia(variable)")
kodi.executebuiltin("PlayerControl(RepeatAll)")
Rather than directly providing a string value for the function PlayMedia, I want to pass a variable as the value instead. The idea is another process may modify the variable value with sed so it can't be static.
Really simple, but can someone point me in the right direction?
It's simple case of string formatting.
template = "{}({})"
functionName = "function" # e.g. input from user
arg = "arg" # e.g. input from user
formatted = template.format(functionName, arg)
assert formatted == "function(arg)"
kodi.executebuiltin(formatted)
OK as far as I get your problem you need to define a variable whose value could be changed later, so the first part is easier, defining a variable in python is as simple as new_song = "tiffny_avlord_I_love_u", similarly you can define another string as new_video = "Bohemia_on_my_feet", the thing to keep in mind is that while defining variables as strings, you need to encapsulate all the string inside the double quotes "..." (However, single quotes also work fine)
Now the issue is how to update it's value , the easiest way is to take input from the user itself which can be done using raw_input() as :
new_song = raw_input("Please enter name of a valid song: ")
print "The new song is : "+new_song
Now whatever the user enters on the console would be stored in the variable new_song and you could use this variable and pass it to any function as
some_function(new_song)
Try executing this line and you will understand how it works.
Using python and NLTK I want to save the help result to a variable.
x = nltk.help.upenn_tagset('RB')
for example.
x variable is assigned with None. The console prints the result of the help function but it doesn't save that to var x.
Looking at the source file of help.py, it uses the print statement and doesn't return anything. upenn_tagset calls _format_tagset, which passes everything to _print_entries, which uses print.
So, what we really want to do is to redirect the print statement.
Quick search, and we've got https://stackoverflow.com/a/4110906/1210278 - replace sys.stdout.
As pointed out in the question linked by #mgilson, this is a permanent solution to a temporary problem. So what do we do? That should be easy - just keep the original around somewhere.
import sys
print "Hello"
cons_out = sys.stdout
sys.stdout = (other writable handle you can get result of)
do_printing_function()
sys.stdout = cons_out
print "World!"
This is actually exactly what the accepted answer at https://stackoverflow.com/a/6796752/1210278 does, except it uses a reusable class wrapper - this is a one-shot solution.
Easiest way to get output of tag explanation is by loading whole tag-set and then extracting explanation of only required tags.
tags = nltk.data.load('help/tagsets/upenn_tagset.pickle')
tags['RB']
I'm trying to write an python script to collect one specific function's parameters.
Parameters can be in multiple lines like this:
str = "getParameters(['ABCD_1','ABCD_2',\
'ABCD_3','ABCD_4'])\
This works already: (it can catch every words between ' and '):
parameters = re.findall(r'\'[\w-]+\'', str)
for parameter in parameters:
print parameter
But I want that only in case of getParameters function the parameters to be collect, and this does not work:
getparameters = re.findall(r'getParameters\(\[[\w-]+', str, re.X|re.DOTALL)
for line in getparameters:
print line
Please suggest!
Here is an example using ast, just for fun.
import ast
module = ast.parse(
"""getParameters(['ABCD_1','ABCD_2',
'ABCD_3','ABCD_4'])""")
for item in module.body:
if isinstance(item.value, ast.Call) and item.value.func.id == 'getParameters':
parameters = [each.s for each in item.value.args[0].elts]
print parameters
If you're fixed on using RegEx and if your function occurs exactly once, you can try:
re.findall('\'(\w+)\',?', re.search('(getParameters\(.+?\))', x, re.X|re.S).group(1), re.X|re.S)
It's not ideal, but it works. I am sure there is a better way to do this.
I'm currently working on an experiment where I'm implementing an interpreter for an old in-game scripting language. It's a forth based language, so I figure it would be fairly easy to just have the instructions (once verified and santized) put into a big list.
Once I've got the code in a list, I am trying to iterate through the entire program in a for loop that processes the instructions one at a time. Certain items, like strings, could be placed onto a variable that holds the current stack, which is easy enough. But where I'm stuck is making commands happen.
I have a big list of functions that are valid and I'd like it to where if any instruction matches them, it calls the associated function.
So, for example, if I had:
"Hello, world!" notify
...the code would check for notify in a list and then execute the notify function. The bottom line is: How do I translate a string into a function name?
You could keep a dictionary of functions the code can call, and then do a look up when you need to:
def notify(s):
print(s)
d = {"notify": notify}
d["notify"]("Hello, world!")
You can do it through locals which is a dictionary with th current local symbol table:
locals()["notify"]()
or though globals which returns a dictionary with the symbol table of globals:
globals()["notify"]()
You can give arguments too e.g.:
locals()["notify"]("Hello, world!")
or
globals()["notify"]("Hello, world!")
If you have a dict called commands that maps names to functions, you can do it like this:
def my_notify_function():
print(stack.pop)
commands = {'notify': my_notify_function, ...}
for item in program:
if item in commands:
commands[item]()
else:
stack.push(item)
Something like:
import re
class LangLib(object):
pattern = re.compile(r'"(.*?)" (.*)')
def run_line(self, line):
arg, command = re.match(LangLib.pattern, line).groups()
return getattr(self, command)(arg)
def notify(self, arg):
print arg
Then your engine code would be:
parser = LangLib()
for line in program_lines:
parser.run_line(line)
Create a dictionary of function names and some tags.
I have tried it several times before, it works really well.