String formatting without index in python2.6 - python

I've got many thousands of lines of python code that has python2.7+ style string formatting (e.g. without indices in the {}s)
"{} {}".format('foo', 'bar')
I need to run this code under python2.6 which requires the indices.
I'm wondering if anyone knows of a painless way allow python2.6 to run this code. It'd be great if there was a from __future__ import blah solution to the problem. I don't see one. Something along those lines would be my first choice.
A distant second would be some script that can automate the process of adding the indices, at least in the obvious cases:
"{0} {1}".format('foo', 'bar')

It doesn't quite preserve the whitespacing and could probably be made a bit smarter, but it will at least identify Python strings (apostrophes/quotes/multi line) correctly without resorting to a regex or external parser:
import tokenize
from itertools import count
import re
with open('your_file') as fin:
output = []
tokens = tokenize.generate_tokens(fin.readline)
for num, val in (token[:2] for token in tokens):
if num == tokenize.STRING:
val = re.sub('{}', lambda L, c=count(): '{{{0}}}'.format(next(c)), val)
output.append((num, val))
print tokenize.untokenize(output) # write to file instead...
Example input:
s = "{} {}".format('foo', 'bar')
if something:
do_something('{} {} {}'.format(1, 2, 3))
Example output (note slightly iffy whitespacing):
s ="{0} {1}".format ('foo','bar')
if something :
do_something ('{0} {1} {2}'.format (1 ,2 ,3 ))

You could define a function to re-format your format strings:
def reformat(s):
return "".join("".join((x, str(i), "}"))
for i, x in list(enumerate(s.split("}")))[:-1])

Maybe a good old sed-regex like:
sed source.py -e 's/{}/%s/g; s/\.format(/ % (/'
your example would get changed to something like:
"%s %s" % ('foo', 'bar')
Granted you loose the fancy new style .format() but imho it's almost never useful for trivial value insertions.

A conversion script could be pretty simple. You can find strings to replace with regex:
fmt = "['\"][^'\"]*{}.*?['\"]\.format"
str1 = "x; '{} {}'.format(['foo', 'bar'])"
str2 = "This is a function; 'First is {}, second is {}'.format(['x1', 'x2']); some more code"
str3 = 'This doesn't have anything but a format. format(x)'
str4 = "This has an old-style format; '{0} {1}'.format(['some', 'list'])"
str5 = "'{0}'.format(1); '{} {}'.format(['x', 'y'])"
def add_format_indices(instr):
text = instr.group(0)
i = 0
while '{}' in text:
text = text.replace('{}', '{%d}'%i, 1)
i = i+1
return text
def reformat_text(text):
return re.sub(fmt, add_format_indices, text)
reformat_text(str1)
"x; '{0} {1}'.format(['foo', 'bar'])"
reformat_text(str2)
"This is a function; 'First is {0}, second is {1}'.format(['x1', 'x2']); some more code"
reformat_text(str3)
"This doesn't have anything but a format. format(x)"
reformat_text(str4)
"This has an old-style format; '{0} {1}'.format(['some', 'list'])"
reformat_text(str5)
"'{0}'.format(1); '{0} {1}'.format(['x', 'y'])"
I think you could throw a whole file through this. You can probably find a faster implementation of add_format_indices, and obviously it hasn't been tested a whole lot.
Too bad there isn't an import __past__, but in general that's not something usually offered (see the 2to3 script for an example), so this is probably your next best option.

Related

How to do nothing while using replace() string method?

I am working with some strings and I am removing some characters from them by using replace(), for example:
a = 'monsterr'
new_a = a.replace("rr", "r")
new_a
However, let's say that now I receive the following string:
In:
a = 'difference'
new_a = a.replace("rr", "r")
new_a
Out:
'difference'
How can I return nothing if my string doesnt contain rr? Is there anyway of just pass or return nothing? I tried to:
def check(a_str):
if 'rr' in a_str:
a_str = a_str.replace("rr", "r")
return a_str
else:
pass
However, it doesn't work. The expected output would be for monsterwould be nothing.
Use return:
def check(a_str):
if 'rr' in a_str:
a_str = a_str.replace("rr", "r")
return a_str
For list comprehension:
a = ["difference", "hinderr"]
x = [i.replace("rr", "r") for i in a]
Just as a little easter egg, I figured I'd include this little gem as an option as well, if only because of your question:
How can I return nothing if my string doesnt contain rr? Is there anyway of just pass or return nothing?
Using boolean operators, you could take the if line completely out of check().
def check(text, dont_want='rr', want='r'):
replacement = text.replace(dont_want, want)
return replacement != text and replacement or None
#checks if there was a change after replacing,
#if True: returns replacement
#if False: returns None
test = "differrence"
check(test)
#difference
test = "difference"
check(test)
#None
Consider this un-pythonic or not, it's another option. Plus it's along the lines of his question.
"return none if string doesn't contain rr"
For those that don't know how or why this works, (and/or enjoy learning cool python tricks but don't know this) then here's the docs page explaining boolean operators.
P.S.
Technically speaking, it is un-pythonic due to it being a ternary operation. This does go against the "Zen of Python" ~ import this but coming from C style languages I enjoy them.

Counting words in a dictionary (Python)

I have this code, which I want to open a specified file, and then every time there is a while loop it will count it, finally outputting the total number of while loops in a specific file. I decided to convert the input file to a dictionary, and then create a for loop that every time the word while followed by a space was seen it would add a +1 count to WHILE_ before finally printing WHILE_ at the end.
However this did not seem to work, and I am at a loss as to why. Any help fixing this would be much appreciated.
This is the code I have at the moment:
WHILE_ = 0
INPUT_ = input("Enter file or directory: ")
OPEN_ = open(INPUT_)
READLINES_ = OPEN_.readlines()
STRING_ = (str(READLINES_))
STRIP_ = STRING_.strip()
input_str1 = STRIP_.lower()
dic = dict()
for w in input_str1.split():
if w in dic.keys():
dic[w] = dic[w]+1
else:
dic[w] = 1
DICT_ = (dic)
for LINE_ in DICT_:
if ("while\\n',") in LINE_:
WHILE_ += 1
elif ('while\\n",') in LINE_:
WHILE_ += 1
elif ('while ') in LINE_:
WHILE_ += 1
print ("while_loops {0:>12}".format((WHILE_)))
This is the input file I was working from:
'''A trivial test of metrics
Author: Angus McGurkinshaw
Date: May 7 2013
'''
def silly_function(blah):
'''A silly docstring for a silly function'''
def nested():
pass
print('Hello world', blah + 36 * 14)
tot = 0 # This isn't a for statement
for i in range(10):
tot = tot + i
if_im_done = false # Nor is this an if
print(tot)
blah = 3
while blah > 0:
silly_function(blah)
blah -= 1
while True:
if blah < 1000:
break
The output should be 2, but my code at the moment prints 0
This is an incredibly bizarre design. You're calling readlines to get a list of strings, then calling str on that list, which will join the whole thing up into one big string with the quoted repr of each line joined by commas and surrounded by square brackets, then splitting the result on spaces. I have no idea why you'd ever do such a thing.
Your bizarre variable names, extra useless lines of code like DICT_ = (dic), etc. only serve to obfuscate things further.
But I can explain why it doesn't work. Try printing out DICT_ after you do all that silliness, and you'll see that the only keys that include while are while and 'while. Since neither of these match any of the patterns you're looking for, your count ends up as 0.
It's also worth noting that you only add 1 to WHILE_ even if there are multiple instances of the pattern, so your whole dict of counts is useless.
This will be a lot easier if you don't obfuscate your strings, try to recover them, and then try to match the incorrectly-recovered versions. Just do it directly.
While I'm at it, I'm also going to fix some other problems so that your code is readable, and simpler, and doesn't leak files, and so on. Here's a complete implementation of the logic you were trying to hack up by hand:
import collections
filename = input("Enter file: ")
counts = collections.Counter()
with open(filename) as f:
for line in f:
counts.update(line.strip().lower().split())
print('while_loops {0:>12}'.format(counts['while']))
When you run this on your sample input, you correctly get 2. And extending it to handle if and for is trivial and obvious.
However, note that there's a serious problem in your logic: Anything that looks like a keyword but is in the middle of a comment or string will still get picked up. Without writing some kind of code to strip out comments and strings, there's no way around that. Which means you're going to overcount if and for by 1. The obvious way of stripping—line.partition('#')[0] and similarly for quotes—won't work. First, it's perfectly valid to have a string before an if keyword, as in "foo" if x else "bar". Second, you can't handle multiline strings this way.
These problems, and others like them, are why you almost certainly want a real parser. If you're just trying to parse Python code, the ast module in the standard library is the obvious way to do this. If you want to be write quick&dirty parsers for a variety of different languages, try pyparsing, which is very nice, and comes with some great examples.
Here's a simple example:
import ast
filename = input("Enter file: ")
with open(filename) as f:
tree = ast.parse(f.read())
while_loops = sum(1 for node in ast.walk(tree) if isinstance(node, ast.While))
print('while_loops {0:>12}'.format(while_loops))
Or, more flexibly:
import ast
import collections
filename = input("Enter file: ")
with open(filename) as f:
tree = ast.parse(f.read())
counts = collections.Counter(type(node).__name__ for node in ast.walk(tree))
print('while_loops {0:>12}'.format(counts['While']))
print('for_loops {0:>14}'.format(counts['For']))
print('if_statements {0:>10}'.format(counts['If']))

Python:New line at the same postion

In python 2.7 how can you achieve the following feature:
print "some text here"+?+"and then it starts there"
the output on terminal should look like:
some text here
and then it starts here
I have searched around and I think \rshould do the work but I tried it out it does not work. I am confused now.
BTW, is the \r solution portable?
P.S.
In my odd situation, I think knowing the length of prev line is quite difficult for me. so any idea rather then using the length of the line above it?
==================================================================================
Okay the situation is like this, I am writing a tree structure and I want to print it out nicely using the __str__ function
class node:
def __init__(self,key,childern):
self.key = key
self.childern = childern
def __str__(self):
return "Node:"+self.key+"Children:"+str(self.childern)
where Children is a list.
Every time it is printing Children, I want it indented using one more than last line. So I think I cannot predict the length before the line I want to print.
\r is probably not a portable solution, the way it is rendered will depend on whatever text editor or terminal you're using. On older Mac systems, '\r' is was used as the end of line character(On windows it is '\r\n' and on linux and OSX it is '\n'.
You could simply do something like this:
def print_lines_at_same_position(*lines):
prev_len = 0
for line in lines:
print " "*prev_len + line
prev_len += len(line)
Usage example:
>>> print_lines_at_same_position("hello", "world", "this is a test")
hello
world
this is a test
>>>
This will only work if whatever you're outputting to has a font with a fixed character length though. I can't think of anything that will work otherwise
Edit to fit changed question
Okay, so that's an entirely different question. I don't think there's any way to do it with it starting at exactly the position where the last line left off unless self.key has a predictable length. But you can get something pretty close with this:
class node:
def __init__(self,key,children):
self.key = key
self.children = children
self.depth = 0
def set_depth(self, depth):
self.depth = depth
for child in self.children:
child.set_depth(depth+1)
def __str__(self):
indent = " "*4*self.depth
children_str = "\n".join(map(str, self.children))
if children_str:
children_str = "\n" + children_str
return indent + "Node: %s%s" % (self.key, children_str)
Then just set the depth of the root node to 0 and do that again every time you change the structure of the tree. There are more efficient ways if you know exactly how you're changing the tree, you can probably figure those out yourself :)
Usage example:
>>> a = node("leaf", [])
>>> b = node("another leaf", [])
>>> c = node("internal", [a,b])
>>> d = node("root", [c])
>>> d.set_depth(0)
>>> print d
Node: root
Node: internal
Node: leaf
Node: another leaf
>>>
You could use os.linesep to get a more portable linebreak, instead of just \r. I would then use len() to calculate the length of the 1st string in order to calculate whitespace.
>>> import os
>>> my_str = "some text here"
>>> print my_str + os.linesep + ' ' * len(my_str) + 'and then it starts here'
some text here
and then it starts here
The key is ' ' * len(my_str). This will repeat the space character len(my_str) times.
The \r solution is not what you are looking for since it is part of the windows newline, but in mac systems it actually is the newline.
You would need code like the following:
def pretty_print(text):
total = 0
for element in text:
print "{}{}".format(' '*total, element)
total += len(element)
pretty_print(["lol", "apples", "are", "fun"])
Which will print the lines of text the way you want them to.
Try using the len("text") * ' ' to get the amount of white space you want.
To get a portable line break, use os.linesep
>>> import os
>>> os.linesep
'\n'
EDIT
Another option that might be suitable in some cases is to override the stdout stream.
import sys, os
class StreamWrap(object):
TAG = '<br>' # use a string that suits your use case
def __init__(self, stream):
self.stream = stream
def write(self, text):
tokens = text.split(StreamWrap.TAG)
indent = 0
for i, token in enumerate(tokens):
self.stream.write(indent*' ' + token)
if i < len(tokens)-1:
self.stream.write(os.linesep)
indent += len(token)
def flush(self):
self.stream.flush()
sys.stdout = StreamWrap(sys.stdout)
print "some text here"+ StreamWrap.TAG +"and then it starts there"
This will give you a result like this:
>>> python test.py
some text here
and then it starts there

Python; reading file and finding desired text

Need to create a function with two params, a filename to open and a pattern.
The pattern will be a search string.
Eg. the function will open sentence.txt that has something like "The quick brown fox" (can possibly be more than one line)
The pattern will be "brown fox"
So if found, as this will be, it should return a line number and index of the character the found string starts on. Else, return -1.
Catch is I've never programmed in python before so I don't know the syntax.
Previously coded in C, C#, Java, VB, etc..
EDIT:
.....Id
.....Name
#
my intent was for you to write HW3 code as iteration or
nested iterations that explicitly index the character
string as an array; i.e, the Python index() also known as
string.index() function is not allowed for this homework.
#
filename = raw_input('Enter filename: ')
pattern = raw_input('Enter pattern: ')
def findPattern(fname, pat):
Reading in one whole chunk
filetext = open(fname).read()
if pat in filetext:
print("Found it -- chunk")
else:
print("Nothing -- chunk")
Reading in line by line
for search in open(fname):
if pat in search:
print("Found it -- line")
else:
print("Nothing -- line")
findPattern(filename, pattern)
you can simulate simple "grep" with the "in" operator
def grep(filename, pattern):
for n,line in enumerate(open(filename)):
if pattern in line:
print line, n
To get index, you can use str.index() or str.find()
Here's a very simple grep. You could hack it out to use regular expressions pretty trivially. globbing wouldn't be much more difficult with glob. Also, the code you want is in there spread between grep and main so that might be of more interest than a custom grep ;)
def grep(filename, needle):
with open(filename) as f_in:
matches = ((i, line.find(needle), line) for i, line in enumerate(f_in))
return [match for match in matches if match[0] != -1]
def main(filename, needle):
matches = grep(filename, needle)
if matches:
print "{0} found on {1} lines in {2}".format(needle, len(matches), filename)
for line in matches:
print "{0}:{1}:{2}".format(*line)
return 1
else:
return -1
if __name__=='__main__':
import sys
filename = sys.argv[1]
needle = sys.argv[2]
return sys.exit(main(filename, needle))
Note that I haven't tested this code so there might be slight bugs. If it compiles, it should run fine though.
Also, you should tell your teacher that signalling failure with return codes is a terrible way to do things. If the caller of the function that you're going to write needs to know if no matches were found, it can just check for an empty list.

How to refactor this python code block to be more efficient

This code block works - it loops through a file that has a repeating number of sets of data
and extracts out each of the 5 pieces of information for each set.
But I I know that the current factoring is not as efficient as it can be since it is looping
through each key for each line found.
Wondering if some python gurus can offer better way to do this more efficiently.
def parse_params(num_of_params,lines):
for line in lines:
for p in range(1,num_of_params + 1,1):
nam = "model.paramName "+str(p)+" "
par = "model.paramValue "+str(p)+" "
opt = "model.optimizeParam "+str(p)+" "
low = "model.paramLowerBound "+str(p)+" "
upp = "model.paramUpperBound "+str(p)+" "
keys = [nam,par,opt,low,upp]
for key in keys:
if key in line:
a,val = line.split(key)
if key == nam: names.append(val.rstrip())
if key == par: params.append(val.rstrip())
if key == opt: optimize.append(val.rstrip())
if key == upp: upper.append(val.rstrip())
if key == low: lower.append(val.rstrip())
print "Names = ",names
print "Params = ",params
print "Optimize = ",optimize
print "Upper = ",upper
print "Lower = ",lower
Though this doesn't answer your question (other answers are getting at that) something that has helped me a lot in doing things similar to what you're doing are List Comprehensions. They allow you to build lists in a concise and (I think) easy to read way.
For instance, the below code builds a 2-dimenstional array with the values you're trying to get at. some_funct here would be a little regex, if I were doing it, that uses the index of the last space in the key as the parameter, and looks ahead to collect the value you're trying to get in the line (the value which corresponds to the key currently being looked at) and appends it to the correct index in the seen_keys 2D array.
Wordy, yes, but if you get list-comprehension and you're able to construct the regex to do that, you've got a nice, concise solution.
keys = ["model.paramName ","model.paramValue ","model.optimizeParam ""model.paramLowerBound ","model.paramUpperBound "]
for line in lines:
seen_keys = [[],[],[],[],[]]
[seen_keys[keys.index(k)].some_funct(line.index(k) for k in keys if k in line]
It's not totally easy to see the expected format. From what I can see, the format is like:
lines = [
"model.paramName 1 foo",
"model.paramValue 2 bar",
"model.optimizeParam 3 bat",
"model.paramLowerBound 4 zip",
"model.paramUpperBound 5 ech",
"model.paramName 1 foo2",
"model.paramValue 2 bar2",
"model.optimizeParam 3 bat2",
"model.paramLowerBound 4 zip2",
"model.paramUpperBound 5 ech2",
]
I don't see the above code working if there is more than one value in each line. Which means the digit is not really significant unless I'm missing something. In that case this works very easily:
import re
def parse_params(num_of_params,lines):
key_to_collection = {
"model.paramName":names,
"model.paramValue":params,
"model.optimizeParam":optimize,
"model.paramLowerBound":upper,
"model.paramUpperBound":lower,
}
reg = re.compile(r'(.+?) (\d) (.+)')
for line in lines:
m = reg.match(line)
key, digit, value = m.group(1, 2, 3)
key_to_collection[key].append(value)
It's not entirely obvious from your code, but it looks like each line can have one "hit" at most; if that's indeed the case, then something like:
import re
def parse_params(num_of_params, lines):
sn = 'Names Params Optimize Upper Lower'.split()
ks = '''paramName paramValue optimizeParam
paramLowerBound paramUpperBound'''.split()
vals = dict((k, []) for k in ks)
are = re.compile(r'model\.(%s) (\d+) (.*)' % '|'.join(ks))
for line in lines:
mo = are.search(line)
if not mo: continue
p = int(mo.group(2))
if p < 1 or p > num_of_params: continue
vals[mo.group(1)].append(mo.group(3).rstrip())
for k, s in zip(ks, sn):
print '%-8s =' % s,
print vals[k]
might work -- I exercised it with a little code as follows:
if __name__ == '__main__':
lines = '''model.paramUpperBound 1 ZAP
model.paramLowerBound 1 zap
model.paramUpperBound 5 nope'''.splitlines()
parse_params(2, lines)
and it emits
Names = []
Params = []
Optimize = []
Upper = ['zap']
Lower = ['ZAP']
which I think is what you want (if some details must differ, please indicate exactly what they are and let's see if we can fix it).
The two key ideas are: use a dict instead of lots of ifs; use a re to match "any of the following possibilities" with parenthesized groups in the re's pattern to catch the bits of interest (the keyword after model., the integer number after that, and the "value" which is the rest of the line) instead of lots of if x in y checks and string manipulation.
There is a lot of duplication there, and if you ever add another key or param, you're going to have to add it in many places, which leaves you ripe for errors. What you want to do is pare down all of the places you have repeated things and use some sort of data model, such as a dict.
Some others have provided some excellent examples, so I'll just leave my answer here to give you something to think about.
Are you sure that parse_params is the bottle-neck? Have you profiled your app?
import re
from collections import defaultdict
names = ("paramName paramValue optimizeParam "
"paramLowerBound paramUpperBound".split())
stmt_regex = re.compile(r'model\.(%s)\s+(\d+)\s+(.*)' % '|'.join(names))
def parse_params(num_of_params, lines):
stmts = defaultdict(list)
for m in (stmt_regex.match(s) for s in lines):
if m and 1 <= int(m.group(2)) <= num_of_params:
stmts[m.group(1)].append(m.group(3).rstrip())
for k, v in stmts.iteritems():
print "%s = %s" % (k, ' '.join(v))
The code given in the OP does multiple tests per line to try to match against the expected set of values, each of which is being constructed on the fly. Rather than construct paramValue1, paramValue2, etc. for each line, we can use a regular expression to try to do the matching in a cheaper (and more robust) manner.
Here's my code snippet, drawing from some ideas that have already been posted. This lets you add a new keyword to the key_to_collection dictionary and not have to change anything else.
import re
def parse_params(num_of_params, lines):
pattern = re.compile(r"""
model\.
(.+) # keyword
(\d+) # index to keyword
[ ]+ # whitespace
(.+) # value
""", re.VERBOSE)
key_to_collection = {
"paramName": names,
"paramValue": params,
"optimizeParam": optimize,
"paramLowerBound": upper,
"paramUpperBound": lower,
}
for line in lines:
match = pattern.match(line)
if not match:
print "Invalid line: " + line
elif match[1] not in key_to_collection:
print "Invalid key: " + line
# Not sure if you really care about enforcing this
elif match[2] > num_of_params:
print "Invalid param: " + line
else:
key_to_collection[match[1]].append(match[3])
Full disclosure: I have not compiled/tested this.
It can certainly be made more efficient. But, to be honest, unless this function is called hundreds of times a second, or works on thousands of lines, is it necessary?
I would be more concerned about making it clear what is happening... currently, I'm far from clear on that aspect.
Just eyeballing it, the input seems to look like this:
model.paramName 1 A model.paramValue 1 B model.optimizeParam 1 C model.paramLowerBound 1 D model.paramUpperBound 1 E model.paramName 2 F model.paramValue 2 G model.optimizeParam 2 H model.paramLowerBound 2 I model.paramUpperBound 2 J
And your desired output seems to be something like:
Names = AF
Params = BG
etc...
Now, since my input certainly doesn't match yours, the output is likely off too, but I think I have the gist.
There are a few points. First, does it matter how many parameters are passed to the function? For example, if the input has two sets of parameters, do I just want to read both, or is it necessary to allow the function to only read one? For example, your code allows me to call parse_params(1,1) and have it only read parameters ending in a 1 from the same input. If that's not actually a requirement, you can skip a large chunk of the code.
Second, is it important to ONLY read the given parameters? If I, for example, have a parameter called 'paramFoo', is it bad if I read it? You can also simplify the procedure by just grabbing all parameters regardless of their name, and extracting their value.
def parse_params(input):
parameter_list = {}
param = re.compile(r"model\.([^ ]+) [0-9]+ ([^ ]+)")
each_parameter = param.finditer(input)
for match in each_parameter:
key = match[0]
value = match[1]
if not key in paramter_list:
parameter_list[key] = []
parameter_list[key].append(value)
return parameter_list
The output, in this instance, will be something like this:
{'paramName':[A, F], 'paramValue':[B, G], 'optimizeParam':[C, H], etc...}
Notes: I don't know Python well, I'm a Ruby guy, so my syntax may be off. Apologies.

Categories

Resources