Python: replacing multiple words in a text file from a dictionary - python

I am having trouble figuring out where I'm going wrong. So I need to randomly replace words and re-write them to the text file, until it no longer makes sense to anyone else. I chose some words just to test it, and have written the following code which is not currently working:
# A program to read a file and replace words until it is no longer understandable
word_replacement = {'Python':'Silly Snake', 'programming':'snake charming', 'system':'table', 'systems':'tables', 'language':'spell', 'languages':'spells', 'code':'snake', 'interpreter':'charmer'}
main = open("INF108.txt", 'r+')
words = main.read().split()
main.close()
for x in word_replacement:
for y in words:
if word_replacement[x][0]==y:
y==x[1]
text = " ".join(words)
print text
new_main = open("INF108.txt", 'w')
new_main.write(text)
new_main.close()
This is the text in the file:
Python is a widely used general-purpose, high-level programming
language. It's design philosophy emphasizes code readability, and its
syntax allows programmers to express concepts in fewer lines of code
than would be possible in languages such as C++ or Java. The language
provides constructs intended to enable clear programs on both a small
and large scale.Python supports multiple programming paradigms,
including object-oriented, imperative and functional programming or
procedural styles. It features a dynamic type system and automatic
memory management and has a large and comprehensive standard
library.Python interpreters are available for installation on many
operating systems, allowing Python code execution on a wide variety
of systems. Using third- party tools, such as Py2exe or Pyinstaller,
Python code can be packaged into stand-alone executable programs for
some of the most popular operating systems, allowing for the
distribution of Python-based software for use on those environments
without requiring the installation of a Python interpreter.
I've tried a few methods of this but as someone new to Python it's been a matter of guessing, and the last two days spent researching it online, but most of the answers I've found are either far too complicated for me to understand, or are specific to that person's code and don't help me.

OK, let's take this step by step.
main = open("INF108.txt", 'r+')
words = main.read().split()
main.close()
Better to use the with statement here. Also, r is the default mode. Thus:
with open("INF108.txt") as main:
words = main.read().split()
Using with will make main.close() get called automatically for you when this block ends; you should do the same for the file write at the end as well.
Now for the main bit:
for x in word_replacement:
for y in words:
if word_replacement[x][0]==y:
y==x[1]
This little section has several misconceptions packed into it:
Iterating over a dictionary (for x in word_replacement) gives you its keys only. Thus, when you want to compare later on, you should just be checking if word_replacement[x] == y. Doing a [0] on that just gives you the first letter of the replacement.
Iterating over the dictionary is defeating the purpose of having a dictionary in the first place. Just loop over the words you want to replace, and check if they're in the dictionary using y in word_replacement.
y == x[1] is wrong in two ways. First of all, you probably meant to be assigning to y there, not comparing (i.e. y = x[1] -- note the single = sign). Second, assigning to a loop variable doesn't even do what you want. y will just get overwritten with a new value next time around the loop, and the words data will NOT get changed at all.
What you want to do is create a new list of possibly-replaced words, like so:
replaced = []
for y in words:
if y in word_replacement:
replaced.append(word_replacement[y])
else:
replaced.append(y)
text = ' '.join(replaced)
Now let's do some refinement. Dictionaries have a handy get method that lets you get a value if the key is present, or a default if it's not. If we just use the word itself as a default, we get a nifty reduction:
replaced = []
for y in words:
replacement = word_replacement.get(y, y)
replaced.append(replacement)
text = ' '.join(replaced)
Which you can just turn into a one-line list-comprehension:
text = ' '.join(word_replacement.get(y, y) for y in words)
And now we're done.

It looks like you want something like this as your if statement in the nested loops:
if x==y:
y=word_replacement[x]
When you loop over a dictionary, you get its keys, not key-value pairs:
>>> mydict={'Python':'Silly Snake', 'programming':'snake charming', 'system':'table'}
>>> for i in mydict:
... print i
Python
programming
system
You can then get the value with mydict[i].
This doesn't quite work, though, because assigning to y doesn't change that element of words. You can loop over its indices instead of elements to assign to the current element:
for x in word_replacement:
for y in range(len(words)):
if x==words[y]:
words[y]=word_replacement[x]
I'm using range() and len() here to get a list of indices of words ([0, 1, 2, ...])

Your issue is probably here:
if word_replacement[x][0]==y:
Here's a small example of what is actually happening, which is probably not what you intended:
w = {"Hello": "World", "Python": "Awesome"}
print w["Hello"]
print w["Hello"][0]
Which should result in:
"World"
"W"
You should be able to figure out how to correct the code from here.

You used word_replacement (which is a dictionary) in a wrong way. You should change for loop to something like this:
for y in words:
if y in word_replacement:
words[words.index(y)] = word_replacement[y]

Related

How to detect the amount of almost-repetition in a text file?

I am a programming teacher, and I would like to write a script that detects the amount of repetition in a C/C++/Python file. I guess I can treat any file as pure text.
The script's output would be the number of similar sequences that repeat. Eventually, I am only interested in a DRY's metric (how much the code satisfied the DRY principle).
Naively I tried to do a simple autocorrelation but it would be hard to find the proper threshold.
u = open("find.c").read()
v = [ord(x) for x in u]
y = np.correlate(v, v, mode="same")
y = y[: int(len(y) / 2)]
x = range(len(y))
z = np.polyval(np.polyfit(x, y, 3), x)
f = (y - z)[: -5]
plt.plot(f)
plt.show();
So I am looking at different strategies... I also tried to compare the similarities between each line, each group of 2 lines, each group of 3 lines ...
import difflib
import numpy as np
lines = open("b.txt").readlines()
lines = [line.strip() for line in lines]
n = 3
d = []
for i in range(len(lines)):
a = lines[i:i+n]
for j in range(len(lines)):
b = lines[j:j+n]
if i == j: continue # skip same line
group_size = np.sum([len(x) for x in a])
if group_size < 5: continue # skip short lines
ratio = 0
for u, v in zip(a, b):
r = difflib.SequenceMatcher(None, u, v).ratio()
ratio += r if r > 0.7 else 0
d.append(ratio)
dry = sum(d) / len(lines)
In the following, we can identify some repetition at a glance:
w = int(len(d) / 100)
e = np.convolve(d, np.ones(w), "valid") / w * 10
plt.plot(range(len(d)), d, range(len(e)), e)
plt.show()
Why not using:
d = np.exp(np.array(d))
Thus, difflib module looks promising, the SequenceMatcher does some magic (Levenshtein?), but I would need some magic constants as well (0.7)... However, this code is > O(n^2) and runs very slowly for long files.
What is funny is that the amount of repetition is quite easily identified with attentive eyes (sorry to this student for having taken his code as a good bad example):
I am sure there is a more clever solution out there.
Any hint?
I would build a system based on compressibility, because that is essentially what things being repeated means. Modern compression algorithms are already looking for how to reduce repetition, so let's piggy back on that work.
Things that are similar will compress well under any reasonable compression algorithm, eg LZ. Under the hood a compression algo is a text with references to itself, which you might be able to pull out.
Write a program that feeds lines [0:n] into the compression algorithm, compare it to the output length with [0:n+1].
When you see the incremental length of the compressed output increases by a lot less than the incremental input, you note down that you potentially have a DRY candidate at that location, plus if you can figure out the format, you can see what previous text it was deemed similar to.
If you can figure out the compression format, you don't need to rely on the "size doesn't grow as much" heuristic, you can just pull out the references directly.
If needed, you can find similar structures with different names by pre-processing the input, for instance by normalizing the names. However I foresee this getting a bit messy, so it's a v2 feature. Pre-processing can also be used to normalize the formatting.
Looks like you're choosing a long path. I wouldn't go there.
I would look into trying to minify the code before analyzing it. To completely remove any influence of variable names, extra spacing, formatting and even slight logic reshuffling.
Another approach would be comparing byte-code of the students. But it may be not a very good idea since the result will likely have to be additionally cleaned up.
Dis would be an interesting option.
I would, most likely, stop on comparing their AST. But ast is likely to give false positives for short functions. Cuz their structure may be too similar, so consider checking short functions with something else, something trivial.
On top of thaaaat, I would consider using Levenshtein distance or something similar to numerically calculate the differences between byte-codes/sources/ast/dis of the students. This would be what? Almost O(N^2)? Shouldn't matter.
Or, if needed, make it more complex and calculate the distance between each function of student A and each function of student B, highlighting cases when the distance is too short. It may be not needed though.
With simplification and normalization of the input, more algorithms should start returning good results. If a student is good enough to take someone's code and reshuffle not only the variables, but the logic and maybe even improve the algo, then this student understands the code well enough to defend it and use it with no help in future. I guess, that's the kind of help a teacher would want to be exchanged between students.
You can treat this as a variant of the longest common subsequence problem between the input and itself, where the trivial matching of each element with itself is disallowed. This retains the optimal substructure of the standard algorithm, since it can be phrased as a non-transitive “equality” and the algorithm never relies on transitivity.
As such, we can write this trivial implementation:
import operator
class Repeat:
def __init__(self,l):
self.l=list(l)
self.memo={}
def __call__(self,m,n):
l=self.l
memo=self.memo
k=m,n
ret=memo.get(k)
if not ret:
if not m or not n: ret=0,None
elif m!=n and l[m-1]==l[n-1]: # critical change here!
z,tail=self(m-1,n-1)
ret=z+1,((m-1,n-1),tail)
else: ret=max(self(m-1,n),self(m,n-1),key=operator.itemgetter(0))
memo[k]=ret
return ret
def go(self):
n=len(self.l)
v=self(n,n)[1]
ret=[]
while v:
x,v=v
ret.append(x)
ret.reverse()
return ret
def repeat(l): return Repeat(l).go()
You might want to canonicalize lines of code by removing whitespace (except perhaps between letters), removing comments, and/or replacing each unique identifier with a standardized label. You might also want to omit trivial lines like } in C/C++ to reduce noise. Finally, the symmetry should allow only cases with, say, m>=n to be treated.
Of course, there are also "real" answers and real research on this issue!
Frame challenge: I’m not sure you should do this
It’d be a fun programming challenge for yourself, but if you intend to use it as a teaching tool—-I’m not sure I would. There’s not a good definition of “repeat” from the DRY principle that would be easy to test for fully in a computer program. The human definition, which I’d say is basically “failure to properly abstract your code at an appropriate level, manifested via some type of repetition of code, whether repeating exact blocks of whether repeating the same idea over and over again, or somewhere in between” isn’t something I think anyone will be able to get working well enough at this time to use as a tool that teaches good habits with respect to DRY without confusing the student or teaching bad habits too. For now I’d argue this is a job for humans because it’s easy for us and hard for computers, at least for now…
That said if you want to give it a try, first define for yourself requirements for what errors you want to catch, what they’ll look like, and what good code looks like, and then define acceptable false positive and false negative rates and test your code on a wide variety of representative inputs, validating your code against human judgement to see if it performs well enough for your intended use. But I’m guessing you’re really looking for more than simple repetition of tokens, and if you want to have a chance at succeeding I think you need to clearly define what you’re looking for and how you’ll measure success and then validate your code. A teaching tool can do great harm if it doesn’t actually teach the correct lesson. For example if your tool simply encourages students to obfuscate their code so it doesn’t get flagged as violating DRY, or if the tool doesn’t flag bad code so the student assumes it’s ok. Or if it flags code that is actually very well written.
More specifically, what types of repetition are ok and what aren’t? Is it good or bad to use “if” or “for” or other syntax repeatedly in code? Is it ok for variables and functions/methods to have names with common substrings (e.g. average_age, average_salary, etc.?). How many times is repetition ok before abstraction should happen, and when it does what kind of abstraction is needed and at what level (e.g. a simple method, or a functor, or a whole other class, or a whole other module?). Is more abstraction always better or is perfect sometimes the enemy of on time on budget? This is a really interesting problem, but it’s also a very hard problem, and honestly I think a research problem, which is the reason for my frame challenge.
Edit:
Or if you definitely want to try this anyway, you can make it a teaching tool--not necessarily as you may have intended, but rather by showing your students your adherence to DRY in the code you write when creating your tool, and by introducing them to the nuances of DRY and the shortcomings of automated code quality assessment by being transparent with them about the limitations of your quality assessment tool. What I wouldn’t do is use it like some professors use plagiarism detection tools, as a digital oracle whose assessment of the quality of the students’ code is unquestioned. That approach is likely to cause more harm than good toward the students.
I suggest the following approach: let's say that repetitions should be at least 3 lines long. Then we hash every 3 lines. If hash repeats, then we write down the line number where it occurred. All is left is to join together adjacent duplicated line numbers to get longer sequences.
For example, if you have duplicate blocks on lines 100-103 and 200-203, you will get {HASH1:(100,200), HASH2:(101,201)} (lines 100-102 and 200-202 will produce the same value HASH1, and HASH2 covers lines 101-103 and 201-203). When you join the results, it will produce a sequence (100,101,200,201). Finding in that monotonic subsequences, you will get ((100,101), (200,201)).
As no loops used, the time complexity is linear (both hashing and dictionary insertion are O(n))
Algorithm:
read text line by line
transform it by
removing blanks
removing empty lines, saving mapping to original for the future
for each 3 transformed lines, join them and calculate hash on it
filter lines for which their hashes occur more then once (these are repetitions of at least 3 lines)
find longest sequences and present repetitive text
Code:
from itertools import groupby, cycle
import re
def sequences(l):
x2 = cycle(l)
next(x2)
grps = groupby(l, key=lambda j: j + 1 == next(x2))
yield from (tuple(v) + (next((next(grps)[1])),) for k,v in grps if k)
with open('program.cpp') as fp:
text = fp.readlines()
# remove white spaces
processed, text_map = [], {}
proc_ix = 0
for ix, line in enumerate(text):
line = re.sub(r"\s+", "", line, flags=re.UNICODE)
if line:
processed.append(line)
text_map[proc_ix] = ix
proc_ix += 1
# calc hashes
hashes, hpos = [], {}
for ix in range(len(processed)-2):
h = hash(''.join(processed[ix:ix+3])) # join 3 lines
hashes.append(h)
hpos.setdefault(h, []).append(ix) # this list will reflect lines that are duplicated
# filter duplicated three liners
seqs = []
for k, v in hpos.items():
if len(v) > 1:
seqs.extend(v)
seqs = sorted(list(set(seqs)))
# find longer sequences
result = {}
for seq in sequences(seqs):
result.setdefault(hashes[seq[0]], []).append((text_map[seq[0]], text_map[seq[-1]+3]))
print('Duplicates found:')
for v in result.values():
print('-'*20)
vbeg, vend = v[0]
print(''.join(text[vbeg:vend]))
print(f'Found {len(v)} duplicates, lines')
for line_numbers in v:
print(f'{1+line_numbers[0]} : {line_numbers[1]}')

How do I write the scores of my game into the leaderboard.txt and display the top 5 scores only with each player's name

I'm currently doing a dice game for my school programming project, and it includes the rules: 'Stores the winner’s score, and their name, in an external file' and 'Displays the score and player name of the top 5 winning scores from the external file.' I've done everything in my game apart from the leaderboard. I am able to write the name and score of the user into the txt file, but I am unsure of how to then sort it, and would also like when people first start the program can use the menu to go to the leaderboard where it would then read the txt file and print the top 5 scores in order including the names.
I've checked loads of over questions similar to mine, but none of them exactly worked for my code as I kept on getting errors implementing other people's code into mine which just weren't compatible with my layout.
(deleted)
Thanks in advance, I've never used stack overflow to ask a question so I apologize if there's anything I've done wrong in my post.
You did good on the question. You stated the problem clearly and, most importantly, you added enough code to run the code so we can have a look at how the program behaves and what's going wrong. In this case nothing is going wrong which is good :)
Considering you mention that this is a school project I will not give you a fully copy/paste solution but will explain hopefully enough details on how to solve this on your own.
Now according to the question, you don't know how to sort your leaderboard. I ran the program a few times myself (after removing the sleeps because I am impatient 😋) and see that your leaderboard file looks like this:
90 - somename
38 - anothername
48 - yetanothername
To display this you must do two things:
Open the file and read the data
Convert the data from the file into something usable by the program
The first step seems to be something you already know as you already use open() to write into the file. Reading is very similar.
The next step is not so obvious if you are new to programming. The file is read as text-data, and you need to sort it by numbers. For a computer the text "10" is not the same a the number 10 (note the quotes). You can try this by opening a Python shell:
>>> 10 == 10
True
>>> 10 == "10"
False
>>> "10" == 10
False
And text sorts differently to numbers. So you one piece of the solution is to convert the text into numbers.
You will also get the data as lines (either using readlines() or splitlines() depending on how you use it. These line need to be split into score and name. The pattern in the file is this:
<score> - <name>
It is important to notice that you have the text " - " as separator between the two (including spaces). Have a look at the Python functions str.split() and str.partition(). These functions can be applied to any text value:
>>> "hello.world".split(".")
['hello', 'world']
>>> "hello.world".partition(".")
('hello', '.', 'world')
You can use this to "cut" the line into multiple pieces.
After doing that you have to remember the previous point about converting text to numbers.
As a last step you will need to sort the values.
When reading from the file, you can load the converted data into a Python list, which can then be sorted.
A convenient solution is to create a list where each element of that list is a tuple with the fields (score, name). Like that you can directly sort the list without any arcane tricks.
And finally, after sorting it, you can print it to the screen.
In summary
Open the file
Read the data from the file as "lines"
Create a new, empty list.
Loop over each line and...
... split the line into multiple parts to get at the score and name separately
... convert the score into a number
... append the two values to the new list from point 3
Sort the list from point 3
Print out the list.
Some general thoughts
You can improve and simplify the code by using more functions
You already show that you know how to use functions. But look at the comments #THIS IS ROUND1 to #THIS IS ROUND5. The lines of code for each round are the same. By moving those lines into a function you will save a lot of code. And that has two benefits: You will only need to make a code-change (improvement or fix) in one place. And secondly, you guarantee that all blocks behave the same.
To do this, you need to think about what variables that block needs (those will be the new function arguments) and what the result will be (that will be the function return value).
A simple example with duplication:
print("round 1")
outcomes = []
value1 = random(1, 100)
value2 = random(1, 100)
if value1 > value2:
outcomes.append("A")
else:
outcomes.append("B")
print("round 2")
outcome = ""
value1 = random(1, 100)
value2 = random(1, 100)
if value1 > value2:
outcomes.append("A")
else:
outcomes.append("B")
Rewritten with functions
def run_round(round_name):
print(round_name)
value1 = random(1, 100)
value2 = random(1, 100)
if value1 > value2:
return "A"
else:
return "B"
outcomes = []
result_1 = run_round("round 1")
outcomes.append(result_1)
result_2 = run_round("round 2")
outcomes.append(result_2)
As you can see, the second code is much shorter and has no more duplication. Your code will have more function arguments. It is generally a challenge in programming to organise your code in such a way that functions have few arguments and no complex return values. Although, as long as it works nobody will look too closely ;)
Safe way to ask for a password
You can use getpass() from the module getpass() to prompt for a password in a secure manner:
from getpass import getpass
password = getpass()
Note however, if you are using PyCharm, this causes some issues which are out of scope of this post. In that case, stick with input().
Sleeps
The "sleep()" calls are nice and give you the chance to follow the program, but make it slow to test the program. Consider to use smaller values (comma-values are possible), or, even better, write your own function that you can "short-circuit" for testing. Something like this:
import time
ENABLE_SLEEP = True
def sleep(s):
if ENABLE_SLEEP:
time.sleep(s)
print("some code")
sleep(1)
print("more code")
sleep(4)
You will then use your own sleep() function anytime you want to wait. That way, you can simply set the variable ENABLE_SLEEP to False and your code will run fast (for testing).

Lambda Functions in python

In the NLTK toolkit, I try to use the lambda function to filter the results.
I have a test_file and a terms_file
What I'm doing is to use the likelihood_ratio in NLTK to rank the multi word terms in the terms_file. But, the input here is the lemma of the multi word terms, so I created a function which extracts from each multi word term its lemma to be introduced afterthat in the lambda function.
so it looks like this
text_file = myfile
terms_file= myfile
def lem(file):
return lemma for each term in the file
My problem is here
How can I call this function in the filter, because when I do what following it does not work.
finder = BigramCollocationFinder.from_words(text_file)
finder.apply_ngram_filter(lambda *w: w not in lem(terms_file))
finder.score_ngrams(BigramAssocMeasures.likelihood_ratio)
print(finder)
Also with the iteration does not work
finder.apply_ngram_filter(lambda *w: w not in [x for x in lem(terms_file)])
(This is sort of a wild guess, but I'm pretty confident that this is the cause of your problem.
Judging from your pseudo-code, the lem function operates on a file handle, reading some information from that file. You need to understand that a file handle is an iterator, and it will be exhausted when iterated once. That is, the first call to lem works as expected, but then the file is fully read and further calls will yield no results.
Thus, I suggest storing the result of lem in a list. This should also be much faster than reading the file again and again. Try something like this:
all_lemma = lem(terms_file) # temporary variable holding the result of `lem`
finder.apply_ngram_filter(lambda *w: w not in all_lemma)
Your line finder.apply_ngram_filter(lambda *w: w not in [x for x in lem(terms_file)]) does not work, because while this creates a list from the result of lem, it does so each time the lambda is executed, so you end up with the same problem.
(Not sure what apply_ngram_filter does, so there might be more problems after that.)
Update: Judging from your other question, it seems like lem itself is a generator function. In this case, you have to explicitly convert the results to a list; otherwise you will run into just the same problem when that generator is exhausted.
all_lemma = list(lem(terms_file))
If the elements yielded by lem are hashable, you can also create a set instead of a list, i.e. all_lemma = set(lem(terms_file)); this will make the lookup in the filter much faster.
If I understand what you are saying, lem(terms_file) returns a list of lemmas. But what do "lemmas" look like? apply_ngram_filter() will only work if each "lemma" is a tuple of exactly two words. If that is indeed the case, then your code should work after you've fixed the file input as suggested by #tobias_k.
Even if your code works, the output of lem() should be stored as a set, not a list. Otherwise your code will be abysmally slow.
all_lemmas = set(lem(terms_file))
But I'm not too sure the above assumptions are right. Why would all lemmas be exactly two words long? I'm guessing that "lemmas" are one word long, and you intended to discard any ngram containing a word that is not in your list. If that's true you need apply_word_filter(), not apply_ngram_filter(). Note that it expects one argument (a word), so it should be written like this:
finder.apply_word_filter(lambda w: w not in all_lemmas)

Pyenchant Module - Spell checker

How do I trim the output of Python Pyenchat Module's 'suggested words list ?
Quite often it gives me a huge list of 20 suggested words that looks awkward when displayed on the screen and also has a tendency to go out of the screen .
Like sentinel, I'm not sure if the problem you're having is specific to pyenchant or a python-familiarity issue. If I assume the latter, you could simply select the number of values you'd like as part of your program. In simple form, this could be as easy as:
suggestion_list = pyenchant_function(document_filled_with_typos)
number_of_suggestions = len(suggestion_list)
MAX_SUGGESTIONS = 3 # you choose what you like
if number_of_suggestions > MAX_SUGGESTIONS:
answer = suggestion_list[0:(MAX_Suggestions-1)] # python lists are indexed to 0
else:
answer = suggestion_list
Note: I'm choosing to be clear rather than concise here, since I'm guessing that will be valued by asker, if asker is unclear on using list indices.
Hope this helps and good luck with python.
Assuming it returns a standard Python list, you use standard Python slicing syntax. E.g. suggestedwords[:10] gets just the first 10.

A better way to assign list into a var

Was coding something in Python. Have a piece of code, wanted to know if it can be done more elegantly...
# Statistics format is - done|remaining|200's|404's|size
statf = open(STATS_FILE, 'r').read()
starf = statf.strip().split('|')
done = int(starf[0])
rema = int(starf[1])
succ = int(starf[2])
fails = int(starf[3])
size = int(starf[4])
...
This goes on. I wanted to know if after splitting the line into a list, is there any better way to assign each list into a var. I have close to 30 lines assigning index values to vars. Just trying to learn more about Python that's it...
done, rema, succ, fails, size, ... = [int(x) for x in starf]
Better:
labels = ("done", "rema", "succ", "fails", "size")
data = dict(zip(labels, [int(x) for x in starf]))
print data['done']
What I don't like about the answers so far is that they stick everything in one expression. You want to reduce the redundancy in your code, without doing too much at once.
If all of the items on the line are ints, then convert them all together, so you don't have to write int(...) each time:
starf = [int(i) for i in starf]
If only certain items are ints--maybe some are strings or floats--then you can convert just those:
for i in 0,1,2,3,4:
starf[i] = int(starf[i]))
Assigning in blocks is useful; if you have many items--you said you had 30--you can split it up:
done, rema, succ = starf[0:2]
fails, size = starf[3:4]
I might use the csv module with a separator of | (though that might be overkill if you're "sure" the format will always be super-simple, single-line, no-strings, etc, etc). Like your low-level string processing, the csv reader will give you strings, and you'll need to call int on each (with a list comprehension or a map call) to get integers. Other tips include using the with statement to open your file, to ensure it won't cause a "file descriptor leak" (not indispensable in current CPython version, but an excellent idea for portability and future-proofing).
But I question the need for 30 separate barenames to represent 30 related values. Why not, for example, make a collections.NamedTuple type with appropriately-named fields, and initialize an instance thereof, then use qualified names for the fields, i.e., a nice namespace? Remember the last koan in the Zen of Python (import this at the interpreter prompt): "Namespaces are one honking great idea -- let's do more of those!"... barenames have their (limited;-) place, but representing dozens of related values is not one -- rather, this situation "cries out" for the "let's do more of those" approach (i.e., add one appropriate namespace grouping the related fields -- a much better way to organize your data).
Using a Python dict is probably the most elegant choice.
If you put your keys in a list as such:
keys = ("done", "rema", "succ" ... )
somedict = dict(zip(keys, [int(v) for v in values]))
That would work. :-) Looks better than 30 lines too :-)
EDIT: I think there are dict comphrensions now, so that may look even better too! :-)
EDIT Part 2: Also, for the keys collection, you'd want to break that into multpile lines.
EDIT Again: fixed buggy part :)
Thanks for all the answers. So here's the summary -
Glenn's answer was to handle this issue in blocks. i.e. done, rema, succ = starf[0:2] etc.
Leoluk's approach was more short & sweet taking advantage of python's immensely powerful dict comprehensions.
Alex's answer was more design oriented. Loved this approach. I know it should be done the way Alex suggested but lot of code re-factoring needs to take place for that. Not a good time to do it now.
townsean - same as 2
I have taken up Leoluk's approach. I am not sure what the speed implication for this is? I have no idea if List/Dict comprehensions take a hit on speed of execution. But it reduces the size of my code considerable for now. I'll optimize when the need comes :) Going by - "Pre-mature optimization is the root of all evil"...

Categories

Resources