Counter Not Working Correctly - python

The following code looks up a text file to see if there are any matches. For example, a line may be "charlie RY content" and the next line may be "charlie content". However, the counter seems to be off and isn't counting correctly.
file = open("C:/file.txt", "rt")
data = file.readlines()
dictionary = dict()
counter = 0
count = 0
setlimit = 10 #int(input("Please enter limit for N. Then press enter:"))
parameter = ["RY", "TZ"]
for j in data:
user = j.split()[0]
identify = j.split()[1]
for l in identify:
#l = a[1:2]
if user not in dictionary.keys() and identify not in parameter:
count = 1
data = dictionary.update({user:count})
break
#print(user, count,"<-- Qualifies")
elif user in dictionary.keys() and identify not in parameter:
data = dictionary.update({user: count})
count += 1
break
print(dictionary)
As seen in the code, it looks for either RY or TZ and ignores this line and if a line without this condition is met, the counter will increase by one.
Sample Data:
charlie TZ this is a sentence
zac this is a sentence
steve RY this is a sentence
bob this is a sentence
bob this is another sentence
Expected Output:
{zac:1, bob:2}

If you wish to augment the count,
count += 1
must come before
dictionary.update({user: count})
In other words,
elif user in dictionary.keys() and identify not in parameter:
count += 1
dictionary.update({user: count})
break
Note that dictionary.update(...) modifies dictionary and returns None.
Since it always returns None, there is no need to save the value in data.
Alternatively, as pointed out by Martijn Pieters, you could use
for j in data:
...
if identify not in parameter:
count += 1
dictionary[user] = count
Note that you don't need to handle the assignment in two different cases.
The assignment dictionary[user] = count will create the new key/value pair if user is not in dictionary, and it will assign the new value, count, even if it is.
Note that the single count variable is getting increased by one whenever the conditional is True for any user.
If you want the dictionary[user] to increase by one independently for each user, then use
for j in data:
...
if identify not in parameter:
dictionary[user] = dictionary.get(user, 0) + 1
dictionary.get(user, 0) returns dictionary[user] if user is in dictionary, otherwise it returns 0.
Another alternative is to use a collections.defaultdict:
import collections
dictionary = collections.defaultdict(int)
for j in data:
...
if identify not in parameter:
dictionary[user] += 1
With dictionary = collections.defaultdict(int),
dictionary[user] will be assigned the default value int() whenever user is not in dictionary. Since
In [56]: int()
Out[56]: 0
dictionary[user] is automatically assigned the default value of 0 when user is not in dictionary.
Also, user in dictionary is more idiomatic Python than user in dictionary.keys(), though they both return the same boolean value. In fact, you already use this idiom in
when you say identify not in parameter.
While we're on the topic of idioms, it is generally better to use a with-statement to open files:
with open("data", "rt") as f:
since this will guarantee that the file handle f gets closed automatically for you when Python leaves the with-statement (either by reaching the end of the code inside the statement, or even if an exception is raised.)
Since identify is assigned string values such as 'TZ', the loop
for l in identify:
assigns values such as T, then Z to the variable l.
l is not used inside the loop, and there is no apparent reason to be looping over the characters in identify. Therefore, you probably want to remove this loop.
Testing for membership in a set is on average a O(1) (constant speed) operation, while testing for membership in a list is O(n) (the time generally increases with the size of the list.) So it is better to make parameter a set:
parameter = set(["RY", "TZ"])
Instead of calling j.split twice,
user = j.split()[0]
identify = j.split()[1]
you only need to call it once:
user, identify = j.split(maxsplit=2)[:2]
Note that both these assume that there is at least one whitespace in j.
If there isn't, the original code snippet will raise IndexError: list index out of range, while the second raises ValueError: need more than 1 value to unpack.
maxsplit=2 tells split to stop splitting the string after (at most) two splits are done. This could save some time if j is a large string with many split points.
So putting it all together,
import collections
dictionary = collections.defaultdict(int)
setlimit = 10 #int(input("Please enter limit for N. Then press enter:"))
parameter = set(["RY", "TZ"])
with open("C:/file.txt", "rt") as f:
for line in f:
user, identify = line.split(maxsplit=2)[:2]
if identify not in parameter:
dictionary[user] += 1
dictionary = dict(dictionary)
print(dictionary)

In addition to what #unutbu said, you need to NOT reset count or continue incrementing it for other users!
if user not in dictionary.keys() and identify not in parameter:
dictionary[user] = 1
break
#print(user, count,"<-- Qualifies")
elif user in dictionary.keys() and identify not in parameter:
dictionary[user] += 1
break
Without this change, #unutbu answer would still have incorrect counting logic from OP. For example, for this input:
charlie TZ this is a sentence
zac this is a sentence
steve RY this is a sentence
bob this is a sentence
bob this is another sentence
zac this is a sentence
zac this is a sentence
bob this is a sentence
your original logic would give the results:
{'bob': 4, 'zac': 3}
when both should be equal to 3.
for l in identify: probably is not needed and actually likely interferes with the results.
TO SUMMARIZE: Your code could look like this:
file = open("C:/file.txt", "rt")
dictionary = dict()
setlimit = 10 #int(input("Please enter limit for N. Then press enter:"))
parameter = ["RY", "TZ"]
for j in file:
user, identify = j.split()[:2]
if identity in parameter:
continue
if user in dictionary.keys():
dictionary[user] += 1
else:
dictionary[user] = 1
file.close()
print(dictionary)

You have been given other answers that cover your current approach. But it should be noted that python already has a collections.Counter object that does the same thing.
Just to demonstrate using a Counter (note: this uses Py3 for * unpacking):
In []:
from collections import Counter
parameter = {"RY", "TZ"}
with open("C:/file.txt") as file:
dictionary = Counter(u for u, i, *_ in map(str.split, file) if i not in parameter)
print(dictionary)
Out[]:
Counter({'bob': 2, 'zac': 1})

Related

Testing if a dictionary key ends in a letter?

tools = {"Wooden_Sword1" : 10, "Bronze_Helmet1 : 20}
I have code written to add items, i'm adding an item like so:
tools[key_to_find] = int(b)
the key_to_find is the tool and the b is the durability and i need to find a way so if i'm adding and Wooden_Sword1 already exists it adds a Wooden_Sword2 instead. This has to work with other items as well
As user3483203 and ShadowRanger commented, it's probably a bad idea to use numbers in your key string as part of the data. Manipulating those numbers will be awkward, and there are better alternatives. For instance, rather than storing a single value for each numbered key, use simple keys and store a list. The index into the list will take the place of the number in the key.
Here's how you could implement it:
tools = {"Wooden_Sword" : [10], "Bronze_Helmet" : [20]}
Add a new wooden sword with durability 10:
tools.setdefault("Wooden_Sword", []).append(10)
Find how many bronze helmets we have:
helmets = tools.get("Bronze_Helmet", [])
print("we have {} helmets".format(len(helmets)))
Find the first bronze helmet with a non-zero durability, and reduce it by 1:
helmets = tools.get("Bronze_Helmet", [])
for i, durability in helmets:
if durability > 0:
helmets[i] -= 1
break
else: # this runs if the break statement was never reached and the loop ran to completion
take_extra_damage() # or whatever
You could simplify some of this code by using a collections.defaultdict instead of a regular dictionary, but if you learn how to use get and setdefault it's not too hard to get by with the regular dict.
To ensure a key name is not taken yet, and add a number if it is, create the new name and test. Then increment the number if it is already in your list. Just repeat until none is found.
In code:
def next_name(basename, lookup):
if basename not in lookup:
return basename
number = 1
while basename+str(number) in lookup:
number += 1
return basename+str(number)
While this code does what you ask, you may want to look at other methods. A possible drawback is that there is no association between, say, WoodenShoe1 and WoodenShoe55 – if 'all wooden shoes' need their price increased, you'd have to iterate over all possible names between 1 and 55, just in case these existed at some time.
From what I understand of the question, your keys have 2 parts: "Name" and "ID". The ID is just an integer that starts at 1, so you can initialize a counter for every name:
numOfWoodenSwords = 0
And to add to the array:
numOfWoodenSwords += 1
tools["wodden_sword" + str(numOfWoodenSwords)] = int(b)
If you need to have an unknown amount of tools, I recommend looking at the re module: https://docs.python.org/3/library/re.html.
Or you could iterate over tools.keys to see if the entry exists.
You could write a function that determines if a character is a letter:
def is_letter(char):
return 65 <= ord(char) <= 90 or 97 <= ord(char) <= 122
Then when you are looking at a key in your dictionary, simply:
if is_letter(key[-1]):
...

python dictionary: value and the key not corresponding in my code to find max

My goal is to figure out who has the most message the file is here.mbox-short.txt.
My code is here.
fhand = open('mbox-short.txt')
counts = dict()
#this loop is to creat a dictionary in which key is the mail's name
#value is times the mail's name appeared
for line in fhand:
if not (line.startswith('From')and not line.startswith('From:')):
continue
words = line.split()
counts[words[1]] = counts.get(words[1],0) + 1
num = None
#this loop is to find max value and its key
for key, value in counts.items():
#print key, value
if num == None or counts[key] > num:
num = counts[key]
print key, num.
After I ran the code. The result is:
But when I run the line print key,value in the second loop, comment the line which under the print key, value, the result is show that the ray#media.berkeley.edu is 1 instead of 5. The cwen#input.edu is 5.
So why the key and value is not corresponding. I thin the problem is on line 19. How can solve it?
It seemly that I didn't save the key.
Thank you for all.
Thank all of you. I solved it.
In the second loop, I created a variable to save the key
I’m not sure if i understand what you want to do. The fist loop should count all address and the second should find the maximum?
You have to inialize num with some value, but I would choose zero. Then you haven’t check for num == None. And you muss save the key, not only the value. The key will always be the last key in the dict.
You need to use the Counter class, its designed for this exact purpose:
import re
from collections import Counter
with open('mbox-short.txt') as f:
emails = Counter([i.strip() for i in re.findall(r'From:(.*?)', f.read(), re.M)])
for email, count in emails.most_common():
print('{} - {}'.format(email, count))
As you realized you need to store the key which gives the max value, or at least max-value-so-far. Then you don't need to store num, the maximum counts[key] value so far, only the key that gives it max_key.
Also, you iterate on the items in counts with for key, value in counts: ..., but then you ignore value inside the loop. Use value; there's no need to look up counts[key] again. And as long as counts has >=1 item, you don't need the pessimism of setting num = None and then testing against that inside your loop. Anyway, num is unnecessary.
# Iterative approach
max_key = counts.keys()[0] # default assumption
for key, value in counts.items():
if value > counts[max_key]:
max_key = key
but you can avoid all this by directly finding the max without needing to iterate:
# Direct approach, using `max(..., key=...)`
max_key = max(counts, key=lambda kv: kv[1])
print max_key, counts[max_key]
cwen#iupui.edu 5
This works because Python sorted(), max(), min() will accept a dictionary, and they have an optional parameter key which allows you to pass in a function telling the comparison which element in each item to use.
Look at the doc for sorted(), max(), min(), and read about Python idioms using them.
There is also the custom collections.Counter which #Burhan references. But it's important to learn how to sort things first. Then you can use it.

How do I create a dictionary where I add points to names form a text file?

I am trying to make sort of a scoring system, I got a textfile with a huge ammount of (players). I have created a dictionary and managed to add all the names in it, but when I am trying to add the points they only get the last number, not all of them combined for each name. So how do I create a new dictionary for each name and then add the points corresponding to the name?
So I basicly need to take one of the names "Anders Johansson" and add his score to his name in the dictionary, his name will exist in the score.txt multiple times with different score.
My thought is that I need ot run down the entire score file and create one instance of every name, then add the corresponding score to each name, "Anders Johansson" may appear up to 100 times in the score.txt.
this code will get the names in to final_score but not the total score.
final_list, file, final_score = [], open('score.txt','r'), {}
string = file.read()
lista = string.split('\n')
lista.sort()
for i in lista[1:]:
final_score[str(i[0:-2])] = 0
final_score[str(i[0:-2])] += int(i[-2:]) # possible error location
final_list = list(final_score.items())
print(sorted(final_list, key=lambda score: score[1],reverse=True))
Example from the file:
Anders Johansson 1
Karin Johansson 1
Eva Johansson 0
Erik Andersson 1
Gunnar Andersson 2
Eva Andersson 1
Nils Eriksson 1
Anders Eriksson 0
Maria Eriksson 1
and so on:
Use a collections.defaultdict:
from collections import defaultdict
final_score = defaultdict(int)
for i in lista[1:]:
final_score[str(i[0:-2])] += int(i[-2:]) #
In your code you keep resetting the value to 0 with final_score[str(i[0:-2])] = 0 every time you find the name/key so that is why you only actually add the last score each time.
You would need to check if the key existed or not and proceed based on the outcome of that or a simpler approach would be to use a dict.setdefault in your own code which creates a key/value pairing if the key does not exist or else does nothing:
for i in lista[1:]:
final_score.setdefault(str(i[0:-2]), 0)
final_score[str(i[0:-2])] += int(i[-2:]) # possible error location
You are also doing more work than needed, there is no need to sort, we can iterate over the file object and use rsplit to get the name and score:
final_score = defaultdict(int)
with open('score.txt','r') as f: # with will automatically close your file
for line in f: # iterate over file object
# split once on whitespace from right side of string, separating name from score
name, score = line.rsplit(None,1) # unpack
# use our defaultdict to increment the score
final_score[name] += int(score)
# just call sorted on .items()
print(sorted(final_score.items(), key=lambda score: score[1], reverse=True))
There is no need to sort as whenever we come across a name we update the value in the dict, it does not matter what order we come across the names, that is indeed one of advantages of using a dict.
Also for sorting final_list.items() using operator.itemgetter will be more efficient than using a lambda:
from operator import itemgetter
print(sorted(final_list.items(),key=itemgetter(1),reverse=True))

Sorting on list values read into a list from a file

I am trying to write a routine to read values from a text file, (names and scores) and then be able to sort the values az by name, highest to lowest etc. I am able to sort the data but only by the position in the string, which is no good where names are different lengths. This is the code I have written so far:
ClassChoice = input("Please choose a class to analyse Class 1 = 1, Class 2 = 2")
if ClassChoice == "1":
Classfile = open("Class1.txt",'r')
else:
Classfile = open("Class2.txt",'r')
ClassList = [line.strip() for line in Classfile]
ClassList.sort(key=lambda s: s[x])
print(ClassList)
This is an example of one of the data files (Each piece of data is on a separate line):
Bob,8,7,5
Fred,10,9,9
Jane,7,8,9
Anne,6,4,8
Maddy,8,5,5
Jim, 4,6,5
Mike,3,6,5
Jess,8,8,6
Dave,4,3,8
Ed,3,3,4
I can sort on the name, but not on score 1, 2 or 3. Something obvious probably but I have not been able to find an example that works in the same way.
Thanks
How about something like this?
indexToSortOn = 0 # will sort on the first integer value of each line
classChoice = ""
while not classChoice.isdigit():
classChoice = raw_input("Please choose a class to analyse (Class 1 = 1, Class 2 = 2) ")
classFile = "Class%s.txt" % classChoice
with open(classFile, 'r') as fileObj:
classList = [line.strip() for line in fileObj]
classList.sort(key=lambda s: int(s.split(",")[indexToSortOn+1]))
print(classList)
The key is to specify in the key function that you pass in what part of each string (the line) you want to be sorting on:
classList.sort(key=lambda s: int(s.split(",")[indexToSortOn+1]))
The cast to an integer is important as it ensures the sort is numeric instead of alphanumeric (e.g. 100 > 2, but "100" < "2")
I think I understand what you are asking. I am not a sort expert, but here goes:
Assuming you would like the ability to sort the lines by either the name, the first int, second int or third int, you have to realize that when you are creating the list, you aren't creating a two dimensional list, but a list of strings. Due to this, you may wish to consider changing your lambda to something more like this:
ClassList.sort(key=lambda s: str(s).split(',')[x])
This assumes that the x is defined as one of the fields in the line with possible values 0-3.
The one issue I see with this is that list.sort() may sort Fred's score of 10 as being less than 2 but greater than 0 (I seem to remember this being how sort worked on ints, but I might be mistaken).

Python dictionary key error.. big mess of dictionaries in dictionaries in lists

This is kind of convoluted, so if I'm missing out on an easy construct for this, please let me know :)
I'm analysing the results of some matching experiments. At the end game, I want to be able to query things such as experiments[0]["cat"]["cat"], which yields the number of times "cat" was matched against "cat". Conversely, I could do experiments[0]["cat"]["dog"], when the first query was a cat and the match attempt was a dog.
The following is my code to populate this structure:
# initializing the first layer, a list of dictionaries.
experiments = []
for assignment in assignments:
match_sums = {}
experiments.append(match_sums)
for i in xrange(len(classes)):
for experiment in xrange(len(experiments)):
# experiments[experiment][classes[i]] should hold a dictionary,
# where the keys are the things that were matched against classes[i],
# and the value is the number of times this occurred.
experiments[experiment][classes[i]] = collections.defaultdict(dict)
# matches[experiment][i] is an integer for what the i'th match was in an experiment.
# classes[j] for some integer j is the string name of the i'th match. could be "dog" or "cat".
experiments[experiment][classes[i]][classes[matches[experiment][i]]] += 1
total_class_sums[classes[i]] = total_class_sums.get(classes[i], 0) + 1
print experiments[0]["cat"]["cat"]
exit()
So clearly this is a bit convoluted. And I'm getting a value of "1" for the last match, rather than a full dictionary at experiments[0]["cat"]. Have I approached this wrong? What could the bug here be? Sorry for the craziness and thanks for any possible help!
Two points:
Dictionary keys can be tuples; and
If you're counting things, use collections.Counter. (You can use defaultdict(int), but Counter is more useful.)
So, instead of
experiments[experiment][classes[i]][classes[matches[experiment][i]]] += 1
write
experiments = Counter()
...
experiments[experiment, classes[i], classes[matches[experiment][i]]] += 1
I just trying to guess your needs, so i tried to change order of your dimensions.
for className, classIdx in enumerate(classes):
experiment = collections.defaultdict(list)
experiments[className] = experiment
for assignment,assignmentIdx in enumerate(assignments):
counterpart = classes[matches[assignmentIdx][classIdx]]
experiment[counterpart].append((assignment,assignmentIdx))
print(len(experiments["cat"]["cat"]), len(experiments["cat"]))

Categories

Resources