How to sum values for the same key [duplicate]

How to sum values for the same key [duplicate] - python

This question already has answers here:
I'm getting a TypeError. How do I fix it?
(2 answers)
Why dict.get(key) instead of dict[key]?
(14 answers)
Closed 6 months ago.
I have a file
gu|8
gt|5
gr|5
gp|1
uk|2
gr|20
gp|98
uk|1
me|2
support|6
And I want to have one number per TLD like:
gr|25
gp|99
uk|3
me|2
support|6
gu|8
gt|5
and here is my code:
f = open(file,'r')
d={}
for line in f:
line = line.strip('\n')
TLD,count = line.split('|')
d[TLD] = d.get(TLD)+count
print d
But I get this error:
d[TLD] = d.get(TLD)+count
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
Can anybody help?

Taking a look at the full traceback:
Traceback (most recent call last):
File "mee.py", line 6, in <module>
d[TLD] = d.get(TLD) + count
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
The error is telling us that we tried to add something of type NoneType to something of type str, which isn't allowed in Python.
There's only one object of type NoneType, which, unsurprisingly, is None – so we know that we tried to add a string to None.
The two things we tried to add together in that line were d.get(TLD) and count, and looking at the documentation for dict.get(), we see that what it does is
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
Since we didn't supply a default, d.get(TLD) returned None when it didn't find TLD in the dictionary, and we got the error attempting to add count to it. So, let's supply a default of 0 and see what happens:
f = open('data','r')
d={}
for line in f:
line = line.strip('\n')
TLD, count = line.split('|')
d[TLD] = d.get(TLD, 0) + count
print d
$ python mee.py
Traceback (most recent call last):
File "mee.py", line 6, in <module>
d[TLD] = d.get(TLD, 0) + count
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Well, we've still got an error, but now the problem is that we're trying to add a string to an integer, which is also not allowed, because it would be ambiguous.
That's happening because line.split('|') returns a list of strings – so we need to explicitly convert count to an integer:
f = open('data','r')
d={}
for line in f:
line = line.strip('\n')
TLD, count = line.split('|')
d[TLD] = d.get(TLD, 0) + int(count)
print d
... and now it works:
$ python mee.py
{'me': 2, 'gu': 8, 'gt': 5, 'gr': 25, 'gp': 99, 'support': 6, 'uk': 3}
Turning that dictionary back into the file output you want is a separate issue (and not attempted by your code), so I'll leave you to work on that.

To answer the title of your question: "how to sum values for the same key" - well, there is the builtin class called collections.Counter that is a perfect match for you:
import collections
d = collections.Counter()
with open(file) as f:
tld, cnt = line.strip().split('|')
d[tld] += int(cnt)
then to write back:
with open(file, 'w') as f:
for tld, cnt in sorted(d.items()):
print >> f, "%s|%d" % (tld, cnt)

Related

python sum() function returning [TypeError: 'str' object is not callable] when used [duplicate]

This question already has answers here:
TypeError: 'float' object not callable
(4 answers)
Closed 1 year ago.
I made a code that reads a file and prints the sum of all the numbers contained in the file.
for example a txt file that contains(text.txt)
10in the 134 fill23
and 100cars 3in 42
will return 312
First I made a short code that looks like this
import re
f = open("text.txt")
numbs = re.findall('[0-9]+', f.read())
numbsls = []
for i in range(0,len(numbs)): numbsls.append(int(numbs[i]))
print(sum(numbsls))
This code worked fine. Next, I modified the code so that it would read the text line by line. (for further flexibility in processing later) It looks like this.
import re
f = open("text.txt")
t = True
numbsls = []
for line in f:
line = line.rstrip()
numbs = re.findall('[0-9]+', line)
if t:
sum = line
t = False
for i in range(0,len(numbs)):
numbsls.append(int(numbs[i]))
print(sum(numbsls))
but when I ran the code it returned this traceback
line 16, in <module> print(sum(numbsls)) TypeError: 'str' object is not callable
I checked some articles and other stackoverflow questions regarding to this problem with no luck. To me everything looks fine and I can't spot what's wrong.
Any help or feedback would be appreciated, thank you!

sum is an in-built function in python, you should not use it as a variable. The line sum=line is creating the issue. You can use variable total instead of sum.

Hangman code having trouble accessing other file

So I made a program for hangman that accesses an input file but it is having trouble accessing it once I type it in.
This is the function that is calling the file
def getWord(filename):
print("Loading from file...")
inputFile = open(filename, 'r')
wordlist = inputFile.read().splitlines()
print(len(wordlist) + " lines loaded.")
return wordlist
filename = input("What file will the words come from? ")
wordlist = getWord(filename)
theWordLine = random.choice(wordlist)
game(theWordLine)
And this is the file itself
person,Roger
place,Home
phrase,A Piece Of Cake
The error it is giving me is this
File "hangman.py' , line 77, in <module>
wordlist = getWord(filename)
File "hangman.py' , line 10, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Can anyone help me out?

The error states: TypeError: unsupported operand type(s) for +: 'int' and 'str'. That means that you cannot use + with something of type int and something of type str. the len function returns an int. Thus you need to cast it into a str before you being able to concatenate it with another str
It should be print(str(len(wordlist)) + " lines loaded.") instead of print(len(wordlist) + " lines loaded.")
You may also want to use string formatting as a comment mentions. If you are using python 3.6 or higher, you can try f-strings: f'{len(wordlist)} lines loaded}'.

It has nothing to do with reading the file. Read the error: an integer and string cannot be added.
Why are you getting this error? Because len() returns an integer, not a string. You can cast len()'s return to a string, or you can just use an f-string:
f'{len(wordlist)} lines loaded}'

Use print(len(wordlist), " lines loaded.") instead of print(len(wordlist) + " lines loaded.")

print(len(wordlist) + " lines loaded.") is causing your issue as it is trying to apply the + operand to variables of different datatypes.
You could use print("{} lines loaded".format(len(wordlist))) to avoid this.

Error while working on .csv with load_csv

I am trying to work on the below code:
ds = load_csv('C:\\User.csv')
f = open(ds,'r')
lines = f.readlines()[1:]
print(lines)
f.close()
First line of dataset is string. I am getting the below error:
TypeError: expected str, bytes or os.PathLike object, not list
Though when I try to open the file with below code it works:
filename='C:\\User.csv'
f = open(filename,'r')
lines = f.readlines()[1:]
print(lines)
f.close()
I am ignoring the first line because its string and rest of the dataset is float.
Update:
load_csv
def load_csv(ds):
dataset = list()
with open(ds, 'r') as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)
return dataset
Even if I use this way still get the error:
ds = load_csv('C:\\Users.csv')
minmax = dataset_minmax(ds)
normalize_dataset(ds, minmax)
def dataset_minmax(dataset):
minmax = list()
for i in range(len(dataset[0])):
col_values = [row[i] for row in dataset]
value_min = min(col_values)
value_max = max(col_values)
minmax.append([value_min, value_max])
return minmax
def normalize_dataset(dataset, minmax):
for row in dataset:
for i in range(len(row)):
row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])
It gives error on:
row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])
Error:
TypeError: unsupported operand type(s) for -: 'str' and 'str'

Since you're now getting a different error, I'll give a second answer.
This error means that the two variables in your subtraction are strings, not numbers.
In [1]: 5 - 3
Out[1]: 2
In [2]: '5' - '3'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-4ef7506473f1> in <module>
----> 1 '5' - '3'
TypeError: unsupported operand type(s) for -: 'str' and 'str'
This is because the CSV reader assumes everything is a string. You need to convert it to floats, e.g., by changing load_csv to do something like dataset.append(list(map(float, row))) instead of your existing append statement.
The min-max stuff doesn't fail, because Python's min and max work on strings, too:
In [3]: min('f', 'o', 'o', 'b', 'a', 'r')
Out[3]: 'a'
However, it might be giving you incorrect answers:
In [4]: min('2.0', '10.0')
Out[4]: '10.0'
By the way, if you're doing much along these lines, you'd probably benefit from using the Pandas package instead of rolling your own.

I am guessing the error is in the open command in your code. The reason why this fails is that the open command expects a string or operating system path-like object that is a handle to a file that it can open (like it says in the error). The function load_csv probably returns a list which is an incompatible format for open

Look at your first two lines where it doesn't work:
ds = load_csv('C:\\User.csv')
f = open(ds,'r')
ds is an object returned (from TensorFlow, I assume?) which contains the data. Then you open it as if it were a filename. This is why the interpreter complains. ds is the dataset, not the string representing the file.
It works in the other example, because you use a filename.

TypeError: a bytes-like object is required, not 'str': even with the encode

I'm just trying to print my script. I have this problem, I have researched and read many answers and even adding .encode ('utf-8) still does not work.
import pandas
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
n_components = 30
n_top_words = 10
def print_top_words(model, feature_names, n_top_words):]
for topic_idx, topic in enumerate(model.components_):
message = "Topic #%d: " % topic_idx
message += " ".join([feature_names[i] for i in topic.argsort()[:-n_top_words - 1:-1]])
return message
text = pandas.read_csv('fr_pretraitement.csv', encoding = 'utf-8')
text_clean = text['liste2']
text_raw = text['liste1']
text_clean_non_empty = text_clean.dropna()
not_commas = text_raw.str.replace(',', '')
text_raw_list = not_commas.values.tolist()
text_clean_list = text_clean_non_empty.values.tolist()
tf_vectorizer = CountVectorizer()
tf = tf_vectorizer.fit_transform(text_clean_list)
tf_feature_names = tf_vectorizer.get_feature_names()
lda = LatentDirichletAllocation(n_components=n_components, max_iter=5,
learning_method='online',
learning_offset=50.,
random_state=0)
lda.fit(tf)
print('topics...')
print(print_top_words(lda, tf_feature_names, n_top_words))
document_topics = lda.fit_transform(tf)
topics = print_top_words(lda, tf_feature_names, n_top_words)
for i in range(len(topics)):
print("Topic {}:".format(i))
docs = np.argsort(document_topics[:, i])[::-1]
for j in docs[:300]:
cleans = " ".join(text_clean_list[j].encode('utf-8').split(",")[:2])
print(cleans.encode('utf-8') + ',' + " ".join(text_raw_list[j].encode('utf-8').split(",")[:2]))
My output:
Traceback (most recent call last):
File "script.py", line 62, in
cleans = " ".join(text_clean_list[j].encode('utf-8').split(",")[:2])
TypeError: a bytes-like object is required, not 'str'

Let's look at the line in which the error raised:
cleans = " ".join(text_clean_list[j].encode('utf-8').split(",")[:2])
Let's go step by step:
text_clean_list[j] is of str type => no error until there
text_clean_list[j].encode('utf-8') is of bytes type => no error until there
text_clean_list[j].encode('utf-8').split(",") is wrong: the parameter "," passed to split() method is of str type, but it must have been of bytes type (because here split() is a method from a bytes object) => the error is raised, indicating that a bytes-like object is required, not 'str'.
Note: Replacing split(",") with split(b",") avoids the error (but it may not be the behavior you expect...)

cleans = " ".join(text_clean_list[j].encode('utf-8').split(",")[:2])
You are encoding the string inside text_clean_list[j] into the bytes but what about the split(",")?
"," still is a str. Now you are trying to split byte like object using a string.
Example:
a = "this,that"
>>> a.encode('utf-8').split(',')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: a bytes-like object is required, not 'str'
Edit
Solution:
1- One solution could be that don't encode your string object right now, just split first and then encode later on. Like in my example:
a = "this, that"
c = a.split(",")
cleans = [x.encode('utf-8') for x in c]
2- Just use a simple encoding of "," itself.
cleans = a.encode("utf-8").split("b")
Both yields same answer. It would be better if you could just come up with input and output examples.

python using a dictionary for translate() gives me 'TypeError: expected a character buffer object'

why would translate() gives me TypeError: expected a character buffer object error when using a dictionary?
remap = {
'with': 'TEXT1',
'as': 'TEXT2',
'text_in': 'TEXT3'
}
s = "with open(loc_path + 'sample_text.txt', 'rb') as text_in:"
ss = s.translate(remap)
print ss
here is the error message:
Traceback (most recent call last):
File "C:\...\REMAP1.py", line 9, in <module>
ss = s.translate(remap)
TypeError: expected a character buffer object
[Finished in 0.1s with exit code 1]
using replace() works:
#ss = s.translate(remap)
#print ss
s = s.replace('with', 'TEXT1')
s = s.replace('as', 'TEXT2')
s = s.replace('text_in', 'TEXT3')
print s
output:
TEXT1 open(loc_path + 'sample_text.txt', 'rb') TEXT2 TEXT3:
[Finished in 0.1s]

From the documentation (emphasis mine):
string.translate(s, table[, deletechars])
Delete all characters from s that are in deletechars (if present), and then translate the characters using table, which must be a 256-character string giving the translation for each character value, indexed by its ordinal.
In your code, remap does not fulfil this requirement.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to sum values for the same key [duplicate] - python

Related

python sum() function returning [TypeError: 'str' object is not callable] when used [duplicate]

Hangman code having trouble accessing other file

Error while working on .csv with load_csv

TypeError: a bytes-like object is required, not 'str': even with the encode

python using a dictionary for translate() gives me 'TypeError: expected a character buffer object'

Categories

Resources