wordcount: reducer python program throws ValueError - python

I get this error whenever I try running Reducer python program in Hadoop system. The Mapper program is perfectly running though. Have given the same permissions as my Mapper program. Is there a syntax error?
Traceback (most recent call last):
File "reducer.py", line 13, in
word, count = line.split('\t', 1)
ValueError: need more than 1 value to unpack
#!/usr/bin/env python
import sys
# maps words to their counts
word2count = {}
# input comes from STDIN
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# parse the input we got from mapper.py
word, count = line.split('\t', 1)
# convert count (currently a string) to int
try:
count = int(count)
except ValueError:
continue
try:
word2count[word] = word2count[word]+count
except:
word2count[word] = count
# write the tuples to stdout
# Note: they are unsorted
for word in word2count.keys():
print '%s\t%s'% ( word, word2count[word] )

The error ValueError: need more than 1 value to unpack is thrown when you do a multi-assign with too few values on the right hand side. So it looks like line has no \t in it, so line.split('\t',1) results in a single value, causing something like word, count = ("foo",).

I cannot answer in detail.
However, I solved the same issue I had when I removed some extra print I had added in the mapper. Probably it is related with how print works for sys.stdin.
I know probably you have already solved the issue now

I changed line.split('\t', 1) to line.split(' ', 1) and it worked.
It seems that the space is not clear, to be perfectly clear: It should be line.split('(one space here)', 1).

Related

Getting EOF error but running my code in Thonny produces no errors

I'm learning python and one of my labs required me to:
Write a program whose input is a string which contains a character and a phrase, and whose output indicates the number of times the character appears in the phrase. The output should include the input character and use the plural form, n's, if the number of times the characters appears is not exactly 1.
My code ended up being:
char = input()
string = input()
count = 0
for i in string:
if i == char:
count +=1
if count > 1 or count == 0:
print(f"{count} {char}'s")
else:
print(f'{count} {char}')
Whenever I run the code in Thonny or in the Zybooks development tab it works but when I select the submit option I keep getting and EOF error:
Traceback (most recent call last):
File "main.py", line 2, in <module>
string = input()
EOFError: EOF when reading a line
Does anyone know what's causing the error?
I tried using the break command but it didn't help though I think if I used break at the end of my for statement it wouldn't count all the way. Any ideas folks?
Thank you Mr. Roberts the number of inputs was the issue. I had to create a single input and pull what I needed from that single line. My code ended up being:
string = input()
char = string[0]
phrase = string[1:]
count = 0
for i in phrase:
if i == char:
count +=1
All good now.

Trying to transcribe using ascii values using ord and chr

Trying to make an encrypter that shifts ascii values of each character in a message by the value of a corresponding character in a password - Output always results in either a single character or a string index out of range error:
msg = input()
pw = input()
pwN = 0
msgN = 0
for i in msg:
newmsg =""
nchar = chr(ord(msg[msgN]) + ord(pw[pwN]))
pwN += 1
msgN += 1
if pwN > len(pw):
pwN = 0
newmsg += nchar
print (newmsg)
Running it in this form results in a single character rather than a message length string in some cases, and in others gives me this error:
Traceback (most recent call last):
File "file", line 8, in <module>
nchar = str(chr(ord(msg[msgN]) + ord(pw[pwN])))
IndexError: string index out of range
I can't figure out what I'm missing.
The issue is that you're setting newmsg to the empty string in each loop. Moving newmsg = "" before the for loop should fix the issue of single characters, although figuring out the out of range error is difficult because of your manual increasing of several indices while also iterating over msg.
I would suggest taking a look at the iteration features Python offers. You are technically iterating over msg, but never actually use i, instead relying solely on indices. A more pythonic way to solve this might be as follows:
from itertools import cycle
msg = input()
pw = input()
newmsg = ""
for mchar, pwchar in zip(msg, cycle(pw)): # cycle loops the password so that abc becomes abcabcabc...
newmsg += chr(ord(mchar) + ord(pwchar))
print(newmsg)
if you want to stick to the loop. I would even use a generator expression to make it
from itertools import cycle
msg = input()
pw = input()
newmsg = "".join(chr(ord(mchar) + ord(pwchar)) for mchar, pwchar in zip(msg, cycle(pw)))
print(newmsg)

Keep Getting ValueError: not enough values to unpack (expected 2, got 1) for a text file for sentiment analysis?

I am trying to turn this text file into a dictionary using the code below:
with open("/content/corpus.txt", "r") as my_corpus:
wordpoints_dict = {}
for line in my_corpus:
key, value = line.split('')
wordpoints_dict[key] = value
print(wordpoints_dict)
It keeps returning:
ValueError Traceback (most recent call last)
<ipython-input-18-8cf5e5efd882> in <module>()
2 wordpoints_dict = {}
3 for line in my_corpus:
----> 4 key, value = line.split('-')
5 wordpoints_dict[key] = value
6 print(wordpoints_dict)
ValueError: not enough values to unpack (expected 2, got 1)
The data in the text file looks like this:
Text Data
You are trying to split a text value at ‘-‘. And unpack it to two values (key (before the dash), value (after the dash)). However, some lines in your txt file do not contain a dash so there is not two values to unpack. Try checking for blank lines as this could be a cause of the issue.
Your code doesn't match the error message. I'm going to assume that the error message is the correct one...
Just add a little logic to handle the case where there isn't a - on a line. I wouldn't be surprised if you fixed that problem and then hit the other side of that problem, where the line has more than one -. If that occurs in your file, you'll have to deal with that case as well, as you'll get a "too many values to unpack" error then. Here's your code with the added boilerplate for doing both of these things:
with open("/content/corpus.txt", "r") as my_corpus:
wordpoints_dict = {}
for line in my_corpus:
parts = line.split('-')
if len(parts) == 1:
parts = (line, '') # If no '-', use an empty second value
elif len(parts) > 2:
parts = parts[:2] # If too many items from split, use the first two
key, value = [x.strip() for x in parts] # strip leading and trailing spaces
wordpoints_dict[key] = value
print(wordpoints_dict)

"IndexError: list index out of range" but it clearly isn't

First of all, I'm new to python, so apologies if this is a silly question (I've got a background in C++).
I'm attempting to split serial data (from arduino) into a list and print specific elements from the list into the console. I won't go into the project details because they aren't important.
The raw serial data looks like:
11111110,11111111,11111111
11111110,11111111,11111111
11111110,11111111,11111111
The code I'm trying to use is
#!/usr/bin/python
import serial, string
output = " "
ser = serial.Serial('/dev/ttyUSB0', 31250, 8, 'N', 1, timeout=1)
while True:
print "----"
while output != "":
output = ser.readline()
outList = output.strip().split(',')
print outList[1]
output = " "
I am getting the error:
Traceback (most recent call last):
File "serialtest.py", line 12, in <module>
print outList[1]
IndexError: list index out of range
I have tried replacing print outList[1] with print(outList), and I get the expected result of:
['11111110', '11111111', '11111111']
['11111110', '11111111', '11111111']
['11111110', '11111111', '11111111']
I CAN get print outList[0] to work, which prints 11111110. This suggests maybe it doesn't like 11111111?
You need to test output being an empty string before you try to split it.
You also need to strip outLine before testing it, not just when you split it.
while True:
print "----"
while True:
output = ser.readline().strip()
if output == "":
break
outList = output.split(',')
print outList[1]

Why am I getting an IndexError in Python 3 when indexing a string and not slicing?

I'm new to programming, and experimenting with Python 3. I've found a few topics which deal with IndexError but none that seem to help with this specific circumstance.
I've written a function which opens a text file, reads it one line at a time, and slices the line up into individual strings which are each appended to a particular list (one list per 'column' in the record line). Most of the slices are multiple characters [x:y] but some are single characters [x].
I'm getting an IndexError: string index out of range message, when as far as I can tell, it isn't. This is the function:
def read_recipe_file():
recipe_id = []
recipe_book = []
recipe_name = []
recipe_page = []
ingred_1 = []
ingred_1_qty = []
ingred_2 = []
ingred_2_qty = []
ingred_3 = []
ingred_3_qty = []
f = open('recipe-file.txt', 'r') # open the file
for line in f:
# slice out each component of the record line and store it in the appropriate list
recipe_id.append(line[0:3])
recipe_name.append(line[3:23])
recipe_book.append(line[23:43])
recipe_page.append(line[43:46])
ingred_1.append(line[46])
ingred_1_qty.append(line[47:50])
ingred_2.append(line[50])
ingred_2_qty.append(line[51:54])
ingred_3.append(line[54])
ingred_3_qty.append(line[55:])
f.close()
return recipe_id, recipe_name, recipe_book, recipe_page, ingred_1, ingred_1_qty, ingred_2, ingred_2_qty, ingred_3, \
ingred_3_qty
This is the traceback:
Traceback (most recent call last):
File "recipe-test.py", line 84, in <module>
recipe_id, recipe_book, recipe_name, recipe_page, ingred_1, ingred_1_qty, ingred_2, ingred_2_qty, ingred_3, ingred_3_qty = read_recipe_file()
File "recipe-test.py", line 27, in read_recipe_file
ingred_1.append(line[46])
The code which calls the function in question is:
print('To show list of recipes: 1')
print('To add a recipe: 2')
user_choice = input()
recipe_id, recipe_book, recipe_name, recipe_page, ingred_1, ingred_1_qty, ingred_2, ingred_2_qty, \
ingred_3, ingred_3_qty = read_recipe_file()
if int(user_choice) == 1:
print_recipe_table(recipe_id, recipe_book, recipe_name, recipe_page, ingred_1, ingred_1_qty,
ingred_2, ingred_2_qty, ingred_3, ingred_3_qty)
elif int(user_choice) == 2:
#code to add recipe
The failing line is this:
ingred_1.append(line[46])
There are more than 46 characters in each line of the text file I am trying to read, so I don't understand why I'm getting an out of bounds error (a sample line is below). If I change to the code to this:
ingred_1.append(line[46:])
to read a slice, rather than a specific character, the line executes correctly, and the program fails on this line instead:
ingred_2.append(line[50])
This leads me to think it is somehow related to appending a single character from the string, rather than a slice of multiple characters.
Here is a sample line from the text file I am reading:
001Cheese on Toast Meals For Two 012120038005002
I should probably add that I'm well aware this isn't great code overall - there are lots of ways I could generally improve the program, but as far as I can tell the code should actually work.
This will happen if some of the lines in the file are empty or at least short. A stray newline at the end of the file is a common cause, since that comes up as an extra blank line. The best way to debug a case like this is to catch the exception, and investigate the particular line that fails (which almost certainly won't be the sample line you reproduced):
try:
ingred_1.append(line[46])
except IndexError:
print(line)
print(len(line))
Catching this exception is also usually the right way to deal with the error: you've detected a pathological case, and now you can consider what to do. You might for example:
continue, which will silently skip processing that line,
Log something and then continue
Bail out by raising a new, more topical exception: eg raise ValueError("Line too short").
Printing something relevant, with or without continuing, is almost always a good idea if this represents a problem with the input file that warrants fixing. Continuing silently is a good option if it is something relatively trivial, that you know can't cause flow-on errors in the rest of your processing. You may want to differentiate between the "too short" and "completely empty" cases by detecting the "completely empty" case early such as by doing this at the top of your loop:
if not line:
# Skip blank lines
continue
And handling the error for the other case appropriately.
The reason changing it to a slice works is because string slices never fail. If both indexes in the slice are outside the string (in the same direction), you will get an empty string - eg:
>>> 'abc'[4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
>>> 'abc'[4:]
''
>>> 'abc'[4:7]
''
Your code fails on line[46] because line contains fewer than 47 characters. The slice operation line[46:] still works because an out-of-range string slice returns an empty string.
You can verify that the line is too short by replacing
ingred_1.append(line[46])
with
try:
ingred_1.append(line[46])
except IndexError:
print('line = "%s", length = %d' % (line, len(line)))

Categories

Resources