How to resolve ValueError in following code? - python

The votes are in… and it's up to you to make sure the correct winner is announced!
You've been given a CSV file called nominees.csv, which contains the names of various movies nominated for a prize, and the people who should be announced as the recipient. The file will look like this:
title,director(s)
Schindler's List,Steven Spielberg
"O Brother, Where Art Thou?","Joel Coen, Ethan Coen"
2001: A Space Odyssey,Stanley Kubrick
"Sherlock, Jr.","Buster Keaton, Roscoe Arbuckle"
You should write a program that reads in nominees.csv, asks for the name of the winning title, and prints out specific congratulations. For example, with the above file, your program should work like this:
Winning title: O Brother, Where Art Thou?
Congratulations: Joel Coen, Ethan Coen
Here is another example, using the same file:
Winning title: Schindler's List
Congratulations: Steven Spielberg
Already tried submitting and altering values but line number 10 always gives value error and so does line number 15. When a list of new nominees is applied, it gives the error and fails my code.
def main():
film_director=[]
with open('nominees.csv','r') as read_file:
lines=read_file.readlines()
lines=lines[1:]
for line in lines:
if '"' in line:
if line[0]=='"':
index_second_quotes=line.index('"',1)
index_third_quotes=line.index('"',index_second_quotes+1)
title = line[:index_second_quotes].strip('\"')
directors=line[index_third_quotes:-1].strip('\"').strip()
else:
index_first_quotes = line.index('"')
index_second_quotes = line.index('"', index_first_quotes+1)
title = line[:index_first_quotes-1].strip('\"')
directors = line[index_first_quotes+1:-1].strip('\"').strip()
film_director.append([title,directors])
else:
tokens = line.split(',')
film_director.append([tokens[0].strip(),tokens[1].strip()])
title = input('Winning title: ')
for row in film_director:
if title.strip()==row[0]:
print('Congratulations:',row[1])
break
main()
The error message given is:
Testing a new nominees file. Your submission raised an exception of type ValueError. This occurred on line 10 of program.py.

The above number of condition checks, splitting, concatenation can be omitted with regular expression. You can make use of the below code with a single regular expression and a split
import re
with open("nominees.csv") as cf:
lines = cf.readlines()
for line in lines[1:]:
reg_match = re.match(r'"([^""]*)","([^""]*)"$', line)
if reg_match:
win_title, director = reg_match.group(1), reg_match.group(2)
else:
win_title, director = line.split(",")
print("Winning title: %s" % win_title)
print("Congratulations: %s" % director.strip())

Related

Parsing already parsed results with BeautifulSoup

I have a question with using python and beautifulsoup.
My end result program basically fills out a form on a website and brings me back the results which I will eventually output to an lxml file. I'll be taking the results from https://interactive.web.insurance.ca.gov/survey/survey?type=homeownerSurvey&event=HOMEOWNERS and I want to get a list for every city all into some excel documents.
Here is my code, I put it on pastebin:
http://pastebin.com/bZJfMp2N
MY RESULTS ARE ALMOST GOOD :D except now I'm getting            355 for my "correct value" instead of 355, for example. I want to parse that and only show the number, you will see when you put this into python.
However, anything I have tried does NOT work, there is no way I can parse that values_2 variable because the results are in bs4.element.resultset when I think i need to parse a string. Sorry if I am a noob, I am still learning and have worked very long on this program.
Would anyone have any input? Anything would be appreciated! I've read up that my results are in a list or something and i can't parse lists? How would I go about doing this?
Here is the code:
__author__ = 'kennytruong'
#THE PROBLEM HERE IS TO PARSE THE RESULTS PROPERLY!!
import urllib.parse, urllib.request
import re
from bs4 import BeautifulSoup
URL = "https://interactive.web.insurance.ca.gov/survey/survey?type=homeownerSurvey&event=HOMEOWNERS"
#Goes through these locations, strips the whitespace in the string and creates a list that starts at every new line
LOCATIONS = '''
ALAMEDA ALAMEDA
'''.strip().split('\n') #strip() basically removes whitespaces
print('Available locations to choose from:', LOCATIONS)
INSURANCE_TYPES = '''
HOMEOWNERS,CONDOMINIUM,MOBILEHOME,RENTERS,EARTHQUAKE - Single Family,EARTHQUAKE - Condominium,EARTHQUAKE - Mobilehome,EARTHQUAKE - Renters
'''.strip().split(',') #strips the whitespaces and starts a newline of the list every comma
print('Available insurance types to choose from:', INSURANCE_TYPES)
COVERAGE_AMOUNTS = '''
15000,25000,35000,50000,75000,100000,150000,200000,250000,300000,400000,500000,750000
'''.strip().split(',')
print('All options for coverage amounts:', COVERAGE_AMOUNTS)
HOME_AGE = '''
New,1-3 Years,4-6 Years,7-15 Years,16-25 Years,26-40 Years,41-70 Years
'''.strip().split(',')
print('All Home Age Options:', HOME_AGE)
def get_premiums(location, coverage_type, coverage_amt, home_age):
formEntries = {'location':location,
'coverageType':coverage_type,
'coverageAmount':coverage_amt,
'homeAge':home_age}
inputData = urllib.parse.urlencode(formEntries)
inputData = inputData.encode('utf-8')
request = urllib.request.Request(URL, inputData)
response = urllib.request.urlopen(request)
responseData = response.read()
soup = BeautifulSoup(responseData, "html.parser")
parseResults = soup.find_all('tr', {'valign':'top'})
for eachthing in parseResults:
parse_me = eachthing.text
name = re.findall(r'[A-z].+', parse_me) #find me all the words that start with a cap, as many and it doesn't matter what kind.
# the . for any character and + to signify 1 or more of it.
values = re.findall(r'\d{1,10}', parse_me) #find me any digits, however many #'s long as long as btwn 1 and 10
values_2 = eachthing.find_all('div', {'align':'right'})
print('raw code for this part:\n' ,eachthing, '\n')
print('here is the name: ', name[0], values)
print('stuff on sheet 1- company name:', name[0], '- Premium Price:', values[0], '- Deductible', values[1])
print('but here is the correct values - ', values_2) #NEEDA STRIP THESE VALUES
# print(type(values_2)) DOING SO GIVES ME <class 'bs4.element.ResultSet'>, NEEDA PARSE bs4.element type
# values_3 = re.split(r'\d', values_2)
# print(values_3) ANYTHING LIKE THIS WILL NOT WORK BECAUSE I BELIEVE RESULTS ARENT STRING
print('\n\n')
def main():
for location in LOCATIONS: #seems to be looping the variable location in LOCATIONS - each location is one area
print('Here are the options that you selected: ', location, "HOMEOWNERS", "150000", "New", '\n\n')
get_premiums(location, "HOMEOWNERS", "150000", "New") #calls function get_premiums and passes parameters
if __name__ == "__main__": #this basically prevents all the indent level 0 code from getting executed, because otherwise the indent level 0 code gets executed regardless upon opening
main()

Python list out of bounds

I have the following function in python that takes input and parses it into a dictionary. I am trying to pass it the following input and for some reason on the lines artist=block[0] causes it to break because the list index is out of range and I am really confused why. It breaks after reading in the second Led Zeppelin. Any help with this issue would be greatly appreciated.
Input
Led Zeppelin
1969 II
-Whole Lotta Love
-What Is and What Should Never Be
-The Lemon Song
-Thank You
-Heartbreaker
-Living Loving Maid (She's Just a Woman)
-Ramble On
-Moby Dick
-Bring It on Home
Led Zeppelin
1979 In Through the Outdoor
-In the Evening
-South Bound Saurez
-Fool in the Rain
-Hot Dog
-Carouselambra
-All My Love
-I'm Gonna Crawl
Hello
Hello
Hello
Hello
Bob Dylan
1966 Blonde on Blonde
-Rainy Day Women #12 & 35
-Pledging My Time
-Visions of Johanna
-One of Us Must Know (Sooner or Later)
-I Want You
-Stuck Inside of Mobile with the Memphis Blues Again
-Leopard-Skin Pill-Box Hat
-Just Like a Woman
-Most Likely You Go Your Way (And I'll Go Mine)
-Temporary Like Achilles
-Absolutely Sweet Marie
-4th Time Around
-Obviously 5 Believers
-Sad Eyed Lady of the Lowlands
Function
def add(data, block):
artist = block[0]
album = block[1]
songs = block[2:]
if artist in data:
data[artist][album] = songs
else:
data[artist] = {album: songs}
return data
def parseData():
global data,file
file=os.getenv('CDDB')
data = {}
with open(file) as f:
block = []
for line in f:
line = line.strip()
if line == '':
data = add(data, block)
block = []
else:
block.append(line)
data = add(data, block)
f.close()
return data
Just add a sanity check to your add() function:
def add(data, block):
if not block:
return
Also, there is no good reason to use global variables. Here's an illustration:
def parseData(path):
data = {}
block = []
with open(path) as f:
for line in f:
line = line.strip()
if line == '':
add(data, block)
block = []
else:
block.append(line)
add(data, block)
return data

Determining a pattern of lines in Python

I'm new to Python and having trouble thinking about this problem Pythonically. I have a text file of SMS messages. There are multi-line statements I'd like to capture.
import fileinput
parsed = {}
for linenum, line in enumerate(fileinput.input()):
### Process the input data ###
try:
parsed[linenum] = line
except (KeyError, TypeError, ValueError):
value = None
###############################################
### Now have dict with value: "data" pairing ##
### for every text message in the archive #####
###############################################
for item in parsed:
sent_or_rcvd = parsed[item][:4]
if sent_or_rcvd != "rcvd" and sent_or_rcvd != "sent" and sent_or_rcvd != '--\n':
###########################################
### Know we have a second or third line ###
###########################################
But here's where I hit a wall. I'm not sure what's the best way to contain the strings I get here. I'd love some expert input. Using Python 2.7.3 but glad to move to 3.
Goal: have a human-readable file full of three-line quotes from these SMS.
Example text:
12425234123|2011-03-19 11:03:44|words words words words
12425234123|2011-03-19 11:04:27|words words words words
12425234123|2011-03-19 11:05:04|words words words words
12482904328|2011-03-19 11:13:31|words words words words
--
12482904328|2011-03-19 15:50:48|More bolder than flow
More cumbersome than pleasure;
Goodbye rocky dump
--
(Yes, before you ask, that's a haiku about poo. I'm trying to capture them from the last 5 years of texting my best friend.)
Ideally resulting in something like:
Haipu 3
2011-03-19
More bolder than flow
More cumbersome than pleasure;
Goodbye rocky dump
import time
data = """12425234123|2011-03-19 11:03:44|words words words words
12425234123|2011-03-19 11:04:27|words words words words
12425234123|2011-03-19 11:05:04|words words words words
12482904328|2011-03-19 11:13:31|words words words words
--
12482904328|2011-03-19 15:50:48|More bolder than flow
More cumbersome than pleasure;
Goodbye rocky dump """.splitlines()
def get_haikus(lines):
haiku = None
for line in lines:
try:
ID, timestamp, txt = line.split('|')
t = time.strptime(timestamp, "%Y-%m-%d %H:%M:%S")
ID = int(ID)
if haiku and len(haiku[1]) ==3:
yield haiku
haiku = (timestamp, [txt])
except ValueError: # happens on error with split(), time or int conversion
haiku[1].append(line)
else:
yield haiku
# now get_haikus() returns tuple (timestamp, [lines])
for haiku in get_haikus(data):
timestamp, text = haiku
date = timestamp.split()[0]
text = '\n'.join(text)
print """{d}\n{txt}""".format(d=date, txt=text)
A good start might be something like the following. I'm reading data from a file named data2 but the read_messages generator will consume lines from any iterable.
#!/usr/bin/env python
def read_messages(file_input):
message = []
for line in file_input:
line = line.strip()
if line[:4].lower() in ('rcvd', 'sent', '--'):
if message:
yield message
message = []
else:
message.append(line)
if message:
yield message
with open('data2') as file_input:
for msg in read_messages(file_input):
print msg
This expects input to look something like the following:
sent
message sent away
it has multiple lines
--
rcvd
message received
rcvd
message sent away
it has multiple lines

Trying to check if 2 values match in a file

this is a code from a chat bot, and it's purpose is to save into a file all information about an user. That will work fine as long as it's only in 1 room, but if i want to save information of the same user in 2 different rooms, i got a problem. The bot won't just update the information getting the user and the room, instead it will always create new and new lines of that user and that room.
It's getting annoying and i would really like to not break this code a lot, so i'd like to know where it fails and how to fix it in a proper way without using dicts. (You can read all the comments inside the code to understand how i think it works).
Thank you for your time.
#First of all it reads the file
leyendoestadisticas = open("listas\Estadisticas.txt", "r")
bufferestadisticas = leyendoestadisticas.read()
leyendoestadisticas.close()
if not '"'+user.name+'"' in bufferestadisticas: #If the name of the user is not there, it adds all the information.
escribiendoestadisticas = open("listas\Estadisticas.txt", 'a')
escribiendoestadisticas.write(json.dumps([user.name, palabrasdelafrase, letrasdelafrase,
"1", user.nameColor, user.fontColor, user.fontFace, user.fontSize,
message.body.replace('"', "'"), room.name, 0, "primermensajitodeesapersona", fixedrooms])+"\n")
escribiendoestadisticas.close()
else: #If the name it's there, it will do the next:
#First of all, get all rooms where the name is saved, to do that...
listadesalas = []
for line in open("listas\Estadisticas.txt", 'r'):
retrieved3 = json.loads(line)
if retrieved3[0] == user.name: #If the name is found
if not retrieved3[9] == room.name: #But room is diferent
listadesalas.append(retrieved3[9]) #Adds the room to a temporal list
#Now that we got a list with all different lines of that user based on rooms... we do the next code
data = []
hablaenunanuevasala = "no"
with open('listas\Estadisticas.txt', 'r+') as f:
for line in f:
data_line = json.loads(line)
if data_line[0] == user.name: #If name is there
if data_line[9] == room.name: #And the room matches with actual room, then update that line.
data_line[1] = int(data_line[1])+int(palabrasdelafrase)
data_line[2] = int(data_line[2])+int(letrasdelafrase)
data_line[3] = int(data_line[3])+1
data_line[4] = user.nameColor
data_line[5] = user.fontColor
data_line[6] = user.fontFace
data_line[7] = user.fontSize
data_line[11] = data_line[8]
data_line[8] = message.body.replace('"', "'")
data_line[9] = room.name
data_line[12] = fixedrooms
else: #but if the user is there and room NOT matches, we want to add a new line to the file with the same user but a new room.
if not room.name in listadesalas: #And here is where i believe is the problem of my code.
hablaenunanuevasala = "si" #needed since i didn't found a way to properly add a new line inside this loop, so must be done outside the loop later.
data.append(data_line)
f.seek(0)
f.writelines(["%s\n" % json.dumps(i) for i in data])
f.truncate()
#Outside the loop - This would work if the program noticed it's a room that is not saved yet in the file for that user.
if hablaenunanuevasala == "si":
escribiendoestadisticas2 = open("listas\Estadisticas.txt", 'a')
escribiendoestadisticas2.write(json.dumps([user.name, palabrasdelafrase, letrasdelafrase,
"1", user.nameColor, user.fontColor, user.fontFace, user.fontSize,
message.body.replace('"', "'"), room.name, 0, "primermensajitodeesapersona", fixedrooms])+"\n")
escribiendoestadisticas2.close()
So... that's what i tried, and it works perfect as long as it's 1 room, it updates the info all the time. When i speak in a second room, it adds me a new record with that second room (perfect). But then if i speak again in ANY of those 2 rooms, the bot will add 2 more lines of code to the file instead of updating the information of the room where i did speak.
Edit Let me summarize it:
Let's say I speak in "whenever" room, the file will save a record
["saelyth", "whenever", "more info"]
If i speak in another room, the file should save a record
["saelyth", "anotherroom", "more info"]
It works great... but then it doesn't update the info. If now i speak in any of those 2 rooms, instead of updating the proper line, the bot will add more new lines into the file, wich is the problem.
Fix done... somehow.
I did choose to save info into different files for each room, that works.

Parsing chat messages as config

I'm trying write a function that would be able to parse out a file with defined messages for a set of replies but am at loss on how to do so.
For example the config file would look:
[Message 1]
1: Hey
How are you?
2: Good, today is a good day.
3: What do you have planned?
Anything special?
4: I am busy working, so nothing in particular.
My calendar is full.
Each new line without a number preceding it is considered part of the reply, just another message in the conversation without waiting for a response.
Thanks
Edit: The config file will contain multiple messages and I would like to have the ability to randomly select from them all. Maybe store each reply from a conversation as a list, then the replies with extra messages can carry the newline then just split them by the newline. I'm not really sure what would be the best operation.
Update:
I've got for the most part this coded up so far:
def parseMessages(filename):
messages = {}
begin_message = lambda x: re.match(r'^(\d)\: (.+)', x)
with open(filename) as f:
for line in f:
m = re.match(r'^\[(.+)\]$', line)
if m:
index = m.group(1)
elif begin_message(line):
begin = begin_message(line).group(2)
else:
cont = line.strip()
else:
# ??
return messages
But now I am stuck on being able to store them into the dict the way I'd like..
How would I get this to store a dict like:
{'Message 1':
{'1': 'How are you?\nHow are you?',
'2': 'Good, today is a good day.',
'3': 'What do you have planned?\nAnything special?',
'4': 'I am busy working, so nothing in particular.\nMy calendar is full'
}
}
Or if anyone has a better idea, I'm open for suggestions.
Once again, thanks.
Update Two
Here is my final code:
import re
def parseMessages(filename):
all_messages = {}
num = None
begin_message = lambda x: re.match(r'^(\d)\: (.+)', x)
with open(filename) as f:
messages = {}
message = []
for line in f:
m = re.match(r'^\[(.+)\]$', line)
if m:
index = m.group(1)
elif begin_message(line):
if num:
messages.update({num: '\n'.join(message)})
all_messages.update({index: messages})
del message[:]
num = int(begin_message(line).group(1))
begin = begin_message(line).group(2)
message.append(begin)
else:
cont = line.strip()
if cont:
message.append(cont)
return all_messages
Doesn't sound too difficult. Almost-Python pseudocode:
for line in configFile:
strip comments from line
if line looks like a section separator:
section = matched section
elsif line looks like the beginning of a reply:
append line to replies[section]
else:
append line to last reply in replies[section][-1]
You may want to use the re module for the "looks like" operation. :)
If you have a relatively small number of strings, why not just supply them as string literals in a dict?
{'How are you?' : 'Good, today is a good day.'}

Categories

Resources