I have the following function in python that takes input and parses it into a dictionary. I am trying to pass it the following input and for some reason on the lines artist=block[0] causes it to break because the list index is out of range and I am really confused why. It breaks after reading in the second Led Zeppelin. Any help with this issue would be greatly appreciated.
Input
Led Zeppelin
1969 II
-Whole Lotta Love
-What Is and What Should Never Be
-The Lemon Song
-Thank You
-Heartbreaker
-Living Loving Maid (She's Just a Woman)
-Ramble On
-Moby Dick
-Bring It on Home
Led Zeppelin
1979 In Through the Outdoor
-In the Evening
-South Bound Saurez
-Fool in the Rain
-Hot Dog
-Carouselambra
-All My Love
-I'm Gonna Crawl
Hello
Hello
Hello
Hello
Bob Dylan
1966 Blonde on Blonde
-Rainy Day Women #12 & 35
-Pledging My Time
-Visions of Johanna
-One of Us Must Know (Sooner or Later)
-I Want You
-Stuck Inside of Mobile with the Memphis Blues Again
-Leopard-Skin Pill-Box Hat
-Just Like a Woman
-Most Likely You Go Your Way (And I'll Go Mine)
-Temporary Like Achilles
-Absolutely Sweet Marie
-4th Time Around
-Obviously 5 Believers
-Sad Eyed Lady of the Lowlands
Function
def add(data, block):
artist = block[0]
album = block[1]
songs = block[2:]
if artist in data:
data[artist][album] = songs
else:
data[artist] = {album: songs}
return data
def parseData():
global data,file
file=os.getenv('CDDB')
data = {}
with open(file) as f:
block = []
for line in f:
line = line.strip()
if line == '':
data = add(data, block)
block = []
else:
block.append(line)
data = add(data, block)
f.close()
return data
Just add a sanity check to your add() function:
def add(data, block):
if not block:
return
Also, there is no good reason to use global variables. Here's an illustration:
def parseData(path):
data = {}
block = []
with open(path) as f:
for line in f:
line = line.strip()
if line == '':
add(data, block)
block = []
else:
block.append(line)
add(data, block)
return data
Related
I have a textfile for some carpool information, where the driver of the car is on the top, and passengers are listed under the driver and indented, I would like to have a system that reliably finds a passenger, then it will find who's the car driver. Here is an example of what it would look like.
Car: Steven
Jerry
Elaine
George
Car: Ross
Rachel
Joey
Car: Steve
Karl
Eric
Red
Ryan
I would like a function that takes a name like Eric and returns Steve, or takes Joey and returns Ross. The file will be much longer in actuality, but this is a snippet of what it looks like.
Assuming you can fit everything in memory and data looks something like
data = """Car: Steven
Jerry
Elaine
George
Car: Ross
Rachel
Joey
Car: Steve
Karl
Eric
Red
Ryan
"""
Then you can define a function by splitting at "Car:":
def find_driver(data, person):
for people in [car.split() for car in data.split("Car:")[1:]]:
if person in people:
return people[0]
return None
Then
find_driver(data, "Eric")
find_driver(data, "Joey")
returns the desired results.
Note splitting at "Car:" always gives a blank 0th element, so we slice past that. Then, we split the data normally (at newlines and spaces) using car.split(). Now we just search in order for what car a person is in.
The following function takes the text of the file as input and produces a dictionary that maps drivers to their passengers.
def parse_lines(text):
driver_to_passengers = dict()
car_prefix = "Car: "
current_driver = None
for line in text.split("\n"):
line = line.strip()
if line == "":
continue
if car_prefix in line:
driver_name = line.split(car_prefix)[-1]
driver_to_passengers[driver_name] = set()
current_driver = driver_name
else:
passenger_name = line
driver_to_passengers[current_driver].add(passenger_name)
return driver_to_passengers
To lookup a passenger, you can do the following:
driver_to_passenger = parse_lines(text)
passenger = "Joey" # Example passenger name
found = False
for driver in driver_to_passengers:
if passenger in driver_to_passengers[driver]:
found = True
break
This is a solution that compacts the space needed (the mapping is between driver and passenger) but increases the lookup time (since we are checking each driver's passengers). The lookup complexity for a single passenger is O(D) where D is the number of drivers (O(D) for iterating over the drivers, O(1) the lookup in the set).
A faster solution for your problem would be to reverse the mapping so that it now is passenger -> driver (but with this you will have some redundancy since a driver can have many passengers). The lookup complexity is reduced to O(1), and the two pieces of code above become:
def parse_lines(text):
passenger_to_driver = dict()
car_prefix = "Car: "
current_driver = None
for line in text.split("\n"):
line = line.strip()
if line == "":
continue
if car_prefix in line:
driver_name = line.split(car_prefix)[-1]
current_driver = driver_name
else:
passenger_name = line
passenger_to_driver[passenger_name] = current_driver
return passenger_to_driver
And the lookup:
passenger_to_driver = parse_lines(text)
passenger = "Joey" # Example passenger name
driver = passenger_to_driver.get(passenger, None) # driver will be None if the passenger does not exist
The votes are in… and it's up to you to make sure the correct winner is announced!
You've been given a CSV file called nominees.csv, which contains the names of various movies nominated for a prize, and the people who should be announced as the recipient. The file will look like this:
title,director(s)
Schindler's List,Steven Spielberg
"O Brother, Where Art Thou?","Joel Coen, Ethan Coen"
2001: A Space Odyssey,Stanley Kubrick
"Sherlock, Jr.","Buster Keaton, Roscoe Arbuckle"
You should write a program that reads in nominees.csv, asks for the name of the winning title, and prints out specific congratulations. For example, with the above file, your program should work like this:
Winning title: O Brother, Where Art Thou?
Congratulations: Joel Coen, Ethan Coen
Here is another example, using the same file:
Winning title: Schindler's List
Congratulations: Steven Spielberg
Already tried submitting and altering values but line number 10 always gives value error and so does line number 15. When a list of new nominees is applied, it gives the error and fails my code.
def main():
film_director=[]
with open('nominees.csv','r') as read_file:
lines=read_file.readlines()
lines=lines[1:]
for line in lines:
if '"' in line:
if line[0]=='"':
index_second_quotes=line.index('"',1)
index_third_quotes=line.index('"',index_second_quotes+1)
title = line[:index_second_quotes].strip('\"')
directors=line[index_third_quotes:-1].strip('\"').strip()
else:
index_first_quotes = line.index('"')
index_second_quotes = line.index('"', index_first_quotes+1)
title = line[:index_first_quotes-1].strip('\"')
directors = line[index_first_quotes+1:-1].strip('\"').strip()
film_director.append([title,directors])
else:
tokens = line.split(',')
film_director.append([tokens[0].strip(),tokens[1].strip()])
title = input('Winning title: ')
for row in film_director:
if title.strip()==row[0]:
print('Congratulations:',row[1])
break
main()
The error message given is:
Testing a new nominees file. Your submission raised an exception of type ValueError. This occurred on line 10 of program.py.
The above number of condition checks, splitting, concatenation can be omitted with regular expression. You can make use of the below code with a single regular expression and a split
import re
with open("nominees.csv") as cf:
lines = cf.readlines()
for line in lines[1:]:
reg_match = re.match(r'"([^""]*)","([^""]*)"$', line)
if reg_match:
win_title, director = reg_match.group(1), reg_match.group(2)
else:
win_title, director = line.split(",")
print("Winning title: %s" % win_title)
print("Congratulations: %s" % director.strip())
I have been working to create a script where me as a User inputs into a txt file names that I want to compare if its in a function (Which generates 100 random names) and see if there is matched names.
I have created this code:
import json, time, sys, os, timeit, random, colorama, requests, traceback, multiprocessing, re
from random import choice
import threading
def get_names():
name_test = [line.rstrip('\n') for line in open('randomnames.txt')]
return name_test
def filter(thread, i):
text = thread
positive_keywords = [i]
has_good = False
for ch in ['&', '#', '“', '”', '"', '*', '`', '*', '’', '-']:
if ch in text:
text = text.replace(ch, "")
sentences = [text]
def check_all(sentence, ws):
return all(re.search(r'\b{}\b'.format(w), sentence) for w in ws)
for sentence in sentences:
if any(check_all(sentence, word.split('+')) for word in positive_keywords):
has_good = True
break
if not has_good or i == "":
sys.exit()
print('Matched ' + text)
def main():
old_list = []
old_names_list = []
while True:
new_names_list = [line.rstrip('\n') for line in open('names.txt')]
for new_thread in get_names():
if not new_names_list == old_names_list:
for i in new_names_list:
if not i in old_names_list:
threading.Thread(target=filter, args=(new_thread, i)).start()
if new_thread not in old_list:
old_list.append(new_thread)
elif new_thread not in old_list:
threading.Thread(target=filter, args=(new_thread, new_names_list)).start()
old_list.append(new_thread)
else:
randomtime = random.randint(1, 3)
print('No changes!')
time.sleep(randomtime)
old_names_list = new_names_list
if __name__ == '__main__':
try:
main()
except KeyboardInterrupt:
print('Keyboard - Interrupted' )
sys.exit()
randomnames.txt
Alejandro
Tisha
Eleni
Milton
Jeanice
Billye
Vicki
Shelba
Valorie
Penelope
Mellissa
Ambrose
Retta
Milissa
Charline
Brittny
Ehtel
Hilton
Hobert
Lakendra
Silva
Lawana
Sidney
Janeen
Audrea
Orpha
Peggy
Kay
Marvis
Tia
Randy
Cary
Santana
Roma
Mandi
Tyrone
Felix
Maybelle
Leonia
Micha
Idalia
Aleida
Elfrieda
Velia
Cassondra
Drucilla
Oren
Kristina
Madison
Dia
names.txt
Alejandro
Tisha
Eleni
Dia
Hobert
How the code works:
It starts by the main where there is a old_list which saves the new_thread value (so it doesn't loop again) and old_names_list where I am gonna save the names.txt one by one.
In the while True that is running for ever, We open the file names.txt and then we enter the for new_thread in get_names(): meaning it will loop through the whole list of randomnames.txt where new_thread is gonna be name by name of randomnames.txt.
After that we check if not new_names_txt == old_names_list: is True. - What this code does is that it looks if the first name in names.txt is inside old_names_list if not wthen we continue to create e thread that is gonna continue in filter(thread,i) and see if it matches. The point here is that one name by the time is supposed to check all names through get_names(): before continue the next names.txt row.
And here is my mostly problem so I don't think I need to explain the rest. My problem here is that I have etc 50 randomnames.txt names meaning if I search for one name from names.txt and check through for new_thread in get_names(): that means its gonna create 50 threads to see if there is any matching. Once the first names.txt name is done, it starts the other one. Meaning that it will create new 50 threads and see if there is any matches. and so on until the names.txt is empty.
My question here is - Is there any better ways to example save maybe all the names in a set() or list or whatever is best and then send it to filter() which is going to check all names.txt for each new_thread that is running?
What results do I except?
A result I want to have is that when I run the script first time. It checks all names.txt and store them into dict or list and then sends it to filter. Once its done its going to hit the "No changes!" since there is nothing new added. But if you add a new name in names.txt. Its gonna make the if not new_names_list == old_names_list: to be true since the list is not the same. So what I want it to do is that it should only check the new name that got added on names.txt to check all new_threads and see if its matching.
If I understood well you want to check if any of the names in names.txt is in randomnames.txt, how about this ?
NAME_LIST_FILE_PATH = r'C:\Temp\randomnames.txt'
NAME_INPUT_FILE_PATH = r'C:\Temp\names.txt'
with open(NAME_LIST_FILE_PATH, 'r') as name_list_file:
name_list = [name for name in name_list_file]
with open(NAME_INPUT_FILE_PATH, 'r') as name_input_file:
name_input_list = [name for name in name_input_file]
matched_names = []
unmatched_names = []
for name in name_input_list:
if name in name_list:
matched_names.append(name)
else:
unmatched_names.append(name)
print('Matched names:\n{matched}\nUnmatched names:\n{unmatched}'.format(
matched=''.join(matched_names),
unmatched=''.join(unmatched_names)
))
output:
λ python "C:\Temp\so_test.py"
Matched names:
Alejandro
Tisha
Eleni
Unmatched names:
Dia
Hobert
edit: no need for so many newlines, they get copied in from the initial for name in name_*_file
I have an input file that's in the following format.
Fred,Karl,Technician,2010--Karl,Cathy,VP,2009--Cathy,NULL,CEO,2007--
--Vince,Cathy,Technician,2010
I need to parse this information to where it ends up looking something like this in an output file:
Cathy (CEO) 2007
-Karl (VP) 2009
--Fred (Technician) 2010
-Vince (Technician) 2010
With the CEO at the top, each subordinate should be under their superior. So whatever the second name is, that is the supervisor. The trick is that if an employee has 2 supervisors, they need to be indented twice "--" with their immediate supervisor above.
I've tried iterating through the list and parsing through the "--" and the commas but I'm struggling with the structure itself. This is what I have so far.
with open('org_chart_sample.in', 'r') as reader: # Open the input file
with open('output.out', 'w') as writer: # Make output file writable
reader.readline() # Ignore first line
lines = reader.readlines() # Read input lines
for line in lines: # Parse out input by the -- which separated attributes of people in the org
employees = line.split('--')
hierarchy = [] # Exterior list to aid in hierarchy
for employee in employees: # Logic that adds to the hierarchy list as algorithm runs
info = employee.split(',')
hierarchy.append(info)
I've been stuck on this problem for longer that I'd like to admit :(
Cool question, it was fun to work on. I tried to be thorough, and it ended up getting kind of long, I hope it's still readable.
Code:
##########################
#Input data cleaned a bit#
##########################
lines = ["Fred,Karl,Technician,2010",
"Karl,Cathy,VP,2009",
"Cathy,NULL,CEO,2007",
"Vince,Cathy,Technician,2010",
"Mary,NULL,CEO,2010",
"Steve,Mary,IT,2013"]
##################################
#Worker class to make things neat#
##################################
class Worker(object):
#Variables assigned to this worker
__slots__ = ["name","boss","job","year","employees","level"]
#Initialize this worker with a string in the form of:
#"name,boss,job,year"
def __init__(self,line):
self.name,self.boss,self.job,self.year = line.split(",")
self.level = 0 if self.boss == "NULL" else -1 #If boss is NULL, they are '0' level
self.employees = []
#A function to add another worker as this worker's employee
def employ(self,worker):
worker.level = self.level+1
self.employees.append(worker)
#This is a recursive function which returns a string of this worker
#and all of this workers employees (depth first)
def __repr__(self):
rep_str = ""
rep_str += "-"*self.level
rep_str += str(self.name)+" works for "+str(self.boss)
rep_str += " as a "+str(self.job)+" since "+str(self.year)+"\n"
for employee in self.employees:
rep_str += str(employee)
return rep_str
########################################
#Prepare to assign the bosses employees#
########################################
#1. Turn all of the lines into worker objects
workers = [Worker(line) for line in lines]
#2. Start from the top level bosses (the ones that had NULL as boss)
boss_level = 0
#3. Get a list of all the workers that have a boss_level of 0
bosses = [w for w in workers if w.level == boss_level]
#While there are still boses looking to employ then keep going
while len(bosses) > 0:
#For each boss look through all the workers and see if they work for this boss
#If they do, employ that worker to the boss
for boss in bosses:
for worker in workers:
if worker.level == -1 and boss.name == worker.boss:
boss.employ(worker)
#Move down a tier of management to sub-bosses
#If there are no sub-bosses at this level, then stop, otherwise while loop again
boss_level += 1
bosses = [w for w in workers if w.level == boss_level]
##########################
#Printing out the workers#
##########################
#1. Loop through the top bosses and
# print out them and all their workers
top_bosses = [w for w in workers if w.level == 0]
for top_boss in top_bosses:
print top_boss
Output:
Cathy works for NULL as a CEO since 2007
-Karl works for Cathy as a VP since 2009
--Fred works for Karl as a Technician since 2010
-Vince works for Cathy as a Technician since 2010
Mary works for NULL as a CEO since 2010
-Steve works for Mary as a IT since 2013
I'm trying to parse tweets data.
My data shape is as follows:
59593936 3061025991 null null <d>2009-08-01 00:00:37</d> <s><a href="http://help.twitter.com/index.php?pg=kb.page&id=75" rel="nofollow">txt</a></s> <t>honda just recalled 440k accords...traffic around here is gonna be light...win!!</t> ajc8587 15 24 158 -18000 0 0 <n>adrienne conner</n> <ud>2009-07-23 21:27:10</ud> <t>eastern time (us & canada)</t> <l>ga</l>
22020233 3061032620 null null <d>2009-08-01 00:01:03</d> <s><a href="http://alexking.org/projects/wordpress" rel="nofollow">twitter tools</a></s> <t>new blog post: honda recalls 440k cars over airbag risk http://bit.ly/2wsma</t> madcitywi 294 290 9098 -21600 0 0 <n>madcity</n> <ud>2009-02-26 15:25:04</ud> <t>central time (us & canada)</t> <l>madison, wi</l>
I want to get the total numbers of tweets and the numbers of keyword related tweets. I prepared the keywords in text file. In addition, I wanna get the tweet text contents, total number of tweets which contain mention(#), retweet(RT), and URL (I wanna save every URL in other file).
So, I coded like this.
import time
import os
total_tweet_count = 0
related_tweet_count = 0
rt_count = 0
mention_count = 0
URLs = {}
def get_keywords(filepath, mode):
with open(filepath, mode) as f:
for line in f:
yield line.split().lower()
for line in open('/nas/minsu/2009_06.txt'):
tweet = line.strip().lower()
total_tweet_count += 1
with open('./related_tweets.txt', 'a') as save_file_1:
keywords = get_keywords('./related_keywords.txt', 'r')
if keywords in line:
text = line.split('<t>')[1].split('</t>')[0]
if 'http://' in text:
try:
url = text.split('http://')[1].split()[0]
url = 'http://' + url
if url not in URLs:
URLs[url] = []
URLs[url].append('\t' + text)
save_file_3 = open('./URLs_in_related_tweets.txt', 'a')
print >> save_file_3, URLs
except:
pass
if '#' in text:
mention_count +=1
if 'RT' in text:
rt_count += 1
related_tweet_count += 1
print >> save_file_1, text
save_file_2 = open('./info_related_tweets.txt', 'w')
print >> save_file_2, str(total_tweet_count) + '\t' + srt(related_tweet_count) + '\t' + str(mention_count) + '\t' + str(rt_count)
save_file_1.close()
save_file_2.close()
save_file_3.close()
Following is the sample keywords
Depression
Placebo
X-rays
X-ray
HIV
Blood preasure
Flu
Fever
Oral Health
Antibiotics
Diabetes
Mellitus
Genetic disorders
I think my code has many problem, but the first error is as follws:
Traceback (most recent call last): File "health_related_tweets.py", line 23, in if keywords in line: TypeError: 'in ' requires string as left operand, not generator
Please help me out!
The reason is that keywords = get_keywords(...) returns a generator. Logically thinking about it, keywords should be a list of all the keywords. And for each keyword in this list, you want to check if it's in the tweet/line or not.
Sample code:
keywords = get_keywords('./related_keywords.txt', 'r')
has_keyword = False
for keyword in keywords:
if keyword in line:
has_keyword = True
break
if has_keyword:
# Your code here (for the case when the line has at least one keyword)
(The above code would be replacing if keywords in line:)