Python: dict and sorting in alphabetical - python

I require to write a program which accept input year from user and read information from CSV file then export result on the screen. The csv source file has format: year, name, count, gender and export result are boy only with format Name Count in alphabetical order.
Input file:
2010,Ruby,440,Female
2010,Cooper,493,Male
Output:
Please enter the year: 2010
Popular boy names in year 2010 are:
Aidan 112
I have error when run program:
Please enter the year: 2014
Traceback (most recent call last):
File "E:\SIT111\A1\printBoynameTPL.py", line 26, in <module>
year, name, count, gender = row
ValueError: need more than 0 values to unpack
This is my code:
'''
This program accepts a year as input from a user and print boys' information of the year. The output should be sorted by name in alphabetical order.
Steps:
1. Receive a year from a user
2. Read CSV files:
Format: year, name, count, gender
3. Display popular boy names on the screen:
Format: Name Count
'''
import csv
inputYear = raw_input('Please enter the year: ')
inFile = open('output/babyQldAll.csv', 'rU')
cvsFile = csv.reader(inFile, delimiter=',')
dict = {}
for row in cvsFile:
year, name, count, gender = row
if (year == inputYear) and (gender == 'Boy'):
dict[name] = count
print('Popular boy names in year %s are:' % inputYear)
# +++++ You code here ++++
# According to informaiton in 'dict', print (name, count) sorted by 'name' in alphabetical order
sortedName = shorted(dict.keys())
for name in sortedName:
print(name, dict[name])
print("Print boy names... ")
inFile.close()
I edited a bit:
for row in cvsFile:
if row:
year, name, count, gender = row
if (year == inputYear) and (gender == 'Male'):
dict[name] = count
print('Popular boy names in year %s are:' % inputYear)
# +++++ You code here ++++
# According to informaiton in 'dict', print (name, count) sorted by 'name' in alphabetical order
sortedName = sorted(dict.keys())
for name in sortedName:
print(name,dict[name])
print("Print boy names... ")
did i do sth wrong? indents or sth?
result:
>>>
Please enter the year: 2013
Popular boy names in year 2013 are:
Print boy names...
>>>

You seem to be having empty lines in your csv file, which is causing empty row to come you iterate the csv file. You can simply check if row is empty or not, before doing rest of the logic. Example -
for row in cvsFile:
if row:
year, name, count, gender = row
if (year == inputYear) and (gender == 'Boy'):
dict[name] = count
Also, you should not use dict as a variable name, it shadows the built-in function dict() .
Also, you have another typo in your program -
sortedName = shorted(dict.keys())
I am guessing you intended to use sorted() .

Related

Looking for any matching terms from file

I have a file that has a large list of Countries, years, and ages of living expectancies. I cannot figure out how to make sure the user is only allowed to input a year that actually exists. After figuring this out, I will need to call only those years (with corresponding country name, code, and living expectancies. How can I do this?
import pathlib
cwd = pathlib.Path(__file__).parent.resolve()
data_file = f'{cwd}/life-expectancy.csv'
with open(data_file) as f:
while True:
user_year = input('Enter the year of interest: ')
for lines in f:
cat = lines.strip().split(',')
country = cat[0]
code = cat[1]
year = cat[2]
age = cat[3]
if any( [year in user_year for year in cat[2]] ):
print(f'Your year is {user_year}. That is one of our known years.')
print(year)
print()
continue
else:
print('Please enter a valid year (1751-2019)')
print('test')
Solution 1
If all the dates from 1751 to 2019 are in your file, then you don't need to read your file to check that, you can simply do that:
# Ask the user for the year
prompt_text = "Enter the year of interest: "
user_year = int(input(prompt_text))
while not 1751 <= user_year <= 2019:
print("Please enter a valid year (1751-2019)")
user_year = int(input(prompt_text))
After that you can read your file and store the data only if the years are matching:
# Get the data for the asked year
# Example of final data: [("France", "FR", 45), ("Espagne", "ES", 29)]
data = []
with open(data_file, "r", encoding="utf-8") as file:
for line in file:
country, code, year, age = line.strip().split(",")
if int(year) == user_year:
data.append((country, code, int(age)))
Solution 2
If you really need to check the year in your file, e.g. because 1845 is not in it, then read the file once and store all the data in a dictionary indexed by the year and return the data of the asked year if it is present:
data = {}
with open(data_file, "r", encoding="utf-8") as file:
for line in file:
country, code, year, age = line.strip().split(",")
year = int(year)
if year in data:
data[year].append((country, code, int(age)))
else:
data[year] = [(country, code, int(age))]
prompt_text = "Enter the year of interest: "
user_year = int(input(prompt_text))
while user_year not in data:
print("The year is not present in the file")
user_year = int(input(prompt_text))
print(data[user_year])
One could use DataFrames to handle such cases. To know more information on dataframe, take a look into Pandas.DataFrame
To select specific column contents from the dataframe: df[[<col_1>, <col_2>]]
Considering the data fetched could produce the following.
import pandas as pd
df = pd.read_csv("Life Expectancy Data.csv")
year = int(input("Enter the year of interest: "))
df = df[["Country", "Year", "Life expectancy "]]
if year in df["Year"].values:
print(f'Your year is {year}. That is one of our known years.')
display(df.loc[df["Year"] == year])
else:
print("Please enter a valid year (2000-2015)")
Your question includes two questions.
1. Question and answer
I cannot figure out how to make sure the user is only allowed to
input a year that actually exists.
Your range of accepted years is 1751-2019. You could create a list with these integers and check that the user input is within that range. E.g.
allowed_answers = list(range(1751, 2019, 1))
There are multiple ways to check the user input and the one you want to use depends on how you want the user interaction to be. Here are few examples:
1.Stop the program immediately if user input is invalid
user_year = input('Enter the year of interest: ')
allowed_answers = list(range(1751, 2019, 1))
assert user_year in allowed_answers, "User input is invalid"
...
2.Ask user to input number until it is accepted
allowed_answers = list(range(1751, 2019, 1))
user_year = 0
while int(user_year) not in allowed_answers:
print('Please enter a valid year (1751-2019)')
user_year = input('Enter the year of interest: ')
3.Combining the two solutions to have a limit of prompts.
allowed_answers = list(range(1751, 2019, 1))
user_year = 0
for i in range(0,5):
print('Please enter a valid year (1751-2019)')
user_year = input('Enter the year of interest: ')
if int(user_year) in allowed_answers:
input_valid = True
break
else:
input_valid = False
assert input_valid, "No correct input after five tries."
Note that all these solutions only handle inputs that can be converted into integer. To go around that, you might need some try... except clauses for the data transformation from string to integer, or transform the list items of allowed_answers into strings.
2. Question and answer
After figuring this out, I will need to call only those years (with corresponding country name, code, and living expectancies. How can I do this?
I would read the file only once a make it into a dictionary. Then you only need to do the indexing once and search from there as long as your program is running. See https://docs.python.org/3/tutorial/datastructures.html#dictionaries .
With these suggestions I would do the data reading and transformation into dictionary outside (and before) your while loop.

How to sum specific values in a csv file in Python?

I am trying to search through a CSV file for certain criteria, and anything that fits that criteria, to be printed as a sum.
Example data:
| city | state | college | cases |
|Huntsville | Alabama | Alabama A&M University | 42 |
etc, for hundreds of lines. I would like to be able to search the data, for example, the state of Alabama, and sum all cases that are equal to that state.
This is what I have so far:
category = input(What would you like to look up? Please enter 'city', 'state', or 'college': ")
if category == "city":
city = input("Enter a city: ")
for row in reader:
if row[0] == city:
print("The city of", city, "has had a total of", row[3], "cases at", row[2])
print("All cities with the name", city, "have a total of", sum(row[3]), "cases.")
The row numbers entered correspond to the row I need in the original CSV file. All code works, except for my last line, where the sum command for the row clearly does not work. While playing around with different options, it does not like that it is a string variable (even though it's all numbers for the cases). Is there a better way to do this? Thank you.
sum(row[3]), assuming it works at all, is just going to return row[3] (explanation here). You need to change your code as follows.
category = input(What would you like to look up? Please enter 'city', 'state', or 'college': ")
if category == "city":
city = input("Enter a city: ")
sum = 0
for row in reader:
if row[0] == city:
print("The city of", city, "has had a total of", row[3], "cases at", row[2])
sum += int(row[3])
print("All cities with the name", city, "have a total of", sum, "cases.")
You won't know the total for the city until you have read all the rows for city.
You're getting a data structure from csvreader that is either a list or a dictionary. I'll assume it's a list. The easy way is:
total = 0
for line in csvdata:
if line[1] == 'Alabama':
total += int(line[3])
that can be turned into a list comprehension form
total = sum([int(x[3]) for x in csvdata if x[1] == 'Alabama'])
(Update, thanks for the correction. Corrections.)

Multiple lines of data, want to get an index or figure out how to get it in order

reads and stores the data in this file.
User for two integers corresponding to start and end years, and finds and lists the year of publication, title, author, in that order, of all books published during that period.
It repeats the previous step till the user enters -1 when prompted for the start year.
This is what I have so far (see picture)
def main():
file = open("resources.txt","r")
myList = []
year1 = int(input("Enter the first year:"))
year2 = int(input("Enter the second year: "))
for x in range(year1, year2):
print(yearofpublication,title, author)
and the file is 1000 lines
I need help with #2 mainly.
Thank you
Here is a solution that doesn't uses Pandas. I have put comments to break down the code according to the steps you requested. Step 1 imports the text file, gets rid of all tabs and newline characters and splits each line on the semicolon to create a list of lists.
Step 2 iterates through all the books and compares index 3 (year) of each book to the specified years. Step 3 creates an infinite loop and breaks it only when the user enters -1.
#step 1
data = open('resources.txt', 'r')
book_list = []
for line in data:
new_line = line.rstrip('\n').replace('\t', '').split(';')
book_list.append(new_line)
#step 3
while True:
year1 = int(input("Enter the first year:"))
if year1 == -1:
break
year2 = int(input("Enter the second year: "))
#step2
for book in book_list:
if year1 <= int(book[3]) <= year2:
print(f'Publication Year: {book[3]}, Title: {book[1]}, Author: {book[2]}')
Assuming you have a txt file like below that is ; separated with a consistent format and no headers.
1 ; A ; X ;1220
2 ; B ; Y ;1245
You can load the file using pandas which will allow you to easily filter the data on conditions.
import pandas
df = pandas.read_csv("data.txt", sep=";", names=["id", "author", "title", "year"])
Then for your step 2, you can filter the dataframe based on year1 and year2
df[(df['year'] > year1) & (df['year'] < year2)]
print(df.head())

Python: Find keywords in a text file from another text file

Take this invoice.txt for example
Invoice Number
INV-3337
Order Number
12345
Invoice Date
January 25, 2016
Due Date
January 31, 2016
And this is what dict.txt looks like:
Invoice Date
Invoice Number
Due Date
Order Number
I am trying to find keywords from 'dict.txt' in 'invoice.txt' and then add it and the text which comes after it (but before the next keyword) in a 2 column datatable.
So it would look like :
col1 ----- col2
Invoice number ------ INV-3337
order number ---- 12345
Here is what I have done till now
with open('C:\invoice.txt') as f:
invoices = list(f)
with open('C:\dict.txt') as f:
for line in f:
dict = line.strip()
for invoice in invoices:
if dict in invoice:
print invoice
This is working but the ordering is all wrong (it is as in dict.txt and not as in invoice.txt)
i.e.
The output is
Invoice Date
Invoice Number
Due Date
Order Number
instead of the order in the invoice.txt , which is
invoice number
order number
invoice date
due date
Can you help me with how I should proceed further ?
Thank You.
This should work. You can load your invoice data into a list, and your dict data into a set for easy lookup.
with open('C:\invoice.txt') as f:
invoice_data = [line.strip() for line in f if line.strip()]
with open('C:\dict.txt') as f:
dict_data = set([line.strip() for line in f if line.strip()])
Now iterate over invoices, 2 at a time and print out the line sets that match.
for i in range(0, len(invoice_data), 2):
if invoice_data[i] in dict_data:
print(invoive_data[i: i + 2])

Show the 5 cities with higher temperature from a text file

I have a text file with some cities and temperatures, like this:
City 1 16
City 2 4
...
City100 20
And Im showing the city with higher temperature with code below.
But I would like to show the 5 cities with higher temperature. Do you see a way to do this? Im here doing some tests but Im always showing 5 times the same city.
#!/usr/bin/env python
import sys
current_city = None
current_max = 0
city = None
for line in sys.stdin:
line = line.strip()
city, temperature = line.rsplit('\t', 1)
try:
temperature = float(temperature)
except ValueError:
continue
if temperature > current_max:
current_max = temperature
current_city = city
print '%s\t%s' % (current_city, current_max)
You can use heapq.nlargest:
import sys
import heapq
# Read cities temperatures pairs
pairs = [
(c, float(t))
for line in sys.stdin for c, t in [line.strip().rsplit('\t', 1)]
]
# Find 5 largest pairs based on second field which is temperature
for city, temperature in heapq.nlargest(5, pairs, key=lambda p: p[1]):
print city, temperature
I like pandas. This is not a complete answer, but I like to encourage people on their way of research. Check this out...
listA = [1,2,3,4,5,6,7,8,9]
import pandas as pd
df = pd.DataFrame(listA)
df.sort(0)
df.tail()
With Pandas, you'll want to learn about Series and DataFrames. DataFrames have a lot of functionality, you can name your columns, create directly from input files, sort by almost anything. There's the common unix words of head and tail (beggining and end), and you can specify count of rows returned....blah blah, blah blah, and so on. I liked the book, "Python for Data Analysis".
Store the list of temperatures and cities in a list. Sort the list. Then, take the last 5 elements: they will be your five highest temperatures.
Read the data into a list, sort the list, and show the first 5:
cities = []
for line in sys.stdin:
line = line.strip()
city, temp = line.rsplit('\t', 1)
cities.append((city, int(temp))
cities.sort(key=lambda city, temp: -temp)
for city, temp in cities[:5]:
print city, temp
This stores the city, temperature pairs in a list, which is then sorted. The key function in the sort tells the list to sort by temperature descending, so the first 5 elements of the list [:5] are the five highest temperature cities.
The following code performs exactly what you need:
fname = "haha.txt"
with open(fname) as f:
content = f.readlines()
content = [line.split(' ') for line in content]
for line in content:
line[1] = float(line[1])
from operator import itemgetter
content = sorted(content, key=itemgetter(1))
print content
to get the country with the highest temprature:
print content[-1]
to get the 5 countries with highest temperatures:
print content[-6:-1]

Categories

Resources