How to find specific items in a CSV file using inputs?

How to find specific items in a CSV file using inputs? - python

I'm still new to python, so forgive me if my code seems rather messy or out of place. However, I need help with an assignment for university. I was wondering how I am able to find specific items in a CSV file? Here is what the assignment says:
Allow the user to type in a year, then, find the average life expectancy for that year. Then find the country with the minimum and the one with the maximum life expectancies for that year.
import csv
country = []
digit_code = []
year = []
life_expectancy = []
count = 0
lifefile = open("life-expectancy.csv")
with open("life-expectancy.csv") as lifefile:
for line in lifefile:
count += 1
if count != 1:
line.strip()
parts = line.split(",")
country.append(parts[0])
digit_code.append(parts[1])
year.append(parts[2])
life_expectancy.append(float(parts[3]))
highest_expectancy = max(life_expectancy)
country_with_highest = country[life_expectancy.index(max(life_expectancy))]
print(f"The country that has the highest life expectancy is {country_with_highest} at {highest_expectancy}!")
lowest_expectancy = min(life_expectancy)
country_with_lowest = country[life_expectancy.index(min(life_expectancy))]
print(f"The country that has the lowest life expectancy is {country_with_lowest} at {lowest_expectancy}!")

It looks like you only want the first and fourth tokens from each row in your CSV. Therefore, let's simplify it like this:
Hong Kong,,,85.29
Japan,,,85.03
Macao,,,84.68
Switzerland,,,84.25
Singapore,,,84.07
You can then process it like this:
FILE = 'life-expectancy.csv'
data = []
with open(FILE) as csv:
for line in csv:
tokens = line.split(',')
data.append((float(tokens[3]), tokens[0]))
hi = max(data)
lo = min(data)
print(f'The country with the highest life expectancy {hi[0]:.2f} is {hi[1]}')
print(f'The country with the lowest life expectancy {lo[0]:.2f} is {lo[1]}')

Related

Trouble printing out the max key/value pair in a dictionary

I'm working on trying to calculate the greatest increase/decrease in a change to profits/losses over time from a CSV.
The data set in csv is as follows (extract only):
Date,Profit/Losses
Jan-2010,867884
Feb-2010,984655
Mar-2010,322013
Apr-2010,-69417
So far, i've imported the csv file and added the items to a dictionary. Calculated total months, total profit/loss, calculated the change in profit/loss from month to month but now need to find the greatest and smallest change in the month and have the code return both the month and the change figure.
The output when trying to print the greatest increase/decrease returns only the final month on the list and all change values (instead of just the biggest change value and it's corresponding month)
Here is the code. Would appreciate any perspective:
budget = {}
total_months = 0
total_pnl = 0
date = 0
pnl = 0
monthly_change = []
previous_pnl = 0
greatest_increase = ["Date",[0]]
greatest_decrease = ["Date",[100000000000000]]
with open(csvpath, 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
header = next(csvreader)
for row in csvreader:
date = 0
pnl = 1
budget[row[date]] = int(row[pnl])
for date, pnl in budget.items():
total_months = total_months + 1
total_pnl = total_pnl + pnl
pnlchange = pnl - previous_pnl
if total_months > 1:
monthly_change.append(pnlchange)
previous_pnl = pnl
if (monthly_change > greatest_increase[1]):
greatest_increase[1] = monthly_change
greatest_increase[0] = row[0]
if (monthly_change < greatest_decrease[1]):
greatest_decrease[1] = monthly_change
greatest_decrease[0] = row[0]
print(greatest_increase)
The primary problem is the final part of the code (the if statement). When I print 'greatest_increase' this currently returns the final value in the list rather than the highest value of change.
current output is:
[['Feb-2017', '671099'], [116771, -662642, -391430, 379920, 212354, 510239, -428211, -821271, 693918, 416278, -974163, 860159, -1115009, 1033048, 95318, -308093, 99052, -521393, 605450, 231727, -65187, -702716, 177975, -1065544, 1926159, -917805, 898730, -334262, -246499, -64055, -1529236, 1497596, 304914, -635801, 398319, -183161, -37864, -253689, 403655, 94168, 306877, -83000, 210462, -2196167, 1465222, -956983, 1838447, -468003, -64602, 206242, -242155, -449079, 315198, 241099, 111540, 365942, -219310, -368665, 409837, 151210, -110244, -341938, -1212159, 683246, -70825, 335594, 417334, -272194, -236462, 657432, -211262, -128237, -1750387, 925441, 932089, -311434, 267252, -1876758, 1733696, 198551, -665765, 693229, -734926, 77242, 532869]]
What i am trying to get is the bold value being the highest value (along with the relevant month)
Apologies if this isn't clear, I'm still fairly new (3rd week learning!)

Finding highest and lowest numbers in list

Here is my code:
with open('life-expectancy.csv') as file:
for row in file:
row = row.strip() #trim
parts = row.split(',')
value = float(parts[3])
max_value = float(-1.0)
if value > max_value:
max_value = value
min_value = float(100.0)
if value < min_value:
min_value = value
# print(sum(value))
print(max_value)
print(min_value)
The life expectancy file contains rows that are all like this:
Afghanistan,AFG,1981,43.923
With different countries, years, etcetera. My goal is to find the highest and lowest life expectancy with the corresponding country and year, but my code is just giving me the life expectancy of the last item in the list (I haven't attempted to add the country and year yet obviously).
What am I missing?

Move the max_value = float(-1.0) and min_value = float(100.0) above the with statement. This way you reset it on each line.

Looping Thru a Nested Dictionary in Python

So I need help looping thru a nested dictionaries that i have created in order to answer some problems. My code that splits up the 2 different dictionaries and adds items into them is as follows:
Link to csv :
https://docs.google.com/document/d/1v68_QQX7Tn96l-b0LMO9YZ4ZAn_KWDMUJboa6LEyPr8/edit?usp=sharing
import csv
region_data = {}
country_data = {}
answers = []
data = []
cuntry = False
f = open('dph_SYB60_T03_Population Growth, Fertility and Mortality Indicators.csv')
reader = csv.DictReader(f)
for line in reader:
#This gets all the values into a standard dict
data.append(dict(line))
#This will loop thru the dict and create variables to hold specific items
for i in data:
# collects all of the Region/Country/Area
location = i['Region/Country/Area']
# Gets All the Years
years = i['Year']
i_d = i['ID']
info = i['Footnotes']
series = i['Series']
value = float(i['Value'])
# print(series)
stats = {i['Series']:i['Value']}
# print(stats)
# print(value)
if (i['ID']== '4'):
cuntry = True
if cuntry == True:
if location not in country_data:
country_data[location] = {}
if years not in country_data[location]:
country_data[location][years] = {}
if series not in country_data[location][years]:
country_data[location][years][series] = value
else:
if location not in region_data:
region_data[location] = {}
if years not in region_data[location]:
region_data[location][years] = {}
if series not in region_data[location][years]:
region_data[location][years][series] = value
When I print the dictionary region_data output is:
For Clarification What is shown is a "Region" as a key in a dict. The years being Values and keys in that 'Region's Dict and so on so forth....
I want to understand how i can loop thru the data and answer a question like :
Which region had the largest numeric decrease in Maternal mortality ratio from 2005 to 2015?
Were "Maternal mortality ratio (deaths per 100,000 population)" is a key within the dictionary.

Build a dataframe
Use pandas for that and read your file accordint to this answer.
import pandas as pd
filename = 'dph_SYB60_T03_Population Growth, Fertility and Mortality Indicators.csv'
df = pd.read_csv(filename)
Build a pivot table
Then you can make a pivot for "'Region/Country/Area'" and "Series" and use as a aggregate function "max".
pivot = df.pivot_table(index='Region/Country/Area', columns='Series', values='Value', aggfunc='max')
Sort by your series of interest
Then sort your "pivot table" by a series name and use the argument "ascending"
df_sort = pivot.sort_values(by='Maternal mortality ratio (deaths per 100,000 population)', ascending=False)
Extract the greatest value in the first row.
Finally you will have the answer to your question.
df_sort['Maternal mortality ratio (deaths per 100,000 population)'].head(1)
Region/Country/Area
Sierra Leone 1986.0
Name: Maternal mortality ratio (deaths per 100,000 population), dtype: float64
Warning: Some of your regions have records before 2005, so you should filter your data only for values between 2005 and 2015.

If you prefer to loop throught dictionaries in Python 3.x you can use the method .items() from each dictionary and nest them with three loops.
With a main dictionary called hear dict_total, this code will work it.
out_region = None
out_value = None
sel_serie = 'Maternal mortality ratio (deaths per 100,000 population)'
min_year = 2005
max_year = 2015
for reg, dict_reg in dict_total.items():
print(reg)
for year, dict_year in dict_reg.items():
if min_year <= year <= max_year:
print(year)
for serie, value in dict_year.items():
if serie == sel_serie and value is not None:
print('{} {}'.format(serie, value))
if out_value is None or out_value < value:
out_value = value
out_region = reg
print('Region: {}\nSerie: {} Value: {}'.format(out_region, sel_serie, out_value))

How to count occurances and calculate a rating with the csv module?

You have a CSV file of individual song ratings and you'd like to know the average rating for a particular song. The file will contain a single 1-5 rating for a song per line.
Write a function named average_rating that takes two strings as parameters where the first string represents the name of a CSV file containing song ratings in the format: "YouTubeID, artist, title, rating" and the second parameter is the YouTubeID of a song. The YouTubeID, artist, and title are all strings while the rating is an integer in the range 1-5. This function should return the average rating for the song with the inputted YouTubeID.
Note that each line of the CSV file is individual rating from a user and that each song may be rated multiple times. As you read through the file you'll need to track a sum of all the ratings as well as how many times the song has been rated to compute the average rating. (My code below)
import csv
def average_rating(csvfile, ID):
with open(csvfile) as f:
file = csv.reader(f)
total = 0
total1 = 0
total2 = 0
for rows in file:
for items in ID:
if rows[0] == items[0]:
total = total + int(rows[3])
for ratings in total:
total1 = total1 + int(ratings)
total2 = total2 + 1
return total1 / total2
I am getting error on input ['ratings.csv', 'RH5Ta6iHhCQ']: division by zero. How would I go on to resolve the problem?

You can do this by using pandas DataFrame.
import pandas as pd
df = pd.read_csv('filename.csv')
total_sum = df[df['YouTubeID'] == 'RH5Ta6iHhCQ'].rating.sum()
n_rating = len(df[df['YouTubeID'] == 'RH5Ta6iHhCQ'].rating)
average = total_sum/n_rating

There are a few confusing things, I think renaming variables and refactoring would be a smart decision. It might even make things more obvious if one function was tasked with getting all the rows for a specific youtube id and another function for calculating the average.
def average_rating(csvfile, id):
'''
Calculate the average rating of a youtube video
params: - csvfile: the location of the source rating file
- id: the id of the video we want the average rating of
'''
total_ratings = 0
count = 0
with open(csvfile) as f:
file = csv.reader(f)
for rating in file:
if rating[0] == id:
count += 1
total_ratings += rating[3]
if count == 0:
return 0
return total_ratings / count

import csv
def average_rating(csvfile, ID) :
with open(csvfile) as f:
file = csv.reader(f)
cont = 0
total = 0
for rows in file:
if rows[0] == ID:
cont = cont + 1
total = total + int(rows[3])
return total/cont
this works guyx

Show the 5 cities with higher temperature from a text file

I have a text file with some cities and temperatures, like this:
City 1 16
City 2 4
...
City100 20
And Im showing the city with higher temperature with code below.
But I would like to show the 5 cities with higher temperature. Do you see a way to do this? Im here doing some tests but Im always showing 5 times the same city.
#!/usr/bin/env python
import sys
current_city = None
current_max = 0
city = None
for line in sys.stdin:
line = line.strip()
city, temperature = line.rsplit('\t', 1)
try:
temperature = float(temperature)
except ValueError:
continue
if temperature > current_max:
current_max = temperature
current_city = city
print '%s\t%s' % (current_city, current_max)

You can use heapq.nlargest:
import sys
import heapq
# Read cities temperatures pairs
pairs = [
(c, float(t))
for line in sys.stdin for c, t in [line.strip().rsplit('\t', 1)]
]
# Find 5 largest pairs based on second field which is temperature
for city, temperature in heapq.nlargest(5, pairs, key=lambda p: p[1]):
print city, temperature

I like pandas. This is not a complete answer, but I like to encourage people on their way of research. Check this out...
listA = [1,2,3,4,5,6,7,8,9]
import pandas as pd
df = pd.DataFrame(listA)
df.sort(0)
df.tail()
With Pandas, you'll want to learn about Series and DataFrames. DataFrames have a lot of functionality, you can name your columns, create directly from input files, sort by almost anything. There's the common unix words of head and tail (beggining and end), and you can specify count of rows returned....blah blah, blah blah, and so on. I liked the book, "Python for Data Analysis".

Store the list of temperatures and cities in a list. Sort the list. Then, take the last 5 elements: they will be your five highest temperatures.

Read the data into a list, sort the list, and show the first 5:
cities = []
for line in sys.stdin:
line = line.strip()
city, temp = line.rsplit('\t', 1)
cities.append((city, int(temp))
cities.sort(key=lambda city, temp: -temp)
for city, temp in cities[:5]:
print city, temp
This stores the city, temperature pairs in a list, which is then sorted. The key function in the sort tells the list to sort by temperature descending, so the first 5 elements of the list [:5] are the five highest temperature cities.

The following code performs exactly what you need:
fname = "haha.txt"
with open(fname) as f:
content = f.readlines()
content = [line.split(' ') for line in content]
for line in content:
line[1] = float(line[1])
from operator import itemgetter
content = sorted(content, key=itemgetter(1))
print content
to get the country with the highest temprature:
print content[-1]
to get the 5 countries with highest temperatures:
print content[-6:-1]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to find specific items in a CSV file using inputs? - python

Related

Trouble printing out the max key/value pair in a dictionary

Finding highest and lowest numbers in list

Looping Thru a Nested Dictionary in Python

How to count occurances and calculate a rating with the csv module?

Show the 5 cities with higher temperature from a text file

Categories

Resources