How to sum specific values in a csv file in Python?

How to sum specific values in a csv file in Python? - python

I am trying to search through a CSV file for certain criteria, and anything that fits that criteria, to be printed as a sum.
Example data:
| city | state | college | cases |
|Huntsville | Alabama | Alabama A&M University | 42 |
etc, for hundreds of lines. I would like to be able to search the data, for example, the state of Alabama, and sum all cases that are equal to that state.
This is what I have so far:
category = input(What would you like to look up? Please enter 'city', 'state', or 'college': ")
if category == "city":
city = input("Enter a city: ")
for row in reader:
if row[0] == city:
print("The city of", city, "has had a total of", row[3], "cases at", row[2])
print("All cities with the name", city, "have a total of", sum(row[3]), "cases.")
The row numbers entered correspond to the row I need in the original CSV file. All code works, except for my last line, where the sum command for the row clearly does not work. While playing around with different options, it does not like that it is a string variable (even though it's all numbers for the cases). Is there a better way to do this? Thank you.

sum(row[3]), assuming it works at all, is just going to return row[3] (explanation here). You need to change your code as follows.
category = input(What would you like to look up? Please enter 'city', 'state', or 'college': ")
if category == "city":
city = input("Enter a city: ")
sum = 0
for row in reader:
if row[0] == city:
print("The city of", city, "has had a total of", row[3], "cases at", row[2])
sum += int(row[3])
print("All cities with the name", city, "have a total of", sum, "cases.")
You won't know the total for the city until you have read all the rows for city.

You're getting a data structure from csvreader that is either a list or a dictionary. I'll assume it's a list. The easy way is:
total = 0
for line in csvdata:
if line[1] == 'Alabama':
total += int(line[3])
that can be turned into a list comprehension form
total = sum([int(x[3]) for x in csvdata if x[1] == 'Alabama'])
(Update, thanks for the correction. Corrections.)

Related

I want to sum the salary of each department how can i do it?

list_name=[]
list_dep=[]
list_salary=[]
name1='abdo'
list_name.append(name1)
dep1='front_end'
list_dep.append(dep1)
salary1=3000
list_salary.append(salary1)
name2='salm'
list_name.append(name2)
dep2='front_end'
list_dep.append(dep2)
salary2=5000
list_salary.append(salary2)
sum1=0
for department1 in list_dep:
if 'front_end' == department1:
sum1+= list_salary[department1.index('front_end')]
print("Front_end: "+str(sum1))
#I want to collect the price in each section of this program must print 8000 but its print 6000 how to solve this?
I want to sum the salary of each department how can i do it
#I hope my question and heal and was clear

Use a dictionary to hold the running totals for each department. You can use defaultdict to automatically initialize dictionary entries for each department.
from collections import defaultdict
total_salaries = defaultdict(int)
for department, salary in zip(list_dep, list_salary):
total_salaries[department] += salary
print(dict(total_salaries))

the current implementation will make this very hard to maintain and will cause duplication in list_dep as you have appended front_end twice one with dep1 and one with dep2 and the reason you got 6000 because when you ran list_salary[department1.index('front_end')] the department1.index('front_end') returned index 0 in both times so you added the first index in list_salary twice
I recommend changing the structure to use dictionary so it would be like so
list_employees = []
name1='abdo'
dep1='front_end'
salary1=3000
list_employees.append({
"name": name1,
"dep": dep1,
"salary": salary1
})
name2='salm'
dep2='front_end'
salary2=5000
list_employees.append({
"name": name2,
"dep": dep2,
"salary": salary2
})
now if you want to get the total salaries for a specific salary you can use the following
sum1 = 0
for employee in list_employees:
if 'front_end' == employee["dep"]:
sum1 += employee["salary"]
print("Front_end: "+str(sum1))
or if you want to get the salaries for all departments then you can do the following
list_salaries = {}
for employee in list_employees:
if employee["dep"] in list_salaries:
list_salaries[employee["dep"]] += employee["salary"]
else:
list_salaries[employee["dep"]] = employee["salary"]
print(list_salaries)

How to find specific items in a CSV file using inputs?

I'm still new to python, so forgive me if my code seems rather messy or out of place. However, I need help with an assignment for university. I was wondering how I am able to find specific items in a CSV file? Here is what the assignment says:
Allow the user to type in a year, then, find the average life expectancy for that year. Then find the country with the minimum and the one with the maximum life expectancies for that year.
import csv
country = []
digit_code = []
year = []
life_expectancy = []
count = 0
lifefile = open("life-expectancy.csv")
with open("life-expectancy.csv") as lifefile:
for line in lifefile:
count += 1
if count != 1:
line.strip()
parts = line.split(",")
country.append(parts[0])
digit_code.append(parts[1])
year.append(parts[2])
life_expectancy.append(float(parts[3]))
highest_expectancy = max(life_expectancy)
country_with_highest = country[life_expectancy.index(max(life_expectancy))]
print(f"The country that has the highest life expectancy is {country_with_highest} at {highest_expectancy}!")
lowest_expectancy = min(life_expectancy)
country_with_lowest = country[life_expectancy.index(min(life_expectancy))]
print(f"The country that has the lowest life expectancy is {country_with_lowest} at {lowest_expectancy}!")

It looks like you only want the first and fourth tokens from each row in your CSV. Therefore, let's simplify it like this:
Hong Kong,,,85.29
Japan,,,85.03
Macao,,,84.68
Switzerland,,,84.25
Singapore,,,84.07
You can then process it like this:
FILE = 'life-expectancy.csv'
data = []
with open(FILE) as csv:
for line in csv:
tokens = line.split(',')
data.append((float(tokens[3]), tokens[0]))
hi = max(data)
lo = min(data)
print(f'The country with the highest life expectancy {hi[0]:.2f} is {hi[1]}')
print(f'The country with the lowest life expectancy {lo[0]:.2f} is {lo[1]}')

How to make dict from list components - python

I have a list of dates:
dates = ['2018-11-13 ', '2018-11-14 ']
and I have a list of weather data for various cities:
weather_data = [('Carbondale', 1875.341, '2018-11-13 '), ('Carbondale', 1286.16, '2018-11-14 '), ('Davenport', 708.5, '2018-11-13 '), ('Davenport', 506.1, '2018-11-14 ')]
i[1] in weather_data is a climate score, based on climatic info for each day. I have shortened the above lists for the sake of this example.
My goal is to find the city with the lowest climate score for each day. I thought a good way to do that would be to put them in a dictionary.
An example of what I want is...
conditions_dict = {'2018-11-13': ('Carbondale',1875.341), ('Davenport', 708.5)}
and my end output would be...
The best weather on 2018-11-13 is in Davenport with a value of 708.5
Basically, if I had a dict with a date as the key, and (city,value) as the value, I could then easily find the lowest value by city for each day.
However, I cannot figure how to make my dictionary look like this. The part I am really struggling with is how to match the date to multiple readings for various cities on one day.
Is using a dictionary even a good way to do this?

You don't really need an intermediate dict with all cities and scores for each date if your goal is find the minimum score and city of each date since you can simply iterate through weather_data and keep track of the lowest score so far and its associated city for each date in a dict:
min_score_of_date = {}
for city, score, date in weather_data:
if date not in min_score_of_date or score < min_score_of_date.get(date)[1]:
min_score_of_date[date] = (city, score)
Given your sample input, min_score_of_date would become:
{'2018-11-13 ': ('Davenport', 708.5), '2018-11-14 ': ('Davenport', 506.1)}

This is another way you can go about it if the lowest temperature dates haven't already been filtered for you.
# each date has a tuple of cities and their temperature
conditions = {
'2018-11-13': (
('Carbondale',1875.341),
('Davenport', 708.5)
)
}
# loop through every date
for date, cities in conditions.items():
# for every date, loop through its values
# grab its temperateure and add to the list
# them find the minimun temperature
# get all tempertures
tempertures = [_[1] for _ in cities]
# get minimum temperature
min_temperture = min(tempertures)
# loop throught all cities
for city in cities:
# if a city matches min_temperature do whats bellow
if min_temperture in city:
# city name
name = city[0]
# city temperture
temperture = str(city[1])
print(
"The best weather on "\
+ date\
+ "is in "\
+ name + " with a value of "\
+ temperture
)

Show the 5 cities with higher temperature from a text file

I have a text file with some cities and temperatures, like this:
City 1 16
City 2 4
...
City100 20
And Im showing the city with higher temperature with code below.
But I would like to show the 5 cities with higher temperature. Do you see a way to do this? Im here doing some tests but Im always showing 5 times the same city.
#!/usr/bin/env python
import sys
current_city = None
current_max = 0
city = None
for line in sys.stdin:
line = line.strip()
city, temperature = line.rsplit('\t', 1)
try:
temperature = float(temperature)
except ValueError:
continue
if temperature > current_max:
current_max = temperature
current_city = city
print '%s\t%s' % (current_city, current_max)

You can use heapq.nlargest:
import sys
import heapq
# Read cities temperatures pairs
pairs = [
(c, float(t))
for line in sys.stdin for c, t in [line.strip().rsplit('\t', 1)]
]
# Find 5 largest pairs based on second field which is temperature
for city, temperature in heapq.nlargest(5, pairs, key=lambda p: p[1]):
print city, temperature

I like pandas. This is not a complete answer, but I like to encourage people on their way of research. Check this out...
listA = [1,2,3,4,5,6,7,8,9]
import pandas as pd
df = pd.DataFrame(listA)
df.sort(0)
df.tail()
With Pandas, you'll want to learn about Series and DataFrames. DataFrames have a lot of functionality, you can name your columns, create directly from input files, sort by almost anything. There's the common unix words of head and tail (beggining and end), and you can specify count of rows returned....blah blah, blah blah, and so on. I liked the book, "Python for Data Analysis".

Store the list of temperatures and cities in a list. Sort the list. Then, take the last 5 elements: they will be your five highest temperatures.

Read the data into a list, sort the list, and show the first 5:
cities = []
for line in sys.stdin:
line = line.strip()
city, temp = line.rsplit('\t', 1)
cities.append((city, int(temp))
cities.sort(key=lambda city, temp: -temp)
for city, temp in cities[:5]:
print city, temp
This stores the city, temperature pairs in a list, which is then sorted. The key function in the sort tells the list to sort by temperature descending, so the first 5 elements of the list [:5] are the five highest temperature cities.

The following code performs exactly what you need:
fname = "haha.txt"
with open(fname) as f:
content = f.readlines()
content = [line.split(' ') for line in content]
for line in content:
line[1] = float(line[1])
from operator import itemgetter
content = sorted(content, key=itemgetter(1))
print content
to get the country with the highest temprature:
print content[-1]
to get the 5 countries with highest temperatures:
print content[-6:-1]

IndexError: list index out of range - python

I have the following error:
currency = row[0]
IndexError: list index out of range
Here is the code:
crntAmnt = int(input("Please enter the amount of money to convert: "))
print(currencies)
exRtFile = open ('exchangeRate.csv','r')
exchReader = csv.reader(exRtFile)
crntCurrency = input("Please enter the current currency: ")
validateloop=0
while validateloop == 0:
for row in exchReader:
currency = row[0]
if currency == crntCurrency:
crntRt = row[1]
validateloop=+1
Heres the CSV file:
Japanese Yen,169.948
US Dollar,1.67
Pound Sterling,1
Euro,5.5
Here's an input/Output example:
Please enter the amount of money to convert: 100
['Pound Sterling', 'Euro', 'US Dollar', 'Japanese Yen']
Please enter the current currency: Pound Sterling

You probably have a blank row in your csv file, causing it to produce an empty list
There are a couple solutions
1. Check if there are elements, only proceed if there are:
for row in exchReader:
if len(row): # can also just do if row:
currency = row[0]
if currency == crntCurrency:
2. Short-circuit an and operator to make currency an empty list, which won't match crntCurrency:
for row in exchReader:
currency = row and row[0]
if currency == crntCurrency:

Try printing out the row. The convention for variable names in python are like_this and not likeThis. You might find the break keyword useful:
for row in exch_reader:
currency = row[0]
if currency == crnt_currency:
crnt_rt = row[1]
break
To only index the row when the row actually contains something:
currency = row and row[0]
Here row[0] is only executed if row evaluates to True, which would be when it has at least one element.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to sum specific values in a csv file in Python? - python

Related

I want to sum the salary of each department how can i do it?

How to find specific items in a CSV file using inputs?

How to make dict from list components - python

Show the 5 cities with higher temperature from a text file

IndexError: list index out of range - python

Categories

Resources