Counting entries in a CSV? - python

I'm just learning Python, and have been having a little bit of trouble with the list functionality of the language. I have a .csv file named purchases.csv and I need to do four things with it:
output the total number of "purchase orders" aka count the total number of entries in the csv
output the average amount of the purchases, showing three decimals.
output the total number of purchases made over 1,800
output the average amount of purchases made that are over 1,800 showing three decimals.
The output needs to look something like:
Total Number of Purchases: xxxx
Amount of Average Purchase: xxxx
Number of Purchase Orders over $1,800: xxxx
Amount of Average Purchases over $1,800: xxxx
So far I've written
import csv
with open('purchases.csv') as csvfile:
readCSV = csv.reader(csvfile,delimiter=',')
total_purchases=[]
for row in readCSV:
total=row[0]
total_purchases.append(total)
print(total_purchases)
my_sum=0
for x in home_runs:
my_sum=my_sum+int(x)
print("The total number of purchases was: ", my_sum)
To find the total number of purchases, but I've hit a wall and can't seem to figure out the rest! I'd love any help and guidance with this...I just can't figure it out!

You need an a series of separate similar for loops, but with if statements to only count the sum conditionally.
Assuming row[0] is your price column:
var sumAbove1800 = 0;
var countAbove1800 = 0;
var totalSum = 0;
var totalPurchases = 0;
for row in readCSV:
var price = float(row[0])
totalPurchases = totalPurchases + 1;
totalSum = totalSum + price;
if(price > 1800):
sumAbove1800 = sumAbove1800 + price;
countAbove1800 = countAbove1800 + 1;
Now to print them out with 3 decimal places:
print("Total Average Price: {:.3f}".format(totalSum / totalPurchases));
print("Total Transactions: {:.3f}".format(totalPurchases));
print("Total Average Price above 1800: {:.3f}".format(sumAbove1800 / countAbove1800 ));
print("Total Transactions above 1800: {:.3f}".format(countAbove1800 ));

Your question is a bit too vague, but here goes anyway.
Unless you are constrained by requirements as this appears to be homework / an assignment, you should give Pandas a try. It's a Python library that helps tremendously with data wrangling and data analysis.
output the total number of "purchase orders" aka count the total number of entries in the csv
This is dead easy with Pandas:
import pandas as pd
df = pd.read_csv('purchases.csv')
num = df.shape[0]
The first two lines are self-explanatory. You build an instance of a Pandas.DataFrame object with read_csv() and store it in df. For the last line, just know that Pandas.DataFrame has a member named shape with the format (number of lines, number of columns), so shape[0] returns the number of lines.
output the average amount of the purchases, showing three decimals.
mean = df['purchase_amount'].mean()
Access column 'purchase_amount' using brackets.
output the total number of purchases made over 1,800
num_over_1800 = df[df['purchase_amount'] > 1800].shape[0]
Slight twist here, just know that this is one way to set a condition in Pandas.
output the average amount of purchases made that are over 1,800
showing three decimals.
mean_over_1800 = df[df['purchase_amount'] > 1800].mean()
This should be self-explanatory from the rest above.

Related

How do calculate a specific row of CSV data in python?

I have a set of CSV data, and I need to calculate the total quantity and profit using Visual studio code. A group of codes have already been provided for me, hence I need to do the calculation. Only the profit in Column N (row 13) and quantity in Column O (row 14) should be part of the calculation.
the data in CSV
this is the code provided for me:
fp = Path.cwd()/"superstore_transaction.csv"
with fp.open(mode="r", encoding="UTF-8", newline="") as file:
reader = csv.reader(file)
next(reader)
cluster1 = []
cluster2 = []
cluster3 = []
for row in reader:
if row[4] == "Cluster 1":
cluster1.append([row[13], row[14]])
elif row[4] == "Cluster 2":
cluster2.append([row[13], row[14]])
else:
cluster3.append([row[13], row[14]])
I tried using For loop, but it doesn't work. I think in general I am just confused with the overall coding that was already provided for me, and I am only limited to a number of codes that I can use to help calculate the total profit
Looks like the profit column have percentage values, so you need to remove the % symbol so it can be converted to a int or float values for the calculation.
Here is an example of cluster1, so you do the same for others manually or looping through them as you want :
cluster1_profit = sum(float(sub_c1[0].replace('%', '')) for sub_c1 in cluster1)
cluster1_quantity = sum(int(sub_c1[1]) for sub_c1 in cluster1)
For this I just did the sum of profits and quantities, so it depends on how you want to calculate your total.
There are better ways to do this, using pandas or numpy, it will make it easier.

How to calculate mortgage with a floating rate that compounds monthly

I am solving this problem where im given a floating rate of 5.1 that increases .1 every month. (5.1, 5.2 ... 5.9, 6%)
It also compounds monthly. Im given an initial loan of 200,000. Monthly payments of 1000 and im trying to solve how much they owe every month.
I am using Pandas Series to hold the increasing rate. Im having difficulty creating a function that will help. Any suggestions would be appreciated.
This is what I have.
`
df = pd.DataFrame(51*np.ones(100) + np.arange(100))
df = df.rename(columns={0:'monthly rate'})
df['monthly rate'] = df['monthly rate'] /10/100 /12
df['monthly payment'] = 1000
df['interest due'] = df['monthly rate'] * 200000
df['mortgage decreasing'] = df['interest due'] - df['monthly payment']
`
This is where I get confused. So we start with 200,000. And it decreases each month, and then that decrease we calculate the new interest due using that new amount. So its like one involved the other and im not sure how to put that into code.
I think where im going wrong is in the calculating interest due portion. Since in that code I am multiplying the rate by the initial loan value, instead of the values of each month. Im just unsure how to solve that.
Just in plain Python you can simulate it like this:
loan = 200000
interest = 0.051
payment = 1000
interest_change = 0.001
month = 1
while month < 37:
# In real life banks calculates interest per day, not 1/12 of year
month_interest = loan * interest/12
new_loan = loan+month_interest-payment
print ("%s: %.2f \t +%.2f (%.2f %%) \t-%s \t -> %.2f " % (month,loan,month_interest, interest*100, payment, new_loan))
loan = new_loan
interest += interest_change
month += 1
if loan < 0:
break

Trying to get sums in lists to print out with strings in output

I have been working on a Python project analyzing a CSV file and cannot get the output to show me sums with my strings, just lists of the numbers that should be summed.
Code I'm working with:
import pandas as pd
data = pd.read_csv('XML_projectB.csv')
#inserted column headers since the raw data doesn't have any
data.columns = ['name','email','category','amount','date']
data['date'] = pd.to_datetime(data['date'])
#Calculate the total budget by cateogry
category_wise = data.groupby('category').agg({'amount':['sum']})
category_wise.reset_index(inplace=True)
category_wise.columns = ['category','total_budget']
#Determine which budget category people spent the most money in
max_budget = category_wise[category_wise['total_budget']==max(category_wise['total_budget'])]['category'].to_list()
#Tally the total amounts for each year-month (e.g., 2017-05)
months_wise = data.groupby([data.date.dt.year, data.date.dt.month])['amount'].sum()
months_wise = pd.DataFrame(months_wise)
months_wise.index.names = ['year','month']
months_wise.reset_index(inplace=True)
#Determine which person(s) spent the most money on a single item.
person = data[data['amount'] == max(data['amount'])]['name'].to_list()
#Tells user in Shell that text file is ready
print("Check your folder!")
#Get all this info into a text file
tfile = open('output.txt','a')
tfile.write(category_wise.to_string())
tfile.write("\n\n")
tfile.write("The type with most budget is " + str(max_budget) + " and the value for the same is " + str(max(category_wise['total_budget'])))
tfile.write("\n\n")
tfile.write(months_wise.to_string())
tfile.write("\n\n")
tfile.write("The person who spent most on a single item is " + str(person) + " and he/she spent " + str(max(data['amount'])))
tfile.close()
The CSV raw data looks like this (there are almost 1000 lines of it):
Walker Gore,wgore8i#irs.gov,Music,$77.98,2017-08-25
Catriona Driussi,cdriussi8j#github.com,Garden,$50.35,2016-12-23
Barbara-anne Cawsey,bcawsey8k#tripod.com,Health,$75.38,2016-10-16
Henryetta Hillett,hhillett8l#pagesperso-orange.fr,Electronics,$59.52,2017-03-20
Boyce Andreou,bandreou8m#walmart.com,Jewelery,$60.77,2016-10-19
My output in the txt file looks like this:
category total_budget
0 Automotive $53.04$91.99$42.66$1.32$35.07$97.91$92.40$21.28$36.41
1 Baby $93.14$46.59$31.50$34.86$30.99$70.55$86.74$56.63$84.65
2 Beauty $28.67$97.95$4.64$5.25$96.53$50.25$85.42$24.77$64.74
3 Books $4.03$17.68$14.21$43.43$98.17$23.96$6.81$58.33$30.80
4 Clothing $64.07$19.29$27.23$19.78$70.50$8.81$39.36$52.80$80.90
year month amount
0 2016 9 $97.95$67.81$80.64
1 2016 10 $93.14$6.08$77.51$58.15$28.31$2.24$12.83$52.22$48.72
2 2016 11 $55.22$95.00$34.86$40.14$70.13$24.82$63.81$56.83
3 2016 12 $13.32$10.93$5.95$12.41$45.65$86.69$31.26$81.53
I want the total_budget column to be the sum of the list for each category, not the individual values you see here. It's the same problem for months_wise, it gives me the individual values, not the sums.
I tried the {} .format in the write lines, .apply(str), .format on its own, and just about every other Python permutation of the conversion to string from a list I could think of, but I'm stumped.
What am I missing here?
As #Barmar said, the source has $XX so it is not treated as numbers. You could try following this approach to parse the values as integers/floats instead of strings with $ in them.

How to make categories out of my text file and calculate average out of the numbers?

I am working on a assignment, but I am stuck and I do not know how to proceed.
I need to make different categories out of the different categories from the first line (from the txt file) and calculate averages over every numerical value. The program has to work flawless when I add new lines to the txt file.
Category;currency;sellerRating;Duration;endDay;ClosePrice;OpenPrice;Competitive?
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Automotive/Game;US;3249;5;Mon;0,01;0,01;No
Music/Automotive/Game;US;3249;5;Mon;0,01;0,01;No
This is the text file. I tried to make different categories out of them, but I do not know if I did it correctly and how to let Python know that he has to calculate all the numbers from 1 group.
with open('bijlage2.txt') as bestand:
maak_er_lists_van = [(line.strip()).split(';') for line in bestand]
keys = maak_er_lists_van[0]
lijst = list(zip([keys]*len(maak_er_lists_van[1:]),
maak_er_lists_van[1:]))
x = [zip(i[0], i[1]) for i in lijst]
maak_dict = [dict(i) for i in x]
for i in maak_dict:
categorieen =[i['Category'], i['currency'], i['sellerRating'],
i['Duration'], i['endDay'], i['ClosePrice'], i['OpenPrice'],
i['Competitive?']]
categorieen = list(map(int, categorieen))
This is what I have so far. I am a Python beginner so the whole text file thing is new to me. Can somebody help me or explain what I have to do so that I can work further on this project? Many thanks in advance!
Here's how I would do it. I had to add using locale.atof() because where I am . is used as the decimal point, not commas. You may have to change this as indicated.
The csv module is used to read the file, and the averages are computed in a two-step process. First the values for each category are summed, and then afterwards, the average value of each one is calculated based on the number of values read.
import csv
import locale
from pprint import pprint, pformat
import locale
#locale.setlocale(locale.LC_ALL, '') # empty string for platform's default settings
# Following used for testing to force ',' to be considered as a decimal point.
locale.setlocale(locale.LC_ALL, 'French_France.1252')
avg_names = 'sellerRating', 'Duration', 'ClosePrice', 'OpenPrice'
averages = {avg_name: 0 for avg_name in avg_names} # Initialze.
# Find total of each category of interest.
num_values = 0
with open('bijlage2.txt', newline='') as bestand:
csvreader = csv.DictReader(bestand, delimiter=';')
for row in csvreader:
num_values += 1
for avg_name in avg_names:
averages[avg_name] += locale.atof(row[avg_name])
# Calculate average of each summed value.
for avg_name, total in averages.items():
averages[avg_name] = total / num_values
print('raw results:')
pprint(averages)
print() # Formatted output
print('Averages:')
for avg_name in avg_names:
rounded = locale.format_string('%.2f', round(averages[avg_name], 2),
grouping=True)
print(' {:<13} {:>10}'.format(avg_name, rounded))
Output:
raw results:
{'ClosePrice': 0.01, 'Duration': 5.0, 'OpenPrice': 0.01, 'sellerRating': 3249.0}
Averages:
sellerRating 3 249,00
Duration 5,00
ClosePrice 0,01
OpenPrice 0,01
Everything is fine with your way to read the file and creating a dictionary with the categories and values, imo. Your list maak_dict contains one dictionary for every line. To calculate an average for one category, you could do something like this:
def calc_average(categ):
values = [i[categ] for i in maak_dict]
average = sum(values)/len(values)
return average
assuming that you want to calculate the mean average. categ has to be a string.
After that, you can create a new dictionary that contains all the averages:
new_dict = {}
for category in maak_dict[0].keys():
avg = calc_average(category)
new_dict[category] = avg

Creating a balance table on Python (edited)

I have this code which display the payment I need to make to student loan depending on different factors but I was wondering how could I display a table telling me for example... if my program says that I'll be making 36 payments of $153.02 to pay a loan of 5000 at 6.4% interest, how do I display a table telling me (payment 1 of 153 this is the remaining balance, this much goes to interest or to principal payment, then the same payment 2 this the new balance... and so on...) In other words a table telling me about those 36 payments and how my balance gets reduced with each payment.
IT DOESN'T HAVE TO BE AS A TABLE MAYBE JUST A LIST... LISTING PAYMENT AFTER PAYMENT UNTILL THE BALANCE IS 0 or -SOMETHING?
This is the code I have so far using python 2.7.3
def calcDebt (principal, interestRate, numPayments, freqPayment):
#This code will make different predictions to pay off student loan
#Input Meanings
'''
Inputs
- interestRate - The Interest Rate of a Loan
- numPayments - The Number of Payments Needed
- principal - The Original Student Loan Amount
- freqPayment - Frequency of Payments Based on Weekly, Monthly, Annually
Returns
- paymentAmount - The Payment Amount Will Be
'''
freqPayment_lookup = {'weekly': 52, 'monthly':12, 'annually':1}
interestRate = float(interestRate) / 100
x = interestRate/freqPayment_lookup[freqPayment]
y = (1.0 + x) ** numPayments
z = x/(y - 1.0)
paymentAmount = (x + z) * principal
return paymentAmount
def main():
a = input('Student Loan Amount: ')
i = input('Student Loan Interest Rate: ')
n = input('Number of Payments: ')
f = None
while f not in ['weekly', 'monthly', 'annually']:
if f:
f = raw_input('Sorry! That is NOT an Option. Please Enter weekly, monthly, or annually: ').lower()
else:
f = raw_input('How Often Do You Want To Make Your Payments? ').lower()
payment = calcDebt(a, i, n, f)
print 'Your %s payment will be %.2f' % (f, payment)
if __name__ == '__main__':
main()
raw_input('Please Press Enter to Exit')
Any ideas?
I would say the best thing for this is to use numpy! here is a general idea for the kind of code you would need:
import numpy as np
#this creates your table with numPayments rows and 4 columns
numPayments = 40
row_index = range(1,numPayments + 1)
real_row_index = []
for i in row_index:
real_row_index.append(str(i))
columns = ["payment number", "remaining balance", "interest amount", "principal amount"]
total_payments = np.chararray((numPayments,3), itemsize=20)
total_payments[:] = "none"
total_payments = np.insert(total_payments, 0, np.array((real_row_index)),1)
total_payments = np.insert(total_payments, 0, np.array((columns)),0)
for row in total_payments[1::,1::]:
#the 1 skips the lables so you don't mess with them
#row[0] will be the number payment column
#row[1] will be the remaining balance column
#row[2] will be the interest amount column
#row[3] will be the principal amount column
now that you have an array of tables set up in a for loop (which I will not finish for you), you can use it in you function to iterate through and for every payment, set the cell to the correct value for the column. So if you calculate the interest amount in a variable interest, you just need to throw this in your for loop to make the interest that belongs in the right place to go there:
row[2] = str(interest)
interest is iterating at the same pace row is, so they should match up if done correctly

Categories

Resources