i need to write a program to read three csv files - python

and i need to print out:
Every customer.
Output format: Customer: ,
Every product.
Output format: Product: ,
Total amount ordered per product.
Output format: amount:
Total money spent per product.
Output format: gross income:
Total money spent per customer.
Output format: money spent:
customers.csv:
id, name, address
1,"Knut","Knutveien 3"
2,"Lise","Liseveien 7"
products.csv:
id, name, price
1,"banana",5
2,"apple",10
orders.csv:
id, customerid, productid, amount
1,1,1,2
2,2,1,3
3,1,2,4
Python:
file_c = open('customers.csv')
reder_c = csv.reader(file_c)
for i in reder_c:
a = i[1]
b = i[2]
print(f'Customer: {a},{b}')
file_p= open('prudockt.csv')
reder_p = csv.reader(file_p)
for h in reder_p:
w = h[1]
t = h[2]
print(f'product: {w},{t}')
I would be grateful if you could help :)

Use pandas.read_csv(). Look here. (And read the pages #Prune commented.)

Related

How do I query a certain list element in python after they meet some conditions using a loop?

I am trying to go through a list (which is set as the parameter in a function) to query the suburb taken as an input from the user. The query is then searched for in the list and if a match is found, the total sales from the suburb's stores are added and the average store sales for that suburb is calculated.
here is the list:
Hm001,6,Frankton,42305.67_
Hm002,10,Glenview,21922.22_
Hm003,7,Silverdale,63277.9_
Hm004,13,Glenview,83290.09_
Hm005,21,Queenwood,81301.82_
Hm006,14,Hillcrest,62333.3_
Hm007,7,Frankton,28998.8_
Hm008,19,Chartwell,51083.5_
Hm009,6,Glenview,62155.72_
Hm0010,8,Enderley,33075.1_
Hm0011,10,Fairfield,61824.7_
Hm0012,15,Rototuna,21804.8-
Hm0013,11,Fairfield,62804.7_
if follows the format: ID, Num of employees, suburb, and sale volume.
I have written the function below but I don't know what is wrong with the function.
def query_suburb_stats(records):
suburb = input("Enter a suburb name: ")
print(suburb)
sales = 0
avg_sales = 0
for match in records:
if suburb.lower() == match[2].lower():
sales += match[3]
avg_sales = avg_sales/sales
if suburb.lower() != records[2].lower():
print(f"No matching name for {suburb}")
else:
print(f"The total sale volume for the stores in {suburb} is: {sales}")
print(f"The average sales for {suburb} is: {avg_sales}")
Your code will work with the following changes:
def query_suburb_stats(records):
suburb = input("Enter a suburb name: ")
print(suburb)
matches = 0
sales = 0
avg_sales = 0
for match in records:
if suburb.lower() == match[2].lower():
matches += 1
sales += float(match[3])
if not matches:
print(f"No matching name for {suburb}")
else:
avg_sales = sales/matches
print(f"The total sale volume for the stores in {suburb} is: {sales}")
print(f"The average sales for {suburb} is: {avg_sales}")
With Glenveiw as input, it will output:
The total sale volume for the stores in Glenview is: 167368.03
The average sales for Glenview is: 55789.34333333333

Trying to get sums in lists to print out with strings in output

I have been working on a Python project analyzing a CSV file and cannot get the output to show me sums with my strings, just lists of the numbers that should be summed.
Code I'm working with:
import pandas as pd
data = pd.read_csv('XML_projectB.csv')
#inserted column headers since the raw data doesn't have any
data.columns = ['name','email','category','amount','date']
data['date'] = pd.to_datetime(data['date'])
#Calculate the total budget by cateogry
category_wise = data.groupby('category').agg({'amount':['sum']})
category_wise.reset_index(inplace=True)
category_wise.columns = ['category','total_budget']
#Determine which budget category people spent the most money in
max_budget = category_wise[category_wise['total_budget']==max(category_wise['total_budget'])]['category'].to_list()
#Tally the total amounts for each year-month (e.g., 2017-05)
months_wise = data.groupby([data.date.dt.year, data.date.dt.month])['amount'].sum()
months_wise = pd.DataFrame(months_wise)
months_wise.index.names = ['year','month']
months_wise.reset_index(inplace=True)
#Determine which person(s) spent the most money on a single item.
person = data[data['amount'] == max(data['amount'])]['name'].to_list()
#Tells user in Shell that text file is ready
print("Check your folder!")
#Get all this info into a text file
tfile = open('output.txt','a')
tfile.write(category_wise.to_string())
tfile.write("\n\n")
tfile.write("The type with most budget is " + str(max_budget) + " and the value for the same is " + str(max(category_wise['total_budget'])))
tfile.write("\n\n")
tfile.write(months_wise.to_string())
tfile.write("\n\n")
tfile.write("The person who spent most on a single item is " + str(person) + " and he/she spent " + str(max(data['amount'])))
tfile.close()
The CSV raw data looks like this (there are almost 1000 lines of it):
Walker Gore,wgore8i#irs.gov,Music,$77.98,2017-08-25
Catriona Driussi,cdriussi8j#github.com,Garden,$50.35,2016-12-23
Barbara-anne Cawsey,bcawsey8k#tripod.com,Health,$75.38,2016-10-16
Henryetta Hillett,hhillett8l#pagesperso-orange.fr,Electronics,$59.52,2017-03-20
Boyce Andreou,bandreou8m#walmart.com,Jewelery,$60.77,2016-10-19
My output in the txt file looks like this:
category total_budget
0 Automotive $53.04$91.99$42.66$1.32$35.07$97.91$92.40$21.28$36.41
1 Baby $93.14$46.59$31.50$34.86$30.99$70.55$86.74$56.63$84.65
2 Beauty $28.67$97.95$4.64$5.25$96.53$50.25$85.42$24.77$64.74
3 Books $4.03$17.68$14.21$43.43$98.17$23.96$6.81$58.33$30.80
4 Clothing $64.07$19.29$27.23$19.78$70.50$8.81$39.36$52.80$80.90
year month amount
0 2016 9 $97.95$67.81$80.64
1 2016 10 $93.14$6.08$77.51$58.15$28.31$2.24$12.83$52.22$48.72
2 2016 11 $55.22$95.00$34.86$40.14$70.13$24.82$63.81$56.83
3 2016 12 $13.32$10.93$5.95$12.41$45.65$86.69$31.26$81.53
I want the total_budget column to be the sum of the list for each category, not the individual values you see here. It's the same problem for months_wise, it gives me the individual values, not the sums.
I tried the {} .format in the write lines, .apply(str), .format on its own, and just about every other Python permutation of the conversion to string from a list I could think of, but I'm stumped.
What am I missing here?
As #Barmar said, the source has $XX so it is not treated as numbers. You could try following this approach to parse the values as integers/floats instead of strings with $ in them.

Counting entries in a CSV?

I'm just learning Python, and have been having a little bit of trouble with the list functionality of the language. I have a .csv file named purchases.csv and I need to do four things with it:
output the total number of "purchase orders" aka count the total number of entries in the csv
output the average amount of the purchases, showing three decimals.
output the total number of purchases made over 1,800
output the average amount of purchases made that are over 1,800 showing three decimals.
The output needs to look something like:
Total Number of Purchases: xxxx
Amount of Average Purchase: xxxx
Number of Purchase Orders over $1,800: xxxx
Amount of Average Purchases over $1,800: xxxx
So far I've written
import csv
with open('purchases.csv') as csvfile:
readCSV = csv.reader(csvfile,delimiter=',')
total_purchases=[]
for row in readCSV:
total=row[0]
total_purchases.append(total)
print(total_purchases)
my_sum=0
for x in home_runs:
my_sum=my_sum+int(x)
print("The total number of purchases was: ", my_sum)
To find the total number of purchases, but I've hit a wall and can't seem to figure out the rest! I'd love any help and guidance with this...I just can't figure it out!
You need an a series of separate similar for loops, but with if statements to only count the sum conditionally.
Assuming row[0] is your price column:
var sumAbove1800 = 0;
var countAbove1800 = 0;
var totalSum = 0;
var totalPurchases = 0;
for row in readCSV:
var price = float(row[0])
totalPurchases = totalPurchases + 1;
totalSum = totalSum + price;
if(price > 1800):
sumAbove1800 = sumAbove1800 + price;
countAbove1800 = countAbove1800 + 1;
Now to print them out with 3 decimal places:
print("Total Average Price: {:.3f}".format(totalSum / totalPurchases));
print("Total Transactions: {:.3f}".format(totalPurchases));
print("Total Average Price above 1800: {:.3f}".format(sumAbove1800 / countAbove1800 ));
print("Total Transactions above 1800: {:.3f}".format(countAbove1800 ));
Your question is a bit too vague, but here goes anyway.
Unless you are constrained by requirements as this appears to be homework / an assignment, you should give Pandas a try. It's a Python library that helps tremendously with data wrangling and data analysis.
output the total number of "purchase orders" aka count the total number of entries in the csv
This is dead easy with Pandas:
import pandas as pd
df = pd.read_csv('purchases.csv')
num = df.shape[0]
The first two lines are self-explanatory. You build an instance of a Pandas.DataFrame object with read_csv() and store it in df. For the last line, just know that Pandas.DataFrame has a member named shape with the format (number of lines, number of columns), so shape[0] returns the number of lines.
output the average amount of the purchases, showing three decimals.
mean = df['purchase_amount'].mean()
Access column 'purchase_amount' using brackets.
output the total number of purchases made over 1,800
num_over_1800 = df[df['purchase_amount'] > 1800].shape[0]
Slight twist here, just know that this is one way to set a condition in Pandas.
output the average amount of purchases made that are over 1,800
showing three decimals.
mean_over_1800 = df[df['purchase_amount'] > 1800].mean()
This should be self-explanatory from the rest above.

python for loop issue

I currently have this as an assignment. I have written the code below, but there seems to be an issue as the calculations keep on adding. Is there a way to restart the for loop?
There are 7 employees. Write a program with nested loops, to ask the yearly salary of each employee for 5 years. Your program should keep track of the highest salary, lowest salary, and calculate the average salary of each employee. After you collect each employees data, display the highest salary, lowest salary, and average salary for that employee.
totalsalary = 0
salaryhigh = 0
salarylow = 10000000
employee = 0
for employee in range(1,4):
print("Please enter the 5 year salaries of Employee#",employee,":")
for year in range(1,6):
salary = int(input('Enter you salary:'+""))
totalsalary = totalsalary + salary
if(salary > salaryhigh):
salaryhigh = salary
if(salary < salarylow):
salarylow = salary
avesalary = totalsalary/5
print('Total Salary entered for 5 years for Employee#',employee,':',totalsalary)
print("Average is:",avesalary)
print("Highest Salary entered is:",salaryhigh)
print("Lowest Salary entered is:",salarylow)
print("------------------------------------")
Your first three lines should be run for each employee, so they should be inside the outer for loop. The fourth line doesn't really do anything: your for-loop resets the employee number. Also, your for-loop does only three employees, but you state that there are seven employees. It's generally recommended that you set the number of employees at the beginning and then use that in the following code, so it's clear what the number represents and it's easier to keep track of and change the number. E.g.
number_of_employees = 7
for employee in range(0,number_of_employees):
...

How to efficiently iterate through a dictionary?

I'm new to Python and programming.
My textbook says I have to do the following problem set:
Create a second purchase summary that which accumulates total investment by ticker symbol. In the
above sample data, there are two blocks of CAT.
These can easily be combined by creating a dict where
the key is the ticker and the value is the list of blocks purchased. The program makes one pass
through the data to create the dict. A pass through the dict can then create a report showing each
ticker symbol and all blocks of stock.
I cannot think of a way, apart from hard-coding, to add the two entries of the 'CAT' stock.
## Stock Reports
stockDict = {"GM":"General Motors", "CAT":"Caterpillar", "EK":"Eastman Kodak",
"FB":"Facebook"}
# symbol,prices,dates,shares
purchases = [("GM",100,"10-sep-2001",48), ("CAT",100,"01-apr-1999",24),
("FB",200,"01-jul-2013",56), ("CAT", 200,"02-may-1999",53)]
# purchase history:
print "Company", "\t\tPrice", "\tDate\n"
for stock in purchases:
price = stock[1] * stock[3]
name = stockDict[stock[0]]
print name, "\t\t", price, "\t", stock[2]
print "\n"
# THIS IS THE PROBLEM SET I NEED HELP WITH:
# accumulate total investment by ticker symbol
byTicker = {}
# create dict
for stock in purchases:
ticker = stock[0]
block = [stock]
if ticker in byTicker:
byTicker[ticker] += block
else:
byTicker[ticker] = block
for i in byTicker.values():
shares = i[0][3]
price = i[0][1]
investment = shares * price
print investment
Right now, the output is:
4800
11200
2400
It's not good because it does not calculate the two CAT stocks. Right now it only calculates one. The code should be flexible enough that I could add more CAT stocks.
Your problem is in the last part of your code, the penultimate bit creates a list of all stocks against each ticker, which is fine:
for i in byTicker.values():
shares = i[0][3]
price = i[0][1]
investment = shares * price
print investment
Here you only use the zeroth stock for each ticker. Instead, try:
for name, purchases in byTicker.items():
investment = sum(shares * price for _, shares, _, price in purchases)
print name, investment
This will add up all of the stocks for each ticker, and for your example gives me:
CAT 13000
FB 11200
GM 4800
The problem with your code is that you are not iterating over the purchaes, but just getting the first element from each ticker value. That is, byTicker looks something like:
byTicker: {
"GM": [("GM",100,"10-sep-2001",48)],
"CAT": [("CAT",100,"01-apr-1999",24), ("CAT", 200,"02-may-1999",53)],
"FB": [("FB",200,"01-jul-2013",56)]
}
so when you iterate over the values, you actually get three lists. But when you process these lists, you are actually accessing only the first of them:
price = i[0][1]
for the value corresponding to "CAT", i[0] is ("CAT",100,"01-apr-1999",24). You should look into i[1] as well! Consider iterating over the different purchases:
for company, purchases in byTicker.items():
investment = 0
for purchase in purchases:
investment += purchase[1] * purchase[3]
print(company, investment)
Maybe something like this:
## Stock Reports
stockDict = {"GM":"General Motors", "CAT":"Caterpillar", "EK":"Eastman Kodak",
"FB":"Facebook"}
# symbol,prices,dates,shares
purchases = [("GM",100,"10-sep-2001",48), ("CAT",100,"01-apr-1999",24),
("FB",200,"01-jul-2013",56), ("CAT", 200,"02-may-1999",53)]
# purchase history:
print "Company", "\t\tPrice", "\tDate\n"
for stock in purchases:
price = stock[1] * stock[3]
name = stockDict[stock[0]]
print name, "\t\t", price, "\t", stock[2]
print "\n"
# THIS IS THE PROBLEM SET I NEED HELP WITH:
# accumulate total investment by ticker symbol
byTicker = {}
# create dict
for stock in purchases:
ticker = stock[0]
price = stock[1] * stock[3]
if ticker in byTicker:
byTicker[ticker] += price
else:
byTicker[ticker] = price
for ticker, price in byTicker.iteritems():
print ticker, price
The output I get is:
Company Price Date
General Motors 4800 10-sep-2001
Caterpillar 2400 01-apr-1999
Facebook 11200 01-jul-2013
Caterpillar 10600 02-may-1999
GM 4800
FB 11200
CAT 13000
which appears to be correct.
Testing whether or not a ticker is in the byTicker dict tells you whether or not there's already been a purchase recorded for that stock. If there is, you just add to it, if not, you start fresh. This is basically what you were doing, except for some reason you were collecting all of the purchase records for a given stock in that dict, when all you really cared about was the price of the purchase.
You could build the dict the same way you were originally, and then iterate over the items stored under each key, and add them up. Something like this:
totals = []
for ticker in byTicker:
total = 0
for purchase in byTicker[ticker]:
total += purchase[1] * purchase[3]
totals.append((ticker, total))
for ticker, total in totals:
print ticker, total
And just for kicks, you could compress it all into one line with generator statements:
print "\n".join("%s: %d" % (ticker, sum(purchase[1]*purchase[3] for purchase in byTicker[ticker])) for ticker in byTicker)
Either of these last two are completely unnecessary to do though, since you're already iterating through every purchase, you may as well just accumulate the total price for each stock as you go, as I showed in the first example.

Categories

Resources