Summing a column in a .csv file using python

Summing a column in a .csv file using python - python

I'm trying to sum a column in a csv file using python. Here's a sample of the csv data;
Date,Profit/Losses
Jan-2010,867884
Feb-2010,984655
Mar-2010,322013
Apr-2010,-69417
May-2010,310503
Jun-2010,522857
Jul-2010,1033096
Aug-2010,604885
Sep-2010,-216386
I want to sum the Profit/Losses column.
I am using the following code but it's returning a 0. Where could I be going wrong?
import os
import csv
# Path to collect data from the csv file in the Resources folder
pybank_csv = os.path.join("resources", "budget_data.csv")
with open(pybank_csv, 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
next(csvfile, None)
t = sum(float(row[1]) for row in csvreader)
#print the results
print(f"Total: {t}")

Easiest way is to use pandas library.
Use pip install pandas to install pandas on your machine
and then
import pandas as pd
df = pd.read_csv('your_filename.csv')
sumcol = df['Profit/Losses'].sum()
print(sumcol)
The sum is in sumcol object now. For future reference, If your task is to work with the data provided in csv file, pandas is a blessing. This library provides you with thousands of different types of operations you could perform on your data. Refer Pandas Website for more info.
If you want to make use of csv package only then you can read the csv as a dict and then sum the Profit/Loss entry of dict for each row
total = 0
with open('your_filename.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
for row in data:
total = total + int(row['Profit/Losses'])
print(total)
Or If you want to use reader instead of dict reader, you need to ignore first row. Something like this
total = 0
with open('your_filename.csv', newline='') as csvfile:
data = csv.reader(csvfile)
for row in data:
if not str(row[1]).startswith('P'):
total = total + int(row[1])
print(total)

Related

How to edit a CSV file row by row in Python without using Pandas

I have a CSV file and when I read it by importing the CSV library I get as the output:
['exam', 'id_student', 'grade']`
['maths', '573834', '7']`
['biology', '573834', '8']`
['biology', '578833', '4']
['english', '581775', '7']`
# goes on...
I need to edit it by creating a 4th column called 'Passed' with two possible values: True or False depending on whether the grade of the row is >= 7 (True) or not (False), and then count how many times each student passed an exam.
If it's not possible to edit the CSV file that way, I would need to just read the CSV file and then create a dictionary of lists with the following output:
dict = {'id_student':[573834, 578833, 581775], 'passed_count': [2,0,1]}
# goes on...
Thanks

Try using importing csv as pandas dataframe
import pandas as pd
data=pd.read_csv('data.csv')
And then use:
data['passed']=(data['grades']>=7).astype(bool)
And then save dataframe to csv as:
data.to_csv('final.csv',index=False)

It is totally possible to "edit" CSV.
Assuming you have a file students.csv with the following content:
exam,id_student,grade
maths,573834,7
biology,573834,8
biology,578833,4
english,581775,7
Iterate over input rows, augment the field list of each row with an additional item, and save it back to another CSV:
import csv
with open('students.csv', 'r', newline='') as source, open('result.csv', 'w', newline='') as result:
csvreader = csv.reader(source)
csvwriter = csv.writer(result)
# Deal with the header
header = next(csvreader)
header.append('Passed')
csvwriter.writerow(header)
# Process data rows
for row in csvreader:
row.append(str(int(row[2]) >= 7))
csvwriter.writerow(row)
Now result.csv has the content you need.
If you need to replace the original content, use os.remove() and os.rename() to do that:
import os
os.remove('students.csv')
os.rename('result.csv', 'students.csv')
As for counting, it might be an independent thing, you don't need to modify CSV for that:
import csv
from collections import defaultdict
with open('students.csv', 'r', newline='') as source:
csvreader = csv.reader(source)
next(csvreader) # Skip header
stats = defaultdict(int)
for row in csvreader:
if int(row[2]) >= 7:
stats[row[1]] += 1
print(stats)
You can include counting into the code above and have both pieces in one place. defaultdict (stats) has the same interface as dict if you need to access that.

CSV file Reading rows in Python

I'm beginner in python, I'm trying to read a csv and to extract some of the result in another file:
import csv
with open('test.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
print(row[0])
I get the error IndexError: list index out of range. It happens when I select a row which doesn't exist. However, my csv as 5 columns and I can't isolate any of them.

Use the python Pandas library for File reading.
Make sure the encoding format for the CSV File.
import pandas as pd
data = pd.read_csv("file_name.csv")
data.head() //it will print the first 5 rows
//for 1 row
data.head(1)
check this, and you'll get the answer for yoru question

How to get specific columns in a certain range from a csv file without using pandas

For some reason the pandas module does not work and I have to find another way to read a (large) csv file and have as Output specific columns within a certain range (e.g. first 1000 lines). I have the code that reads the entire csv file, but I haven't found a way to display just specific columns.
Any help is much appreciated!
import csv
fileObj = open('apartment-data-all-4-xaver.2018.csv')
csvReader = csv.reader( fileObj )
for row in csvReader:
print row
fileObj.close()

I created a small csv file with the following contents:
first,second,third
11,12,13
21,22,23
31,32,33
41,42,43
You can use the following helper function that uses namedtuple from collections module, and generates objects that allows you to access your columns like attributes:
import csv
from collections import namedtuple
def get_first_n_lines(file_name, n):
with open(file_name) as file_obj:
csv_reader = csv.reader(file_obj)
header = next(csv_reader)
Tuple = namedtuple('Tuple', header)
for i, row in enumerate(csv_reader, start=1):
yield Tuple(*row)
if i >= n: break
If you want to print first and third columns, having n=3 lines, you use the method like this (Python 3.6 +):
for line in get_first_n_lines(file_name='csv_file.csv', n=3):
print(f'{line.first}, {line.third}')
Or like this (Python 3.0 - 3.5):
for line in get_first_n_lines(file_name='csv_file.csv', n=3):
print('{}, {}'.format(line.first, line.third))
Outputs:
11, 13
21, 23
31, 33

use csv dictreader and then filter out specific rows and columns
import csv
data = []
with open('names.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
data.append(row)
colnames = ['col1', 'col2']
for i in range(1000):
print(data[i][colnames[0]], data[i][colnames[1]])

How to make a python list from a .csv list?

I want to know how to make a .csv list into a python list which I can do plotting and calculating:
I used:
fpath = r'C:112017\temp\tT.csv'
with open(fpath,'r') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
print(list(reader))
it gives me lists like this (which I dont' want):
[['2014-12-30', '18.34244791666665'], ['2014-12-31', '18.540224913494818'], ['2015-01-01', '18.15729166666666'],......
If I use
print(row)
it gives me lists like this (looks better but I still can not use it for calculating):
...
['2016-07-27', '20.434809022479584']
['2016-07-28', '21.395138886239796']
['2016-07-29', '20.81571181284057']
['2016-07-30', '20.565711801250778']
...
How can I use panda to make a list? Or is there any easier way to achieve this? Is that possible to use something like:
date = row[0]
temp = row[1]
lis = pd.DataFrame(date,temp)
I guess there are some basic mistakes, but I can't fix it by myself.
Thank you for your time to help.

There is pandas.read_csv() method which will read the csv file and return a dataframe
Eg:
fpath = r'C:112017\temp\tT.csv'
df = pd.read_csv(fpath, delimiter=',', names=['date', 'temp'])

I guess you are trying to get the columns as list i.e. a list of dates and temperatures.
fpath = r'C:112017\temp\tT.csv'
with open(fpath,'r') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
data = list(reader)
date, temp = list(map(list, zip(*data)))
# for python 2 use map(list,zip(*data))
# converting temp to float
temp = list(map(float,temp))

in my opinion if you want to perform calculations with data from a .csv file you should consider using pandas and numpy.
import pandas as pd
import numpy as np
# importing dataframe
df = pd.read_csv('filename.csv', delimiter=',')
# check the dataframe
print (df)

Reading a csv file by column

I have a code to read csv file by row
import csv
with open('example.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
print(row)
print(row[0])
But i want only selected columns what is the technique could anyone give me a script?

import csv
with open('example.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
column_one = [row[0] for row in readCSV ]
Will give you list of values from the first column. That being said - you'll have to read the entire file anyway.

You can't do that, because files are written byte-by-byte to your filesystem. To know where one line ends, you will have to read all the line to detect the presence of a line-break character. There's no way around this in a CSV.
So you'll have to read all the file -- but you can choose which parts of each row you want to keep.

I would definitely use pandas for that.
However, in plain python this one of the way to do it.
In this example I am extracting the content of row 3, column 4.
import csv
target_row = 3
target_col = 4
with open('yourfile.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
n = 0
for row in reader:
if row == target_row:
data = row.split()[target_col]
break
print data

read_csv in pandas module can load a subset of columns.
Assume you only want to load columns 1 and 3 in your .csv file.
import pandas as pd
usecols = [1, 3]
df = pd.read_csv('example.csv',usecols=usecols, sep=',')
Here is Doc for read_csv.
In addition, if your file is big, you can read the file piece by piece by specifying chucksize in read_csv

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Summing a column in a .csv file using python - python

Related

How to edit a CSV file row by row in Python without using Pandas

CSV file Reading rows in Python

How to get specific columns in a certain range from a csv file without using pandas

How to make a python list from a .csv list?

Reading a csv file by column

Categories

Resources