Calculate value difference between two different CSV files python

Calculate value difference between two different CSV files python - python

I have two differenct csv files:
outputnovember.csv
symbol,name,amount
A,john,2
D,mary,6
E,bob,9
m,liz,-8
p,peter,-2
A total 2,Positive total 17,Negative total -10
outputdecember.csv
symbol,name,amount
A,john,2
D,mary,26
m,liz,-1
p,peter,-2
A total 2,Positive total 26,Negative total -3
how do i calculate the difference between the calculated values of the two file so that the following is appended to outdecember: A total 0, Posiitve total 9, Negative total-17
here's my code so far:
import csv
f=open('outputnovember.csv')
csv_f= csv.reader(f)
with open('input.csv', 'r') as f_input, open('outdecember.csv', 'w') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
header = next(csv_input)
csv_output.writerow(header)
sum_positive = sum_negative = sum_a = 0
for cols in csv_input:
csv_output.writerow(cols)
value = int(cols[2])
if cols[0] == 'A':
sum_a += value
if value >= 0:
sum_positive += value
else:
sum_negative += value
csv_output.writerow(["A total {}".format(sum_a)],
csv_output.writerow(["Positive total {}".format(sum_positive)])
csv_output.writerow(["Negative total {}".format(sum_negative)])
... here is where i'm stuck to retrieve the values from outputnovember.csv and find the difference from outputdecember.csv
Thanks all
B

Related

Sum of rows from CSV

I have the following code:
with open("expenses.csv") as read_exp:
reader = csv.reader(read_exp, delimiter=',')
header = next(reader)
if header != None:
for row in reader:
month_str = row[0]
month_dt= datetime.strptime(month_str, '%d/%m/%Y').month
if month_dt == month1:
sum1 = sum((map(int,row[2:7])))
print(sum1)
This gives me the sum of each individual row that is from the month I am looking for.
Output:
Enter selected month number: 7
Selected Month is: July
15
26
7
23
21
19
30
Is there a way to combine the individual sums into one total sum?
My csv is as below:
Date,Budget,Groceries,Transport,Food,Bills,Others
12/7/2021,30,1,0,4,2,8
13/7/2021,30,9,3,5,7,2
14/7/2021,30,3,3,0,0,1
15/7/2021,30,1,0,10,7,5
16/7/2021,30,9,9,0,2,1
17/7/2021,30,0,6,4,1,8
18/7/2021,30,0,9,9,8,4
16/8/2021,30,7,10,7,10,1
17/8/2021,30,5,6,10,9,1
18/8/2021,30,6,1,9,10,5
19/8/2021,30,0,8,8,3,5
20/8/2021,30,4,0,6,9,4
21/8/2021,30,6,2,1,1,5
22/8/2021,30,3,3,1,1,10
13/9/2021,30,8,2,9,4,6
14/9/2021,30,10,7,10,5,7
15/9/2021,30,5,5,6,9,6
16/9/2021,30,5,7,4,6,2
17/9/2021,30,3,7,10,5,7
18/9/2021,30,8,9,6,8,1
19/9/2021,30,5,3,1,9,5

I assume you want to print the full value of the month in your example correct?
If that is the case you could just have a variable total_sum for example where u add the content of sum1(I m assuming sum1 is a value) into it like this:
reader = csv.reader(read_exp, delimiter=',')
header = next(reader)
if header != None:
for row in reader:
month_str = row[0]
month_dt= datetime.strptime(month_str, '%d/%m/%Y').month
if month_dt == month1:
sum1 = sum((map(int,row[2:7])))
print(sum1)
total_sum += sum1
print(total_sum)

merge some rows in two conditions

I want to merge rows within a condition. If the row is less than 20 characters long, combine that row with the previous row. But I have two columns and I want to apply the condition in the code in the second column, if any row contains less than 20 characters remove row for two columns.
I got help here already to merge rows but if I had one column now I have different requirements. I have two columns and want to apply the operation in the second row, any row have less than 20 char merge this row with the previous row and remove this row from two columns.
This the old code for merge and remove row but when I have one columns. Thank you for help.
I'm try this code but doesn't give me result.
import csv
import pandas as pd
df = pd.read_csv('Test.csv')
with open('Output.csv', mode='w', newline='', encoding='utf-16') as f:
writer = csv.writer(f, delimiter=' ')
rows = []
for i, data in enumerate(df['Sentence']):
if i + 1 == len(df['Sentence']):
writer.writerow([data])
elif len(df['Sentence'][i + 1]) < 20:
writer.writerow([data + df['Sentence'][i + 1]])
df.drop(df.index[[i + 1]])
elif len(df['Sentence'][i + 1]) >= 20:
writer.writerow([data])

I solved this by make the row null then remove it from CSV
df = pd.read_csv('test.csv', encoding='utf-8')
with open('output.csv', mode='w', newline='', encoding='utf-16') as f:
writer = csv.writer(f, delimiter=' ')
rows = []
for i, data in enumerate(df['Sentence']):
if i + 1 == len(df['Sentence']):
writer.writerow([data])
elif len(df['Sentence'][i + 1]) < 19:
writer.writerow([data + df['Sentence'][i + 1]])
df['Sentence'][i + 1] = ''
elif len(df['Sentence'][i + 1]) >= 19:
writer.writerow([data])

openpyxl start writing from particular column/cell

I have the following code:
ws = wb.worksheets[1]
print(ws)
with open('out.txt', 'r+') as data:
reader = csv.reader(data, delimiter='\t')
for row in reader:
print(row)
ws.append(row)
wb.save('test.xlsx')
by default it's written to xlsx file starting from A0
Is there a more convinient way to start appending data, let's say from C2?
Or only xxx.cell(row=xx , column=yy ).value=zz ?
i = 2
j = 3
with open('out.txt', 'r+') as data:
reader = list(csv.reader(data, delimiter='\t'))
for row in reader:
for element in row:
ws.cell(row=i, column=j).value = element
j += 1
j = 3
i += 1

Just pad the rows with Nones
ws.append([]) # move to row 2
for row in reader:
row = (None)*2 + row
ws.append(row)

File data binding with column names

I have files with hundreds and thousands rows of data but they are without any column.
I am trying to go to every file and make them row by row and store them in list after that I want to assign values by columns. But here I am confused what to do because values are around 60 in every row and some extra columns with value assigned and they should be added in every row.
Code so for:
import re
import glob
filenames = glob.glob("/home/ashfaque/Desktop/filetocsvsample/inputfiles/*.txt")
columns = []
with open("/home/ashfaque/Downloads/coulmn names.txt",encoding = "ISO-8859-1") as f:
file_data = f.read()
lines = file_data.splitlines()
for l in lines:
columns.append(l.rstrip())
total = {}
for name in filenames:
modified_data = []
with open(name,encoding = "ISO-8859-1") as f:
file_data = f.read()
lines = file_data.splitlines()
for l in lines:
if len(l) >= 1:
modified_data.append(re.split(': |,',l))
rows = []
i = len(modified_data)
x = 0
while i > 60:
r = lines[x:x+59]
x = x + 60
i = i - 60
rows.append(r)
z = len(modified_data)
while z >= 60:
z = z - 60
if z > 1:
last_columns = modified_data[-z:]
x = []
for l in last_columns:
if len(l) > 1:
del l[0]
x.append(l)
elif len(l) == 1:
x.append(l)
for row in rows:
for vl in x:
row.append(vl)
for r in rows:
for i in range(0,len(r)):
if len(r) >= 60:
total.setdefault(columns[i],[]).append(r[i])
In other script I have separated both row with 60 values and last 5 to 15 columns which should be added with row are separate but again I am confused how to bind all the data.
Data Should look like this after binding.
outputdata.xlsx
Data Input file:
inputdata.txt
What Am I missing here? any tool ?

I believe that your issue can be resolved by taking the input file and turning it into a CSV file which you can then import into whatever program you like.
I wrote a small generator that would read a file a line at a time and return a row after a certain number of lines, in this case 60. In that generator, you can make whatever modifications to the data as you need.
Then with each generated row, I write it directly to the csv. This should keep the memory requirements for this process pretty low.
I didn't understand what you were doing with the regex split, but it would be simple enough to add it to the generator.
import csv
OUTPUT_FILE = "/home/ashfaque/Desktop/File handling/outputfile.csv"
INPUT_FILE = "/home/ashfaque/Desktop/File handling/inputfile.txt"
# This is a generator that will pull only num number of items into
# memory at a time, before it yields the row.
def get_rows(path, num):
row = []
with open(path, "r", encoding="ISO-8859-1") as f:
for n, l in enumerate(f):
# apply whatever transformations that you need to here.
row.append(l.rstrip())
if (n + 1) % num == 0:
# if rows need padding then do it here.
yield row
row = []
with open(OUTPUT_FILE, "w") as output:
csv_writer = csv.writer(output)
for r in get_rows(INPUT_FILE, 60):
csv_writer.writerow(r)

TypeError: 'float' object is not iterable 3

import csv
csvfile = open(r"C:\Users\Administrator\Downloads\canberra_2011_2012.csv")
header = csvfile.readline()
csv_f = csv.reader(csvfile)
for row in csv_f:
first_value = float(row[5])
total = sum(first_value)
length = len(first_value)
average = total/length
print("average = ",average)
When i run this code, it said
TypeError: 'float' object is not iterable
But when I change the line 7 to
first_value = [float(row[5]) for row in csv_f
then it works. This confuses me, can anyone help me?

The other answer is much more elegant than mine, but the following is closer to the spirit of your original code. It may make your errors more obvious. I apologize for the crappy formatting. I'm new to this site.
import csv
csvfile = open(r"C:\Users\Administrator\Downloads\canberra_2011_2012.csv")
header = csvfile.readline()
csv_f = csv.reader(csvfile)
length = 0
total = 0.0
for row in csv_f:
first_value = float(row[5])
total = total + first_value
length += 1
if length > 0:
average = total/length
print("average = ",average)

I think you want to collect all the first_values and then do some calculations. To do that, you must step through each row of the csv file and first collect all the values, otherwise you are summing one value and that's the source of your error.
Try this version:
import csv
with open(r"C:\Users\Administrator\Downloads\canberra_2011_2012.csv") as f:
reader = csv.reader(f)
values = [float(line[5]) for line in reader]
# Now you can do your calculations:
total = sum(values)
length = len(values)
# etc.

You are getting this error at this line of your code,
total = sum(first_value)
The error is raised because sum is a function of iterable object. As in your code, the first_value is a float object. So you can not use sum function on it. But when you use list compression,
first_value = [float(row[5]) for row in csv_f]
then the first_value is a list type object consisting the float values of row[5]. So you can apply sum function on it without raising error.
Apart from list compression, you can also append the values in a list in your for loop and calculate the sum and length after the loop.
first_values = []
for row in csv_f:
first_value = float(row[5])
first_values.append(first_value)
total = sum(first_values)
length = len(first_values)
average = total/length

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Calculate value difference between two different CSV files python - python

Related

Sum of rows from CSV

merge some rows in two conditions

openpyxl start writing from particular column/cell

File data binding with column names

TypeError: 'float' object is not iterable 3

Categories

Resources