I have a .csv file with some data that i would like to change.
It looks like this:
item_name,item_cost,item_priority,item_required,item_completed
item 1,11.21,2,r
item 2,411.21,3,r
item 3,40.0,1,r,c
My code runs most of what i need but i am unsure of how to write back on my .csv to produce this result
item_name,item_cost,item_priority,item_required,item_completed
item 1,11.21,2,x
item 2,411.21,3,r
item 3,40.0,1,r,c
My code:
print("Enter the item number:")
line_count = 0
marked_item = int(input())
with open("items.csv", 'r') as f:
reader = csv.DictReader(f, delimiter=',')
for line in reader:
if line["item_required"] == 'r':
line_count += 1
if marked_item == line_count:
new_list = line
print(new_list)
for key, value in new_list.items():
if value == "r":
new_list['item_required'] = "x"
print(new_list)
with open("items.csv", 'a') as f:
writer = csv.writer(f)
writer.writerow(new_list.values())
There are several problems here
you're using a DictReader, which is good to read data, but not as good to read and write data as the original file, since dictionaries do not ensure column order (unless you don't care, but most of the time people don't want columns to be swapped). I just read the title, find the index of the column title, and use this index in the rest of the code (no dicts = faster)
when you write you append to the csv. You have to delete old contents, not append. And use newline='' or you get a lot of blank lines (python 3) or "wb" (python 2)
when you read, you need to store all values, not only the one you want to change, or you won't be able to write back all the data (since you're replacing the original file)
when you modify, you do overcomplex stuff I just replaced by a simple replace in list at the given index (after all you want to change r to x at a given row)
Here's the fixed code taking all aforementioned remarks into account
EDIT: added the feature you request after: add a c after x if not already there, extending the row if needed
import csv
line_count = 0
marked_item = int(input())
with open("items.csv", 'r') as f:
reader = csv.reader(f, delimiter=',')
title = next(reader) # title
idx = title.index("item_required") # index of the column we target
lines=[]
for line in reader:
if line[idx] == 'r':
line_count += 1
if marked_item == line_count:
line[idx] = 'x'
# add 'c' after x (or replace if column exists)
if len(line)>idx+1: # check len
line[idx+1] = 'c'
else:
line.append('c')
lines.append(line)
with open("items.csv", 'w',newline='') as f:
writer = csv.writer(f,delimiter=',')
writer.writerow(title)
writer.writerows(lines)
Using pandas:
import pandas as pd
df = pd.read_csv("items.csv")
print("Enter the item number:")
marked_item = int(input())
df.set_value(marked_item - 1, 'item_required', 'x')
# This is the extra feature you required:
df.set_value(marked_item - 1, 'item_completed', 'c')
df.to_csv("items.csv", index = False)
Result when marked_item = 1:
item_name,item_cost,item_priority,item_required,item_completed
item 1,11.21,2,x,c
item 2,411.21,3,r,
item 3,40.0,1,r,c
Note that according to RFC4180 you should keep the trailing commas.
I guess this should do the trick:
Open a file which can read and written to update it (use "+r" for that)
instead of opening it again write it right there using csvfilewriter, which we create at the start.
file.py
import csv
fieldnames = ["item_name","item_cost","item_priority","item_required","item_completed"]
csvfile = open("items.csv", 'r+')
csvfilewriter = csv.DictWriter(csvfile, fieldnames=fieldnames,dialect='excel', delimiter=',')
csvfilewriter.writeheader()
print("Enter the item number:")
line_count = 0
marked_item = int(input())
with open("items.csv", 'r') as f:
reader = csv.DictReader(f, delimiter=',')
for line in reader:
if line["item_required"] == 'r':
line_count += 1
if marked_item == line_count:
new_list = line
print(new_list)
for key, value in new_list.items():
if value == "r":
new_list['item_required'] = "x"
print(new_list)
csvfilewriter.writerow(new_list)
If you don't want to update the csv but want to write a new one, below is the code:
import csv
fieldnames = ["item_name","item_cost","item_priority","item_required","item_completed"]
csvfile = open("items_new.csv", 'w')
csvfilewriter = csv.DictWriter(csvfile, fieldnames=fieldnames,dialect='excel', delimiter=',')
csvfilewriter.writeheader()
print("Enter the item number:")
line_count = 0
marked_item = int(input())
with open("items.csv", 'r') as f:
reader = csv.DictReader(f, delimiter=',')
for line in reader:
if line["item_required"] == 'r':
line_count += 1
if marked_item == line_count:
new_list = line
print(new_list)
for key, value in new_list.items():
if value == "r":
new_list['item_required'] = "x"
print(new_list)
csvfilewriter.writerow(new_list)
else:
csvfilewriter.writerow(line)
Related
I have a following text-file products.txt:
Product;Amount;Price
Apple;3;10.00
Banana;1;5.00
Lemon;2;3.00
Orange;4;20.00
Apple;4;8.00
I want read this file and make a new text-file newfile.txt, which contains value of each row (Amount X Price):
30.00
5.00
6.00
80.00
32.00
Finally, I want to find the total sum of newfile.txt (which is 30+5+6+80+32 = 153)
Note, the price of same product can vary and we are not interested total sum of each product.
I started with creating class.
class DATA:
product= ""
amount= 0
price= 0
def read (name):
list = []
file= open(name, 'r', encoding="UTF-8")
file.readline()
while (True):
row= file.readline()
if(rivi == ''):
break
columns= row[:-1].split(';')
info= DATA()
info.amount= int(columns[1])
info.price= int(columns[2])
info.total = info.amount * info.price
file.append(info)
tiedosto.close()
return list
This should work:
def read(name):
total = 0
ori = open(name, 'r', encoding="UTF-8")
row = ori.readline()
dest = open("newfile.txt", 'w', encoding="UTF-8")
row = ori.readline()
while (row != ""):
row = row[:-1].split(';')
res = int(row[1]) * float(row[2])
total += res
dest.write(str(res) + "\n")
row = ori.readline()
ori.close()
dest.close()
print(total)
read("products.txt")
A possibility would be to use csv from the standard library.
import csv
# fix files' paths
path1 = # file to read
path2 = # file to write
# read data and perform computations
rows_tot = []
with open(path1, 'r', newline='', encoding="utf-8") as fd:
reader = csv.DictReader(fd, delimiter=";")
for row in reader:
rows_tot.append(float(row['Amount']) * float(row['Price']))
# total sum
print("Total sum:", int(sum(rows_tot)))
# save to file the new data
with open(path2, 'w', newline='') as fd:
fieldnames = ("AmountXPrice",)
writer = csv.DictWriter(fd, fieldnames=fieldnames)
writer.writeheader()
for value in rows_tot:
writer.writerow({fieldnames[0]: f"{value:.2f}"})
Remark: it is not clear from the question the type of the various data, in case just change int with float or the other way around.
First off, I do not have pandas framework and am unable to install it. I am hoping that I can solve this problem without pandas.
I am trying to clean my data using python framework, by removing rows that contain empty cells.
this is my code:
import csv
input_file = 'test.csv'
output_file = 'test1.csv'
cols_to_remove =[0,1,9,11,14,15,23,28,29,32,33,37,38,39,41,43,44,45,46,47,48,49]
cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0
with open(input_file, "r") as source: #to run and delete column
reader = csv.reader(source)
with open(output_file, "w") as result:
writer = csv.writer(result)
for row in reader:
row_count += 1
print('\r{0}'.format(row_count)) # Print rows processed
for col_index in cols_to_remove:
del row[col_index]
writer.writerow(row)
print(row)
I have tried codes from other similar questions asked, however it prints into an empty file.
Assuming that the empty row is in fact one that looks like this
,,,,,,,,,,,,,, (and many more)
you can do the following:
import csv
input_file = 'test.csv'
output_file = 'test1.csv'
cols_to_remove =[0,1,9,11,14,15,23,28,29,32,33,37,38,39,41,43,44,45,46,47,48,49]
cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0
with open(input_file, "r") as source: #to run and delete column
reader = csv.reader(source)
with open(output_file, "w") as result:
writer = csv.writer(result)
for row in reader:
row_count += 1
print('\r{0}'.format(row_count)) # Print rows processed
all_empty = False # new
for cell in row: # new
if len(cell) == 0: # new
all_empty = True # new
break# new
if all_empty: # new
continue # new
for col_index in cols_to_remove:
del row[col_index]
writer.writerow(row)
print(row)
try to skip row if any cell empty like this:
if any(cel is None or cel == '' for cel in row):
continue
Here's the code:
import csv
input_file = 'hey.txt'
output_file = 'test1.csv'
cols_to_remove = [0, 1, 9, 11, 14, 15, 23, 28, 29, 32, 33, 37, 38]
cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0
with open(input_file, "r+") as source: # to run and delete column
reader = csv.reader(source)
with open(output_file, "w+") as result:
writer = csv.writer(result)
for row in reader:
row_count += 1
print('\r{0}'.format(row_count)) # Print rows processed
if any(cel is None or cel == '' for cel in row):
continue
for col_index in cols_to_remove:
try:
del row[col_index]
except Exception:
pass
writer.writerow(row)
print(row)
files and cols_to_remove were changed, please, use your own.
step of skipping might be moving after row deleting.
Checking length of row as someone suggested might not work as empty rows in csv may contain empty Strings ''. So len(row) wouldn't return 0 because it is a list of empty strings.
To simply delete empty rows in csv try adding the following check
skip_row = True
for item in row:
if item != None and item != '':
skip_row = False
if not skip_row:
# process row
Full code would be
import csv
input_file = 'test.csv'
output_file = 'test1.csv'
cols_to_remove = [0,1,9,11,14,15,23,28,29,32,33,37,38,39,41,43,44,45,46,47,48,49]
cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0
with open(input_file, "r") as source: #to run and delete column
reader = csv.reader(source)
with open(output_file, "w", newline='' ) as result:
writer = csv.writer(result)
for row in reader:
row_count += 1
print('\r{0}'.format(row_count)) # Print rows processed
skip_row = True
for item in row:
if item != None and item != '':
skip_row = False
if not skip_row:
for col_index in cols_to_remove:
del row[col_index]
writer.writerow(row)
print(row)
Here we check if each element of a row is None type or an empty string and decide if we should skip that row or not.
you can use this for delete all the row that have a null value,
data = data.dropna()
and this one for delete column,
data = data.drop(['column'], axis=1)
Consider the following CSV:
date,description,amount
14/02/2020,march contract,-99.00
15/02/2020,april contract,340.00
16/02/2020,march contract,150.00
17/02/2020,april contract,-100.00
What I'd like to do is:
Iterate through all of the rows
Total the amounts of lines which have the same description
Return the last line which has that newly-calculated amount
Applied to the above example, the CSV would look like this:
16/02/2020,march contract,51.00
17/02/2020,april contract,240.00
So far, I've tried nesting csv.reader()s inside of each other and I'm not getting the result I am wanting.
I'd like to achieve this without any libraries and/or modules.
Here is the code I have so far, where first_row is each row in the CSV and second_row is the iteration of looking for matching descriptions:
csv_reader = csv.reader(report_file)
for first_row in csv_reader:
description_index = 5
amount_index = 13
print(first_row)
for second_row in csv_reader:
if second_row is not first_row:
print(first_row[description_index] == second_row[description_index])
if first_row[description_index] == second_row[description_index]:
first_row[amount_index] = float(first_row[amount_index]) + float(second_row[amount_index])
This will work:
import csv
uniques = {} # dictionary to store key/value pairs
with open(report_file, newline='') as f:
reader = csv.reader(f, delimiter=',')
next(reader, None) # skip header row
for data in reader:
date = data[0]
description = data[1]
if description in uniques:
cumulative_total = uniques[description][0]
uniques[description] = [cumulative_total+float(data[2]), date]
else:
uniques[description] = [float(data[2]), date]
# print output
for desc, val in uniques.items():
print(f'{val[0]}, {desc}, {val[1]}')
I know that you've asked for a solution without pandas, but you'll save yourself a lot of time if you use it:
df = pd.read_csv(report_file)
totals = df.groupby(df['description']).sum()
print(totals)
I suggest you should use pandas, it'll be efficient.
or if you still want to go with your way then this will help.
import csv
with open('mycsv.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
value_dict = {}
line_no = 0
for row in csv_reader:
if line_no == 0:
line_no += 1
continue
cur_date = row[0]
cur_mon = row[1]
cur_val = float(row[2])
if row[1] not in value_dict.keys():
value_dict[cur_mon] = [cur_date, cur_val]
else:
old_date, old_val = value_dict[cur_mon]
value_dict[cur_mon] = [cur_date, (old_val + cur_val)]
line_no += 1
for key, val_list in value_dict.items():
print(f"{val_list[0]},{key},{val_list[1]}")
Output:
16/02/2020,march contract,51.0
17/02/2020,april contract,240.0
Mark this as answer if it helps you.
working with dictionary makes it easy to access values
import csv
from datetime import datetime
_dict = {}
with open("test.csv", "r") as f:
reader = csv.reader(f, delimiter=",")
for i, line in enumerate(reader):
if i==0:
headings = [line]
else:
if _dict.get(line[1],None) is None:
_dict[line[1]] = {
'date':line[0],
'amount':float(line[2])
}
else:
if datetime.strptime(_dict.get(line[1]).get('date'),'%d/%m/%Y') < datetime.strptime(line[0],'%d/%m/%Y'):
_dict[line[1]]['date'] = line[0]
_dict[line[1]]['amount'] = _dict[line[1]]['amount'] + float(line[2])
Here your _dict will contain unique description and values
>>> print(_dict)
{'march contract': {'date': '16/02/2020', 'amount': 51.0},
'april contract': {'date': '17/02/2020', 'amount': 240.0}}
convert to list and add headings
headings.extend([[value['date'],key,value['amount']] for key,value in _dict.items()])
>>>print(headings)
[['date', 'description', 'amount'],['16/02/2020', 'march contract', 51.0], ['17/02/2020', 'april contract', 240.0]]
save list to csv
with open("out.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(headings)
You can also use itertools.groupby and sum() for this if you don't mind outputting in sorted form.
from datetime import datetime
from itertools import groupby
import csv
with open(report_file, 'r') as f:
reader = csv.reader(f)
lst = list(reader)[1:]
sorted_input = sorted(lst, key=lambda x : (x[1], datetime.strptime(x[0],'%d/%m/%Y'))) #sort by description and date
groups = groupby(sorted_input, key=lambda x : x[1])
for k,g in groups:
rows = list(g)
total = sum(float(row[2]) for row in rows)
print(f'{rows[-1][0]},{k},{total}') #print last date, description, total
Output:
17/02/2020,april contract,240.0
16/02/2020,march contract,51.0
You may think of this one as another redundant question asked, but I tried to go through all similar questions asked, no luck so far. In my specific use-case, I can't use pandas or any other similar library for this operation.
This is what my input looks like
AttributeName,Value
Name,John
Gender,M
PlaceofBirth,Texas
Name,Alexa
Gender,F
SurName,Garden
This is my expected output
Name,Gender,Surname,PlaceofBirth
John,M,,Texas
Alexa,F,Garden,
So far, I have tried to store my input into a dictionary and then tried writing it to a csv string. But, it is failing as I am not sure how to incorporate missing column values conditions. Here is my code so far
reader = csv.reader(csvstring.split('\n'), delimiter=',')
csvdata = {}
csvfile = ''
for row in reader:
if row[0] != '' and row[0] in csvdata and row[1] != '':
csvdata[row[0]].append(row[1])
elif row[0] != '' and row[0] in csvdata and row[1] == '':
csvdata[row[0]].append(' ')
elif row[0] != '' and row[1] != '':
csvdata[row[0]] = [row[1]]
elif row[0] != '' and row[1] == '':
csvdata[row[0]] = [' ']
for key, value in csvdata.items():
if value == ' ':
csvdata[key] = []
csvfile += ','.join(csvdata.keys()) + '\n'
for row in zip(*csvdata.values()):
csvfile += ','.join(row) + '\n'
For the above code as well, I took some help here. Thanks in advance for any suggestions/advice.
Edit #1 : Update code to imply that I am doing processing on a csv string instead of a csv file.
What you need is something like that:
import csv
with open("in.csv") as infile:
buffer = []
item = {}
lines = csv.reader(infile)
for line in lines:
if line[0] == 'Name':
buffer.append(item.copy())
item = {'Name':line[1]}
else:
item[line[0]] = line[1]
buffer.append(item.copy())
for item in buffer[1:]:
print item
If none of the attributes is mandatory, I think #framontb solution needs to be rearranged in order to work also when Name field is not given.
This is an import-free solution, and it's not super elegant.
I assume you have lines already in this form, with this columns:
lines = [
"Name,John",
"Gender,M",
"PlaceofBirth,Texas",
"Gender,F",
"Name,Alexa",
"Surname,Garden" # modified typo here: SurName -> Surname
]
cols = ["Name", "Gender", "Surname", "PlaceofBirth"]
We need to distinguish one record from another, and without mandatory fields the best I can do is start considering a new record when an attribute has already been seen.
To do this, I use a temporary list of attributes tempcols from which I remove elements until an error is raised, i.e. new record.
Code:
csvdata = {k:[] for k in cols}
tempcols = list(cols)
for line in lines:
attr, value = line.split(",")
try:
csvdata[attr].append(value)
tempcols.remove(attr)
except ValueError:
for c in tempcols: # now tempcols has only "missing" attributes
csvdata[c].append("")
tempcols = [c for c in cols if c != attr]
for c in tempcols:
csvdata[c].append("")
# write csv string with the code you provided
csvfile = ""
csvfile += ",".join(csvdata.keys()) + "\n"
for row in zip(*csvdata.values()):
csvfile += ",".join(row) + "\n"
>>> print(csvfile)
Name,PlaceofBirth,Surname,Gender
John,Texas,,M
Alexa,,Garden,F
While, if you want to sort columns according to your desired output:
csvfile = ""
csvfile += ",".join(cols) + "\n"
for row in zip(*[csvdata[k] for k in cols]):
csvfile += ",".join(row) + "\n"
>>> print(csvfile)
Name,Gender,Surname,PlaceofBirth
John,M,,Texas
Alexa,F,Garden,
This works for me:
with open("in.csv") as infile, open("out.csv", "w") as outfile:
incsv, outcsv = csv.reader(infile), csv.writer(outfile)
incsv.__next__() # Skip 1st row
outcsv.writerows(zip(*incsv))
Update: For input and output as strings:
import csv, io
with io.StringIO(indata) as infile, io.StringIO() as outfile:
incsv, outcsv = csv.reader(infile), csv.writer(outfile)
incsv.__next__() # Skip 1st row
outcsv.writerows(zip(*incsv))
print(outfile.getvalue())
I am very new to Python programming and decided on a small project to learn the language.
Basically I am trying to:
Read the first cell of a CSV file.
Ask if that cell value is "liked".
If liked, write to the column next to the cell on 1., "1".
Else, write "0".
Repeat on next row until end of list.
My code right now:
import csv
reader = csv.reader(open("mylist.csv"), delimiter=',')
data = []
for row in reader:
data.append(row)
ask = (data[0][0])
ans = input("Do you like {}? ".format(ask))
if ans == ("y"):
f = open('mylist.csv', 'r')
reader = csv.reader(f)
data = list(reader)
f.close()
data[0][1] = '1'
my_new_list = open('mylist.csv', 'w', newline='')
csv_writer = csv.writer(my_new_list)
csv_writer.writerows(data)
my_new_list.close()
else:
f = open('mylist.csv', 'r')
reader = csv.reader(f)
data = list(reader)
f.close()
data[0][1] = '0'
my_new_list = open('mylist.csv', 'w', newline='')
csv_writer = csv.writer(my_new_list)
csv_writer.writerows(data)
my_new_list.close()
So basically, I am stuck trying to get the content of the next row.
FYI, I am looking to implement machine learning to this process.
First learning how to do this in a basic manner.
Any help is welcome.
Thank you!
You shouldn't read from and write to the same file/list/dict at the same time. If you do, references to data may change. You can start with something like this for your task. However, note that as the file grows you code becomes slower.
import csv
reader = csv.reader(open("test.csv", 'r'), delimiter=',')
content = []
for row in reader:
item = row[0]
ans = raw_input("Do you like {}? ".format(item))
if ans == 'y':
content.append([item, 1])
else:
content.append([item, 0])
writer = csv.writer(open('test.csv', 'w'))
writer.writerows(content)
In my last work with csv I opened the file so:
import csv
with open(name) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
data.append(row)
If you want the resultant csv file to contain all of the data from the input file but with the question results added in, you could use something like this.
It will insert you answer (0 or 1) after the first item in each record.
import csv
reader = csv.reader(open("mylist.csv", 'r'), delimiter=',')
data = []
for row in reader:
data.append(row)
for row in data:
ans = raw_input("Do you like {}? ".format(row[0]))
if ans == 'y':
row[1:1] = "1"
else:
row[1:1] = "0"
writer = csv.writer(open('myresult.csv', 'w'))
writer.writerows(data)