python csv file add to field based off another field - python

I have a csv file looks like this:
I have a column called “Inventory”, within that column I pulled data from another source and it put it in a dictionary format as you see.
What I need to do is iterate through the 1000+ lines, if it sees the keywords: comforter, sheets and pillow exist than write “bedding” to the “Location” column for that row, else write “home-fashions” if the if statement is not true.
I have been able to just get it to the if statement to tell me if it goes into bedding or “home-fashions” I just do not know how I tell it to write the corresponding results to the “Location” field for that line.
In my script, im printing just to see my results but in the end I just want to write to the same CSV file.
from csv import DictReader
with open('test.csv', 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
for line in csv_dict_reader:
if 'comforter' in line['Inventory'] and 'sheets' in line['Inventory'] and 'pillow' in line['Inventory']:
print('Bedding')
print(line['Inventory'])
else:
print('home-fashions')
print(line['Inventory'])

The last column of your csv contains commas. You cannot read it using DictReader.
import re
data = []
with open('test.csv', 'r') as f:
# Get the header row
header = next(f).strip().split(',')
for line in f:
# Parse 4 columns
row = re.findall('([^,]*),([^,]*),([^,]*),(.*)', line)[0]
# Create a dictionary of one row
item = {header[0]: row[0], header[1]: row[1], header[2]: row[2],
header[3]: row[3]}
# Add each row to the list
data.append(item)
After preparing your data, you can check with your conditions.
for item in data:
if all([x in item['Inventory'] for x in ['comforter', 'sheets', 'pillow']]):
item['Location'] = 'Bedding'
else:
item['Location'] = 'home-fashions'
Write output to a file.
import csv
with open('output.csv', 'w') as f:
dict_writer = csv.DictWriter(f, data[0].keys())
dict_writer.writeheader()
dict_writer.writerows(data)

csv.DictReader returns a dict, so just assign the new value to the column:
if 'comforter' in line['Inventory'] and ...:
line['Location'] = 'Bedding'
else:
line['Location'] = 'home-fashions'
print(line['Inventory'])

Related

Python If Statement and Lists

i'm fairly new to python and am looking for some help. What i would like to do is read a csv file and then use a for loop with an if statement to locate at rows of that data contain a value and print it out with a header and some formatting using f'.
The issue i seem to have it when finding the data using the if statement, im unsure what i can output the data to, which will then enable it to be printed out (the search output could contain multiple rows and columns):
with open(r'data.csv', 'r') as csv_file:
# loop through the csv file using for loop
for row in csv_file:
# search each row of data for the input from the user
if panel_number in row:
??
Use the csv module. Then in your if statement you can append the row to a list of matches
import csv
matched_rows = []
with open(r'data.csv', 'r') as file:
file.readline() # skip over header line -- remove this if there's no header
csv_file = csv.reader(file)
for row in csv_file:
# search each row of data for the input from the user
if row[0] == panel_number:
matched_rows.append(row)
print(matched_rows)

Removing the end of line character from a read csv file

I tried sever times to use strip() but I can't get it to work.
I removed that piece from this snip but every time I tried it I had
an error or it did nothing. The sort is fine I just want to strip the newline before writing to the new file?
import sys, csv, operator
data = csv.reader(open('tickets.csv'),delimiter=',')
sortedlist = sorted(data, key=operator.itemgetter(6))
# 0 specifies according to first column we want to sort
#now write the sort result into new CSV file
with open("newfiles.csv", "w") as f:
#writablefile = csv.writer(f)
fileWriter = csv.writer(f, delimiter=',')
for row in sortedlist:
#print(row)
lst = (row)
fileWriter.writerow(lst)
You need to add newline='' to your open() when writing a CSV file. This is explained in the documentation. Without it, your file can end up having a blank line per row.
import sys, csv, operator
data = csv.reader(open('tickets.csv'),delimiter=',')
header = next(data)
sortedlist = sorted(data, key=operator.itemgetter(6))
# 0 specifies according to first column we want to sort
#now write the sort result into a new CSV file
with open("newfiles.csv", "w", newline="") as f:
fileWriter = csv.writer(f)
fileWriter.writerow(header) # keep the header at the top
fileWriter.writerows(sortedlist)
Also you need to first read in the header row before loading everything for sorting. This avoids it being sorted. It can then be output separately when writing your sorted output CSV.
If your tickets.csv file contains blank lines, you would need to also remove these. For example:
for row in sortedList:
if row:
fileWriter.writerow(row)

getting average of some digits from a csv file as input and Write the averages in an output csv file in python 3

I am learning python3 :), and I am trying to read a CSV file with different rows
and take the average of the scores for each person(in each row)
and write it in a CSV file as an output in python 3.
The input file is like below:
David,5,2,3,1,6
Adele,3,4,1,5,2,4,2,1
...
The output file should seem like below:
David,4.75
Adele,2.75
...
It seems that I am reading the file correctly, as I print
the average for each name in the terminal, but in CSV
output file it prints only the average of the last name
of the input file, while I want to print all names and
corresponding averages in CSV output file.
Anybody can help me with it?
import csv
from statistics import mean
these_grades = []
name_list = []
reader = csv.reader(open('input.csv', newline=''))
for row in reader:
name = row[0]
name_list.append(name)
with open('result.csv', 'w', newline='\n') as f:
writer = csv.writer(f,
delimiter=',',
quotechar='"',
quoting=csv.QUOTE_MINIMAL)
for grade in row[1:]:
these_grades.append(int(grade))
for item in name_list:
writer.writerow([''.join(item), mean(these_grades)])
print('%s,%f' % (name , mean(these_grades)))
There are several issues in your code:
You're not using a context manager (with) when you read the input file. There's no reason to use it when writing but not when reading - you consequently don't close the "input.csv" file
You're using a list to store data from rows. This doesn't easily distinguish between the person's name and the scores associated with the person. It would be better to use a dictionary in which the key is the person's name, and the values stored against that key are the individual scores
You repeatedly open the file within a for loop in 'w' mode. Every time you open a file in write mode, it just wipes all the previous contents. You actually do write each row to the file, but you just wipe it again when you open the file on the next iteration.
You can use:
import csv
import statistics
# use a context manager to read the data in too, not just for writing
with open('input.csv') as infile:
reader = csv.reader(infile)
data = list(reader)
# Create a dictionary to store the scores against the name
scores = {}
for row in data:
scores[row[0]] = row[1:] # First item in the row is the key (name) and the rest is values
with open('output.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
# Now we need to iterate the dictionary and average the score on each iteration
for name, scores in scores.items():
ave_score = statistics.mean([int(item) for item in scores])
writer.writerow([name, ave_score])
This can be further consolidated, but it's less easy to see what's happening:
with open('input.csv') as infile, open('output.csv', 'w', newline='') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
name = row[0]
values = row[1:]
ave_score = statistics.mean(map(int, values))
writer.writerow([name, ave_score])

Create new CSV that excludes rows from old CSV

I need guidance on code to write a CSV file that drops rows with specific numbers in the first column [0]. My script writes a file, but it contains the rows that I am working to delete. I suspect that I may have an issue with the spreadsheet being read as one long string rather than ~150 rows.
import csv
Property_ID_To_Delete = {4472738, 4905985, 4905998, 4678278, 4919702, 4472936, 2874431, 4949190, 4949189, 4472759, 4905977, 4905995, 4472934, 4905982, 4906002, 4472933, 4905985, 4472779, 4472767, 4472927, 4472782, 4472768, 4472750, 4472769, 4472752, 4472748, 4472751, 4905989, 4472929, 4472930, 4472753, 4933246, 4472754, 4472772, 4472739, 4472761, 4472778}
with open('2015v1.csv', 'rt') as infile:
with open('2015v1_edit.csv', 'wt') as outfile:
writer = csv.writer(outfile)
for row in csv.reader(infile):
if row[0] != Property_ID_To_Delete:
writer.writerow(row)
Here is the data:
https://docs.google.com/spreadsheets/d/19zEMRcir_Impfw3CuexDhj8PBcKPDP46URZ9OA3uV9w/edit?usp=sharing
You need to check if an id, converted into an integer as you set as integers,
is contained in the ids to delete.
Write the line only if its not contained. You compare the id in the
first column with the whole set of ids to be deleted. A string is always
not equal to a set:
>>> '1' != {1}
True
Therefore, you get all rows in your output.
Change:
if row[0] != Property_ID_To_Delete:
into:
if int(row[0]) not in Property_ID_To_Delete:
EDIT
You need tow write the header of your infile first before trying to convert the first column entry into an integer:
with open('2015v1.csv', 'rt') as infile:
with open('2015v1_edit.csv', 'wt') as outfile:
writer = csv.writer(outfile)
reader = csv.reader(infile)
writer.writerow(next(reader))
for row in reader:
if int(row[0]) not in Property_ID_To_Delete:
writer.writerow(row)

parsing text file with JSON-like object into CSV

I have a text file containing key-value pairs, with the last two key-value pairs containing JSON-like objects that I would like to split out into columns and write with the other values, using the keys as column headings. The first three rows of the data file input.txt look like this:
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::44.6743867864386,Length3dCenterToCenter::44.6768028159989,Tag::<NULL>,{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.53962362760298}
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::57.8689351603823,Length3dCenterToCenter::57.8700464193429,Tag::<NULL>,{StartPoint::7793.52927597915[%2C]1680.91224357457[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.43363070193163}
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::68.7161350545728,Length3dCenterToCenter::68.7172034962765,Tag::<NULL>,{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7793.52927597915[%2C]1680.91224357457[%2C]-3.45819643838485}
and we eventually came up with something that worked, but there must be a much better way:
import csv
with open('input.txt', 'rb') as fin, open('output.csv', 'wb') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for i, line in enumerate(reader):
mysplit = [item.split('::') for item in line if item.strip()]
if not mysplit: # blank line
continue
keys, vals = zip(*mysplit)
start_vals = [item.split('[%2C]') for item in mysplit[-2]]
end_vals = [item.split('[%2C]') for item in mysplit[-1]]
a=list(keys[0:-2])
a.extend(['start1','start2','start3','end1','end2','end3'])
b=list(vals[0:-2])
b.append(start_vals[1][0])
b.append(start_vals[1][1])
b.append(start_vals[1][2][:-1])
b.append(end_vals[1][0])
b.append(end_vals[1][1])
b.append(end_vals[1][2][:-1])
if i == 0:
# if first line: write header
writer.writerow(a)
writer.writerow(b)
which produces the output file output.csv that looks like this
InnerDiameterOrWidth,InnerHeight,Length2dCenterToCenter,Length3dCenterToCenter,Tag,start1,start2,start3,end1,end2,end3
0.1,0.1,44.6743867864386,44.6768028159989,<NULL>,7858.35924983374,1703.69341358077,-3.075,7822.85045874375,1730.80294308742,-3.53962362760298
0.1,0.1,57.8689351603823,57.8700464193429,<NULL>,7793.52927597915,1680.91224357457,-3.075,7822.85045874375,1730.80294308742,-3.43363070193163
0.1,0.1,68.7161350545728,68.7172034962765,<NULL>,7858.35924983374,1703.69341358077,-3.075,7793.52927597915,1680.91224357457,-3.45819643838485
We don't want to write code like this in the future.
What is the best way to read data like this?
I'd use:
from itertools import chain
import csv
_header_translate = {
'StartPoint': ('start1', 'start2', 'start3'),
'EndPoint': ('end1', 'end2', 'end3')
}
def header(col):
header = col.strip('{}').split('::', 1)[0]
return _header_translate.get(header, (header,))
def cleancolumn(col):
col = col.strip('{}').split('::', 1)[1]
return col.split('[%2C]')
def chainedmap(func, row):
return list(chain.from_iterable(map(func, row)))
with open('input.txt', 'rb') as fin, open('output.csv', 'wb') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for i, row in enumerate(reader):
if not i: # first row, write header first
writer.writerow(chainedmap(header, row))
writer.writerow(chainedmap(cleancolumn, row))
The cleancolumn method takes any of your columns and returns a tuple (possibly with only one value) after removing the braces, removing everything before the first :: and splitting on the embedded 'comma'. By using itertools.chain.from_iterable() we turn the series of tuples generated from the columns into one list again for the csv writer.
When handling the first line we generate one header row from the same columns, replacing the StartPoint and EndPoint headers with the 6 expanded headers.

Categories

Resources