Let's say I have a dictionary:
dict = {'R150': 'PN000123', 'R331': 'PN000873', 'C774': 'PN000064', 'L7896': 'PN000447', 'R0640': 'PN000878', 'R454': 'PN000333'}.
I need to fill in this sample csv file: https://www.dropbox.com/s/c95mlitjrvyppef/sheet.csv
example rows
HEADER,ID,ReferenceID,Value,Location X-Coordinate,Location Y-Coordinate,ROOM,ALT_SYMBOLS,Voltage,Thermal_Rating,Tolerance,PartNumber,MPN,Description,Part_Type,PCB Footprint,SPLIT_INST,SWAP_INFO,GROUP,Comments,Wattage,Tol,Population Notes,Gender,ICA_MFR_NAME,ICA_PARTNUM,Order#,CLASS,INSTALLED,TN,RATING,OriginalSymbolOrigin,Rated_Current,Manufacturer 2,Status,Need To Mirror/Rotate Pin Display Properties,TOLERANCE,LEVEL
,,R150,1,,,,,<null>,<null>,<null>,,,to be linked,Resistor,TODO,<null>,<null>,<null>,<null>,1/16W,?,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>
,,R4737,1,,,,,<null>,<null>,<null>,,,to be linked,Resistor,TODO,<null>,<null>,<null>,<null>,1/16W,?,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>
,,R4738,1,,,,,<null>,<null>,<null>,,,to be linked,Resistor,TODO,<null>,<null>,<null>,<null>,1/16W,?,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>
Specifically, I need to fill in the PartNumber column based on the keys of the dict I created. So I need to iterate through column ReferenceID and compare that value to my keys in dict. If there is a match I need to fill in the corresponding PartNumber cell with that value (in the dict)....
I'm sorry if this is all confusing I am new to python and am having trouble with the csv module.
To get you started - here's something that uses the csv.DictReader object, and loops through your file, one row at a time, and where something exists in your_dict based on the row's ReferenceID sets the PartNumber to that value, otherwise to an empty string.
If you use this in conjuction with the docs at http://docs.python.org/2/library/stdtypes.html#typesmapping and http://docs.python.org/2/library/csv.html - you should be able to write out the data and better understand what's happening.
import csv
your_dict = {'R150': 'PN000123', 'R331': 'PN000873', 'C774': 'PN000064', 'L7896': 'PN000447', 'R0640': 'PN000878', 'R454': 'PN000333'}
with open('your_csv_file.csv') as fin:
csvin = csv.DictReader(fin)
for row in csvin:
row['PartNumber'] = your_dict.get(row['ReferenceID'], '')
print row
Related
I'm new in Python language and i'm facing a small challenge in which i havent been able to figure it out so far.
I receive a csv file with around 30-40 columns and 5-50 rows with various details in each cell. The 1st row of the csv has the title for each column and by the 2nd row i have item values.
What i want to do is to create a python script which will read the csv file and every time to do the following:
Add a row after the actual 1st item row, (literally after the 2nd row, cause the 1st row is titles), and in that new 3rd row to contain the same information like the above one with one difference only. in the column "item_subtotal" i want to add the value from the column "discount total".
all the bellow rows should remain as they are, and save this modified csv as a new file with the word "edited" added in the file name.
I could really use some help because so far i've only managed to open the csv file with a python script im developing, but im not able so far to add the contents of the above row to that newly created row and replace that specific value.
Looking forward any help.
Thank you
Here Im attaching the CSV with some values changed for privacy reasons.
order_id,order_number,date,status,shipping_total,shipping_tax_total,fee_total,fee_tax_total,tax_total,discount_total,order_total,refunded_total,order_currency,payment_method,shipping_method,customer_id,billing_first_name,billing_last_name,billing_company,billing_email,billing_phone,billing_address_1,billing_address_2,billing_postcode,billing_city,billing_state,billing_country,shipping_first_name,shipping_last_name,shipping_address_1,shipping_address_2,shipping_postcode,shipping_city,shipping_state,shipping_country,shipping_company,customer_note,item_id,item_product_id,item_name,item_sku,item_quantity,item_subtotal,item_subtotal_tax,item_total,item_total_tax,item_refunded,item_refunded_qty,item_meta,shipping_items,fee_items,tax_items,coupon_items,order_notes,download_permissions_granted,admin_custom_order_field:customer_type_5
15001_TEST_2,,"2017-10-09 18:53:12",processing,0,0.00,0.00,0.00,5.36,7.06,33.60,0.00,EUR,PayoneCw_PayPal,"0,00",0,name,surname,,name.surname#gmail.com,0123456789,"address 1",,41541_TEST,location,,DE,name,surname,address,01245212,14521,location,,DE,,,1328,302,"product title",103,1,35.29,6.71,28.24,5.36,0.00,0,,"id:1329|method_id:free_shipping:3|method_title:0,00|total:0.00",,id:1330|rate_id:1|code:DE-MWST-1|title:MwSt|total:5.36|compound:,"id:1331|code:#getgreengent|amount:7.06|description:Launchcoupon for friends","text string",1,
You can also use pandas to manipulate the data from the csv like this:
import pandas
import copy
Read the csv file into a pandas dataframe:
df = pandas.read_csv(filename)
Make a deepcopy of the first row of data and add the discount total to the item subtotal:
new_row = copy.deepcopy(df.loc[1])
new_row['item_subtotal'] += new_row['discount total']
Concatenate the first 2 rows with the new row and then everything after that:
df = pandas.concat([df.loc[:1], new_row, df.loc[2:]], ignore_index=True)
Change the filename and write the out the new csv file:
filename = filename.strip('.csv') + 'edited.csv'
df.to_csv(filename)
I hope this helps! Pandas is great for cleanly handling massive amounts of data, but may be overkill for what you are trying to do. Then again, maybe not. It would help to see an example data file.
The first step is to turn that .csv into something that is a little easier to work with. Fortunately, python has the 'csv' module which makes it easy to turn your .csv file into a much nicer list of lists. The below will give you a way to both turn your .csv into a list of lists and turn the modified data back into a .csv file.
import csv
import copy
def csv2list(ifile):
"""
ifile = the path of the csv to be converted into a list of lists
"""
f = open(ifile,'rb')
olist=[]
c = csv.reader(f, dialect='excel')
for line in c:
olist.append(line) #and update the outer array
f.close
return olist
#------------------------------------------------------------------------------
def list2csv(ilist,ofile):
"""
ilist = the list of lists to be converted
ofile = the output path for your csv file
"""
with open(ofile, 'wb') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=',',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
[csvwriter.writerow(x) for x in ilist]
Now, you can simply copy list[1] and change the appropriate element to reflect your summed value using:
listTemp = copy.deepcopy(ilist[1])
listTemp[n] = listTemp[n] + listTemp[n-x]
ilist.insert(2,listTemp)
As for how to change the file name, just use:
import os
newFileName = os.path.splitext(oldFileName)[0] + "edited" + os.path.splitext(oldFileName)[1]
Hopefully this will help you out!
Here's an example of the contents of my CSV file:
Fruit, colour, ripe,
apple, green,,
banana, yellow,,
pineapple, green,,
plum, purple,,
I want to loop through the contents of the CSV file and according to a test (extrinsic to the CSV data, using an input value supplied to the enclosing function), end up with something like this:
Fruit, colour, ripe,
apple, green, true,
banana, yellow,,
pineapple, green,,
plum, purple, true,
My current code looks like this:
csv_data = csv.reader(open('./data/fruit_data.csv', 'r'))
for row in csv_data:
fruit = row[0]
if fruit == input:
# Here, write 'true' in the 'ripe' column.
It's easy enough to add new data in one go, using the CSV module or pandas, but here I need to add the data iteratively. It seems that I can't change the CSV file in place(?), but if I write out to a different CSV file, it's going to overwrite on each match within the loop, so it'll only reflect that value.
You have, basically, two approaches:
1- Open a second text file before your loop then loop through each row of the initial file and append rows to the second file. After all rows are done, close the initial file. Example: How do you append to a file?
2- Read in everything from the initial csv. Then make changes to the object you created (I highly recommend using Pandas for this). Then write out to a csv. Here's an example of that method:
import pandas as pd
import numpy as np
# read in the csv
csv_data = pd.read_csv('./data/fruit_data.csv')
# I'm partial to the numpy where logic when creating a new column based
# on if/then logic on an existing column
csv_data['ripe'] = np.where(csv_data['fruit']==input, True, False)
# write out the csv
csv_data.to_csv('./data/outfile.csv')
The choice between 1 and 2 should really come down to scale. If your csv is so big that you can't read it all in and manipulate it the way you want, then you should molest it line by line. If you can read the whole thing in and then manipulate it with Pandas, your life will be MUCH easier.
If you want to create a new CSV file
csv_data = csv.reader(open('./Desktop/fruit_data.csv', 'r'))
csv_new = csv.writer(open('./Desktop/fruit_new_data.csv', 'w'))
for row in csv_data:
fruit = row[0]
if fruit == input:
row.append("ripe")
csv_new.writerow(row)
else:
csv_new.writerow(row)
Basically the only thing missing in your previous question is the last statment, which is to write, else is add in case the criteria does not match.
Another possibility could be to use linestartswith
If you create a temporary file, you can write your rows as you read them. If you use os.rename on Unix, "the renaming will be an atomic operation":
import csv
import os
def update_fruit_data(input):
csv_file_name = 'data/fruit_data.csv'
tmp_file_name = "%s.tmp" % csv_file_name
# Update fruit data
with open(csv_file_name, 'r') as csv_input_file:
csv_reader = csv.reader(csv_input_file)
with open(tmp_file_name, 'w') as csv_output_file:
csv_writer = csv.writer(csv_output_file)
for row in csv_reader:
fruit = row[0]
if fruit == input:
row[2] = 'true'
csv_writer.writerow(row)
# Rename tmp file to csv file
os.rename(tmp_file_name, csv_file_name)
while True:
input = get_input()
update_fruit_data(input)
The get_input here is a stand-in for whatever you use to get the value of input.
In order to make the changes, you have to add your new data to a location such as a list. This list will contain the results of your processing.
fruit_details= list()
csv_data = csv.reader(open('./data/fruit_data.csv', 'r'))
for row in csv_data:
fruit = row[0]
if fruit == input:
fruit_details.append([row[0],row[1],'true'])
The resulting list fruit_details, will then contain the fruits with true in the "ripe" column. If you one to append the non-fruit items, add and else statement which either puts false or row[2] as necessary.
I have a dictionary I created from a csv file and would like to use this dict to update the values in a specific column of a different csv file called sheet2.csv.
Sheet2.csv has many columns with different headers and I need to only update the column PartNumber based on my key value pairs in my dict.
My question is how would I use the keys in dict to search through sheet2.csv and update/write to only the column PartNumber with the appropriate value?
I am new to python so I hope this is not too confusing and any help is appreciated!
This is the code I used to create the dict:
import csv
a = open('sheet1.csv', 'rU')
csvReader = csv.DictReader(a)
dict = {}
for line in csvReader:
dict[line["ReferenceID"]] = line["PartNumber"]
print(dict)
dict = {'R150': 'PN000123', 'R331': 'PN000873', 'C774': 'PN000064', 'L7896': 'PN000447', 'R0640': 'PN000878', 'R454': 'PN000333'}
To make things even more confusing, I also need to make sure that already existing rows in sheet2 remain unchanged. For example, if there is a row with ReferenceID as R1234 and PartNumber as PN000000, it should stay untouched. So I would need to skip rows which are not in my dict.
Link to sample CSVs:
http://dropbox.com/s/zkagunnm0xgroy5/Sheet1.csv
http://dropbox.com/s/amb7vr48mdc94v6/Sheet2.csv
EDIT: Let me rephrase my question and provide a better example csvfile.
Let's say I have a Dict = {'R150': 'PN000123', 'R331': 'PN000873', 'C774': 'PN000064', 'L7896': 'PN000447', 'R0640': 'PN000878', 'R454': 'PN000333'}.
I need to fill in this csv file: https://www.dropbox.com/s/c95mlitjrvyppef/sheet.csv
Specifically, I need to fill in the PartNumber column using the keys of the dict I created. So I need to iterate through column ReferenceID and compare that value to my keys in dict. If there is a match I need to fill in the corresponding PartNumber cell with that value.... I'm sorry if this is all confusing!
The code below should do the trick. It first builds a dictionary just like your code and then moves on to read Sheet2.csv row by row, possibly updating the part number. The output goes to temp.csv which you can compare with the inital Sheet2.csv. In case you want to overwrite Sheet2.csv with the contents of temp.csv, simply uncomment the line with shutil.move.
Note that the sample files you provided do not contain any updateable data, so Sheet2.csv and temp.csv will be identical. I tested this with a slightly modified Sheet1.csv where I made sure that it actually contains a reference ID used by Sheet2.csv.
import csv
import shutil
def createReferenceIdToPartNumberMap(csvToReadPath):
result = {}
print 'read part numbers to update from', csvToReadPath
with open(csvToReadPath, 'rb') as csvInFile:
csvReader = csv.DictReader(csvInFile)
for row in csvReader:
result[row['ReferenceID']] = row['PartNumber']
return result
def updatePartNumbers(csvToUpdatePath, referenceIdToPartNumberMap):
tempCsvPath = 'temp.csv'
print 'update part numbers in', csvToUpdatePath
with open(csvToUpdatePath, 'rb') as csvInFile:
csvReader = csv.reader(csvInFile)
# Figure out which columns contain the reference ID and part number.
titleRow = csvReader.next()
referenceIdColumn = titleRow.index('ReferenceID')
partNumberColumn = titleRow.index('PartNumber')
# Write tempoary CSV file with updated part numbers.
with open(tempCsvPath, 'wb') as tempCsvFile:
csvWriter = csv.writer(tempCsvFile)
csvWriter.writerow(titleRow)
for row in csvReader:
# Check if there is an updated part number.
referenceId = row[referenceIdColumn]
newPartNumber = referenceIdToPartNumberMap.get(referenceId)
# If so, update the row just read accordingly.
if newPartNumber is not None:
row[partNumberColumn] = newPartNumber
print ' update part number for %s to %s' % (referenceId, newPartNumber)
csvWriter.writerow(row)
# TODO: Move the temporary CSV file over the initial CSV file.
# shutil.move(tempCsvPath, csvToUpdatePath)
if __name__ == '__main__':
referenceIdToPartNumberMap = createReferenceIdToPartNumberMap('Sheet1.csv')
updatePartNumbers('Sheet2.csv', referenceIdToPartNumberMap)
The purpose of my Python script is to compare the data present in multiple CSV files, looking for discrepancies. The data are ordered, but the ordering differs between files. The files contain about 70K lines, weighing around 15MB. Nothing fancy or hardcore here. Here's part of the code:
def getCSV(fpath):
with open(fpath,"rb") as f:
csvfile = csv.reader(f)
for row in csvfile:
allRows.append(row)
allCols = map(list, zip(*allRows))
Am I properly reading from my CSV files? I'm using csv.reader, but would I benefit from using csv.DictReader?
How can I create a list containing whole rows which have a certain value in a precise column?
Are you sure you want to be keeping all rows around? This creates a list with matching values only... fname could also come from glob.glob() or os.listdir() or whatever other data source you so choose. Just to note, you mention the 20th column, but row[20] will be the 21st column...
import csv
matching20 = []
for fname in ('file1.csv', 'file2.csv', 'file3.csv'):
with open(fname) as fin:
csvin = csv.reader(fin)
next(csvin) # <--- if you want to skip header row
for row in csvin:
if row[20] == 'value':
matching20.append(row) # or do something with it here
You only want csv.DictReader if you have a header row and want to access your columns by name.
This should work, you don't need to make another list to have access to the columns.
import csv
import sys
def getCSV(fpath):
with open(fpath) as ifile:
csvfile = csv.reader(ifile)
rows = list(csvfile)
value_20 = [x for x in rows if x[20] == 'value']
If I understand the question correctly, you want to include a row if value is in the row, but you don't know which column value is, correct?
If your rows are lists, then this should work:
testlist = [row for row in allRows if 'value' in row]
post-edit:
If, as you say, you want a list of rows where value is in a specified column (specified by an integer pos, then:
testlist = []
pos = 20
for row in allRows:
testlist.append([element if index != pos else 'value' for index, element in enumerate(row)])
(I haven't tested this, but let me now if that works).
Here's a sample csv file
id, serial_no
2, 500
2, 501
2, 502
3, 600
3, 601
This is the output I'm looking for (list of serial_no withing a list of ids):
[2, [500,501,502]]
[3, [600, 601]]
I have implemented my solution but it's too much code and I'm sure there are better solutions out there. Still learning Python and I don't know all the tricks yet.
file = 'test.csv'
data = csv.reader(open(file))
fields = data.next()
for row in data:
each_row = []
each_row.append(row[0])
each_row.append(row[1])
zipped_data.append(each_row)
for rec in zipped_data:
if rec[0] not in ids:
ids.append(rec[0])
for id in ids:
for rec in zipped_data:
if rec[0] == id:
ser_no.append(rec[1])
tmp.append(id)
tmp.append(ser_no)
print tmp
tmp = []
ser_no = []
**I've omitted var initializing for simplicity of code
print tmp
Gives me output I mentioned above. I know there's a better way to do this or pythonic way to do it. It's just too messy! Any suggestions would be great!
from collections import defaultdict
records = defaultdict(list)
file = 'test.csv'
data = csv.reader(open(file))
fields = data.next()
for row in data:
records[row[0]].append(row[1])
#sorting by ids since keys don't maintain order
results = sorted(records.items(), key=lambda x: x[0])
print results
If the list of serial_nos need to be unique just replace defaultdict(list) with defaultdict(set) and records[row[0]].append(row[1]) with records[row[0]].add(row[1])
Instead of a list, I'd make it a collections.defaultdict(list), and then just call the append() method on the value.
result = collections.defaultdict(list)
for row in data:
result[row[0]].append(row[1])
Here's a version I wrote, looks like there are plenty of answers for this one already though.
You might like using csv.DictReader, gives you easy access to each column by field name (from the header / first line).
#!/usr/bin/python
import csv
myFile = open('sample.csv','rb')
csvFile = csv.DictReader(myFile)
# first row will be used for field names (by default)
myData = {}
for myRow in csvFile:
myId = myRow['id']
if not myData.has_key(myId): myData[myId] = []
myData[myId].append(myRow['serial_no'])
for myId in sorted(myData):
print '%s %s' % (myId, myData[myId])
myFile.close()
Some observations:
0) file is a built-in (a synonym for open), so it's a poor choice of name for a variable. Further, the variable actually holds a file name, so...
1) The file can be closed as soon as we're done reading from it. The easiest way to accomplish that is with a with block.
2) The first loop appears to go over all the rows, grab the first two elements from each, and make a list with those results. However, your rows already all contain only two elements, so this has no net effect. The CSV reader is already an iterator over rows, and the simple way to create a list from an iterator is to pass it to the list constructor.
3) You proceed to make a list of unique ID values, by manually checking. A list of unique things is better known as a set, and the Python set automatically ensures uniqueness.
4) You have the name zipped_data for your data. This is telling: applying zip to the list of rows would produce a list of columns - and the IDs are simply the first column, transformed into a set.
5) We can use a list comprehension to build the list of serial numbers for a given ID. Don't tell Python how to make a list; tell it what you want in it.
6) Printing the results as we get them is kind of messy and inflexible; better to create the entire chunk of data (then we have code that creates that data, so we can do something else with it other than just printing it and forgetting it).
Applying these ideas, we get:
filename = 'test.csv'
with open(filename) as in_file:
data = csv.reader(in_file)
data.next() # ignore the field labels
rows = list(data) # read the rest of the rows from the iterator
print [
# We want a list of all serial numbers from rows with a matching ID...
[serial_no for row_id, serial_no in rows if row_id == id]
# for each of the IDs that there is to match, which come from making
# a set from the first column of the data.
for id in set(zip(*rows)[0])
]
We can probably do even better than this by using the groupby function from the itertools module.
example using itertools.groupby. This only works if the rows are already grouped by id
from csv import DictReader
from itertools import groupby
from operator import itemgetter
filename = 'test.csv'
# the context manager ensures that infile is closed when it goes out of scope
with open(filename) as infile:
# group by id - this requires that the rows are already grouped by id
groups = groupby(DictReader(infile), key=itemgetter('id'))
# loop through the groups printing a list for each one
for i,j in groups:
print [i, map(itemgetter(' serial_no'), list(j))]
note the space in front of ' serial_no'. This is because of the space after the comma in the input file