i want to add up lines in a csv file (It's a BOM) if they are identical and in the same part, but not if they are a specific type.
Here is the example to make it more clear:
LevelName,Type,Amount
Part_1,a,1
Part_1,a,1
Part_1,b,1
Part_1,c,1
Part_1,d,1
Part_1,f,1
Part_2,a,1
Part_2,c,1
Part_2,d,1
Part_2,a,1
Part_2,a,1
Part_2,d,1
Part_2,d,1
So i need to some up all Types within a Part but not if the type is 'd'.
Result should look like this:
LevelName,Type,Amount
Part_1,a,2
Part_1,b,1
Part_1,c,1
Part_1,d,1
Part_1,f,1
Part_2,a,3
Part_2,c,1
Part_2,d,1
Part_2,d,1
Part_2,d,1
unfortunatly i can not use any external lib. so pandas is no option here.
That is how far i got:
import csv
map = {}
with open('infile.csv', 'rt') as f:
reader = csv.reader(f, delimiter = ',')
with open('outfile.csv', 'w', newline='') as fout:
writer = csv.writer(fout, delimiter=';', quoting=csv.QUOTE_MINIMAL)
writer.writerow(next(reader))
for row in reader:
(level, type, count) = row
if not type=='d':
Well, here i just don't get ahead...
Thanks for any hint!
Ok sorry about using pandas. Then first read the file saving the results in a defaultdict.
from collections import defaultdict
grouped = defaultdict(int)
if not type=='d':
grouped[(level, type)] += int(count)
Then you can save the result of that dict to a file
import csv
import os
cwd = os.getcwd()
master = {}
file = csv.DictReader(open(cwd+'\\infile.csv', 'rb'), delimiter=',')
data = [row for row in file]
for row in data:
master.setdefault(row['LevelName'], {})
if row['Type'] != 'd':
master[row['LevelName']].setdefault(row['Type'], 0)
master[row['LevelName']][row['Type']] += int(row['Amount'])
print (master)
Not as simple as the soloution above but this shows how to iterate over the data
OR i suppose you could concatenate the 'LevelName' and the 'Type' so that you have one less line of code. It depends on what you what you want.
for row in data:
if row['Type'] != 'd':
master.setdefault(row['LevelName'] + row['Type'], 0)
master[row['LevelName'] + row['Type']] += int(row['Amount'])
print (master)
EDIT
to write back to original format something like:
out = open(cwd+'\\outfile.csv', 'wb')
out.write('LevelName,Type,Amount\n')
for k,v in master.iteritems():
for z in v:
out.write('%s,%s,%s\n' % (k, z, str(v[z])))
Related
Id like to group data in a .csv file. My data is like the following:
code,balance
CN,999.99
CN,1.01
LS,177.77
LS,69.42
LA,200.43
WO,100
I would like to group the items by code and sum up the balances of the like codes. Desired output would be:
code,blance
CN,1001
LS,247.19
...
I was originaly using Pandas for this task but will not have a package available to put that library on a server.
mydata = pd.read_csv('./tmp/temp.csv')
out = mydata.groupby('code').sum()
Solutions would preferably be compatible with Python 2.6.
I apologize if this is a duplicate, the other posts seem to be grouping differently.
I would also like to avoid doing this in a -
if code = x
add balance to x_total
-kind of way
MY SOLUTION:
def groupit():
groups = defaultdict(list)
with open('tmp.csv') as fd:
reader = csv.DictReader(fd)
for row in reader:
groups[row['code']].append(float(row['balance.']))
total={key:sum(groups[key]) for key in groups}
total=str(total)
total=total.replace(' ','')
total=total.replace('{','')
total=total.replace('}','')
total=total.replace("'",'')
total=total.replace(',','\n')
total=total.replace(':',',')
outfile = open('out.csv','w+')
outfile.write('code,balance\n')
outfile.write(total)
Python > 2.6:
from collections import defaultdict
import csv
groups = defaultdict(list)
with open('text.txt') as fd:
reader = csv.DictReader(fd)
for row in reader:
groups[row['code']].append(float(row['balance']))
totals = {key: sum(groups[key]) for key in groups}
print(totals)
This outputs:
{'CN': 1001.0, 'LS': 247.19, 'LA': 200.43, 'WO': 100.0}
Python = 2.6:
from collections import defaultdict
import csv
groups = defaultdict(list)
with open('text.txt') as fd:
reader = csv.DictReader(fd)
for row in reader:
groups[row['code']].append(float(row['balance']))
totals = dict((key, sum(groups[key])) for key in groups)
print(totals)
Here is how I will go about it:
with open("data.csv", 'r') as f:
data = f.readlines()
result = {}
for val in range(1, len(data)-1):
x = data[val].split(",")
if x[0] not in result:
result[x[0]] = float(x[1].replace('\n', ""))
else:
result[x[0]] = result[x[0]] + float(x[1].replace('\n', ""))
result dictionary will have the values of interest which can then be saves as csv.
import csv
with open('mycsvfile.csv', 'wb') as f: # Just use 'w' mode in 3.x
w = csv.DictWriter(f, result.keys())
w.writeheader()
w.writerow(result)
Hope this helps :)
I am trying to simply import a .csv into Python. I've read numerous documents but for the life of me I can't figure out how to do the following.
The CSV format is as follows
NYC,22,55
BOSTON,39,22
I'm trying to generate the following : {NYC = [22,55], BOSTON = [39,22]} so that I can call i[0] and i[1] in a loop for each variable.
I've tried
import csv
input_file = csv.DictReader(open("C:\Python\Sandbox\longlat.csv"))
for row in input_file:
print(row)
Which prints my variables, but I dont know hot to nest two numeric values within the city name and generate the list that im hoping to get.
Thanks for your help, sorry for my rookie question -
If you are not familiar with python comprehensions, you can use the following code that uses a for loop:
import csv
with open('C:\Python\Sandbox\longlat.csv', 'r') as f:
reader = csv.reader(f)
result = {}
for row in reader:
result[row[0]] = row[1:]
The previous code works if you want the numbers to be string, if you want them to be numbers use:
import csv
with open('C:\Python\Sandbox\longlat.csv', 'r') as f:
reader = csv.reader(f)
result = {}
for row in reader:
result[row[0]] = [int(e) for i in row[1:]] # float instead of int is also valid
Use dictionary comprehension:
import csv
with open(r'C:\Python\Sandbox\longlat.csv', mode='r') as csvfile:
csvread = csv.reader(csvfile)
result = {k: [int(c) for c in cs] for k, *cs in csvread}
This works in python-3.x, and produces on my machine:
>>> result
{'NYC': [22, 55], 'BOSTON': [39, 22]}
It also works for an arbitrary number of columns.
In case you use python-2.7, you can use indexing and slicing over sequence unpacking:
import csv
with open(r'C:\Python\Sandbox\longlat.csv', mode='r') as csvfile:
csvread = csv.reader(csvfile)
result = {row[0]: [int(c) for c in row[1:]] for row in csvread}
Each row will have 3 values. You want the first as the key and the rest as the value.
>>> row
['NYC','22','55']
>>> {row[0]: row[1:]}
{'NYC': ['22', '55']}
You can create the whole dict:
lookup = {row[0]: row[1:] for row in input_file}
You can also use pandas like so:
import pandas as pd
df = pd.read_csv(r'C:\Python\Sandbox\longlat.csv')
result = {}
for index, row in df.iterrows():
result[row[0]] = row[1:]
Heres a hint. Try familiarizing yourself with the str.split(x) function
strVar = "NYC,22,55"
listVar = strVar.split(',') # ["NYC", "22", "55"]
cityVar = listVar[0] # "NYC"
restVar = listVar[1:]; # ["22", "55"]
# If you want to convert `restVar` into integers
restVar = map(int, restVar)
I have a csv that looks like this:
HA-MASTER,CategoryID
38231-S04-A00,14
39790-S10-A03,14
38231-S04-A00,15
39790-S10-A03,15
38231-S04-A00,16
39790-S10-A03,16
38231-S04-A00,17
39790-S10-A03,17
38231-S04-A00,18
39790-S10-A03,18
38231-S04-A00,19
39795-ST7-000,75
57019-SN7-000,75
38251-SV4-911,75
57119-SN7-003,75
57017-SV4-A02,75
39795-ST7-000,76
57019-SN7-000,76
38251-SV4-911,76
57119-SN7-003,76
57017-SV4-A02,76
What I would like to do is reformat this data so that there is only one line for each categoryID for example:
14,38231-S04-A00,39790-S10-A03
76,39795-ST7-000,57019-SN7-000,38251-SV4-911,57119-SN7-003,57017-SV4-A02
I have not found a way in excel that I can accomplish this programatically. I have over 100,000 lines. Is there a way using python CSV Read and Write to do something like this?
Yes there is a way:
import csv
def addRowToDict(row):
global myDict
key=row[1]
if key in myDict.keys():
#append values if entry already exists
myDict[key].append(row[0])
else:
#create entry
myDict[key]=[row[1],row[0]]
global myDict
myDict=dict()
inFile='C:/Users/xxx/Desktop/pythons/test.csv'
outFile='C:/Users/xxx/Desktop/pythons/testOut.csv'
with open(inFile, 'r') as f:
reader = csv.reader(f)
ignore=True
for row in reader:
if ignore:
#ignore first row
ignore=False
else:
#add entry to dict
addRowToDict(row)
with open(outFile,'w') as f:
writer = csv.writer(f)
#write everything to file
writer.writerows(myDict.itervalues())
Just edit inFile and outFile
This is pretty trivial using a dictionary of lists (Python 2.7 solution):
#!/usr/bin/env python
import fileinput
categories={}
for line in fileinput.input():
# Skip the first line in the file (assuming it is a header).
if fileinput.isfirstline():
continue
# Split the input line into two fields.
ha_master, cat_id = line.strip().split(',')
# If the given category id is NOT already in the dictionary
# add a new empty list
if not cat_id in categories:
categories[cat_id]=[]
# Append a new value to the category.
categories[cat_id].append(ha_master)
# Iterate over all category IDs and lists. Use ','.join() to
# to output a comma separate list from an Python list.
for k,v in categories.iteritems():
print '%s,%s' %(k,','.join(v))
I would read in the entire file, create a dictionary where the key is the ID and the value is a list of the other data.
data = {}
with open("test.csv", "r") as f:
for line in f:
temp = line.rstrip().split(',')
if len(temp[0].split('-')) == 3: # => specific format that ignores the header...
if temp[1] in data:
data[temp[1]].append(temp[0])
else:
data[temp[1]] = [temp[0]]
with open("output.csv", "w+") as f:
for id, datum in data.iteritems():
f.write("{},{}\n".format(id, ','.join(datum)))
Use pandas!
import pandas
csv_data = pandas.read_csv('path/to/csv/file')
use_this = csv_data.group_by('CategoryID').values
You will get a list with everything you want, now you just have to format it.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
Cheers.
I see many beautiful answers have come up while I was trying it, but I'll post mine as well.
import re
csvIN = open('your csv file','r')
csvOUT = open('out.csv','w')
cat = dict()
for line in csvIN:
line = line.rstrip()
if not re.search('^[0-9]+',line): continue
ham, cid = line.split(',')
if cat.get(cid,False):
cat[cid] = cat[cid] + ',' + ham
else:
cat[cid] = ham
for i in sorted(cat):
csvOUT.write(i + ',' + cat[i] + '\n')
Pandas approach:
import pandas as pd
df = pd.read_csv('data.csv')
#new = df.groupby('CategoryID')['HA-MASTER'].apply(lambda row: '%s' % ','.join(row))
new = df.groupby('CategoryID')['HA-MASTER'].agg(','.join)
new.to_csv('out.csv')
out.csv:
14,"38231-S04-A00,39790-S10-A03"
15,"38231-S04-A00,39790-S10-A03"
16,"38231-S04-A00,39790-S10-A03"
17,"38231-S04-A00,39790-S10-A03"
18,"38231-S04-A00,39790-S10-A03"
19,38231-S04-A00
75,"39795-ST7-000,57019-SN7-000,38251-SV4-911,57119-SN7-003,57017-SV4-A02"
76,"39795-ST7-000,57019-SN7-000,38251-SV4-911,57119-SN7-003,57017-SV4-A02"
This was an interesting question. My solution was to append each new item for a given key to a single string in the value, along with a comma to delimit the columns.
with open('Input01.csv') as input_file:
file_lines = [item.strip() for item in input_file.readlines()]
for item in iter([i.split(',') for i in file_lines]):
if item[1] in set_vals:
set_vals[item[1]] = set_vals[item[1]] + ',' + item[0]
else:
set_vals[item[1]] = item[0]
with open('Results01.csv','w') as output_file:
for i in sorted(set_vals.keys()):
output_file.write('{},{}\n'.format(i, set_vals[i]))
MaxU's implementation, using pandas, has good potential and looks really elegant, but all the values are placed into one cell, because each of the strings is double-quoted. For example, the line corresponding to the code '18'—"38231-S04-A00,39790-S10-A03"—would place both values in the second column.
import csv
from collections import defaultdict
inpath = '' # Path to input CSV
outpath = '' # Path to output CSV
output = defaultdict(list) # To hold {category: [serial_numbers]}
for row in csv.DictReader(open(inpath)):
output[row['CategoryID']].append(row['HA-MASTER'])
with open(outpath, 'w') as f:
f.write('CategoryID,HA-MASTER\n')
for category, serial_number in output.items():
row = '%s,%s\n' % (category, serial_number)
f.write(row)
For Python 3.4.0
Hey everyone,
I have a csv file that looks like this:
string1;value1
string2;value2
string3;value3
What I wanna do is getting this csv file into some kind of "record" data type, so that I can e.g. look for stringX in stringbig and, if stringX is found, then add +1 to valueX.
What is the easiest way to code that?
Thanks in advance
You can make rows into namedtuples. Here's a simple example:
import csv
from collections import namedtuple
Record = namedtuple('Record', ['product', 'part_number', 'category'])
mydict = defaultdict(dict)
with open('inventory.csv', 'rb') as inf:
for rec in map(Record._make, csv.reader(inf)):
print(rec.part_number)
You can just use the builtin csv module and a simple python dictionary:
import csv
records = {}
with open('/path/to/your/file.csv','rb') as fileobj:
reader = csv.reader(fileobj, delimiter=';')
for key, value in reader:
records[key] = int(value)
Then you can easily update the valueX for stringX by doing:
records[stringX] = records.get(stringX, 0) + 1
You can use DictReader for that:
CSV:
Name;Value
string1;0
string2;20
string3;12
Python:
import csv
with open("data.csv", 'r') as f:
r = csv.DictReader(f, delimiter=';')
for row in r:
if 'string2' in row['Name']:
row['Value'] += 1
print(row)
{'Value': 21, 'Name': 'string2'}
I have a text file containing key-value pairs, with the last two key-value pairs containing JSON-like objects that I would like to split out into columns and write with the other values, using the keys as column headings. The first three rows of the data file input.txt look like this:
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::44.6743867864386,Length3dCenterToCenter::44.6768028159989,Tag::<NULL>,{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.53962362760298}
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::57.8689351603823,Length3dCenterToCenter::57.8700464193429,Tag::<NULL>,{StartPoint::7793.52927597915[%2C]1680.91224357457[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.43363070193163}
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::68.7161350545728,Length3dCenterToCenter::68.7172034962765,Tag::<NULL>,{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7793.52927597915[%2C]1680.91224357457[%2C]-3.45819643838485}
and we eventually came up with something that worked, but there must be a much better way:
import csv
with open('input.txt', 'rb') as fin, open('output.csv', 'wb') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for i, line in enumerate(reader):
mysplit = [item.split('::') for item in line if item.strip()]
if not mysplit: # blank line
continue
keys, vals = zip(*mysplit)
start_vals = [item.split('[%2C]') for item in mysplit[-2]]
end_vals = [item.split('[%2C]') for item in mysplit[-1]]
a=list(keys[0:-2])
a.extend(['start1','start2','start3','end1','end2','end3'])
b=list(vals[0:-2])
b.append(start_vals[1][0])
b.append(start_vals[1][1])
b.append(start_vals[1][2][:-1])
b.append(end_vals[1][0])
b.append(end_vals[1][1])
b.append(end_vals[1][2][:-1])
if i == 0:
# if first line: write header
writer.writerow(a)
writer.writerow(b)
which produces the output file output.csv that looks like this
InnerDiameterOrWidth,InnerHeight,Length2dCenterToCenter,Length3dCenterToCenter,Tag,start1,start2,start3,end1,end2,end3
0.1,0.1,44.6743867864386,44.6768028159989,<NULL>,7858.35924983374,1703.69341358077,-3.075,7822.85045874375,1730.80294308742,-3.53962362760298
0.1,0.1,57.8689351603823,57.8700464193429,<NULL>,7793.52927597915,1680.91224357457,-3.075,7822.85045874375,1730.80294308742,-3.43363070193163
0.1,0.1,68.7161350545728,68.7172034962765,<NULL>,7858.35924983374,1703.69341358077,-3.075,7793.52927597915,1680.91224357457,-3.45819643838485
We don't want to write code like this in the future.
What is the best way to read data like this?
I'd use:
from itertools import chain
import csv
_header_translate = {
'StartPoint': ('start1', 'start2', 'start3'),
'EndPoint': ('end1', 'end2', 'end3')
}
def header(col):
header = col.strip('{}').split('::', 1)[0]
return _header_translate.get(header, (header,))
def cleancolumn(col):
col = col.strip('{}').split('::', 1)[1]
return col.split('[%2C]')
def chainedmap(func, row):
return list(chain.from_iterable(map(func, row)))
with open('input.txt', 'rb') as fin, open('output.csv', 'wb') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for i, row in enumerate(reader):
if not i: # first row, write header first
writer.writerow(chainedmap(header, row))
writer.writerow(chainedmap(cleancolumn, row))
The cleancolumn method takes any of your columns and returns a tuple (possibly with only one value) after removing the braces, removing everything before the first :: and splitting on the embedded 'comma'. By using itertools.chain.from_iterable() we turn the series of tuples generated from the columns into one list again for the csv writer.
When handling the first line we generate one header row from the same columns, replacing the StartPoint and EndPoint headers with the 6 expanded headers.