Creating a nested dictionary from a CSV file with Python - python

I have a a csv file "input.csv" which has the following data.
UID,BID,R
U1,B1,4
U1,B2,3
U2,B1,2
I want the above to look like the following dictionary; group by the UID as they key and BID and R as a nested dictionary value.
{"U1":{"B1":4, "B2": 3}, "U2":{"B1":2}}
I have the below code:
new_data_dict = defaultdict(str)
with open("input.csv", 'r') as data_file:
data = csv.DictReader(data_file, delimiter=",")
headers = next(data)
for row in data:
new_data_dict[row["UID"]] += {row["BID"]:int(row["R"])}
The above throws an obvious error of
TypeError: cannot concatenate 'str' and 'dict' objects
Is there a way to do this?

Using the regular dict() you can use get() to initialize a new sub-dict and fill it afterwards.
import csv
new_data_dict = {}
with open("data.csv", 'r') as data_file:
data = csv.DictReader(data_file, delimiter=",")
for row in data:
item = new_data_dict.get(row["UID"], dict())
item[row["BID"]] = int(row["R"])
new_data_dict[row["UID"]] = item
print new_data_dict
Also, your call to next(data) was superfluous as the headers were automatically detected and stripped from the result.

This is a more efficient version using defaultdict:
from collections import defaultdict
new_data_dict = {}
with open("input.csv", 'r') as data_file:
data_file.next()
for row in data_file:
row = row.strip().split(",")
new_data_dict.setdefault(row[0],{})[row[1]] = int(row[2])

Related

CSV Grouping w/o Pandas

Id like to group data in a .csv file. My data is like the following:
code,balance
CN,999.99
CN,1.01
LS,177.77
LS,69.42
LA,200.43
WO,100
I would like to group the items by code and sum up the balances of the like codes. Desired output would be:
code,blance
CN,1001
LS,247.19
...
I was originaly using Pandas for this task but will not have a package available to put that library on a server.
mydata = pd.read_csv('./tmp/temp.csv')
out = mydata.groupby('code').sum()
Solutions would preferably be compatible with Python 2.6.
I apologize if this is a duplicate, the other posts seem to be grouping differently.
I would also like to avoid doing this in a -
if code = x
add balance to x_total
-kind of way
MY SOLUTION:
def groupit():
groups = defaultdict(list)
with open('tmp.csv') as fd:
reader = csv.DictReader(fd)
for row in reader:
groups[row['code']].append(float(row['balance.']))
total={key:sum(groups[key]) for key in groups}
total=str(total)
total=total.replace(' ','')
total=total.replace('{','')
total=total.replace('}','')
total=total.replace("'",'')
total=total.replace(',','\n')
total=total.replace(':',',')
outfile = open('out.csv','w+')
outfile.write('code,balance\n')
outfile.write(total)
Python > 2.6:
from collections import defaultdict
import csv
groups = defaultdict(list)
with open('text.txt') as fd:
reader = csv.DictReader(fd)
for row in reader:
groups[row['code']].append(float(row['balance']))
totals = {key: sum(groups[key]) for key in groups}
print(totals)
This outputs:
{'CN': 1001.0, 'LS': 247.19, 'LA': 200.43, 'WO': 100.0}
Python = 2.6:
from collections import defaultdict
import csv
groups = defaultdict(list)
with open('text.txt') as fd:
reader = csv.DictReader(fd)
for row in reader:
groups[row['code']].append(float(row['balance']))
totals = dict((key, sum(groups[key])) for key in groups)
print(totals)
Here is how I will go about it:
with open("data.csv", 'r') as f:
data = f.readlines()
result = {}
for val in range(1, len(data)-1):
x = data[val].split(",")
if x[0] not in result:
result[x[0]] = float(x[1].replace('\n', ""))
else:
result[x[0]] = result[x[0]] + float(x[1].replace('\n', ""))
result dictionary will have the values of interest which can then be saves as csv.
import csv
with open('mycsvfile.csv', 'wb') as f: # Just use 'w' mode in 3.x
w = csv.DictWriter(f, result.keys())
w.writeheader()
w.writerow(result)
Hope this helps :)

How to realise "sumif" in python3.x within an csv data

i want to add up lines in a csv file (It's a BOM) if they are identical and in the same part, but not if they are a specific type.
Here is the example to make it more clear:
LevelName,Type,Amount
Part_1,a,1
Part_1,a,1
Part_1,b,1
Part_1,c,1
Part_1,d,1
Part_1,f,1
Part_2,a,1
Part_2,c,1
Part_2,d,1
Part_2,a,1
Part_2,a,1
Part_2,d,1
Part_2,d,1
So i need to some up all Types within a Part but not if the type is 'd'.
Result should look like this:
LevelName,Type,Amount
Part_1,a,2
Part_1,b,1
Part_1,c,1
Part_1,d,1
Part_1,f,1
Part_2,a,3
Part_2,c,1
Part_2,d,1
Part_2,d,1
Part_2,d,1
unfortunatly i can not use any external lib. so pandas is no option here.
That is how far i got:
import csv
map = {}
with open('infile.csv', 'rt') as f:
reader = csv.reader(f, delimiter = ',')
with open('outfile.csv', 'w', newline='') as fout:
writer = csv.writer(fout, delimiter=';', quoting=csv.QUOTE_MINIMAL)
writer.writerow(next(reader))
for row in reader:
(level, type, count) = row
if not type=='d':
Well, here i just don't get ahead...
Thanks for any hint!
Ok sorry about using pandas. Then first read the file saving the results in a defaultdict.
from collections import defaultdict
grouped = defaultdict(int)
if not type=='d':
grouped[(level, type)] += int(count)
Then you can save the result of that dict to a file
import csv
import os
cwd = os.getcwd()
master = {}
file = csv.DictReader(open(cwd+'\\infile.csv', 'rb'), delimiter=',')
data = [row for row in file]
for row in data:
master.setdefault(row['LevelName'], {})
if row['Type'] != 'd':
master[row['LevelName']].setdefault(row['Type'], 0)
master[row['LevelName']][row['Type']] += int(row['Amount'])
print (master)
Not as simple as the soloution above but this shows how to iterate over the data
OR i suppose you could concatenate the 'LevelName' and the 'Type' so that you have one less line of code. It depends on what you what you want.
for row in data:
if row['Type'] != 'd':
master.setdefault(row['LevelName'] + row['Type'], 0)
master[row['LevelName'] + row['Type']] += int(row['Amount'])
print (master)
EDIT
to write back to original format something like:
out = open(cwd+'\\outfile.csv', 'wb')
out.write('LevelName,Type,Amount\n')
for k,v in master.iteritems():
for z in v:
out.write('%s,%s,%s\n' % (k, z, str(v[z])))

How to read a csv file into something like a "record" data type?

For Python 3.4.0
Hey everyone,
I have a csv file that looks like this:
string1;value1
string2;value2
string3;value3
What I wanna do is getting this csv file into some kind of "record" data type, so that I can e.g. look for stringX in stringbig and, if stringX is found, then add +1 to valueX.
What is the easiest way to code that?
Thanks in advance
You can make rows into namedtuples. Here's a simple example:
import csv
from collections import namedtuple
Record = namedtuple('Record', ['product', 'part_number', 'category'])
mydict = defaultdict(dict)
with open('inventory.csv', 'rb') as inf:
for rec in map(Record._make, csv.reader(inf)):
print(rec.part_number)
You can just use the builtin csv module and a simple python dictionary:
import csv
records = {}
with open('/path/to/your/file.csv','rb') as fileobj:
reader = csv.reader(fileobj, delimiter=';')
for key, value in reader:
records[key] = int(value)
Then you can easily update the valueX for stringX by doing:
records[stringX] = records.get(stringX, 0) + 1
You can use DictReader for that:
CSV:
Name;Value
string1;0
string2;20
string3;12
Python:
import csv
with open("data.csv", 'r') as f:
r = csv.DictReader(f, delimiter=';')
for row in r:
if 'string2' in row['Name']:
row['Value'] += 1
print(row)
{'Value': 21, 'Name': 'string2'}

How do I merge two CSV files based on field and keep same number of attributes on each record?

I am attempting to merge two CSV files based on a specific field in each file.
file1.csv
id,attr1,attr2,attr3
1,True,7,"Purple"
2,False,19.8,"Cucumber"
3,False,-0.5,"A string with a comma, because it has one"
4,True,2,"Nope"
5,True,4.0,"Tuesday"
6,False,1,"Failure"
file2.csv
id,attr4,attr5,attr6
2,"python",500000.12,False
5,"program",3,True
3,"Another string",-5,False
This is the code I am using:
import csv
from collections import OrderedDict
with open('file2.csv','r') as f2:
reader = csv.reader(f2)
fields2 = next(reader,None) # Skip headers
dict2 = {row[0]: row[1:] for row in reader}
with open('file1.csv','r') as f1:
reader = csv.reader(f1)
fields1 = next(reader,None) # Skip headers
dict1 = OrderedDict((row[0], row[1:]) for row in reader)
result = OrderedDict()
for d in (dict1, dict2):
for key, value in d.iteritems():
result.setdefault(key, []).extend(value)
with open('merged.csv', 'wb') as f:
w = csv.writer(f)
for key, value in result.iteritems():
w.writerow([key] + value)
I get output like this, which merges appropriately, but does not have the same number of attributes for all rows:
1,True,7,Purple
2,False,19.8,Cucumber,python,500000.12,False
3,False,-0.5,"A string with a comma, because it has one",Another string,-5,False
4,True,2,Nope
5,True,4.0,Tuesday,program,3,True
6,False,1,Failure
file2 will not have a record for every id in file1. I'd like the output to have empty fields from file2 in the merged file. For example, id 1 would look like this:
1,True,7,Purple,,,
How can I add the empty fields to records that don't have data in file2 so that all of my records in the merged CSV have the same number of attributes?
If we're not using pandas, I'd refactor to something like
import csv
from collections import OrderedDict
filenames = "file1.csv", "file2.csv"
data = OrderedDict()
fieldnames = []
for filename in filenames:
with open(filename, "rb") as fp: # python 2
reader = csv.DictReader(fp)
fieldnames.extend(reader.fieldnames)
for row in reader:
data.setdefault(row["id"], {}).update(row)
fieldnames = list(OrderedDict.fromkeys(fieldnames))
with open("merged.csv", "wb") as fp:
writer = csv.writer(fp)
writer.writerow(fieldnames)
for row in data.itervalues():
writer.writerow([row.get(field, '') for field in fieldnames])
which gives
id,attr1,attr2,attr3,attr4,attr5,attr6
1,True,7,Purple,,,
2,False,19.8,Cucumber,python,500000.12,False
3,False,-0.5,"A string with a comma, because it has one",Another string,-5,False
4,True,2,Nope,,,
5,True,4.0,Tuesday,program,3,True
6,False,1,Failure,,,
For comparison, the pandas equivalent would be something like
df1 = pd.read_csv("file1.csv")
df2 = pd.read_csv("file2.csv")
merged = df1.merge(df2, on="id", how="outer").fillna("")
merged.to_csv("merged.csv", index=False)
which is much simpler to my eyes, and means you can spend more time dealing with your data and less time reinventing wheels.
You can use pandas to do this:
import pandas
csv1 = pandas.read_csv('filea1.csv')
csv2 = pandas.read_csv('file2.csv')
merged = csv1.merge(csv2, on='id')
merged.to_csv("output.csv", index=False)
I haven't tested this yet but it should put you on the right track until I can try it out. The code is quite self-explanatory; first you import the pandas library so that you can use it. Then using pandas.read_csv you read the 2 csv files and use the merge method to merge them. The on parameter specifies which column should be used as the "key". Finally, the merged csv is written to output.csv.
Use dict of dict then update it. Like this:
import csv
from collections import OrderedDict
with open('file2.csv','r') as f2:
reader = csv.reader(f2)
lines2 = list(reader)
with open('file1.csv','r') as f1:
reader = csv.reader(f1)
lines1 = list(reader)
dict1 = {row[0]: dict(zip(lines1[0][1:], row[1:])) for row in lines1[1:]}
dict2 = {row[0]: dict(zip(lines2[0][1:], row[1:])) for row in lines2[1:]}
#merge
updatedDict = OrderedDict()
mergedAttrs = OrderedDict.fromkeys(lines1[0][1:] + lines2[0][1:], "?")
for id, attrs in dict1.iteritems():
d = mergedAttrs.copy()
d.update(attrs)
updatedDict[id] = d
for id, attrs in dict2.iteritems():
updatedDict[id].update(attrs)
#out
with open('merged.csv', 'wb') as f:
w = csv.writer(f)
for id, rest in sorted(updatedDict.iteritems()):
w.writerow([id] + rest.values())

Python Joining csv files where key is first column value

I try to join two csv files where key is value of first column.
There's no header.
Files have different number of lines and rows.
Order of file a must be preserved.
file a:
john,red,34
andrew,green,18
tonny,black,50
jack,yellow,27
phill,orange,45
kurt,blue,29
mike,pink,61
file b:
tonny,driver,new york
phill,scientist,boston
desired result:
john,red,34
andrew,green,18
tonny,black,50,driver,new york
jack,yellow,27
phill,orange,45,scientist,boston
kurt,blue,29
mike,pink,61
I examined all related threads and I am sure that some of you are gonna mark this question duplicate but I simply have not found solution yet.
I grabbed dictionary based solution but this approach does not handle preserve line order from file 'a' condition.
import csv
from collections import defaultdict
with open('a.csv') as f:
r = csv.reader(f, delimiter=',')
dict1 = {}
for row in r:
dict1.update({row[0]: row[1:]})
with open('b.csv') as f:
r = csv.reader(f, delimiter=',')
dict2 = {}
for row in r:
dict2.update({row[0]: row[1:]})
result = defaultdict(list)
for d in (dict1, dict2):
for key, value in d.iteritems():
result[key].append(value)
I also would like to avoid putting these csv files to the database like sqlite or using pandas module.
Thanks in advance
Something like
import csv
from collections import OrderedDict
with open('b.csv', 'rb') as f:
r = csv.reader(f)
dict2 = {row[0]: row[1:] for row in r}
with open('a.csv', 'rb') as f:
r = csv.reader(f)
dict1 = OrderedDict((row[0], row[1:]) for row in r)
result = OrderedDict()
for d in (dict1, dict2):
for key, value in d.iteritems():
result.setdefault(key, []).extend(value)
with open('ab_combined.csv', 'wb') as f:
w = csv.writer(f)
for key, value in result.iteritems():
w.writerow([key] + value)
produces
john,red,34
andrew,green,18
tonny,black,50,driver,new york
jack,yellow,27
phill,orange,45,scientist,boston
kurt,blue,29
mike,pink,61
(Note that I didn't bother protecting against the case where dict2 has a key which isn't in dict1-- that's easily added if you like.)

Categories

Resources