csv file to dictionary [duplicate] - python

This question already has answers here:
Converting csv file to dictionary
(2 answers)
Closed 6 years ago.
I want to write from a csv file to a dictionary.
But:
The first word from a row should be the key
All other words in the row should be seperate values for this key.
My code so far:
def coordinates(text):
import csv
reader = csv.reader(open(text))
d = {}
for row in reader:
key = row[0]
d[key] = row[1:]
print(d)
coordinates('luchthavens2.csv')
With this code all items from the row are the key in my dictionary.
Who can help?
EDIT:
Input file looks like this:
BIN,"Bamiyan","Bamiyan","Afghanistan","AF",34.800000,67.816667,701,"Afghanistan",\N,\N,1149361
BST,"Bost","Bost","Afghanistan","AF",31.550000,64.366667,701,"Afghanistan",\N,1134720,1149361
CCN,"Chakcharan","Chakcharan","Afghanistan","AF",34.533333,65.266667,701,"Afghanistan",\N,\N,1149361
All from an excel file called luchthavens2.csv, the positions of the text are A1-A2-A3-etc.
You should find it here: https://expirebox.com/download/bb8cb3a39f9be041743a8b86db89093b.html
Output:
{
'CCN,"Chakcharan","Chakcharan","Afghanistan","AF",34.533333,65.266667,701,"Afghanistan",\\N,\\N,1149361': [],
'BST,"Bost","Bost","Afghanistan","AF",31.550000,64.366667,701,"Afghanistan",\\N,1134720,1149361': [],
'BIN,"Bamiyan","Bamiyan","Afghanistan","AF",34.800000,67.816667,701,"Afghanistan",\\N,\\N,1149361': []
}
EDIT:
I've changed my input file to a text file, then back again to a csv file. Strangely enough this worked, I can read it without any problems.

If you run the following,
import pprint
import csv
def coordinates(text):
ret = {}
with open(text, 'r') as fp:
reader = csv.reader(fp)
for row in reader:
key = row.pop(0)
ret[key] = row
return ret
data = coordinates('data.csv')
pprint.pprint(data)
On the following file,
$ cat data.csv
AAA,"Blub",25.25
BBB,"Blob",27.27
Then you will get,
$ python stackoverflow.py
{'AAA': ['Blub', '25.25'], 'BBB': ['Blob', '27.27']}

In your input file, the quotation marks are messed up.
"BIN,""Bamiyan"",""Bamiyan"",""Afghanistan"",""AF"",34.800000,67.816667,701,""Afghanistan"",\N,\N,1149361"
"BST,""Bost"",""Bost"",""Afghanistan"",""AF"",31.550000,64.366667,701,""Afghanistan"",\N,1134720,1149361"
"CCN,""Chakcharan"",""Chakcharan"",""Afghanistan"",""AF"",34.533333,65.266667,701,""Afghanistan"",\N,\N,1149361"
should be
"BIN","Bamiyan","Bamiyan","Afghanistan","AF",34.800000,67.816667,701,"Afghanistan",\N,\N,1149361
"BST","Bost","Bost","Afghanistan","AF",31.550000,64.366667,701,"Afghanistan",\N,1134720,1149361
"CCN","Chakcharan","Chakcharan","Afghanistan","AF",34.533333,65.266667,701,"Afghanistan",\N,\N,1149361

This should give you desired output.
Your data is not in proper CSV format
import csv
reader = csv.reader(open('luchthavens2.csv'))
d = {}
for row in reader:
row = row[0].split(',')
key = row[0]
d[key] = row[1:]
Rather you dont need a CSV module for this thing because your data is not proper CSV
Below code will solve your problem without CSV module
d = dict()
with open('luchthavens2.csv') as fh:
for row in fh:
row = row.split(',')
key = row[0]
d[key] = row[1:]

Related

Turning a CSV file with a header into a python dictionary

Locked. There are disputes about this question’s content being resolved at this time. It is not currently accepting new answers or interactions.
Lets say I have the following example csv file
a,b
100,200
400,500
How would I make into a dictionary like below:
{a:[100,400],b:[200,500]}
I am having trouble figuring out how to do it manually before I use a package, so I understand. Any one can help?
some code I tried
with open("fake.csv") as f:
index= 0
dictionary = {}
for line in f:
words = line.strip()
words = words.split(",")
if index >= 1:
for x in range(len(headers_list)):
dictionary[headers_list[i]] = words[i]
# only returns the last element which makes sense
else:
headers_list = words
index += 1
At the very least, you should be using the built-in csv package for reading csv files without having to bother with parsing. That said, this first approach is still applicable to your .strip and .split technique:
Initialize a dictionary with the column names as keys and empty lists as values
Read a line from the csv reader
Zip the line's contents with the column names you got in step 1
For each key:value pair in the zip, update the dictionary by appending
with open("test.csv", "r") as file:
reader = csv.reader(file)
column_names = next(reader) # Reads the first line, which contains the header
data = {col: [] for col in column_names}
for row in reader:
for key, value in zip(column_names, row):
data[key].append(value)
Your issue was that you were using the assignment operator = to overwrite the contents of your dictionary on every iteration. This is why you either want to pre-initialize the dictionary like above, or use a membership check first to test if the key exists in the dictionary, adding it if not:
key = headers_list[i]
if key not in dictionary:
dictionary[key] = []
dictionary[key].append(words[i])
An even cleaner shortcut is to take advantage of dict.get:
key = headers_list[i]
dictionary[key] = dictionary.get(key, []) + [words[i]]
Another approach would be to take advantage of the csv package by reading each row of the csv file as a dictionary itself:
with open("test.csv", "r") as file:
reader = csv.DictReader(file)
data = {}
for row_dict in reader:
for key, value in row_dict.items():
data[key] = data.get(key, []) + [value]
Another standard library package you could use to clean this up further is collections, with defaultdict(list), where you can directly append to the dictionary at a given key without worrying about initializing with an empty list if the key wasn't already there.
To do that just keep the column name and data seperate then iterate the column and add the value for the corresponding index in data, not sure if this work with empty values.
However, I am much sure that going through pandas would be 100% easier, it's a really used library for working with data in external files.
import csv
datas = []
with open('fake.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
if line_count == 0:
cols = row
line_count += 1
else:
datas.append(row)
line_count += 1
dict = {}
for index, col in enumerate(cols): #Iterate through the data with value and indices
dict[col] = []
for data in datas: #append a in the current dict key, a new value.
#if this key doesn't exist, it will create a new one.
dict[col].append(data[index])
print(dict)

How to skip the first column when reading a csv file to a dictionary python

I'm new to python. I'm trying to read a csv file into a dictionary but it is returning the dictionary with the key twice, the first time as the key word, and the second time as one of the columns. Has anyone any idea on how to remove the first column that is already considered to be key?
Here is my code:
def read_dict(filename, key_column_index):
"""Read the contents of a CSV file into a compound
dictionary and return the dictionary.
Parameters
filename: the name of the CSV file to read.
key_column_index: the index of the column
to use as the keys in the dictionary.
Return: a compound dictionary that contains
the contents of the CSV file.
"""
# Create an empty dictionary that will store the data from the CSV file.
csv_dict = {}
with open(filename, "rt") as csv_file:
# Use the csv module to create a reader object that will read from the opened CSV file.
reader = csv.reader(csv_file)
# Skip the first row of data as it contains the header of each column
next(reader)
# Read the rows in the CSV file one row at a time.
# The reader object returns each row as a list.
for row_list in reader:
# From the current row, retrieve the data from the column that contains the key.
key = row_list[key_column_index]
# Store the data from the current row into the dictionary.
csv_dict[key] = row_list
return csv_dict
This will skip over the column at key_column_index using list slicing:
key = row_list[key_column_index]
csv_dict[key] = row_list[:key_column_index] + row_list[key_column_index + 1:]
You can use this feature
import csv
f = open("samp.csv","r")
reader = csv.reader(f)
d = {}
d_reader = csv.DictReader(f)
for l in (d_reader):
print(l)
The returned dictionary is in the form of json records. A list of dictionaries where the key is the column name
Or you can do this if you want the whole column as a list under each column name which serves as the key
import csv
f = open("samp.csv","r")
reader = csv.reader(f)
d = {}
for ri,r in enumerate(reader):
if ri == 0:
column_names = list(r)
if ri > 0:
for ci, c in enumerate(r):
curr_cname = column_names[ci]
if curr_cname not in d:
d[curr_cname] = []
d[curr_cname].append(c)
print(d) # d = {'a': ['1', '5'], 'b': ['2', '6'], 'c': ['3', '7']}
f.close()
Here is what my code looks like:
def read_dict(filename, key_column_index):
"""Read the contents of a CSV file into a compound
dictionary and return the dictionary.
Parameters
filename: the name of the CSV file to read.
key_column_index: the index of the column
to use as the keys in the dictionary.
Return: a compound dictionary that contains
the contents of the CSV file.
"""
# Create an empty dictionary that will store the data from the CSV file.
csv_dict = {}
with open(filename, "rt") as csv_file:
# Use the csv module to create a reader object that will read from the
opened CSV file.
reader = csv.reader(csv_file)
# Skip the first row of data as it contains the header of each column
next(reader)
# Read the rows in the CSV file one row at a time.
# The reader object returns each row as a list.
for row_list in reader:
# From the current row, retrieve the data from the column that contains
the key.
key = row_list[key_column_index]
# Store the data from the current row into the dictionary.
csv_dict[key] = row_list[:key_column_index] + row_list[key_column_index + 1:]
return csv_dict

python csv file add to field based off another field

I have a csv file looks like this:
I have a column called “Inventory”, within that column I pulled data from another source and it put it in a dictionary format as you see.
What I need to do is iterate through the 1000+ lines, if it sees the keywords: comforter, sheets and pillow exist than write “bedding” to the “Location” column for that row, else write “home-fashions” if the if statement is not true.
I have been able to just get it to the if statement to tell me if it goes into bedding or “home-fashions” I just do not know how I tell it to write the corresponding results to the “Location” field for that line.
In my script, im printing just to see my results but in the end I just want to write to the same CSV file.
from csv import DictReader
with open('test.csv', 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
for line in csv_dict_reader:
if 'comforter' in line['Inventory'] and 'sheets' in line['Inventory'] and 'pillow' in line['Inventory']:
print('Bedding')
print(line['Inventory'])
else:
print('home-fashions')
print(line['Inventory'])
The last column of your csv contains commas. You cannot read it using DictReader.
import re
data = []
with open('test.csv', 'r') as f:
# Get the header row
header = next(f).strip().split(',')
for line in f:
# Parse 4 columns
row = re.findall('([^,]*),([^,]*),([^,]*),(.*)', line)[0]
# Create a dictionary of one row
item = {header[0]: row[0], header[1]: row[1], header[2]: row[2],
header[3]: row[3]}
# Add each row to the list
data.append(item)
After preparing your data, you can check with your conditions.
for item in data:
if all([x in item['Inventory'] for x in ['comforter', 'sheets', 'pillow']]):
item['Location'] = 'Bedding'
else:
item['Location'] = 'home-fashions'
Write output to a file.
import csv
with open('output.csv', 'w') as f:
dict_writer = csv.DictWriter(f, data[0].keys())
dict_writer.writeheader()
dict_writer.writerows(data)
csv.DictReader returns a dict, so just assign the new value to the column:
if 'comforter' in line['Inventory'] and ...:
line['Location'] = 'Bedding'
else:
line['Location'] = 'home-fashions'
print(line['Inventory'])

Python CSV writer

I have a csv that looks like this:
HA-MASTER,CategoryID
38231-S04-A00,14
39790-S10-A03,14
38231-S04-A00,15
39790-S10-A03,15
38231-S04-A00,16
39790-S10-A03,16
38231-S04-A00,17
39790-S10-A03,17
38231-S04-A00,18
39790-S10-A03,18
38231-S04-A00,19
39795-ST7-000,75
57019-SN7-000,75
38251-SV4-911,75
57119-SN7-003,75
57017-SV4-A02,75
39795-ST7-000,76
57019-SN7-000,76
38251-SV4-911,76
57119-SN7-003,76
57017-SV4-A02,76
What I would like to do is reformat this data so that there is only one line for each categoryID for example:
14,38231-S04-A00,39790-S10-A03
76,39795-ST7-000,57019-SN7-000,38251-SV4-911,57119-SN7-003,57017-SV4-A02
I have not found a way in excel that I can accomplish this programatically. I have over 100,000 lines. Is there a way using python CSV Read and Write to do something like this?
Yes there is a way:
import csv
def addRowToDict(row):
global myDict
key=row[1]
if key in myDict.keys():
#append values if entry already exists
myDict[key].append(row[0])
else:
#create entry
myDict[key]=[row[1],row[0]]
global myDict
myDict=dict()
inFile='C:/Users/xxx/Desktop/pythons/test.csv'
outFile='C:/Users/xxx/Desktop/pythons/testOut.csv'
with open(inFile, 'r') as f:
reader = csv.reader(f)
ignore=True
for row in reader:
if ignore:
#ignore first row
ignore=False
else:
#add entry to dict
addRowToDict(row)
with open(outFile,'w') as f:
writer = csv.writer(f)
#write everything to file
writer.writerows(myDict.itervalues())
Just edit inFile and outFile
This is pretty trivial using a dictionary of lists (Python 2.7 solution):
#!/usr/bin/env python
import fileinput
categories={}
for line in fileinput.input():
# Skip the first line in the file (assuming it is a header).
if fileinput.isfirstline():
continue
# Split the input line into two fields.
ha_master, cat_id = line.strip().split(',')
# If the given category id is NOT already in the dictionary
# add a new empty list
if not cat_id in categories:
categories[cat_id]=[]
# Append a new value to the category.
categories[cat_id].append(ha_master)
# Iterate over all category IDs and lists. Use ','.join() to
# to output a comma separate list from an Python list.
for k,v in categories.iteritems():
print '%s,%s' %(k,','.join(v))
I would read in the entire file, create a dictionary where the key is the ID and the value is a list of the other data.
data = {}
with open("test.csv", "r") as f:
for line in f:
temp = line.rstrip().split(',')
if len(temp[0].split('-')) == 3: # => specific format that ignores the header...
if temp[1] in data:
data[temp[1]].append(temp[0])
else:
data[temp[1]] = [temp[0]]
with open("output.csv", "w+") as f:
for id, datum in data.iteritems():
f.write("{},{}\n".format(id, ','.join(datum)))
Use pandas!
import pandas
csv_data = pandas.read_csv('path/to/csv/file')
use_this = csv_data.group_by('CategoryID').values
You will get a list with everything you want, now you just have to format it.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
Cheers.
I see many beautiful answers have come up while I was trying it, but I'll post mine as well.
import re
csvIN = open('your csv file','r')
csvOUT = open('out.csv','w')
cat = dict()
for line in csvIN:
line = line.rstrip()
if not re.search('^[0-9]+',line): continue
ham, cid = line.split(',')
if cat.get(cid,False):
cat[cid] = cat[cid] + ',' + ham
else:
cat[cid] = ham
for i in sorted(cat):
csvOUT.write(i + ',' + cat[i] + '\n')
Pandas approach:
import pandas as pd
df = pd.read_csv('data.csv')
#new = df.groupby('CategoryID')['HA-MASTER'].apply(lambda row: '%s' % ','.join(row))
new = df.groupby('CategoryID')['HA-MASTER'].agg(','.join)
new.to_csv('out.csv')
out.csv:
14,"38231-S04-A00,39790-S10-A03"
15,"38231-S04-A00,39790-S10-A03"
16,"38231-S04-A00,39790-S10-A03"
17,"38231-S04-A00,39790-S10-A03"
18,"38231-S04-A00,39790-S10-A03"
19,38231-S04-A00
75,"39795-ST7-000,57019-SN7-000,38251-SV4-911,57119-SN7-003,57017-SV4-A02"
76,"39795-ST7-000,57019-SN7-000,38251-SV4-911,57119-SN7-003,57017-SV4-A02"
This was an interesting question. My solution was to append each new item for a given key to a single string in the value, along with a comma to delimit the columns.
with open('Input01.csv') as input_file:
file_lines = [item.strip() for item in input_file.readlines()]
for item in iter([i.split(',') for i in file_lines]):
if item[1] in set_vals:
set_vals[item[1]] = set_vals[item[1]] + ',' + item[0]
else:
set_vals[item[1]] = item[0]
with open('Results01.csv','w') as output_file:
for i in sorted(set_vals.keys()):
output_file.write('{},{}\n'.format(i, set_vals[i]))
MaxU's implementation, using pandas, has good potential and looks really elegant, but all the values are placed into one cell, because each of the strings is double-quoted. For example, the line corresponding to the code '18'—"38231-S04-A00,39790-S10-A03"—would place both values in the second column.
import csv
from collections import defaultdict
inpath = '' # Path to input CSV
outpath = '' # Path to output CSV
output = defaultdict(list) # To hold {category: [serial_numbers]}
for row in csv.DictReader(open(inpath)):
output[row['CategoryID']].append(row['HA-MASTER'])
with open(outpath, 'w') as f:
f.write('CategoryID,HA-MASTER\n')
for category, serial_number in output.items():
row = '%s,%s\n' % (category, serial_number)
f.write(row)

Convert a csv to a dictionary with multiple values?

I have a csv file like this:
pos,place
6696,266835
6698,266835
938,176299
940,176299
941,176299
947,176299
948,176299
949,176299
950,176299
951,176299
770,272944
2751,190650
2752,190650
2753,190650
I want to convert it to a dictionary like the following:
{266835:[6696,6698],176299:[938,940,941,947,948,949,950,951],190650:[2751,2752,2753]}
And then, fill the missing numbers in the range in the values:
{{266835:[6696,6697,6698],176299:[938,939,940,941,942,943,944,945,946947,948,949,950,951],190650:[2751,2752,2753]}
}
Right now i have tried to build the dictionary using solution suggested here, but it overwrites the old value with new one.
Any help would be great.
Here is a function that i wrote for converting csv2dict
def csv2dict(filename):
"""
reads in a two column csv file, and the converts it into dictionary
"""
import csv
with open(filename) as f:
f.readline()#ignore first line
reader=csv.reader(f,delimiter=',')
mydict=dict((rows[1],rows[0]) for rows in reader)
return mydict
Easiest is to use collections.defaultdict() with a list:
import csv
from collections import defaultdict
data = defaultdict(list)
with open(inputfilename, 'rb') as infh:
reader = csv.reader(infh)
next(reader, None) # skip the header
for col1, col2 in reader:
data[col2].append(int(col1))
if len(data[col2]) > 1:
data[col2] = range(min(data[col2]), max(data[col2]) + 1)
This also expands the ranges on the fly as you read the data.
Based on what you have tried -
from collections import default dict
# open archive reader
myFile = open ("myfile.csv","rb")
archive = csv.reader(myFile, delimiter=',')
arch_dict = defaultdict(list)
for rows in archive:
arch_dict[row[1]].append(row[0])
print arch_dict

Categories

Resources