I want to import a database.csv that contains 4 values
key,email1,email2,email3
filename,email#example.com,email2#example.com,email3#example.com
filename2,email#yahoo.com,email#google.com,email#outlook.com
etc,etc,etc,etc
Next I want to separate the column key to equal a list of filenames, and email1, email2, and email3 to another list
key = [filename]
emails = [email#example.com,email2#example.com,email3#example.com]
Current code
import csv
with open('data.csv') as read_csv:
reader = csv.reader(read_csv)
for row in reader:
key = row[0]
emails = row[1::]
return key
return emails
Output is
key = [filename2]
emails = [filename2,email#yahoo.com,email#google.com,email#outlook.com]
What I need is the key to match correspondingly with the emails to pass to another function.
A dictionary sounds like the appropriate solution here.
import csv
result = {}
with open('data.csv') as read_csv:
reader = csv.reader(read_csv)
for row in reader:
result[row[0]] = row[1:]
You can then always access or pass on the values like this:
result[filename2]
Related
Locked. There are disputes about this question’s content being resolved at this time. It is not currently accepting new answers or interactions.
Lets say I have the following example csv file
a,b
100,200
400,500
How would I make into a dictionary like below:
{a:[100,400],b:[200,500]}
I am having trouble figuring out how to do it manually before I use a package, so I understand. Any one can help?
some code I tried
with open("fake.csv") as f:
index= 0
dictionary = {}
for line in f:
words = line.strip()
words = words.split(",")
if index >= 1:
for x in range(len(headers_list)):
dictionary[headers_list[i]] = words[i]
# only returns the last element which makes sense
else:
headers_list = words
index += 1
At the very least, you should be using the built-in csv package for reading csv files without having to bother with parsing. That said, this first approach is still applicable to your .strip and .split technique:
Initialize a dictionary with the column names as keys and empty lists as values
Read a line from the csv reader
Zip the line's contents with the column names you got in step 1
For each key:value pair in the zip, update the dictionary by appending
with open("test.csv", "r") as file:
reader = csv.reader(file)
column_names = next(reader) # Reads the first line, which contains the header
data = {col: [] for col in column_names}
for row in reader:
for key, value in zip(column_names, row):
data[key].append(value)
Your issue was that you were using the assignment operator = to overwrite the contents of your dictionary on every iteration. This is why you either want to pre-initialize the dictionary like above, or use a membership check first to test if the key exists in the dictionary, adding it if not:
key = headers_list[i]
if key not in dictionary:
dictionary[key] = []
dictionary[key].append(words[i])
An even cleaner shortcut is to take advantage of dict.get:
key = headers_list[i]
dictionary[key] = dictionary.get(key, []) + [words[i]]
Another approach would be to take advantage of the csv package by reading each row of the csv file as a dictionary itself:
with open("test.csv", "r") as file:
reader = csv.DictReader(file)
data = {}
for row_dict in reader:
for key, value in row_dict.items():
data[key] = data.get(key, []) + [value]
Another standard library package you could use to clean this up further is collections, with defaultdict(list), where you can directly append to the dictionary at a given key without worrying about initializing with an empty list if the key wasn't already there.
To do that just keep the column name and data seperate then iterate the column and add the value for the corresponding index in data, not sure if this work with empty values.
However, I am much sure that going through pandas would be 100% easier, it's a really used library for working with data in external files.
import csv
datas = []
with open('fake.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
if line_count == 0:
cols = row
line_count += 1
else:
datas.append(row)
line_count += 1
dict = {}
for index, col in enumerate(cols): #Iterate through the data with value and indices
dict[col] = []
for data in datas: #append a in the current dict key, a new value.
#if this key doesn't exist, it will create a new one.
dict[col].append(data[index])
print(dict)
I'm new to python. I'm trying to read a csv file into a dictionary but it is returning the dictionary with the key twice, the first time as the key word, and the second time as one of the columns. Has anyone any idea on how to remove the first column that is already considered to be key?
Here is my code:
def read_dict(filename, key_column_index):
"""Read the contents of a CSV file into a compound
dictionary and return the dictionary.
Parameters
filename: the name of the CSV file to read.
key_column_index: the index of the column
to use as the keys in the dictionary.
Return: a compound dictionary that contains
the contents of the CSV file.
"""
# Create an empty dictionary that will store the data from the CSV file.
csv_dict = {}
with open(filename, "rt") as csv_file:
# Use the csv module to create a reader object that will read from the opened CSV file.
reader = csv.reader(csv_file)
# Skip the first row of data as it contains the header of each column
next(reader)
# Read the rows in the CSV file one row at a time.
# The reader object returns each row as a list.
for row_list in reader:
# From the current row, retrieve the data from the column that contains the key.
key = row_list[key_column_index]
# Store the data from the current row into the dictionary.
csv_dict[key] = row_list
return csv_dict
This will skip over the column at key_column_index using list slicing:
key = row_list[key_column_index]
csv_dict[key] = row_list[:key_column_index] + row_list[key_column_index + 1:]
You can use this feature
import csv
f = open("samp.csv","r")
reader = csv.reader(f)
d = {}
d_reader = csv.DictReader(f)
for l in (d_reader):
print(l)
The returned dictionary is in the form of json records. A list of dictionaries where the key is the column name
Or you can do this if you want the whole column as a list under each column name which serves as the key
import csv
f = open("samp.csv","r")
reader = csv.reader(f)
d = {}
for ri,r in enumerate(reader):
if ri == 0:
column_names = list(r)
if ri > 0:
for ci, c in enumerate(r):
curr_cname = column_names[ci]
if curr_cname not in d:
d[curr_cname] = []
d[curr_cname].append(c)
print(d) # d = {'a': ['1', '5'], 'b': ['2', '6'], 'c': ['3', '7']}
f.close()
Here is what my code looks like:
def read_dict(filename, key_column_index):
"""Read the contents of a CSV file into a compound
dictionary and return the dictionary.
Parameters
filename: the name of the CSV file to read.
key_column_index: the index of the column
to use as the keys in the dictionary.
Return: a compound dictionary that contains
the contents of the CSV file.
"""
# Create an empty dictionary that will store the data from the CSV file.
csv_dict = {}
with open(filename, "rt") as csv_file:
# Use the csv module to create a reader object that will read from the
opened CSV file.
reader = csv.reader(csv_file)
# Skip the first row of data as it contains the header of each column
next(reader)
# Read the rows in the CSV file one row at a time.
# The reader object returns each row as a list.
for row_list in reader:
# From the current row, retrieve the data from the column that contains
the key.
key = row_list[key_column_index]
# Store the data from the current row into the dictionary.
csv_dict[key] = row_list[:key_column_index] + row_list[key_column_index + 1:]
return csv_dict
I have a csv file looks like this:
I have a column called “Inventory”, within that column I pulled data from another source and it put it in a dictionary format as you see.
What I need to do is iterate through the 1000+ lines, if it sees the keywords: comforter, sheets and pillow exist than write “bedding” to the “Location” column for that row, else write “home-fashions” if the if statement is not true.
I have been able to just get it to the if statement to tell me if it goes into bedding or “home-fashions” I just do not know how I tell it to write the corresponding results to the “Location” field for that line.
In my script, im printing just to see my results but in the end I just want to write to the same CSV file.
from csv import DictReader
with open('test.csv', 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
for line in csv_dict_reader:
if 'comforter' in line['Inventory'] and 'sheets' in line['Inventory'] and 'pillow' in line['Inventory']:
print('Bedding')
print(line['Inventory'])
else:
print('home-fashions')
print(line['Inventory'])
The last column of your csv contains commas. You cannot read it using DictReader.
import re
data = []
with open('test.csv', 'r') as f:
# Get the header row
header = next(f).strip().split(',')
for line in f:
# Parse 4 columns
row = re.findall('([^,]*),([^,]*),([^,]*),(.*)', line)[0]
# Create a dictionary of one row
item = {header[0]: row[0], header[1]: row[1], header[2]: row[2],
header[3]: row[3]}
# Add each row to the list
data.append(item)
After preparing your data, you can check with your conditions.
for item in data:
if all([x in item['Inventory'] for x in ['comforter', 'sheets', 'pillow']]):
item['Location'] = 'Bedding'
else:
item['Location'] = 'home-fashions'
Write output to a file.
import csv
with open('output.csv', 'w') as f:
dict_writer = csv.DictWriter(f, data[0].keys())
dict_writer.writeheader()
dict_writer.writerows(data)
csv.DictReader returns a dict, so just assign the new value to the column:
if 'comforter' in line['Inventory'] and ...:
line['Location'] = 'Bedding'
else:
line['Location'] = 'home-fashions'
print(line['Inventory'])
new to python and trying to build a simple CSV reader to create new trades off an existing instrument. Ideally, I'd like to build a dictionary to simplify the parameters required to set up a new trade (instead of using row[1], [2], [3], etc, I'd like to replace with my headers that read Value Date, Trade Date, Price, Quantity, etc.)
I've created dictionary keys below, but am having trouble linking them to my script to create the new trade. What should I put to substitute the rows? Any advice appreciated! Thanks...
Code below:
import acm
import csv
# Opening CSV file
with open('C:\Users\Yina.Huang\Desktop\export\TradeBooking.csv', 'rb') as f:
reader = csv.DictReader(f, delimiter=',')
next(reader, None)
for row in reader:
# Match column header with column number
d = {
row["Trade Time"],
row["Value Day"],
row["Acquire Day"],
row["Instrument"],
row["Price"],
row["Quantity"],
row["Counterparty"],
row["Acquirer"],
row["Trader"],
row["Currency"],
row["Portfolio"],
row["Status"]
}
NewTrade = acm.FTrade()
NewTrade.TradeTime = "8/11/2016 12:00:00 AM"
NewTrade.ValueDay = "8/13/2016"
NewTrade.AcquireDay = "8/13/2016"
NewTrade.Instrument = acm.FInstrument[row["Instrument"]]
NewTrade.Price = row[4]
NewTrade.Quantity = row[5]
NewTrade.Counterparty = acm.FParty[row[6]]
NewTrade.Acquirer = acm.FParty[row[7]]
NewTrade.Trader = acm.FUser[row[8]]
NewTrade.Currency = acm.FCurrency[row[9]]
NewTrade.Portfolio = acm.FPhysicalPortfolio[row[10]]
NewTrade.Premium = (int(row[4])*int(row[5]))
NewTrade.Status = row[11]
print NewTrade
NewTrade.Commit()
The csv module already provides this functionality with the csv.DictReader object.
with open('C:\Users\Yina.Huang\Desktop\export\TradeBooking.csv', 'rb') as f:
reader = csv.DictReader(f)
for row in reader:
NewTrade = acm.FTrade()
NewTrade.TradeTime = row['Trade Time']
NewTrade.ValueDay = row['Value Day']
NewTrade.AcquireDay = row['Aquire Day']
NewTrade.Instrument = acm.Finstrument[row['Instrument']]
NewTrade.Price = row['Price']
NewTrade.Quantity = row['Quantity']
# etc
From the documentation:
Create an object which operates like a regular reader but maps the
information read into a dict whose keys are given by the optional
fieldnames parameter. The fieldnames parameter is a sequence whose
elements are associated with the fields of the input data in order.
These elements become the keys of the resulting dictionary. If the
fieldnames parameter is omitted, the values in the first row of the
csvfile will be used as the fieldnames. If the row read has more
fields than the fieldnames sequence, the remaining data is added as a
sequence keyed by the value of restkey. If the row read has fewer
fields than the fieldnames sequence, the remaining keys take the value
of the optional restval parameter.
I have a CSV file, with columns holding specific values that I read into specific places in a dictionary, and rows separate instances of data that equal one full dictionary. I read in and then use this data to computer certain values, process some of the inputs, etc., for each row before moving on to the next row. My question is, if I have a header that specifics the names of the columns (Key1 versus Key 3A, etc.), can I use that information to avoid the somewhat draw out code I am currently using (below).
with open(input_file, 'rU') as controlFile:
reader = csv.reader(controlFile)
next(reader, None) # skip the headers
for row in reader:
# Grabbing all the necessary inputs
inputDict = {}
inputDict["key1"] = row[0]
inputDict["key2"] = row[1]
inputDict["key3"] = {}
inputDict["key3"].update({"A" : row[2]})
inputDict["key3"].update({"B" : row[3]})
inputDict["key3"].update({"C" : row[4]})
inputDict["key3"].update({"D" : row[5]})
inputDict["key3"].update({"E" : row[6]})
inputDict["Key4"] = {}
inputDict["Key4"].update({"F" : row[7]})
inputDict["Key4"].update({"G" : float(row[8])})
inputDict["Key4"].update({"H" : row[9]})
If you use a DictReader, you can improve your code a bit:
Create an object which operates like a regular reader but maps the
information read into a dict whose keys are given by the optional
fieldnames parameter. The fieldnames parameter is a sequence whose
elements are associated with the fields of the input data in order.
These elements become the keys of the resulting dictionary. If the
fieldnames parameter is omitted, the values in the first row of the
csvfile will be used as the fieldnames.
So, if we utilize that:
import csv
import string
results = []
mappings = [
[(string.ascii_uppercase[i-2], i) for i in range(2, 7)],
[(string.ascii_uppercase[i-2], i) for i in range(7, 10)]]
with open(input_file, 'rU') as control_file:
reader = csv.DictReader(control_file)
for row in reader:
row_data = {}
row_data['key1'] = row['key1']
row_data['key2'] = row['key2']
row_data['key3'] = {k:row[v] for k,v in mappings[0]}
row_data['key4'] = {k:row[v] for k,v in mappings[1]}
results.append(row_data)
yes you can.
import csv
with open(infile, 'rU') as infile:
reader = csv.DictReader(infile)
for row in reader:
print(row)
Take a look at this piece of code.
fields = csv_data.next()
for row in csv_data:
parsed_data.append(dict(zip(fields,row)))