Create Multiple Python dictionaries from CSV using same column as key - python

I have a csv:
Col1, Col2, Col3, ...
10, 0.024, 0.0012, ...
20, 0.0013, 0.43, ...
I want a list of dictionaries like so
[{"Col1":"Col2"}, {"Col1": "Col3"},...]
with Col1 always as the key for each dictionary
I've tried this and it works for the first dictionary, but produces empty
dictionaries for all the others.
import os, csv
path = r"I:\ARC\WIP\KevinWIP\Risk\Data\PythonGui"
os.chdir(path)
with open('DispersalKernal10m.csv', mode = 'r') as infile:
reader = csv.reader(infile)
DistProb_LUT = [
{rows[0]:rows[1] for rows in reader},
{rows[0]:rows[2] for rows in reader},
{rows[0]:rows[3] for rows in reader},
{rows[0]:rows[4] for rows in reader},
{rows[0]:rows[5] for rows in reader},
{rows[0]:rows[6] for rows in reader},
{rows[0]:rows[7] for rows in reader}]
infile.close()
print(DistProb_LUT)
Searched around and everything I tried didn't work. Any suggestions appreciated.

For creating the first dictionary itself, you are looping through the entire file and reaching the end of file, so for all other dictionaries your file cursor is always at end and hence all other dictionaries are empty. Instead of doing dictionary comprehension like that, use a for loop outside the dictionary creation portion and change the logic a little bit, like below -
import os, csv
path = r'I:\ARC\WIP\KevinWIP\Risk\Data\PythonGui'
os.chdir(path)
DistProb_LUT = [{} for _ in range(7)]
with open('DispersalKernal10m.csv', mode = 'r') as infile:
reader = csv.reader(infile)
for rows in reader:
for i in range(7):
DistProb_LUT[i][rows[0]] = rows[i+1]
You also do not need to close the infile as it would be automatically closed by with statement.

Reading a file is typically not an operation that you can repeat many times in a row without reopening the file. Therefore, something like this may be of use to you:
import os, csv
path = r'I:\ARC\WIP\KevinWIP\Risk\Data\PythonGui'
os.chdir(path)
DistProb_LUT = [{} for i in range(7)]
with open('DispersalKernal10m.csv', mode = 'r') as infile:
reader = csv.reader(infile)
for row in reader:
DistProb_LUT[0][row[0]] = row[1]
DistProb_LUT[1][row[0]] = row[2]
DistProb_LUT[2][row[0]] = row[3]
DistProb_LUT[3][row[0]] = row[4]
DistProb_LUT[4][row[0]] = row[5]
DistProb_LUT[5][row[0]] = row[6]
DistProb_LUT[6][row[0]] = row[7]
print(DistProb_LUT)
Also, you don't need the infile.close() line. The with statement takes care of that for you.

Related

python csv file add to field based off another field

I have a csv file looks like this:
I have a column called “Inventory”, within that column I pulled data from another source and it put it in a dictionary format as you see.
What I need to do is iterate through the 1000+ lines, if it sees the keywords: comforter, sheets and pillow exist than write “bedding” to the “Location” column for that row, else write “home-fashions” if the if statement is not true.
I have been able to just get it to the if statement to tell me if it goes into bedding or “home-fashions” I just do not know how I tell it to write the corresponding results to the “Location” field for that line.
In my script, im printing just to see my results but in the end I just want to write to the same CSV file.
from csv import DictReader
with open('test.csv', 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
for line in csv_dict_reader:
if 'comforter' in line['Inventory'] and 'sheets' in line['Inventory'] and 'pillow' in line['Inventory']:
print('Bedding')
print(line['Inventory'])
else:
print('home-fashions')
print(line['Inventory'])
The last column of your csv contains commas. You cannot read it using DictReader.
import re
data = []
with open('test.csv', 'r') as f:
# Get the header row
header = next(f).strip().split(',')
for line in f:
# Parse 4 columns
row = re.findall('([^,]*),([^,]*),([^,]*),(.*)', line)[0]
# Create a dictionary of one row
item = {header[0]: row[0], header[1]: row[1], header[2]: row[2],
header[3]: row[3]}
# Add each row to the list
data.append(item)
After preparing your data, you can check with your conditions.
for item in data:
if all([x in item['Inventory'] for x in ['comforter', 'sheets', 'pillow']]):
item['Location'] = 'Bedding'
else:
item['Location'] = 'home-fashions'
Write output to a file.
import csv
with open('output.csv', 'w') as f:
dict_writer = csv.DictWriter(f, data[0].keys())
dict_writer.writeheader()
dict_writer.writerows(data)
csv.DictReader returns a dict, so just assign the new value to the column:
if 'comforter' in line['Inventory'] and ...:
line['Location'] = 'Bedding'
else:
line['Location'] = 'home-fashions'
print(line['Inventory'])

how to select a specific column of a csv file in python

I am a beginner of Python and would like to have your opinion..
I wrote this code that reads the only column in a file on my pc and puts it in a list.
I have difficulties understanding how I could modify the same code with a file that has multiple columns and select only the column of my interest.
Can you help me?
list = []
with open(r'C:\Users\Desktop\mydoc.csv') as file:
for line in file:
item = int(line)
list.append(item)
results = []
for i in range(0,1086):
a = list[i-1]
b = list[i]
c = list[i+1]
results.append(b)
print(results)
You can use pandas.read_csv() method very simply like this:
import pandas as pd
my_data_frame = pd.read_csv('path/to/your/data')
results = my_data_frame['name_of_your_wanted_column'].values.tolist()
A useful module for the kind of work you are doing is the imaginatively named csv module.
Many csv files have a "header" at the top, this by convention is a useful way of labeling the columns of your file. Assuming you can insert a line at the top of your csv file with comma delimited fieldnames, then you could replace your program with something like:
import csv
with open(r'C:\Users\Desktop\mydoc.csv') as myfile:
csv_reader = csv.DictReader(myfile)
for row in csv_reader:
print ( row['column_name_of_interest'])
The above will print to the terminal all the values that match your specific 'column_name_of_interest' after you edit it to match your particular file.
It's normal to work with lots of columns at once, so that dictionary method of packing a whole row into a single object, addressable by column-name can be very convenient later on.
To a pure python implementation, you should use the package csv.
data.csv
Project1,folder1/file1,data
Project1,folder1/file2,data
Project1,folder1/file3,data
Project1,folder1/file4,data
Project1,folder2/file11,data
Project1,folder2/file42a,data
Project1,folder2/file42b,data
Project1,folder2/file42c,data
Project1,folder2/file42d,data
Project1,folder3/filec,data
Project1,folder3/fileb,data
Project1,folder3/filea,data
Your python program should read it by line
import csv
a = []
with open('data.csv') as csv_file:
reader = csv.reader(csv_file, delimiter=',')
for row in reader:
print(row)
# ['Project1', 'folder1/file1', 'data']
If you print the row element you will see it is a list like that
['Project1', 'folder1/file1', 'data']
If I would like to put in my list all elements in column 1, I need to put that element in my list, doing:
a.append(row[1])
Now in list a I will have a list like:
['folder1/file1', 'folder1/file2', 'folder1/file3', 'folder1/file4', 'folder2/file11', 'folder2/file42a', 'folder2/file42b', 'folder2/file42c', 'folder2/file42d', 'folder3/filec', 'folder3/fileb', 'folder3/filea']
Here is the complete code:
import csv
a = []
with open('data.csv') as csv_file:
reader = csv.reader(csv_file, delimiter=',')
for row in reader:
a.append(row[1])

Building list of dicts from csv using Python

I have a CSV file containing three columns and many rows. All the data is Strings. I am trying to read the CSV one line at a time and convert each row to a Dict which I then want to append to a list so I have a List of Dicts. The environment is AWS Lambda and the CSV comes from an S3 bucket.
My code:
csv_object = s3.Object('MyBucket', 'My.csv')
csv_file = csv_object.get()['Body'].read().decode('utf-8')
f = StringIO(csv_file)
reader = csv.reader(f, delimiter=',')
list_of_json = []
mydevice = {}
for row in reader:
mydevice["device"] = row[0]
mydevice["serial"] = row[1]
mydevice["software"] = row[2]
list_of_json.append(mydevice)
The software runs (ie, doesn't error), but it doesn't produce the desired result. If I print(list_of_json) after the for loop completes, I want it to produce this;
[{"device":"Dev1", "serial":"Ser1", "software":"software1"},{.....}]
But what is actually produces is just an empty list... as if the append statement doesn't even exist;
[]
The CSV reading and for row in reader: parts all seem to work fine. If I do a print(mydevice) inside the for loop I can see it working its way through all the devices successfully, but for reasons I can't fathom, the append statement never seems to append anything to the list_of_json list.
Why not just use csv.DictReader
csv_object = s3.Object('MyBucket', 'My.csv')
csv_file = csv_object.get()['Body'].read().decode('utf-8')
f = StringIO(csv_file)
reader = csv.DictReader(f, ('device', 'serial', 'software'))
list_of_json = [dict(device) for device in reader]
#also don't forget to
f.close() #or use contextlib.closing
You need to create a new dictionary inside the loop:
csv_object = s3.Object('MyBucket', 'My.csv')
csv_file = csv_object.get()['Body'].read().decode('utf-8')
f = StringIO(csv_file)
reader = csv.reader(f, delimiter=',')
list_of_json = []
for row in reader:
mydevice = {}
mydevice["device"] = row[0]
mydevice["serial"] = row[1]
mydevice["software"] = row[2]
list_of_json.append(mydevice)

Replace column in csv with modified column

I got a csv file with a couple of columns and a header containing 4 rows. The first column contains the timestamp. Unfortunately it also gives milliseconds, but whenever those are at 00, they are not given in the file. It looks like that:
"TOA5","CR1000","CR1000","E9048"
"TIMESTAMP","RECORD","BattV_Avg","PTemp_C_Avg"
"TS","RN","Volts","Deg C"
"","","Avg","Avg"
"2015-08-28 12:40:23.51",1,12.91,32.13
"2015-08-28 12:50:43.23",2,12.9,32.34
"2015-08-28 13:12:22",3,12.91,32.54
As I don't need the milliseconds, I want to get rid of those, as this makes further calculations containing time a bit complicated. My approach so far:
Extract first 20 digits in each row to get a format such as 2015-08-28 12:40:23
timestamp = []
with open(filepath) as f:
for _ in xrange(4): #skip 4 header rows
next(f)
for line in f:
time = line[1:20] #Get values for the current line
timestamp.append(time) #Add values to list
From here on I'm struggling on how to procede further. I want to exchange the first column in the csv file with the newly created timestamp list.
I tried creating a dictionary, but I don't know how to use the header caption in row 2 as the key:
d = {}
with open(filepath, 'rb') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for col in csv_reader:
#use header info from row 2 as key here
This would import the whole csv file into a dict and I'd then change the TIMESTAMP entry in the dict with the timestamp list above. Is this even possible?
Or is there an easier approach on how to just change the first column in the csv with my new list so that my csv file in the end contains the timestamp just without the millisecond information?
So the first column in my csv should look like this:
"TOA5"
"TIMESTAMP"
"TS"
""
2015-08-28 12:40:23
2015-08-28 12:50:43
2015-08-28 13:12:22
This should do it and preserve the quoting:
with open(filepath1, 'rb') as fin, open(filepath2, 'wb') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout, quoting=csv.QUOTE_NONNUMERIC)
for _ in xrange(4): # copy first 4 header rows
writer.writerow(next(reader))
for row in reader: # process data lines
row[0] = row[0][:19] # strip fractional seconds from first column
writer.writerow([row[0], int(row[1])] + map(float, row[2:]))
Since a csv.reader returns the columns of each row as a list of strings, it's necessary to convert any which contain numeric values into their actual int or float numeric value before they're written out to prevent them from being quoted.
I believe you can easily create a new csv from iterating over the original csv and replacing the timestamp as you want.
Example -
with open(filepath, 'rb') as csv_file, open('<new file>','wb') as outfile:
csv_reader = csv.reader(csv_file, delimiter=',')
csv_writer = csv.writer(outfile, delimiter=',')
for i, row in enumerate(csv_reader): #Enumerating as we only need to change rows after 3rd index.
if i <= 3:
csv_writer.writerow(row)
else:
csv_writer.writerow([row[0][1:20]] + row[1:])
I'm not entirely sure about how to parse your csv but I would do something of the sort:
time = time.split(".")[0]
so if it does have a millisecond it would get removed and if it doesn't nothing will happen.

Python: General CSV file parsing and manipulation

The purpose of my Python script is to compare the data present in multiple CSV files, looking for discrepancies. The data are ordered, but the ordering differs between files. The files contain about 70K lines, weighing around 15MB. Nothing fancy or hardcore here. Here's part of the code:
def getCSV(fpath):
with open(fpath,"rb") as f:
csvfile = csv.reader(f)
for row in csvfile:
allRows.append(row)
allCols = map(list, zip(*allRows))
Am I properly reading from my CSV files? I'm using csv.reader, but would I benefit from using csv.DictReader?
How can I create a list containing whole rows which have a certain value in a precise column?
Are you sure you want to be keeping all rows around? This creates a list with matching values only... fname could also come from glob.glob() or os.listdir() or whatever other data source you so choose. Just to note, you mention the 20th column, but row[20] will be the 21st column...
import csv
matching20 = []
for fname in ('file1.csv', 'file2.csv', 'file3.csv'):
with open(fname) as fin:
csvin = csv.reader(fin)
next(csvin) # <--- if you want to skip header row
for row in csvin:
if row[20] == 'value':
matching20.append(row) # or do something with it here
You only want csv.DictReader if you have a header row and want to access your columns by name.
This should work, you don't need to make another list to have access to the columns.
import csv
import sys
def getCSV(fpath):
with open(fpath) as ifile:
csvfile = csv.reader(ifile)
rows = list(csvfile)
value_20 = [x for x in rows if x[20] == 'value']
If I understand the question correctly, you want to include a row if value is in the row, but you don't know which column value is, correct?
If your rows are lists, then this should work:
testlist = [row for row in allRows if 'value' in row]
post-edit:
If, as you say, you want a list of rows where value is in a specified column (specified by an integer pos, then:
testlist = []
pos = 20
for row in allRows:
testlist.append([element if index != pos else 'value' for index, element in enumerate(row)])
(I haven't tested this, but let me now if that works).

Categories

Resources