Related
Each line in my csv file is data on a pet. Ex"Fish, Nemo, April 2nd, Goldfish, Orange." I would like to import that file and create a new object for that pet depending on its type(the first string in each line). For example data about the fish would be stored in a fish object. I then want to put each object into a list.
I've tried:
pets = []
with open('desktop/cs110/pets.csv', 'r') as file:
csvReader = csv.reader(file, delimiter=',')
for row_pets in csvReader:
pets.append(row_pets)
columnNames = ['firstCol', 'secondCol', 'thirdColomn']
lstPets = []
for row_pets in pets:
lstPets.append({key: value for key, value in zip(columnNames, row_pets)})
return lstPets
With csv.DictReader you can accomplish what your current code attempts by specifying fieldnames and assuming your "object" desired is a dictionary:
pets.csv
Fish,Nemo,April 2nd,Goldfish,Orange
Cat,Garfield,June 1st,Tabby,Orange
test.py
import csv
from pprint import pprint
with open('pets.csv', newline='') as file:
reader = csv.DictReader(file, fieldnames='type name bday species color'.split())
data = list(reader)
pprint(data)
Output
[{'bday': 'April 2nd',
'color': 'Orange',
'name': 'Nemo',
'species': 'Goldfish',
'type': 'Fish'},
{'bday': 'June 1st',
'color': 'Orange',
'name': 'Garfield',
'species': 'Tabby',
'type': 'Cat'}]
Hi everyone so I have a question in regards to a issue I am having.
Here is a sample input csv:
Alfa,Beta,Charlie,Delta,Echo,Foxtrot,Golf,Hotel,India,Juliett,Kilo
A,B1,C1,D1,E1,F1,G1,H1,I1,J1,
A,B2,C2,D2,E2,F2,G2,H2,I2,J2,1
B,B3,C3,D3,E3,F3,G3,H3,I3,J3,
B,B4,C4,D4,E4,F4,G4,H4,I4,J4,
This is a version of a code that i have:
import csv
fieldnames_dict = {
'Alfa': 'Alfa_New',
'Echo': 'Echo_New',
'Foxtrot': 'Foxtrot_New_ALL',
'Hotel': 'Hotel_New',
'India': 'India_New',
'Charlie': 'Charlie_New'
}
with open("book1.csv", "r", encoding="utf-8", errors='ignore') as csv_in:
with open("xtest_file.csv", "w", encoding="utf-8", errors='ignore') as csv_out:
reader = csv.DictReader(csv_in, delimiter=',', quotechar='"')
writer = csv.DictWriter(csv_out, delimiter=',', quotechar='"',
fieldnames=list(fieldnames_dict.values()))
writer.writeheader()
for row_in in reader:
row_out = {new: row_in[old] for old, new in fieldnames_dict.items()}
writer.writerow(row_out)
What this code does, is pretty much rearrange the the columns according to the dictionary.
However I need to repeat some rows of the csv file according to a new dictionary and also rename the value of that column's row.
For example:
second_dictionary = {
'A' : '1ST,2ND,3RD",
'B' : '4TH',
}
This dictionary will need to be compared with the column "Alfa") and when it finds that it has the values of 'A', it will then repeat that row with the values of '1ST', '2ND' and '3RD'.
If the column "Alfa" has the value of 'B', it will look at the dictionary and replace only once by it self as the 4TH.
What the output should look like:
Alfa_New,Echo_New,Foxtrot_New_ALL,Hotel_New,India_New,Charlie_New
1ST,E1,F1,H1,I1,C1
2ND,E1,F1,H1,I1,C1
3RD,E1,F1,H1,I1,C1
1ST,E2,F2,H2,I2,C2
2ND,E2,F2,H2,I2,C2
3RD,E2,F2,H2,I2,C2
4TH,E3,F3,H3,I3,C3
4TH,E4,F4,H4,I4,C4
As you can see, since the input file had this row:
Alfa,Beta,Charlie,Delta,Echo,Foxtrot,Golf,Hotel,India,Juliett,Kilo
A,B1,C1,D1,E1,F1,G1,H1,I1,J1,
With the second dictionary, it will need to replace the 'A' from Alfa and copy it self 3 times but replacing the values from the second dictionary:
Alfa_New,Echo_New,Foxtrot_New_ALL,Hotel_New,India_New,Charlie_New
1ST,E1,F1,H1,I1,C1
2ND,E1,F1,H1,I1,C1
3RD,E1,F1,H1,I1,C1
What will I need to do for that?
Maybe this could help:
import csv
fieldnames_dict = {
'Alfa': 'Alfa_New',
'Echo': 'Echo_New',
'Foxtrot': 'Foxtrot_New_ALL',
'Hotel': 'Hotel_New',
'India': 'India_New',
'Charlie': 'Charlie_New'
}
second_dictionary = {
'A': '1ST,2ND,3RD',
'B': '4TH',
}
with open("book1.csv", "r", encoding="utf-8", errors='ignore') as csv_in:
with open("xtest_file.csv", "w", encoding="utf-8", errors='ignore') as csv_out:
reader = csv.DictReader(csv_in, delimiter=',', quotechar='"')
writer = csv.DictWriter(csv_out, delimiter=',', quotechar='"',
fieldnames=list(fieldnames_dict.values()))
writer.writeheader()
for row_in in reader:
row_out = {new: row_in[old] for old, new in fieldnames_dict.items()}
column_name = 'Alfa_New'
column_key = row_out[column_name]
if column_key in second_dictionary.keys():
for item in second_dictionary[column_key].split(','):
row_out[column_name] = item
writer.writerow(row_out)
else:
writer.writerow(row_out)
I'm working on cs50's pset6, DNA, and I want to read a csv file that looks like this:
name,AGATC,AATG,TATC
Alice,2,8,3
Bob,4,1,5
Charlie,3,2,5
And what I want to create is a nested dictionary, that would look like this:
data_dict = {
"Alice" : {
"AGATC" : 2,
"AATG" : 8,
"TATC" : 3
},
"Bob" : {
"AGATC" : 4,
"AATG" : 1,
"TATC" : 5
},
"Charlie" : {
"AGATC" : 3,
"AATG" : 2,
"TATC" : 5
}
}
So I want to use this:
with open(argv[1]) as data_file:
for i in data_file:
(Or another variation) To loop through the csv file and append to the dictionary adding all of the values so that I have a database that I can later access.
You should use python's csv.DictReader module
import csv
data_dict = {}
with open(argv[1]) as data_file:
reader = csv.DictReader(data_file)
for record in reader:
# `record` is a OrderedDict (type of dict) of column-name & value.
# Instead of creating the data pair as below:
# ```
# name = record["name"]
# data = {
# "AGATC": record["AGATC"],
# "AATG": record["AATG"],
# "TATC": record["TATC"],
# ...
# }
# data_dict[name] = data
# ```
# you can just delete the `name` column from `record`
name = record["name"]
del record["name"]
data_dict[name] = record
print(data_dict)
Using simple file read
with open(argv[1], 'r') as data_file:
line = next(data_file) # get the first line from file (i.e. header)
hdr = line.rstrip().split(',') # convert header string to comma delimited list
# ['name', 'AGATC', 'AATG', 'TATC']
data_dic = {}
for line in data_file:
line = line.rstrip().split(',')
# name and dictionary for current line
data_dic[line[0]] = {k:v for k, v in zip(hdr[1:], line[1:])}
print(data_dic)
Output
{'Alice': {'AATG': '8', 'AGATC': '2', 'TATC': '3'},
'Bob': {'AATG': '1', 'AGATC': '4', 'TATC': '5'},
'Charlie': {'AATG': '2', 'AGATC': '3', 'TATC': '5'}}
I am trying to write a code that will take in json values from Kafka and output them to a .csv file. The issue is that, for grades, the values have either science and math OR just english as nested objects.
This is what the data looks like:
{'id': 0, 'name': 'Susan', 'lastName': 'Johnsan', 'grades': {'science':
78, 'math': 89}}
{'id': 1, 'name': 'Mary', 'lastName': 'Davids', 'grades': {'english':
85}}
However when I run my code I keep getting the error TypeError: string indices must be integers.
from kafka import KafkaConsumer
import json
import csv
import sys
from datetime import datetime
import os
# connect to kafka topic
kaf = KafkaConsumer('students.all.events')
outputfile = 'C:\\Users\\Documents\\students_output.csv'
outfile = open(outputfile, mode='w', newline='')
master_key = ['id', 'name', 'lastName', 'science', 'math', 'english']
writer = csv.DictWriter(outfile, master_key, delimiter="|")
writer.writeheader()
'''
writer = csv.writer(outfile)
writer.writerow(['JSON_Data'])
'''
i = 1
for row in kaf:
if i < 5000:
json_row = json.loads(row.value)
print('Row: ', i)
print(json_row)
dict = {'id': json_row['id'], 'name': json_row['name'], 'lastName': json_row['lastName']}
for value in json_row['grades']:
if value['science'] is not None:
dict['science'] = value['science']
dict['math'] = value['math']
elif value['english'] is not None:
dict['english'] = value['english']
writer.writerow(dict)
i += 1
else:
break
outfile.close()
Please check if the value variable is actually of type dict, because the error you get, in general, means that you are trying to access a string object in a dict[key] way.
It looks like you have a typo - at least in the code that you pasted here. There is an extra double quote after the lastName key.
Based off of the help #TenorFlyy gave me, I changed my code to fix the issue:
from kafka import KafkaConsumer
import json
import csv
import sys
from datetime import datetime
import os
# connect to kafka topic
kaf = KafkaConsumer('students.all.events')
outputfile = 'C:\\Users\\Documents\\students_output.csv'
outfile = open(outputfile, mode='w', newline='')
master_key = ['id', 'name', 'lastName', 'science', 'math', 'english']
writer = csv.DictWriter(outfile, master_key, delimiter="|")
writer.writeheader()
'''
writer = csv.writer(outfile)
writer.writerow(['JSON_Data'])
'''
i = 1
for row in kaf:
if i < 5000:
json_row = json.loads(row.value)
print('Row: ', i)
print(json_row)
dict = {'id': json_row['id'], 'name': json_row['name'], 'lastName': json_row['lastName']}
for key, value in json_row['grades'].items():
dict[key] = value
writer.writerow(dict)
i += 1
else:
break
outfile.close()
I'm wondering if anyone has a sort of hacky / cool solution to this problem . I have a text file like so:
NAME:name
ID:id
PERSON:person
LOCATION:location
NAME:name
morenamestuff
ID:id
PERSON:person
LOCATION:location
JUNK
So I have some blocks that all contain lines that can be split into a dict, and some that cannot. How can I take lines without the : character and join them to the previous line? Here's what I'm currently doing
# loop through chunk
# the first element of dat is a Title, so skip that
key_map = dict(x.split(':') for x in dat[1:])
But I of course get an error because the second chunk has a line without the : character. So I wanted my dict to look something like this after correctly splitting it:
# there will be a key_map for each chunk of data
key_map['NAME'] == 'name morenamestuff' # 3rd line appended to previous
key_map['ID'] == 'id'
key_map['PERSON'] = 'person'
key_map['LOCATION'] = 'location
Solution
EDIT: Here's my final solution on github, and the full code here:
parseScript.py
import re
import string
bad_chars = '(){}"<>[] ' # characers we want to strip from the string
key_map = []
# parse file
with open("dat.txt") as f:
data = f.read()
data = data.strip('\n')
data = re.split('}|\[{', data)
# format file
with open("format.dat") as f:
formatData = [x.strip('\n') for x in f.readlines()]
data = filter(len, data)
# strip and split each station
for dat in data[1:-1]:
# perform black magic, don't even try to understand this
dat = dat.translate(string.maketrans("", "", ), bad_chars).split(',')
key_map.append(dict(x.split(':') for x in dat if ':' in x ))
if ':' not in dat[1]:key_map['NAME']+=dat[k][2]
for station in range(0, len(key_map)):
for opt in formatData:
print opt,":",key_map[station][opt]
print ""
dat.txt
View raw here
format.dat
NAME
STID
LONGITUDE
LATITUDE
ELEVATION
STATE
ID
out.dat
View raw here
When in doubt, write your own generator.
Add in itertools.groupby to chunk by groups of text delimited by whitespace breaks.
def chunker(s):
it = iter(s)
out = [next(it)]
for line in it:
if ':' in line or not line:
yield ' '.join(out)
out = []
out.append(line)
if out:
yield ' '.join(out)
usage:
from itertools import groupby
[dict(x.split(':') for x in g) for k,g in groupby(chunker(lines), bool) if k]
Out[65]:
[{'ID': 'id', 'LOCATION': 'location', 'NAME': 'name', 'PERSON': 'person'},
{'ID': 'id',
'LOCATION': 'location',
'NAME': 'name morenamestuff',
'PERSON': 'person'}]
(if those fields are always the same, I'd go with something like creating some namedtuples instead of a bunch of dicts)
from collections import namedtuple
Thing = namedtuple('Thing', 'ID LOCATION NAME PERSON')
[Thing(**dict(x.split(':') for x in g)) for k,g in groupby(chunker(lines), bool) if k]
Out[76]:
[Thing(ID='id', LOCATION='location', NAME='name', PERSON='person'),
Thing(ID='id', LOCATION='location', NAME='name morenamestuff', PERSON='person')]
Here is something that addresses all your requirements. It handles joining of multiple lines, ignoring blank lines, and ignoring junk lines that do not appear within a block. It is implemented as a generator that yields each dictionary as it is completed.
def parser(data):
d = {}
for line in data:
line = line.strip()
if not line:
if d:
yield d
d = {}
else:
if ':' in line:
key, value = line.split(':')
d[key] = value
else:
if d:
d[key] = '{} {}'.format(d[key], line)
if d:
yield d
When run with this data:
ignore me
NAME:name1
ID:id1
PERSON:person1
LOCATION:location1
NAME:name2
morenamestuff
ID:id2
PERSON:person2
LOCATION:location2
junk
and
other
stuff
NAME:name3
morenamestuff
and more
ID:id3
PERSON:person3
more person stuff
LOCATION:location3
JUNK
MORE JUNK
>>> for d in parser(open('data')):
... print d
{'PERSON': 'person1', 'LOCATION': 'location1', 'NAME': 'name1', 'ID': 'id1'}
{'PERSON': 'person2', 'LOCATION': 'location2', 'NAME': 'name2 morenamestuff', 'ID': 'id2'}
{'PERSON': 'person3 more person stuff', 'LOCATION': 'location3', 'NAME': 'name3 morenamestuff and more', 'ID': 'id3'}
You can grab the lot as a list:
>>> results = list(parser(open('data')))
>>> results
[{'PERSON': 'person1', 'LOCATION': 'location1', 'NAME': 'name1', 'ID': 'id1'}, {'PERSON': 'person2', 'LOCATION': 'location2', 'NAME': 'name2 morenamestuff', 'ID': 'id2'}, {'PERSON': 'person3 more person stuff', 'LOCATION': 'location3', 'NAME': 'name3 morenamestuff and more', 'ID': 'id3'}]
I don't find itertools or regex particularly nice to work with, here's a pure-python solution
separator = ':'
output = []
chunk = None
with open('/tmp/stuff.txt') as f:
for line in (x.strip() for x in f):
if not line:
# we are between 'chunks'
chunk, key = None, None
continue
if chunk is None:
# we are at the beginning of a new 'chunk'
chunk, key = {}, None
output.append(chunk)
if separator in line:
key, val = line.split(separator)
chunk[key] = val
else:
chunk[key] += line
not as elegant, as you requested, but this works
dat=[['NAME:name',
'ID:id',
'PERSON:person',
'LOCATION:location'],
['NAME:name',
'morenamestuff',
'ID:id',
'PERSON:person',
'LOCATION:location']]
k=1
key_map = dict(x.split(':') for x in dat[k] if ':' in x )
if ':' not in dat[k][1]:key_map['NAME']+=dat[k][1]
key_map>>
{'ID': 'id',
'LOCATION': 'location',
'NAME': 'namemorenamestuff',
'PERSON': 'person'}
Just add something to lines with no ":".
if line.find(':') == -1:
line=line+':None'
Then you won't get an error.