I am having trouble with parsing a specific key from a json string in a table. Below is my code to read a csv file and extract "employee_id" from the json column in each row:
with open('data.csv') as csvFile:
csv_reader = csv.reader(csvFile, delimiter=',')
next(csv_reader, None) # skips the header row
for row in csv_reader:
event_data = row[4]
data = json.loads(event_data)
print(data['employee_id'])
Here is a sample event_data output:
"{\"py/object\": \"employee_information.event_types.EmployeeCreated\", \"employee_id\": \"98765\", \"employee_first_name\": \"Jonathan\", \"employee_last_name\": \"Smith\", \"application_id\": \"1234\", \"address\": \"1234 street\"}"
But I get an error that says:
Traceback (most recent call last):
File "/Users/user/Documents/python_test/main.py", line 14, in <module>
print(data['employee_id'])
TypeError: string indices must be integers
I checked the type for data and it returns a str. I thought that json.loads was suppose to convert the json string into a python dict?
event_data is doubly-encoded for some reason, so you need to decode it twice.
data = json.loads(json.loads(event_data))
Related
I want to read only first column from csv file. I tried the below code but didn't got the result from available solution.
data = open('data.csv')
reader = csv.reader(data)
interestingrows = [i[1] for i in reader]'
The error I got is:
Traceback (most recent call last):
File "G:/Setups/Python/pnn-3.py", line 12, in <module>
interestingrows = [i[1] for i in reader]
File "G:/Setups/Python/pnn-3.py", line 12, in <listcomp>
interestingrows = [i[1] for i in reader]
IndexError: list index out of range
You can also use DictReader to access columns by their header
For example: If you had a file called "stackoverflow.csv" with the headers ("Oopsy", "Daisy", "Rough", and "Tumble")
You could access the first column with this script:
import csv
with open(stackoverflow.csv) as csvFile:
#Works if the file is in the same folder,
# Otherwise include the full path
reader = csv.DictReader(csvFile)
for row in reader:
print(row["Oopsy"])
If you want the first item from an indexable iterable you should use 0 as the index. But in this case you can simply use zip() in order to get an iterator of columns and since the csv.reader returns an iterator you can use next() to get the first column.
with open('data.csv') as data:
reader = csv.reader(data)
first_column = next(zip(*reader))
I am 99% of the way there...
def xl_to_csv(xl_file):
wb = xlrd.open_workbook(xl_file)
sh = wb.sheet_by_index(0)
output = 'output.csv'
op = open(output, 'wb')
wr = csv.writer(op, quoting=csv.QUOTE_ALL)
for rownum in range(sh.nrows):
part_number = sh.cell(rownum,1)
#wr.writerow(sh.row_values(rownum)) #writes entire row
wr.writerow(part_number)
op.close()
using wr.writerow(sh.row_values(rownum)) I can write the entire row from the Excel file to a CSV, but there are like 150 columns and I only want one of them. So, I'm grabbing the one column that I want using part_number = sh.cell(rownum,1), but I can't seem to get the syntax correct to just write this variable out to a CSV file.
Here's the traceback:
Traceback (most recent call last):
File "test.py", line 61, in <module>
xl_to_csv(latest_file)
File "test.py", line 32, in xl_to_csv
wr.writerow(part_number)
_csv.Error: sequence expected
Try this:
wr.writerow([part_number.value])
The argument must be a list-like object.
The quickest fix is to throw your partnum in a list (and as per Abdou you need to add .value to get the value out of a cell):
for rownum in range(sh.nrows):
part_number = sh.cell(rownum,1).value # added '.value' to get value from cell
wr.writerow([part_number]) # added brackets to give writerow the list it wants
More generally, you can use a list comprehension to grab the columns you want:
cols = [1, 8, 110]
for rownum in range(sh.nrows):
wr.writerow([sh.cell(rownum, colnum).value for colnum in cols])
I am trying to read a 3 column csv into a dictionary with the code below. The 1st column is the unique identifier, and the following 2 are information related.
d = dict()
with open('filemane.csv', 'r') as infile:
reader = csv.reader(infile)
mydict = dict((rows[0:3]) for rows in reader)
print mydict
When I run this code I get this error:
Traceback (most recent call last):
File "commissionsecurity.py", line 34, in <module>
mydict = dict((rows[0:3]) for rows in reader)
ValueError: dictionary update sequence element #0 has length 3; 2 is required
Dictionaries need to get a key along with a value. When you have
mydict = dict((rows[0:3]) for rows in reader)
^^^^^^^^^
ambiguous
You are passing in a list that is of length 3 which is not of the length 2 (key, value) format that is expected. The error message hints at this by saying that the length required is 2 and not the 3 that was provided. To fix this make the key be rows[0] and the associated value be rows[1:3]:
mydict = dict((rows[0], rows[1:3]) for rows in reader)
^^^^^^ ^^^^^^^^^
key value
You can do something along the lines of:
with open('filemane.csv', 'r') as infile:
reader = csv.reader(infield)
d={row[0]:row[1:] for row in reader}
Or,
d=dict()
with open('filemane.csv', 'r') as infile:
reader = csv.reader(infield)
for row in reader:
d[row[0]]=row[1:]
I need to convert a .dat file that's in a specific format into a .csv file.
The .dat file has multiple rows with a repeating structure. The data is held in brackets and have tags. Below is the sample data; it repeats throughout the data file:
{"name":"ABSDSDSRF","ID":"AFJDKGFGHF","lat":37,"lng":-122,"type":0,"HAC":5,"verticalAccuracy":4,"course":266.8359375,"area":"san_francisco"}
Can anyone provide a starting point for the script?
This will create a csv assuming each line in your .DAT is json. Just order the header list to your liking
import csv, json
header = ['ID', 'name', 'type', 'area', 'HAC', 'verticalAccuracy', 'course', 'lat', 'lng']
with open('file.DAT') as datfile:
with open('output.csv', 'wb') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=header)
writer.writeheader()
for line in datfile:
writer.writerow(json.loads(line))
Your row is in json format. So, you can use:
import json
data = json.loads('{"name":"ABSDSDSRF","ID":"AFJDKGFGHF","lat":37,"lng":-122,"type":0,"HAC":5,"verticalAccuracy":4,"course":266.8359375,"area":"san_francisco"}')
print data.get('name')
print data.get('ID')
This is only a start point. You have to iter all the .dat file. At the end, you have to write an exporter to save the data into the csv file.
Use a regex to find all of the data items. Use ast.literal_eval to convert each data item into a dictionary. Collect the items in a list.
import re, ast
result = []
s = '''{"name":"ABSDSDSRF","ID":"AFJDKGFGHF","lat":37,"lng":-122,"type":0,"HAC":5,"verticalAccuracy":4,"course":266.8359375,"area":"san_francisco"}'''
item = re.compile(r'{[^}]*?}')
for match in item.finditer(s):
d = ast.literal_eval(match.group())
result.append(d)
If each data item is on a separate line in the file You don't need the regex - you can just iterate over the file.
with open('file.dat') as f:
for line in f:
line = line.strip()
line = ast.literal_eval(line)
result.append(line)
Use json.load:
import json
with open (filename) as fh:
data = json.load (fh)
...
I would like to do a similar process to a csv file as detailed in this question, but I get an error message saying:
TypeError: list indices must be integers
The csv file that I want to rearrange has a combination of float, text, and integer data types. I'm assuming this is the problem, but can't figure out a way to modify the code below to insert the data. It writes the header information in the new CSV file though.
I'm using the same code as suggested by John Machin, but my write names variable uses:
writenames = "ID,average,max,min,median,mode,stddev,skewness,kurtosis".split(",")
reader = csv.reader(open("/home/usrs/chris/Summary.csv", "rb"))
writer = csv.writer(open("/home/usrs/chris/SummaryNEW.csv", "wb"))
readnames = reader.next()
names2indicies = dict((name,index) for index, name in enumerate(readnames))
writeindices = [names2string[name] for name in writenames]
reorderfunct = operator.itemgetter(writeindices)
writer.writerow(writenames)
for row in reader:
writer.writerow(reorderfunct(row))
operator.itemgetter() is all you need:
inp=csv.reader(open(...))
outp=csv.writer(open(...))
map(outp.writerow,map(operator.itemgetter(x,y,z),inp))
Where x,y,z are the columns you want to re-order.
However, since the first row in Summary.csv is headers, then you might consider using DictReader and DictWriter:
writenames = "ID,average,max,min,median,mode,stddev,skewness,kurtosis".split(",")
reader = csv.DictReader(open("/home/usrs/chris/Summary.csv", "rb"))
writer = csv.DictWriter(open("/home/usrs/chris/SummaryNEW.csv", "wb"), \
fieldnames=writenames)
reorderfunct = lambda r: dict([(col, r[col]) for col in writenames])
writer.writeheader()
for row in reader:
writer.writerow(reorderfunct(row))