Export Elasticsearch data results into a CSV file

Export Elasticsearch data results into a CSV file - python

I would like to export my elastic search data that is being returned from my query to a CSV file. I have been looking everywhere to do this but all my attemps on this is failing on how to do it.
my query is below;
import elasticsearch
import unicodedata
import csv
es = elasticsearch.Elasticsearch(["9200"])
# this returns up to 100 rows, adjust to your needs
res = es.search(index="search", body=
{
"_source": ["CN", "NCR", "TRDT"],
"query": {
"bool": {
"should": [
{"wildcard": {
"CN": "TEST*"
}
}]
}
}},size=10)
sample = res['hits']['hits']
# then open a csv file, and loop through the results, writing to the csv
with open('outputfile.tsv', 'wb') as csvfile:
filewriter = csv.writer(csvfile, delimiter='\t', # we use TAB delimited, to handle cases where freeform text may have a comma
quotechar='|', quoting=csv.QUOTE_MINIMAL)
# create column header row
filewriter.writerow(["CN", "NCR", "TRDT"]) #change the column labels here
# fill columns 1, 2, 3 with your data
col1 = hit["some"]["deeply"]["nested"]["field"].decode('utf-8') #replace these nested key names with your own
col1 = col1.replace('\n', ' ')
# col2 = , col3 = , etc...
for hit in sample:
filewriter.writerow(col1)
could someone fix this code up for it to work please, been spending hours on this to make it work.
the error i am getting when running this is below:
Traceback (most recent call last):
File "C:/Users/.PyCharmCE2017.2/config/scratches/trtr.py", line 30, in <module>
filewriter.writerow(["CN", "NCR", "TRDT"]) # change the column labels here
TypeError: a bytes-like object is required, not 'str'
thank you in advance.

Replace wb with w in with open('outputfile.tsv', 'wb') since you are using Python 3.x

Related

Is there a way to rename empty header CSV file converting to JSON in Python without pandas?

I am trying to convert a CSV file to JSON but there is a header in my csv that is empty. Is there a way to name it when outputting it to JSON?
Example data
"" Calories Fat Sodium
Bread 100 10 23
I got this code from geeksforgeeks
import csv
import json
# Function to convert a CSV to JSON
# Takes the file paths as arguments
def make_json(csvFilePath, jsonFilePath):
# create a dictionary
data = {}
# Open a csv reader called DictReader
with open(csvFilePath, encoding='utf-8') as csvf:
csvReader = csv.DictReader(csvf)
# Convert each row into a dictionary
# and add it to data
for rows in csvReader:
# Assuming a column named 'No' to
# be the primary key
key = rows['']
data[key] = rows
# Open a json writer, and use the json.dumps()
# function to dump data
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(data, indent=4))
# Driver Code
# Decide the two file paths according to your
# computer system
csvFilePath = r'Names.csv'
jsonFilePath = r'Names.json'
# Call the make_json function
make_json(csvFilePath, jsonFilePath)
I did this and it gets the first row, but i'm not sure how to rename it when i output it to JSON.
It appears as "":"Bread" in the JSON file.
key = rows['']
Thanks in advance if anyone can help!
Edit: Expected output
{
"Food": "Bread",
"Calories": "45",
"Fat (g)": "0",
"Carb. (g)": "11",
"Fiber (g)": "0",
"Protein": "0",
"Sodium": "10"
}

Not getting expected output in python when converting a csv to json

I have an excel file in which data is saved in csv format in such a way.This data is present in the excel file as shown below,under column A (The CSV File is generated by LabView Software code which i have written to generate data).I have also attached an image of the csv file for reference at the end of my question.
RPM,Load Current,Battery Output,Power Capacity
1200,30,12,37
1600,88,18,55
I want to create a Json file in such format
{
"power_capacity_data" :
{
"rpm" : ["1200","1600"],
"load_curr" : ["30","88"],
"batt_output" : ["12","18"],
"power_cap" : ["37","55"]
}
}
This is my code
import csv
import json
def main():
#created a dictionary so that i can append data to it afterwards
power_data = {"rpm":[],"load_curr":[],"batt_output":[],"power_cap":[]}
with open('power1.lvm') as f:
reader = csv.reader(f)
#trying to append the data of column "RPM" to dictionary
rowcount = 0
for row in reader:
if rowcount == 0:
#trying to skip the first row
rowcount = rowcount + 1
else:
power_data['rpm'].append(row[0])
print(row)
json_report = {}
json_report['pwr_capacity_data'] = power_data
with open('LVMJSON', "w") as f1:
f1.write(json.dumps(json_report, sort_keys=False, indent=4, separators=(',', ': '),encoding="utf-8",ensure_ascii=False))
f1.close()
if __name__ == "__main__":
main()
The output json file that i am getting is this:(please ignore the print(row) statement in my code)
{
"pwr_capacity_data":
{
"load_curr": [],
"rpm": [
"1200,30,12.62,37.88",
"1600,88,18.62,55.88"
],
"batt_output": [],
"power_cap": []
}
}
The whole row is getting saved in the list,but I just want the values under the column RPM to be saved .Can someone help me out with what I may be doing wrong.Thanks in advance.I have attached an image of csv file to just in case it helps

You could use Python's defaultdict to make it a bit easier. Also a dictionary to map all your header values.
from collections import defaultdict
import csv
import json
power_data = defaultdict(list)
header_mappings = {
'RPM' : 'rpm',
'Load Current' : 'load_curr',
'Battery Output' : 'batt_output',
'Power Capacity' : 'power_cap'}
with open('power1.lvm', newline='') as f_input:
csv_input = csv.DictReader(f_input)
for row in csv_input:
for key, value in row.items():
power_data[header_mappings[key]].append(value)
with open('LVMJSON.json', 'w') as f_output:
json.dump({'power_capacity_data' : power_data}, f_output, indent=2)
Giving you an output JSON file looking like:
{
"power_capacity_data": {
"batt_output": [
"12",
"18"
],
"power_cap": [
"37",
"55"
],
"load_curr": [
"30",
"88"
],
"rpm": [
"1200",
"1600"
]
}
}

JSON Parsing help in Python

I have below data in JSON format, I have started with code below which throws a KEY ERROR.
Not sure how to get all data listed in headers section.
I know I am not doing it right in json_obj['offers'][0]['pkg']['Info']: but not sure how to do it correctly.
how can I get to different nodes like info,PricingInfo,Flt_Info etc?
{
"offerInfo":{
"siteID":"1",
"language":"en_US",
"currency":"USD"
},
"offers":{
"pkg":[
{
"offerDateRange":{
"StartDate":[
2015,
11,
8
],
"EndDate":[
2015,
11,
14
]
},
"Info":{
"Id":"111"
},
"PricingInfo":{
"BaseRate":1932.6
},
"flt_Info":{
"Carrier":"AA"
}
}
]
}
}
import os
import json
import csv
f = open('api.csv','w')
writer = csv.writer(f,delimiter = '~')
headers = ['Id' , 'StartDate', 'EndDate', 'Id', 'BaseRate', 'Carrier']
default = ''
writer.writerow(headers)
string = open('data.json').read().decode('utf-8')
json_obj = json.loads(string)
for pkg in json_obj['offers'][0]['pkg']['Info']:
row = []
row.append(json_obj['id']) # just to test,but I need column values listed in header section
writer.writerow(row)

It looks like you're accessing the json incorrectly. After you have accessed json_obj['offers'], you accessed [0], but there is no array there. json_obj['offers'] gives you another dictionary.
For example, to get PricingInfo like you asked, access like this:
json_obj['offers']['pkg'][0]['PricingInfo']
or 11 from the StartDate like this:
json_obj['offers']['pkg'][0]['offerDateRange']['StartDate'][1]
And I believe you get the KEY ERROR because you access [0] in the dictionary, which since that isn't a key, you get the error.

try to substitute this piece of code:
for pkg in json_obj['offers'][0]['pkg']['Info']:
row = []
row.append(json_obj['id']) # just to test,but I need column values listed in header section
writer.writerow(row)
With this:
for pkg in json_obj['offers']['pkg']:
row.append(pkg['Info']['Id'])
year = pkg['offerDateRange']['StartDate'][0]
month = pkg['offerDateRange']['StartDate'][1]
day = pkg['offerDateRange']['StartDate'][2]
StartDate = "%d-%d-%d" % (year,month,day)
print StartDate
writer.writerow(row)

Try this
import os
import json
import csv
string = open('data.json').read().decode('utf-8')
json_obj = json.loads(string)
print json_obj["offers"]["pkg"][0]["Info"]["Id"]
print str(json_obj["offers"]["pkg"][0]["offerDateRange"]["StartDate"][0]) +'-'+ str(json_obj["offers"]["pkg"][0]["offerDateRange"]["StartDate"][1])+'-'+str(json_obj["offers"]["pkg"][0]
["offerDateRange"]["StartDate"][2])
print str(json_obj["offers"]["pkg"][0]["offerDateRange"]["EndDate"][0]) +'-'+ str(json_obj["offers"]["pkg"][0]["offerDateRange"]["EndDate"][1])+'-'+str(json_obj["offers"]["pkg"][0]
["offerDateRange"]["EndDate"][2])
print json_obj["offers"]["pkg"][0]["Info"]["Id"]
print json_obj["offers"]["pkg"][0]["PricingInfo"]["BaseRate"]
print json_obj["offers"]["pkg"][0]["flt_Info"]["Carrier"]

Python : Normalize Json response (array)

I am new to JSON and Python,I am trying to achieve below
Need to parse below JSON
{
"id": "12345abc",
"codes": [
"BSVN1FKW3JKKNNMN",
"HJYYUKJJL999OJR",
"DFTTHJJJJ0099JUU",
"FGUUKHKJHJGJJYGJ"
],
"ctr": {
"source": "xyz",
"user_id": "1234"
}
}
Expected output:Normalized on "codes" value
ID~CODES~USER_ID
12345abc~BSVN1FKW3JKKNNMN~1234
12345abc~HJYYUKJJL999OJR~1234
12345abc~DFTTHJJJJ0099JUU~1234
12345abc~FGUUKHKJHJGJJYGJ~1234
Started with below ,but need help to get to my desired output.
The "codes" block can have n number of values separated by comma.
The below code is throwing an error "TypeError: string indices must be integers"
#!/usr/bin/python
import os
import json
import csv
f = open('rspns.csv','w')
writer = csv.writer(f,delimiter = '~')
headers = [‘ID’,’CODES’,’USER_ID’]
default = ''
writer.writerow(headers)
string = open('sample.json').read().decode('utf-8')
json_obj = json.loads(string)
#print json_obj['id']
#print json_obj['codes']
#print json_obj['codes'][0]
#print json_obj['codes'][1]
#print json_obj['codes’][2]
#print json_obj['codes’][3]
#print json_obj['ctr’][‘user_id']
for keyword in json_obj:
row = []
row.append(str(keyword['id']))
row.append(str(keyword['codes']))
row.append(str(keyword['ctr’][‘user_id']))
writer.writerow(row)

If your json_obj looks exactly like that , that is it is a dictionary, then the issue is that when you do -
for keyword in json_obj:
You are iterating over keys in json_obj, then if you try to access ['id'] for that key it should error out saying string indices must be integers .
You should first get the id and user_id before looping and then loop over json_obj['codes'] and then add the previously computed id and user_id along with the current value from codes list to the writer csv as a row.
Example -
import json
import csv
string = open('sample.json').read().decode('utf-8')
json_obj = json.loads(string)
with open('rspns.csv','w') as f:
writer = csv.writer(f,delimiter = '~')
headers = ['ID','CODES','USER_ID']
writer.writerow(headers)
id = json_obj['id']
user_id = json_obj['ctr']['user_id']
for code in json_obj['codes']:
writer.writerow([id,code,user_id])

You don't want to iterate through json_obj as that is a dictionary and iterating through will get the keys. The TypeError is caused by trying to index into the keys ('id', 'code', and 'ctr') -- which are strings -- as if they were a dictionary.
Instead, you want a separate row for each code in json_obj['codes'] and to use the json_obj dictionary for your lookups:
for code in json_obj['codes']:
row = []
row.append(json_obj['id'])
row.append(code)
row.append(json_obj['ctr’][‘user_id'])
writer.writerow(row)

Scraping a table into a dict and writing each row to CSV

I have a project that was using Scraperwiki to write to their sqlite store, but I need to just write a CSV. The catch is that all the data is stored in a dict, which works fine for writing to sqlite:
scraperwiki.sqlite.save(unique_keys = ['somekey'], data = data, table_name='fancy')
I run that after I scrape each row. There's no order to a dict so I can't just write the values to CSV. I've been looking over csv.DictWriter and collections.defaultdict, and I'm still wrapping my head around how I'd refactor my code so that I can write data, which is a dictionary to CSV instead. Here's an example of my code as structured now:
def store_exception(exception, line_number, some_string):
data = {
'timestamp' : datetime.now(),
'line_number': line_number,
'message' : exception,
'string' : some_string
}
scraperwiki.sqlite.save(unique_keys = ['timestamp'], data = data, table_name='error_log')
I think I want something like this though:
def store_exception(exception, line_number, some_string):
data = {
'timestamp' : datetime.now(),
'line_number': line_number,
'message' : exception,
'string' : some_string
}
d = defaultdict(lambda: "")
d_order = d['timestamp'],d['line_number'],d['message'],d['string']
with open('some/path.csv', 'w') as csvfile:
linewriter = csv.DictWriter(csvfile, d_order, delimiter='|',
quotechar='"', quoting=csv.QUOTE_MINIMAL)
linewriter.writerow(data)
That seems inefficient, though. Do I need both collections.defaultdict and csv.DictWriter?

This should do it. You don't need the default_dict at all, d_order just needs to be a list for field names, or in this case dict keys:
def store_exception(exception, line_number, some_string):
data = {
'timestamp' : datetime.now(),
'line_number': line_number,
'message' : exception,
'string' : some_string
}
d_order = ['timestamp', 'line_number', 'message', 'string']
with open('some/path.csv', 'w') as csvfile:
linewriter = csv.DictWriter(csvfile, d_order, delimiter='|',
quotechar='"', quoting=csv.QUOTE_MINIMAL)
linewriter.writerow(data)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Export Elasticsearch data results into a CSV file - python

Replace wb with w in with open('outputfile.tsv', 'wb') since you are using Python 3.x

Related

Is there a way to rename empty header CSV file converting to JSON in Python without pandas?

Not getting expected output in python when converting a csv to json

JSON Parsing help in Python

Python : Normalize Json response (array)

Scraping a table into a dict and writing each row to CSV

Categories

Resources