ValueError: dict contains fields not in fieldnames: 'Reviews', 'Consumer_Feedback' - python

I am trying to convert json to csv. But I get the error:
ValueError: dict contains fields not in fieldnames: 'Reviews', 'Consumer_Feedback'
How to check that all keys are written?
This is what I've writen:
file_id = ''
with open(filename_jsonl, 'r') as f:
for line in f.read():
file_id += line
file_id = [json.loads(item + '\n}') for item in file_id.split('}\n')[0:-1]]
with open(filename_csv, 'a') as f:
writer = csv.DictWriter(f, file_id[0].keys(), delimiter=";")
writer.writeheader()
for profile in file_id:
writer.writerow(profile)
jsonl
{
"First_and_Last_Name": "Lori Anderson",
"Primary_Specialty": "Acupuncturist",
"Practice": null,
"Education": [],
"Phone": "(405) 943-0377",
"Address": "5701 NW 23rd St, Oklahoma City, OK 73127"
}
{
"First_and_Last_Name": "Joe Wojciechowski, D.C.",
"Primary_Specialty": "Chiropractor",
"Practice": "13",
"Education": [
"Palmer College of Chiropractic"],
"Consumer_Feedback": "(1 Review)",
"Reviews": [
"\r\nDr. Joe is an amazing chiropractor. He continues to educate himself and incorporates everything he learns into his practice."],
"Phone": "(405) 878-6611",
"Address": "18877 Ferdondo Dr., Earlsboro, OK 74840"
}
Any and all help is appreciated. Thanks!

Related

loop through a list of dictionaries, perform function, and append result to csv

I trying to loop over a list containing Twitter data in a json format. The list is made of several dictionaries each containing data on a politician. The code works if the input json_response only holds data on one politician. However, when json_response is list of dictionaries i get an error.
In short, I believe the issue can be isolated to three for-loops in the code for tweet in json_response['data']:, for dics in json_response['includes']['users']:, and for element in json_response['includes']['media']:.
# Inputs for the request
bearer_token = auth()
headers = create_headers(bearer_token)
keyword = search_query
start_time = "2016-03-01T00:00:00.000Z"
end_time = "2021-03-31T00:00:00.000Z"
max_results = 3000
json_response = [] # empty list that will hold tweet objects
for i in keyword: # loop through list of politicians in keyword i.e. search query and extract tweets
url = create_url(i, start_time, end_time, max_results)
json_response.append(connect_to_endpoint(url[0], headers, url[1]))
pass
I have only pasted the json_response object for 2 out of 30 politicians due cap on characters. However, the structure is the same for the remaining 28 politicians.
print(json.dumps(json_response, indent=4, sort_keys=True)) # look at json_response object.
[
{
"data": [
{
"author_id": "2877379617",
"created_at": "2021-03-25T12:11:14.000Z",
"id": "1375057688355336195",
"text": "#prettynobodyco She blocked me in 2015 - for pointing out that Tim Kaine enables sexual assault in the military and the evidence was his killing of the MJIA and publicly stated that Military commanders should remain in charge of military rape cases. She's Tanden level awful. Congrats!"
},
{
"author_id": "1265018154444562440",
"created_at": "2021-03-22T19:48:59.000Z",
"id": "1374085719472361474",
"text": "#MehcatCat #AlasscanIsBack #PattyArquette #timkaine Funny, they blocked me. \ud83e\udd23\ud83e\udd23"
},
{
"author_id": "2378324935",
"created_at": "2021-03-07T21:32:13.000Z",
"id": "1368675879312887810",
"text": "#DrWinarick #KatieOGrady4 I apologize for any drama. Katie O Grady blocked me because we had a disagreement about Tim Kaine on one of your older posts. I guess I can't please everyone haha. :/"
},
{
"author_id": "821870502943817729",
"created_at": "2021-02-12T23:53:59.000Z",
"id": "1360376637385244673",
"text": "She blocked me a long ass time ago when I asked her why we shoulf care about Tim Kaine's personal view on abortion if it didn't impact legislation"
},
{
"attachments": {
"media_keys": [
"16_1341045032732770306"
]
},
"author_id": "17232340",
"created_at": "2020-12-21T15:37:07.000Z",
"id": "1341045038420275205",
"text": "#DSingh4Biden #moomintroll8 #timkaine #GovernorVA That's why I replied to you. She blocked me previously, for what silliness I can't remember. Tough being a troll AND a snowflake!"
}
],
"includes": {
"media": [
{
"media_key": "16_1341045032732770306",
"type": "animated_gif"
}
],
"users": [
{
"created_at": "2014-11-15T02:23:57.000Z",
"description": "",
"id": "2877379617",
"name": "Laura Saylor",
"username": "lauraleesaylor"
},
{
"created_at": "2020-05-25T20:33:36.000Z",
"description": "Weird Writer & Lunatic Linguist\nWicked Witch of the East\nshe/her",
"id": "1265018154444562440",
"name": "Zauberkind",
"username": "Zauberkind2"
},
{
"created_at": "2014-03-08T07:22:31.000Z",
"description": "#Resist, #BLM, #Vaxxed, liberal, autistic, kidney transplant survivor, political nerd, mental health advocate, fighter for equality, truth, justice, etc.",
"id": "2378324935",
"name": "Trevor \"Trev\" McKee Achilles",
"username": "MrTAchilles"
},
{
"created_at": "2017-01-19T00:02:52.000Z",
"description": "statist / Progressive Gun Nut/ Single and hating it\n\n / \n\nstraight????? /\n\npronouns / brain worm survivor\n\n \n",
"id": "821870502943817729",
"name": "Squirrel Dad",
"username": "nihilisticpillo"
},
{
"created_at": "2008-11-07T15:09:46.000Z",
"description": "Liberal-Veteran-Dog Lover | Taste for irony, but in moderation | Humor is reason gone mad. ~Groucho Marx | I follow & unfollow back #VeteransResist #Resist",
"id": "17232340",
"name": "anti-Fascist Jim",
"username": "JimnBL"
}
]
},
"meta": {
"newest_id": "1375057688355336195",
"next_token": "b26v89c19zqg8o3foseug43lzoqdft4ghg78o9sn9ds3h",
"oldest_id": "1341045038420275205",
"result_count": 5
}
},
{
"data": [
{
"author_id": "1248251899884814336",
"created_at": "2021-03-27T13:36:45.000Z",
"id": "1375803982409576450",
"text": "#gavinjeffries0 #steven86026859 #MSNBC #SenBooker Uh Oh our friend Steve blocked me, I guess not being able to answer your simple question and being asked to was too much for him."
},
{
"author_id": "293104735",
"created_at": "2021-02-07T21:45:47.000Z",
"id": "1358532435122683904",
"text": "#slwilliams1101 #annabella313 #CrossConnection #TiffanyDCross #Scaramucci #JoyAnnReid #CapehartJ #MSNBC #SenBooker #AliVelshi I stopped watching #TiffanyDCross as well and only watch #CapehartJ now (even though he blocked me in 2016 because I had a \"strong\" response to something mean he said about Hillary Clinton)."
},
{
"author_id": "380970864",
"created_at": "2021-02-07T20:58:01.000Z",
"id": "1358520416273326081",
"text": "#annabella313 #CrossConnection #TiffanyDCross #Scaramucci #JoyAnnReid #CapehartJ #MSNBC After I criticized #TiffanyDCross she blocked me. #JoyAnnReid called herself petty during and interview with #SenBooker. Why be petty? Be mature and thoughtful so people can learn. Hosts need to learn too. I only watch #AliVelshi #CapehartJ now."
},
{
"attachments": {
"media_keys": [
"3_1358448920632909825"
]
},
"author_id": "793175035322171397",
"created_at": "2021-02-07T16:17:44.000Z",
"id": "1358449876565164034",
"text": "#FinstaManhattan #SenSchumer #SenBooker #RonWyden Lmao he blocked me over that. His bio said he likes to 'debate & that sometimes he's wrong but he can admit that'.\n\nGuess not.\n\nI wasn't rude or mean at all. This is too funny \ud83e\udd23"
},
{
"author_id": "752266160352010241",
"created_at": "2021-02-06T20:34:06.000Z",
"id": "1358152008948195328",
"text": "#fattypinner #tkbone32221 #SenSchumer #SenBooker #RonWyden He blocked me \ud83e\udd23\ud83d\ude2d\ud83e\udd23\ud83e\udd23\ud83e\udd23\ud83d\ude2d"
}
],
"includes": {
"media": [
{
"media_key": "3_1358448920632909825",
"type": "photo",
"url": ""
}
],
"users": [
{
"created_at": "2020-04-09T14:11:04.000Z",
"description": "",
"id": "1248251899884814336",
"name": "Firstcomm",
"username": "Firstcomm1"
},
{
"created_at": "2011-05-04T19:26:22.000Z",
"description": "Cinephile, balletomane, book lover, tennis fan, K-Drama fanatic, Jang Na-ra fangirl, USC School of Cinematic Arts alumna, Hillary Clinton and Nancy Pelosi Dem.",
"id": "293104735",
"name": "Joyce Tyler",
"username": "joyce_tyler"
},
{
"created_at": "2011-09-27T14:50:37.000Z",
"description": "Spelman College, BA, George Washington University MA, University of South Florida Ph.D. in Political Science, proud Ted Kennedy, Obama, Biden/Harris Democrat!",
"id": "380970864",
"name": "Stephanie L. Williams, Ph.D.",
"username": "slwilliams1101"
},
{
"created_at": "2016-10-31T19:37:19.000Z",
"description": "Loves: life, fam, cats, cars, tattoos, reality TV; collector of t-shirts & Volkswagen\u2019s. Hates: Oxford commas. #CombatVet #Medic #BidenHarris2020 #Resist",
"id": "793175035322171397",
"name": "Que Sarah Sarah \ud83d\udda4",
"username": "sarahalli13"
},
{
"created_at": "2016-07-10T22:20:03.000Z",
"description": "3x Hollywood Video Street Fighter 2 Champion",
"id": "752266160352010241",
"name": "Sugarcoder",
"username": "TheSugarCoder"
}
]
},
"meta": {
"newest_id": "1375803982409576450",
"next_token": "b26v89c19zqg8o3fosktkdplqiw2q9kzx2ibm4r4y27wd",
"oldest_id": "1358152008948195328",
"result_count": 5
}
}
...28 other politicians
# Create file
csvFile = open("tweet_sample.csv", "a", newline="", encoding='utf-8')
csvWriter = csv.writer(csvFile)
# Create headers for the data I want to save. I only want to save these columns in my dataset
csvWriter.writerow(
['author id', 'created_at', 'id', 'tweet', 'bio', 'image_url'])
csvFile.close()
def append_to_csv(json_response, fileName):
# A counter variable
global created_at, tweet_id, bio, text, author_id
counter = 0
# Open OR create the target CSV file
csvFile = open(fileName, "a", newline="", encoding='utf-8')
csvWriter = csv.writer(csvFile)
# Loop through each tweet
for tweet in json_response[0]['data']: # NOTE adding a 0 gives access to the data for the first politician while adding 1 gives access to data for the second politician and so on...
# 1. Author ID
author_id = tweet['author_id']
# 2. Time created
created_at = dateutil.parser.parse(tweet['created_at'])
# 3. Tweet ID
tweet_id = tweet['id']
# 4. Tweet text
text = tweet['text']
for dics in json_response[0]['includes']['users']: # NOTE 0 added
# 5. description. Contained in includes data object
if ('description' in dics):
bio = dics['description']
else:
bio = " "
for element in json_response[0]['includes']['media']: # NOTE 0 added
# 6. image url. Contained in includes data object
if ('url' in element):
image_url = element['url']
else:
image_url = " "
# Assemble all data in a list
res = [author_id, created_at, tweet_id, text, bio, image_url]
# Append the result to the CSV file
csvWriter.writerow(res)
counter += 1
# When done, close the CSV file
csvFile.close()
# Print the number of tweets for this iteration
print("# of Tweets added from this response: ", counter)
append_to_csv(json_response, "tweet_sample.csv") # Save tweet data in a csv file
Error message:
TypeError: list indices must be integers or slices, not str
By adding the [0] in the loop I avoid the TypeError above. However the output from the function append_to_csv is not ideal as it only includes the last tweet for the first politician. I guess my loop overwrites data.
Desired output would be a data frame with columns author_id, created_at, id, tweet, bio, image_url. Not all users have a bio on their profile or an image_url in their tweet hence the if-else statement in the function above and the bio, no_bio and bio, image_url, no_image_url in the desired data frame.
pol_df = pd.read_csv("path_to_tweet_sample.csv" )
pol_df.head()
author_id created_at id tweet bio image_url
0 737885223858384896 2021-03-26T21:56:02.000Z 1375567243082338314 tweet_text no_bio no_image_url
1 847612931487416323 2021-03-26T21:55:24.000Z 1375567083791073283 tweet_text no_bio no_image_url
2 18634205 2021-03-08T12:29:00.000Z 1368901564363051010 tweet_text bio image_url
3 27327319 2021-03-02T11:53:16.000Z 1366718245521211393 tweet_text bio no_image_url
4 917634626247647232 2021-02-28T18:16:45.000Z 1366089974907432961 tweet_text bio image_url
I think you are confusing lists with dicts. When you try to access a list like a dict (e.g. data["author_id"]) the TypeError you're getting will be raised. You have to iterate over a list and then try to access each dict in that list like [x['author_id'] for x in data], for example. If you want to extract values from the dicts and write it to a csv file you might want to do something like this:
import pandas as pd
author_data = []
for data in resp:
for author in data['data']:
author_id = author['author_id']
created_at = author['created_at']
another_id = author['id']
tweet_text = author['text']
author_data.append([author_id, created_at, another_id, tweet_text])
author_df = pd.DataFrame(author_data, columns=['author_id', 'created_at', 'id', 'text'])
media_data = []
for data in resp:
for media in data['includes']['media']:
url = media.get('url', 'no_url')
media_data.append(media)
media_df = pd.DataFrame(media_data, columns=['url'])
bio_data = []
for data in resp:
for user in data['includes']['users']:
bio = user['description']
author_id = user['id']
bio_data.append([bio, author_id])
bio_df = pd.DataFrame(bio_data, columns=['bio', 'author_id'])
final_df = author_df.merge(bio_df, on="author_id")
print(final_df)
You have to save different parts of the data in different dataframes and then merge them. The thing is that media does not contain the author_id or another key that is shared between the ['includes']['media'] part and ['data'] part so you cannot merge that.

Convert repetitive pattern into JSON file with python

Hope you are doing fine,
I have a data file(containing 1000s of a structured pattern of data), like below
PARTNER="ABC"
ADDRESS1="ABC Country INN"
DEPARTMENT="ABC Department"
CONTACT_PERSON="HR"
TELEPHONE="+91.90.XX XX X XXX"
FAX="+01.XX.XX XX XX XX"
EMAIL=""
PARTNER="DEF"
ADDRESS1="DEF Malaysia"
DEPARTMENT=""
CONTACT_PERSON=""
TELEPHONE="(YYY)YYYYY"
FAX="(001)YYYYYYYY"
EMAIL=""
PARTNER="GEH-LOP"
ADDRESS1="GEH LOP Street"
DEPARTMENT="HR"
CONTACT_PERSON="Adam"
TELEPHONE="+91.ZZ.ZZ.ZZZZ"
FAX="+91.ZZ.ZZ.ZZZ"
EMAIL=""
I tried to convert the datafile(partner.txt) to JSON with below code:
Created empty dictionaries dict1 and dict2
Reading the data file line by line
used this if not line.isspace() to make sure the linefeed is read is written in dictionary dict1
When linebreak(empty line appears) appended the content of dict1 to dict2 using dict2.update(dict1)
import json
dict1 = {}
dict2 ={}
with open("partner.txt", "r") as fh:
out_file = open("test1.json", "w")
for line in fh:
if not line.isspace():
command, description = line.strip().split("=")
dict1[command] = description.strip('"')
else:
dict2.update(dict1)
print("space found")
json.dump(dict2,out_file,indent=1)
out_file.close()
print("json file created")
But this code creates a json(test1.json) with only the single block of PARTNER
{
"PARTNER": "DEF",
"ADDRESS1": "DEF Malaysia",
"DEPARTMENT": "",
"CONTACT_PERSON": "",
"TELEPHONE": "(YYY)YYYYY",
"FAX": "(001)YYYYYYYY",
"EMAIL": ""
}
Expected Output
I tried looking up a lot but couldn't find a way:-
{
"data":[
{
"PARTNER": "ABC",
"ADDRESS1": "ABC Country INN",
"DEPARTMENT": "ABC Department",
"CONTACT_PERSON": "HR",
"TELEPHONE": "+91.90.XX XX X XXX",
"FAX": "+01.XX.XX XX XX XX",
"EMAIL": ""
},
{
"PARTNER": "DEF",
"ADDRESS1": "DEF Malaysia",
"DEPARTMENT": "",
"CONTACT_PERSON": "",
"TELEPHONE": "(YYY)YYYYY",
"FAX": "(001)YYYYYYYY",
"EMAIL": ""
},
{
"PARTNER": "GEH-LOP",
"ADDRESS1": "GEH LOP Street",
"DEPARTMENT": "HR",
"CONTACT_PERSON": "Adam",
"TELEPHONE": "+91.ZZ.ZZ.ZZZZ",
"FAX": "+91.ZZ.ZZ.ZZZ",
"EMAIL": ""
}
]
}
You need to set dict1 to a new dict each time:
import json
dict1 = {}
dict2 ={}
with open("partner.txt", "r") as fh:
out_file = open("test1.json", "w")
for line in fh:
if not line.isspace():
command, description = line.strip().split("=")
dict1[command] = description.strip('"')
else:
dict2.update(dict1)
dict1 = {} # set it to new dict
print("space found")
json.dump(dict2,out_file,indent=1)
out_file.close()
print("json file created")
You need to append the dict to a list of dictionaries, not use update, as it overwrites the keys that are always the same:
import json
dict1 = {}
data = []
with open("partner.txt", "r") as fh:
out_file = open("test1.json", "w")
for line in fh:
if not line.isspace():
command, description = line.strip().split("=")
dict1[command] = description.strip('"')
else:
data.append(dict1)
dict1 = {} # set it to new dict
print("space found")
output = {'data': data}
json.dump(output, out_file, indent=1)
out_file.close()
print("json file created")
there are many ways to do this. maybe we should make it maintainable
def list_to_dict(lines):
obj = {}
for liner in lines:
idx = liner.find("=")
obj[liner[0:idx]] = liner[idx + 2 : len(liner) - 1]
return obj
with open("file", "r") as f:
results = []
group = []
for line in list(map(lambda x: x.strip(), f.read().split("\n"))):
if line == "":
results.append(list_to_dict(group))
group = []
else:
group.append(line)
print(results)
Solution
Using regex + json + dict/list-comprehension
You can do this using the regex (regular expression) and json libraries together. The text-processing is carried out with regex and finally the json library is used to format the dictionary into JSON format and write to a .json file.
Additionally we use dict and list comprehensions to gather the intended fields.
Note:
The regex pattern used here is as follows:
# longer manually written version
pat = r'PARTNER="(.*)"\n\s*ADDRESS1="(.*)"\n\s*DEPARTMENT="(.*)"\n\s*CONTACT_PERSON="(.*)"\n\s*TELEPHONE="(.*)"\n\s*FAX="(.*)"\n\s*EMAIL="(.*)"'
# shorter equivalent automated version
pat = '="(.*)"\n\s*'.join(field_labels) + '="(.*)"'
Code
import re
import json
# Read from file or use the dummy data
with open("partner.txt", "r") as f:
s = f.read()
field_labels = [
'PARTNER',
'ADDRESS1',
'DEPARTMENT',
'CONTACT_PERSON',
'TELEPHONE',
'FAX',
'EMAIL'
]
# Define regex pattern and compile for speed
pat = '="(.*)"\n\s*'.join(field_labels) + '="(.*)"'
pat = re.compile(pat)
# Extract target fields
data = pat.findall(s)
# Prepare a list of dicts: each dict for a single block of data
d = [dict((k,v) for k,v in zip(field_labels, field_values)) for field_values in data]
text = json.dumps({'data': d}, indent=2)
print(text)
# Write to a json file
with open('output.json', 'w') as f:
f.write(text)
Output:
# output.json
{
"data": [
{
"PARTNER": "ABC",
"ADDRESS1": "ABC Country INN",
"DEPARTMENT": "ABC Department",
"CONTACT_PERSON": "HR",
"TELEPHONE": "+91.90.XX XX X XXX",
"FAX": "+01.XX.XX XX XX XX",
"EMAIL": ""
},
{
"PARTNER": "DEF",
"ADDRESS1": "DEF Malaysia",
"DEPARTMENT": "",
"CONTACT_PERSON": "",
"TELEPHONE": "(YYY)YYYYY",
"FAX": "(001)YYYYYYYY",
"EMAIL": ""
},
{
"PARTNER": "GEH-LOP",
"ADDRESS1": "GEH LOP Street",
"DEPARTMENT": "HR",
"CONTACT_PERSON": "Adam",
"TELEPHONE": "+91.ZZ.ZZ.ZZZZ",
"FAX": "+91.ZZ.ZZ.ZZZ",
"EMAIL": ""
}
]
}
Dummy Data
# Dummy Data
s = """
PARTNER="ABC"
ADDRESS1="ABC Country INN"
DEPARTMENT="ABC Department"
CONTACT_PERSON="HR"
TELEPHONE="+91.90.XX XX X XXX"
FAX="+01.XX.XX XX XX XX"
EMAIL=""
PARTNER="DEF"
ADDRESS1="DEF Malaysia"
DEPARTMENT=""
CONTACT_PERSON=""
TELEPHONE="(YYY)YYYYY"
FAX="(001)YYYYYYYY"
EMAIL=""
PARTNER="GEH-LOP"
ADDRESS1="GEH LOP Street"
DEPARTMENT="HR"
CONTACT_PERSON="Adam"
TELEPHONE="+91.ZZ.ZZ.ZZZZ"
FAX="+91.ZZ.ZZ.ZZZ"
EMAIL=""
"""

How to add multiple objects into a single array in a JSON file using Python?

I want to append multiple objects into a single array in a JSON file. Each object is created after executing the code and then saved into an array in a JSON file.
I have this code:
import json
users = [
{
"username": "",
"phone": ""
}
]
username = input('Username: ')
phone = input('Phone: ')
for user in users:
user['username'] = username
user['phone'] = phone
with open('users.json', 'a') as file:
json.dump(users, file, indent=4)
After executing this code once, I get this:
[
{
"username": "Mark",
"phone": "333-4743"
}
]
After executing twice I get this:
[
{
"username": "Mark",
"phone": "333-4743"
}
][
{
"username": "Jane",
"phone": "555-6723"
}
]
But I want this result:
[
{
"username": "Mark",
"phone": "333-4743"
},
{
"username": "Jane",
"phone": "555-6723"
}
]
How can I achieve this result?
You can use:
import json
from os import path
users = [
{
"username": "",
"phone": ""
}
]
username = input('Username: ')
phone = input('Phone: ')
for user in users:
user['username'] = username
user['phone'] = phone
my_path = 'users.json'
if path.exists(my_path):
with open(my_path , 'r') as file:
previous_json = json.load(file)
users = previous_json + users
with open(my_path , 'w') as file:
json.dump(users, file, indent=4)
In your code, you are just appending every entry to your file on each run of your code. To have only one list with all your entries, you have to first read the previous entries and then add the new entry.
Instead of appending to the file each time, you should parse the JSON and use .append() on the list:
import json, os
if not os.path.exists("users.json"):
with open("users.json", "w") as f:
f.write("[]")
users = json.load(open("users.json"))
username = input('Username: ')
phone = input('Phone: ')
users.append({
"username": username,
"phone":phone
})
with open('users.json', 'w') as file:
json.dump(users, file, indent=4)
You need to just use append() method of list to add new object to your list
import json
#load code goes here, running from path
# where user.json file is located
With open('user.json', 'a') as file:
User=json.loads(file)
username = input('Username: ')
phone = input('Phone: ')
Data = {'Username': username,'phone': phone}
users.append(Data)
Json.dump(User,file,indent=4)

convert csv file to multiple nested json format

I have written a code to convert csv file to nested json format. I have multiple columns to be nested hence assigning separately for each column. The problem is I'm getting 2 fields for the same column in the json output.
import csv
import json
from collections import OrderedDict
csv_file = 'data.csv'
json_file = csv_file + '.json'
def main(input_file):
csv_rows = []
with open(input_file, 'r') as csvfile:
reader = csv.DictReader(csvfile, delimiter='|')
for row in reader:
row['TYPE'] = 'REVIEW', # adding new key, value
row['RAWID'] = 1,
row['CUSTOMER'] = {
"ID": row['CUSTOMER_ID'],
"NAME": row['CUSTOMER_NAME']
}
row['CATEGORY'] = {
"ID": row['CATEGORY_ID'],
"NAME": row['CATEGORY']
}
del (row["CUSTOMER_NAME"], row["CATEGORY_ID"],
row["CATEGORY"], row["CUSTOMER_ID"]) # deleting since fields coccuring twice
csv_rows.append(row)
with open(json_file, 'w') as f:
json.dump(csv_rows, f, sort_keys=True, indent=4, ensure_ascii=False)
f.write('\n')
The output is as below:
[
{
"CATEGORY": {
"ID": "1",
"NAME": "Consumers"
},
"CATEGORY_ID": "1",
"CUSTOMER_ID": "41",
"CUSTOMER": {
"ID": "41",
"NAME": "SA Port"
},
"CUSTOMER_NAME": "SA Port",
"RAWID": [
1
]
}
]
I'm getting 2 entries for the fields I have assigned using row[''].
Is there any other way to get rid of this? I want only one entry for a particular field in each record.
Also how can I convert the keys to lower case after reading from csv.DictReader(). In my csv file all the columns are in upper case and hence I'm using the same to assign. But I want to convert all of them to lower case.
In order to convert the keys to lower case, it would be simpler to generate a new dict per row. BTW, it should be enough to get rid of the duplicate fields:
for row in reader:
orow = collection.OrderedDict()
orow['type'] = 'REVIEW', # adding new key, value
orow['rawid'] = 1,
orow['customer'] = {
"id": row['CUSTOMER_ID'],
"name": row['CUSTOMER_NAME']
}
orow['category'] = {
"id": row['CATEGORY_ID'],
"name": row['CATEGORY']
}
csv_rows.append(orow)

selective flattening in python nested dictionary and finding keys

i have the data in this format present in json file
[
{
"FIRST NAME": "Nasim",
"EMAIL": "ac#iaculisnec.net",
"ADDLINE1": "855-8805 Nunc. Avenue",
"CITY": "Masterton",
"LOCATION":{"ADDLINE2":"855-8805",
"ADDLINE3":"Avenue",
"PIN":"100"}
},
{
"FIRST NAME": "Xanthus",
"EMAIL": "adipiscing.elit#tinciduntcongue.edu",
"ADDLINE1": "357-4583 Curae; St.",
"CITY": "Basildon",
"LOCATION":{"ADDLINE2":"357-4583",
"ADDLINE3":"Curae; St.",
"PIN":"101"}
},
{
"FIRST NAME": "Hedley",
"EMAIL": "Quisque.libero.lacus#arcu.ca",
"ADDLINE1": "315-623 Nibh. Road",
"CITY": "Abingdon",
"LOCATION":{"ADDLINE2":"315-623",
"ADDLINE3":"Nibh. Road",
"PIN":"102"}
}]
this is my code
data=json.loads(file('grade.json').read())
for row in data:
row['ADDRESS']= row['ADDLINE1']+','+ row['CITY']
del row['CITY'], row['ADDLINE1']
row['LOCATION1']=row['LOCATION']['ADDLINE2']+','+row['LOCATION'] ['ADDLINE3']+','+row['LOCATION']['PIN']
del row['LOCATION']
data =json.loads(file('grade.json').read())
out = {}
for sub in data.values():
for key, value in sub.items():
if key in out:
del out[key]
else:
out[key] = value
print(out)
file('files','w').write(json.dumps(data))
out_path= "outfile9.csv"
fieldnames = list(set(k for d in data for k in d))
with open(out_path, 'wb') as out_file:
writer = csv.DictWriter(out_file, fieldnames=fieldnames, dialect='excel')
writer.writeheader()
writer.writerows(data)
i want to remove d nested dictionary(LOCATION1, here after formatting-previously was LOCATION) but retain ADDLINE2,3,PIN as the same. i want a flattened dictionary. what can i do to improvise it?
i require keys in this form
[firstname,email,address,location{addline2,addline3,pin}]
even if extra nested values are added it should dynamically appear in this form
data=json.loads(file('grade.json').read())
for row in data:
row['ADDRESS']= row['ADDLINE1']+','+ row['CITY']
del row['CITY'], row['ADDLINE1']
row['LOCATION1']=row['LOCATION']['ADDLINE2']+','+row['LOCATION'] ['ADDLINE3']+','+row['LOCATION']['PIN']
del row['LOCATION']
data =json.loads(file('grade.json').read())
the above is all useless because of the last line, resets data.
to flatten ADDLINE2,3,PIN , add in the above loop, before everything else
row['ADDLINE2'] = row['LOCATION']['ADDLINE2']
row['ADDLINE3'] = row['LOCATION']['ADDLINE3']
row['PIN '] = row['LOCATION']['PIN ']

Categories

Resources