Using Python to conver JSON to CSV - python

I have tried a few different ways using Panda to import my JSON to a csv file.
import pandas as pd
df = pd.read_json("CDMP_E2.json")
df.ts_csv("CDMP_Output.csv")
The problem is when I run that code it makes the output all in one "column".
The column header shows up as Credit-NoSQL.
Then the data in the column is everything from each "object"
'date':'2021-08-01','type':'CARD','amount':'100'
So it looks like this:
Credit-NoSQL
'date':'2021-08-01','type':'CARD','amount':'100'
I would instead expect to see date, type and amount as the headers instead.
account date type amount returneddate
ABCD 2021-08-01 CARD 100
EFGHI 2021-08-01 CARD 150 2021-08-04
My JSON file looks as such:
[
{
"Credit-NoSQL":{
"account":"ABCD"
"date":"2021-08-01",
"type":"CARD",
"amount":"100"
}
},
{
"Credit-NoSQL":{
"account":"EFGHI"
"date":"2021-08-02",
"type":"CARD",
"amount":"150"
"returneddate":"2021-08-04"
}
}
]
so I am not sure if it is the way my JSON file is set up with it's list and such or if I am missing something in my python command. I am new to python and still learning so I am at a loss at what I can do next.

No need to use pandas for this.
import json, csv
with open("CDMP_E2.json") as json_file:
data = [item['Credit-NoSQL'] for item in json.load(json_file)]
# Get the union of all dictionary keys
fieldnames = set()
for row in data:
fieldnames |= row
with open("CDMP_Output.csv", "w") as csv_file:
cwrite = csv.DictWriter(csv_file, fieldnames = fieldnames)
cwrite.writeheader()
cwrite.writerows(data)

Related

JSON to CSV: keys to header issue

I am trying to convert a very long JSON file to CSV. I'm currently trying to use the code below to accomplish this.
import json
import csv
with open('G:\user\jsondata.json') as json_file:
jsondata = json.load(json_file)
data_file = open('G:\user\jsonoutput.csv', 'w', newline='')
csv_writer = csv.writer(data_file)
count = 0
for data in jsondata:
if count == 0:
header = data.keys()
csv_writer.writerow(header)
count += 1
csv_writer.writerow(data.values())
data_file.close()
This code accomplishes writing all the data to a CSV, However only takes the keys for from the first JSON line to use as the headers in the CSV. This would be fine, but further in the JSON there are more keys to used. This causes the values to be disorganized. I was wondering if anyone could help me find a way to get all the possible headers and possibly insert NA when a line doesn't contain that key or values for that key.
The JSON file is similar to this:
[
{"time": "1984-11-04:4:00", "dateOfevent": "1984-11-04", "action": "TAKEN", "Country": "Germany", "Purchased": "YES", ...},
{"time": "1984-10-04:4:00", "dateOfevent": "1984-10-04", "action": "NOTTAKEN", "Country": "Germany", "Purchased": "NO", ...},
{"type": "A4", "time": "1984-11-04:4:00", "dateOfevent": "1984-11-04", "Country": "Germany", "typeOfevent": "H7", ...},
{...},
{...},
]
I've searched for possible solutions all over, but was unable to find anyone having a similar issue.
If want to use csv and json modules to do this then can do it in two passes. First pass collects the keys for the CSV file and second pass writes the rows to CSV file. Also, must use a DictWriter since the keys differ in the different records.
import json
import csv
with open('jsondata.json') as json_file:
jsondata = json.load(json_file)
# stage 1 - populate column names from JSON
keys = []
for data in jsondata:
for k in data.keys():
if k not in keys:
keys.append(k)
# stage 2 - write rows to CSV file
with open('jsonoutput.csv', 'w', newline='') as fout:
csv_writer = csv.DictWriter(fout, fieldnames=keys)
csv_writer.writeheader()
for data in jsondata:
csv_writer.writerow(data)
Could you try to read it normally, and then cobert it to csv using .to_csv like this:
df = pd.read_json('G:\user\jsondata')
#df = pd.json_normalize(df['Column Name']) #if you want to normalize it
dv.to_csv('example.csv')

Problems running 'botometer-python' script over multiple user accounts & saving to CSV

I'm new to python, having mostly used R, but I'm attempting to use the code below to run around 90 twitter accounts/handles (saved as a one-column csv file called '1' in the code below) through the Botometer V4 API. The API github says that you can run through a sequence of accounts with 'check_accounts_in' without upgrading to the paid-for BotometerLite.
However, I'm stuck on how to loop through all the accounts/handles in the spreadsheet and then save the individual results to a new csv. Any help or suggestions much appreciated.
import botometer
import csv
import pandas as pd
rapidapi_key = "xxxxx"
twitter_app_auth = {
'consumer_key': 'xxxxx',
'consumer_secret': 'xxxxx',
'access_token': 'xxxxx',
'access_token_secret': 'xxxxx',
}
bom = botometer.Botometer(wait_on_ratelimit=True,
rapidapi_key=rapidapi_key,
**twitter_app_auth)
#read in csv of account names with pandas
data = pd.read_csv("1.csv")
for screen_name, result in bom.check_accounts_in(data):
#add output to csv
with open('output.csv', 'w') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerow(['Account Name','Astroturf Score', 'Fake Follower Score']),
csvwriter.writerow([
result['user']['user_data']['screen_name'],
result['display_scores']['universal']['astroturf'],
result['display_scores']['universal']['fake_follower']
])
Im not sure what the API returns, but you need to loop through your CSV data and send each item to the API. with the returned results you can append the CSV. You can loop through the csv without pandas, but it kept that in place because you are already using it.
added a dummy function to demonstrate the some returned data saved to a csv.
CSV I used:
names
name1
name2
name3
name4
import pandas as pd
import csv
def sample(x):
return x + " Some new Data"
df = pd.read_csv("1.csv", header=0)
output = open('NewCSV.csv', 'w+')
for name in df['names'].values:
api_data = sample(name)
csvfile = csv.writer(output)
csvfile.writerow([api_data])
output.close()
to read the one column CSV directly without pandas. you may need to adjust based on your CSV
with open('1.csv', 'r') as csv:
content = csv.readlines()
for name in content[1:]: # skips the header row - remove [1:] if the file doesn have one
api_data = sample(name.replace('\n', ""))
Making some assumptions about your API. This may work:
This assumes the API is returning a dictionary:
{"cap":
{
"english": 0.8018818614025648,
"universal": 0.5557322218336633
}
import pandas as pd
import csv
df = pd.read_csv("1.csv", header=0)
output = open('NewCSV.csv', 'w+')
for name in df['names'].values:
api_data = bom.check_accounts_in(name)
csvfile = csv.writer(output)
csvfile.writerow([api_data['cap']['english'],api_data['cap']['universal']])
output.close()

How to extract objects from a csv / pandas dataframe of JSON files?

I have a csv (which I turned into a pandas dataframe) in which each row consists of a different JSON file, each JSON file has the exact same format and objects as the others, and each one represents a unique transaction (purchase) I would like to take this dataframe and convert it into a dataframe or excel file in which each column would represent an object from the JSON file and each row would represent each transaction.
The JSON also contains arrays, in which case I would like to be able to retrieve each element of the array. Ideally I would like to be able to retrieve all possible objects from the JSON files and turn them into columns.
A simplified version of a row would be:
{
"source":{
"analyze":true,
"billing":{
"gender":null,
"name":"xxxxx",
"phones":[
{
"area_code":"xxxxx",
"country_code":"xxxxx",
"number":"xxxxx",
"phone_type":"xxxxx"
}
]
},
"created_at":"xxxxx",
"customer":{
"address":{
"city":"xxxxx",
"complement":"xxxxx",
"country":"xxxxx",
"neighborhood":"xxxxx",
"number":"xxxxx",
"state":"xxxxx",
"street":"xxxxx",
"zip_code":"xxxxx"
},
"date_of_birth":"xxxxx",
"documents":[
{
"document_type":"xxxxx",
"number":"xxxxx"
}
],
"email":"xxxxx",
"gender":xxxxx,
"name":"xxxxx",
"number_of_previous_orders":xxxxx,
"phones":[
{
"area_code":"xxxxx",
"country_code":"xxxxx",
"number":"xxxxx",
"phone_type":"xxxxx"
}
],
"register_date":xxxxx,
"register_id":"xxxxx"
},
"device":{
"ip":"xxxxx",
"lat":"xxxxx",
"lng":"xxxxx",
"platform":xxxxx,
"session_id":xxxxx
}
}
}
And my python code,,,
import csv
import json
import pandas as pd
df = pd.read_csv(r"<name of csv file in which each row is a JSON file>")
A simplified of my expected output would be something like
Expected Output
You mean something like this as the output, for example to get area_code:
A_col area_code
0 {"source":{"analyze":true,"billing":{"gender":... xxxxx
first:
"gender":xxxxx, "number_of_previous_orders":xxxxx, "register_date":xxxxx, "platform":xxxxx, "session_id":xxxxx, should be double quoted
get the json document:
newjson = []
with open('./example.json', 'r') as f:
for line in f:
line = line.strip()
newjson.append(line)
format it to string:
jsonString = ''.join(newjson)
turn into python object:
jsonData = json.loads(jsonString)
extract the fields using dictionary operations and turn into pandas dataframe:
newDF = pd.DataFrame({"A_col": jsonString, "area_code": jsonData['source']['billing']['phones'][0]['area_code']}, index=[0])

Python convert dictionary to CSV

I am trying to convert dictionary to CSV so that it is readable (in their respective key).
import csv
import json
from urllib.request import urlopen
x =0
id_num = [848649491, 883560475, 431495539, 883481767, 851341658, 42842466, 173114302, 900616370, 1042383097, 859872672]
for bilangan in id_num:
with urlopen("https://shopee.com.my/api/v2/item/get?itemid="+str(bilangan)+"&shopid=1883827")as response:
source = response.read()
data = json.loads(source)
#print(json.dumps(data, indent=2))
data_list ={ x:{'title':productName(),'price':price(),'description':description(),'preorder':checkPreorder(),
'estimate delivery':estimateDelivery(),'variation': variation(), 'category':categories(),
'brand':brand(),'image':image_link()}}
#print(data_list[x])
x =+ 1
i store the data in x, so it will be looping from 0 to 1, 2 and etc. i have tried many things but still cannot find a way to make it look like this or close to this:
https://i.stack.imgur.com/WoOpe.jpg
Using DictWriter from the csv module
Demo:
import csv
data_list ={'x':{'title':'productName()','price':'price()','description':'description()','preorder':'checkPreorder()',
'estimate delivery':'estimateDelivery()','variation': 'variation()', 'category':'categories()',
'brand':'brand()','image':'image_link()'}}
with open(filename, "w") as infile:
writer = csv.DictWriter(infile, fieldnames=data_list["x"].keys())
writer.writeheader()
writer.writerow(data_list["x"])
I think, maybe you just want to merge some cells like excel do?
If yes, I think this is not possible in csv, because csv format does not contain cell style information like excel.
Some possible solutions:
use openpyxl to generate a excel file instead of csv, then you can merge cells with "worksheet.merge_cells()" function.
do not try to merge cells, just keep title, price and other fields for each line, the data format should be like:
first line: {'title':'test_title', 'price': 22, 'image': 'image_link_1'}
second line: {'title':'test_title', 'price': 22, 'image': 'image_link_2'}
do not try to merge cells, but set the title, price and other fields to a blank string, so it will not show in your csv file.
use line break to control the format, that will merge multi lines with same title into single line.
hope that helps.
If I were you, I would have done this a bit differently. I do not like that you are calling so many functions while this website offers a beautiful JSON response back :) More over, I will use pandas library so that I have total control over my data. I am not a CSV lover. This is a silly prototype:
import requests
import pandas as pd
# Create our dictionary with our items lists
data_list = {'title':[],'price':[],'description':[],'preorder':[],
'estimate delivery':[],'variation': [], 'categories':[],
'brand':[],'image':[]}
# API url
url ='https://shopee.com.my/api/v2/item/get'
id_nums = [848649491, 883560475, 431495539, 883481767, 851341658,
42842466, 173114302, 900616370, 1042383097, 859872672]
shop_id = 1883827
# Loop throw id_nums and return the goodies
for id_num in id_nums:
params = {
'itemid': id_num, # take values from id_nums
'shopid':shop_id}
r = requests.get(url, params=params)
# Check if we got something :)
if r.ok:
data_json = r.json()
# This web site returns a beautiful JSON we can slice :)
product = data_json['item']
# Lets populate our data_list with the items we got. We could simply
# creating one function to do this, but for now this will do
data_list['title'].append(product['name'])
data_list['price'].append(product['price'])
data_list['description'].append(product['description'])
data_list['preorder'].append(product['is_pre_order'])
data_list['estimate delivery'].append(product['estimated_days'])
data_list['variation'].append(product['tier_variations'])
data_list['categories'].append([product['categories'][i]['display_name'] for i, _ in enumerate(product['categories'])])
data_list['brand'].append(product['brand'])
data_list['image'].append(product['image'])
else:
# Do something if we hit connection error or something.
# may be retry or ignore
pass
# Putting dictionary to a list and ordering :)
df = pd.DataFrame(data_list)
df = df[['title','price','description','preorder','estimate delivery',
'variation', 'categories','brand','image']]
# df.to ...? There are dozen of different ways to store your data
# that are far better than CSV, e.g. MongoDB, HD5 or compressed pickle
df.to_csv('my_data.csv', sep = ';', encoding='utf-8', index=False)

Loading data from a JSON file

I am trying to get some data from a JSON file. Here is the code for it -
import csv
import json
ifile = open('facebook.csv', "rb")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
try:
csvfile = open('facebook.csv', 'r')
jsonfile = open('file.json', 'r+')
fieldnames = ("USState","NOFU2008","NOFU2009","NOFU2010", "12MI%", "24MI%")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
json.dump(row, jsonfile)
jsonfile.write('\n')
data = json.load(jsonfile)
print data["USState"]
except ValueError:
continue
I am not getting any output on the console for the print statement. The JSON is in the following format
{"USState": "US State", "12MI%": "12 month increase %", "24MI%": "24 month increase %", "NOFU2010": "Number of Facebook UsersJuly 2010", "NOFU2008": "Number of Facebook usersJuly 2008", "NOFU2009": "Number of Facebook UsersJuly 2009"}
{"USState": "Alabama", "12MI%": "109.3%", "24MI%": "400.7%", "NOFU2010": "1,452,300", "NOFU2008": "290,060", "NOFU2009": "694,020"}
I want to access this like NOFU2008 for all the rows.
The problem is in the way you're creating the JSON file. You don't want to use json.dump() for each row and then append those to the JSON file.
To create a JSON file, you should first create a data structure in Python that represents the entire file the way you want it, and then call json.dump() one time only to dump out the entire structure to JSON format.
Making a single json.dump() call for your entire file will insure that it is valid JSON.
I'd also recommend wrapping your list/array of rows inside a dict/object so you have a place to put other properties that pertain to the entire JSON file as opposed to a single row.
It looks like the first couple of rows of your facebook.csv are something like this (with or without the quotes):
"US State","12 month increase %","24 month increase %","Number of Facebook UsersJuly 2010","Number of Facebook usersJuly 2008","Number of Facebook UsersJuly 2009"
"Alabama","109.3%","400.7%","1,452,300","290,060","694,020"
Let's say we want to generate this JSON file from that (indented here for clarity):
{
"rows": [
{
"USState": "US State",
"12MI%": "Number of Facebook usersJuly 2008",
"24MI%": "Number of Facebook UsersJuly 2009",
"NOFU2010": "Number of Facebook UsersJuly 2010",
"NOFU2008": "12 month increase %",
"NOFU2009": "24 month increase %"
},
{
"USState": "Alabama",
"12MI%": "290,060",
"24MI%": "694,020",
"NOFU2010": "1,452,300",
"NOFU2008": "109.3%",
"NOFU2009": "400.7%"
}
]
}
Note that the top level of the JSON file is an object (not an array), and this object has a rows property which is the array of rows.
We can create this JSON file and test it with this Python code:
import csv
import json
# Read the CSV file and convert it to a list of dicts
with open( 'facebook.csv', 'rb' ) as csvfile:
fieldnames = (
"USState", "NOFU2008", "NOFU2009", "NOFU2010",
"12MI%", "24MI%"
)
reader = csv.DictReader( csvfile, fieldnames )
rows = list( reader )
# Wrap the list inside an outer dict
wrap = {
'rows': rows
}
# Format and write the entire JSON in one fell swoop
with open( 'file.json', 'wb' ) as jsonfile:
json.dump( wrap, jsonfile )
# Now test the file by reading it and parsing it
with open( 'file.json', 'rb' ) as jsonfile:
data = json.load( jsonfile )
# For fun, convert the data back to JSON again and pretty-print it
print json.dumps( data, indent=4 )
A few notes... This code does not have the nested reader loops from the original. I have no idea what those were for. One reader should be enough.
In fact, this version doesn't use a loop at all. This line generates a list of rows from the reader object:
rows = list( reader )
Also pay close attention to use use of with where the CSV and JSON files are opened. This is a great way to open a file because the file will be automatically closed at the end of the with block.
Now having said all this, I have to wonder if this exact JSON structure is what you really want? It looks like the first row of the CSV is a header row, so you may want to skip that row? You can do that easily by adding a reader.next() call before converting the rest of the CSV data to a list:
reader.next()
rows = list( reader )
Also I'm not sure I understand how you want to access the resulting data. You wouldn't be able to use data["USState"], because USState is a property of each individual row object. So say a little more about how you want to access the data and we can sort it out.
If you want to create a list of json objects in the file then you should inform yourself about what a list in json looks like.
In case that list elements are separated by a comma you should put something like this into the code:
jsonfile.write(',\n')

Categories

Resources