HI I'm a beginner with python.
I have a csv file which I retrieve from my database. The table has several columns, one of which contains data in json. I have difficulty converting json data into a table to be saved in pdf.
First I load the csv. Then I take the column that has the data in json and I convert them.
df = pd.DataFrame(pd.read_csv('database.csv',sep=';'))
print(df.head())
for index, row in df.iterrows():
str = row["json_data"]
val = ast.literal_eval(str)
val1 = json.loads(str)
A sample of my json is that
{
"title0": {
"id": "1",
"ex": "90",
},
"title1": {
"name": "Alex",
"surname": "Boris",
"code": "RBAMRA4",
},
"title2": {
"company": "LA",
"state": "USA",
"example": "9090",
}
}
I'm trying to create a table like this
-------------------------------------------\
title0
--------------------------------------------\
id 1
ex 90
---------------------------------------------\
title 1
---------------------------------------------\
name Alex
surname Boris
code RBAMRA4
----------------------------------------------\
title2
----------------------------------------------\
company LA
state USA
example 9090
You can use the Python JSON library to achieve this.
import json
my_str = open("outfile").read()
val1 = json.loads(my_str)
for key in val1:
print("--------------------\n"+key+"\n--------------------")
for k in val1[key]:
print(k, val1[key][k])
Load the JSON data into the json.jsonloads function, this will deserialize the string and convert it to a python object (the whole object becomes a dict).
Then you loop through the dict the way you want.
--------------------
title0
--------------------
id 1
ex 90
--------------------
title1
--------------------
name Alex
surname Boris
code RBAMRA4
--------------------
title2
--------------------
company LA
state USA
example 9090
Read about parsing a dict, then you will understand the for loop.
Related
"description": ID|100|\nName|Sam|\nCity|New York City|\nState|New York|\nContact|1234567890|\nEmail|1234#yahoo.com|
This is how my code in json looks like. I wanted to convert this json file to excel sheet to split the nested column to separate columns and have used pandas for it, but couldn't achieve it. The output I want in my excel sheet is:
ID Name City State Contact Email
100 Sam New York City New York 1234567890 1234#yahoo.com
I want to remove those pipes and the solution should be in pandas. Please help me out with this.
The code I am trying:
I want output as:
The output on excel sheet:
[2]: https://i.stack.imgur.com/QjSUU.png
The list of dict column looks like:
"assignees": [{
"id": 1234,
"username": "xyz",
"name": "XYZ",
"state": "active",
"avatar_url": "aaaaaaaaaaaaaaa",
"web_url": "bbbbbbbbbbb"
},
{
"id": 5678,
"username": "abcd",
"name": "ABCD",
"state": "active",
"avatar_url": "hhhhhhhhhhh",
"web_url": "mmmmmmmmm"
}
],
This could be one way:
import pandas as pd
df = pd.read_json('Sample.json')
df2 = pd.DataFrame()
for i in df.index:
desc = df['description'][i]
attributes = desc.split("\n")
d = {}
for attrib in attributes:
if not(attrib.startswith('Name') or attrib.startswith('-----')):
kv = attrib.split("|")
d[kv[0]] = kv[1]
df2 = df2.append(d, ignore_index=True)
print(df2)
df2.to_csv("output.csv")
Output xls:
I want to write a program that will save information from the API, in the form of a JSON file. The API has the 'exchangeId' parameter. When I save information from the API, I want to save only those files in which the 'exchangeId' will be different and his value will be more then one. How can I make it? Please, give me hand.
My Code:
exchangeIds = {102,311,200,302,521,433,482,406,42,400}
for pair in json_data["marketPairs"]:
if (id := pair.get("exchangeId")):
if id in exchangeIds:
json_data["marketPairs"].append(pair)
exchangeIds.remove(id)
pairs.append({
"exchange_name": pair["exchangeName"],
"market_url": pair["marketUrl"],
"price": pair["price"],
"last_update" : pair["lastUpdated"],
"exchange_id": pair["exchangeId"]
})
out_object["name_of_coin"] = json_data["name"]
out_object["marketPairs"] = pairs
out_object["pairs"] = json_data["numMarketPairs"]
name = json_data["name"]
Example of ExchangeIds output, that I don't need:
{200} #with the one id in `ExchangeId`
Example of JSON output:
{
"name_of_coin": "Pax Dollar",
"marketPairs": [
{
"exchange_name": "Bitrue",
"market_url": "https://www.bitrue.com/trade/usdp_usdt",
"price": 1.0000617355334473,
"last_update": "2021-12-24T16:39:09.000Z",
"exchange_id": 433
},
{
"exchange_name": "Hotbit",
"market_url": "https://www.hotbit.io/exchange?symbol=USDP_USDT",
"price": 0.964348817699553,
"last_update": "2021-12-24T16:39:08.000Z",
"exchange_id": 400
}
],
"pairs": 22
} #this one of exapmle that I need, because there are two id
Facebook allows us to download our own content, and an option is to send it to a json file. I want to parse that file to pull specific comments I've made in a specific Facebook group. I have the comments.json file and have a short snippet of test code that can get the top layers of data. The lowest layer, where the group names and the actual comments are do not parse.
This is on Windows 10 using the IDLE python IDE (python version 3.5.2).
Here is a short sample of the json file -- anonymized:
{
"comments": [
{
"timestamp": 1564971950,
"data": [
{
"comment": {
"timestamp": 1564971950,
"comment": "Some Text Here",
"author": "My Name",
"group": "Group 1 Name"
}
}
],
"title": "My Name commented on Other Person's post."
},
{
"timestamp": 1564968688,
"data": [
{
"comment": {
"timestamp": 1564968688,
"comment": "Some More Text Here",
"author": "My Name",
"group": "Group 2 Name"
}
}
],
"title": "My Name replied to their own comment."
}
]
}
I want to select on the [comments][data][comment][group]. Here is the
short test python file code I tried:
import json
from datetime import datetime
with open('sample.json', 'r') as json_file:
data = json.load(json_file)
for j1 in data["comments"]:
for j2 in j1["data"]:
print(datetime.utcfromtimestamp(j1['timestamp']))
## for j3 in j2['comment']:
print(j2)
Which results in this output
2019-08-05 02:25:50
{'comment': {'group': 'Group 1 Name', 'comment': 'Some Text Here', 'author': 'My Name', 'timestamp': 1564971950}}
2019-08-05 01:31:28
{'comment': {'group': 'Group 2 Name', 'comment': 'Some More Text Here', 'author': 'My Name', 'timestamp': 1564968688}}
You can see the data is pulled in to j2. When I tried to grab that last level of data, the keys are grabbed, but not the values. The code for this:
import json
from datetime import datetime
with open('sample.json', 'r') as json_file:
data = json.load(json_file)
for j1 in data["comments"]:
for j2 in j1["data"]:
print(datetime.utcfromtimestamp(j1['timestamp']))
for j3 in j2['comment']:
print(j3)
And the output:
2019-08-05 02:25:50
group
timestamp
comment
author
2019-08-05 01:31:28
group
timestamp
comment
author
If I try to grab a specific key (like j3[group]), I get an error - TypeError: string indices must be integers
Which means the json library doesn't recognize this last level as keys and values properly. I can add the square brackets before and after that farthest right set of curly brackets in my sample file and get what I want to get with this code:
import json
from datetime import datetime
with open('sample2.json', 'r') as json_file:
data = json.load(json_file)
for j1 in data["comments"]:
for j2 in j1["data"]:
for j3 in j2['comment']:
if j3['group'] == "Group 1 Name":
print(datetime.utcfromtimestamp(j3['timestamp']))
print(j3['comment'])
Which, given I only ask for "Group 1 Name" I get this:
2019-08-05 02:25:50
Some Text Here
What I'd like to do, since I really don't want to manually edit a 56000 line json file to add all the missing square brackets, is there a way to parse j2 to pull the key/value pairs, as such, from that "comment" set.
import json
from datetime import datetime
with open('sample2.json', 'r') as json_file:
data = json.load(json_file)
for j1 in data["comments"]:
for j2 in j1["data"]:
for j3 in j2['comment']:
if j3['group'] == "Group 1 Name":
print(datetime.utcfromtimestamp(j3['timestamp']))
print(j3['comment'])
I expect to pull the data for comments in a specific facebook group from the user downloaded json file and having it output with the timestamp and comment text.
When I try to access that lowest level key/value set I get the error: TypeError: string indices must be integers
In Python, the default behavior when iterating over a dictionary by calling it's variable name is calling dict.keys().
It means that this statement:
for j3 in j2['comment']:
print(j3)
Actually equivalents to this:
for key in j2['comment'].keys():
print(key)
The reason you received TypeError is because the action j3['group'] was called on a string (dictionary key) and not a dictionary.
You managed to bypass this exception by changing the value of comment key from dictionary to list, so trying to iterate j2['comment'] actually returned a list of one dictionary:
for dictionary in j2['comment']:
print(dictionary['your_key'])
You can iterate over the key value pairs of j2 without changing the original JSON file by doing something like this:
import json
from datetime import datetime
with open('sample.json', 'r') as json_file:
data = json.load(json_file)
for j1 in data["comments"]:
for j2 in j1["data"]:
for k, v in j2['comment'].items():
print('Key: {0}, Value: {1}'.format(k,v))
If all you want is to only print comments from a specific group, based on your example, you don't really need to go into another nested loop, e.g:
import json
from datetime import datetime
with open('sample.json', 'r') as json_file:
data = json.load(json_file)
for j1 in data["comments"]:
for j2 in j1["data"]:
if j2['comment']['group'] == 'Group 1 Name':
print(j2['comment']['comment'])
I am using rest with a python script to extract Name and Start Time from a response.
I can get the information but I can't combine data so that the information is on the same line in a CSV. When I go to export them to CSV they all go on new lines.
There is probably a much better way to extract data from a JSON List.
for item in driverDetails['Query']['Results']:
for data_item in item['XValues']:
body.append(data_item)
for key, value in data_item.items():
#driver = {}
#test = {}
#startTime = {}
if key == "Name":
drivers.append(value)
if key == "StartTime":
drivers.append(value)
print (drivers)
Code to write to CSV:
with open(logFileName, 'a') as outcsv:
# configure writer to write standard csv file
writer = csv.writer(outcsv, delimiter=',', quotechar="'",
quoting=csv.QUOTE_MINIMAL, lineterminator='\n',skipinitialspace=True)
for driver in drivers:
writer.writerow(driver)
Here is a sample of the response:
"Query": {
"Results": [
{
"XValues": [
{
"ReportScopeStartTime": "2018-06-18T23:00:00Z"
},
{
"ReportScopeEndTime": "2018-06-25T22:59:59Z"
},
{
"ID": "1400"
},
{
"Name": " John Doe"
},
{
"StartTime": "2018-06-19T07:16:10Z"
},
],
},
"XValues": [
{
"ReportScopeStartTime": "2018-06-18T23:00:00Z"
},
{
"ReportScopeEndTime": "2018-06-25T22:59:59Z"
},
{
"ID": "1401"
},
{
"Name": " Jane Smith"
},
{
"StartTime": "2018-06-19T07:16:10Z"
},
],
},
My ouput in csv:
John Doe
2018-06-19T07:16:10Z
Jane Smith
2018-06-19T07:16:10Z
Desired Outcome:
John Doe, 2018-06-19T07:16:10Z
Jane Smith, 2018-06-19T07:16:10Z
Just use normal dictionary access to get the values:
for item in driverDetails['Query']['Results']:
for data_item in item['XValues']:
body.append(data_item)
if "Name" in data_item:
drivers.append(data_item["Name"])
if "StartTime" in data_item:
drivers.append(data_item["StartTime"])
print (drivers)
If you know the items will already have the required fields then you won't even need the in tests.
writer.writerow() expects a sequence. You are calling it with a single string as a parameter so it will split the string into individual characters. Probably you want to keep the name and start time together so extract them as a tuple:
for item in driverDetails['Query']['Results']:
name, start_time = "", ""
for data_item in item['XValues']:
body.append(data_item)
if "Name" in data_item:
name = data_item["Name"]
if "StartTime" in data_item:
start_time = data_item["StartTime"]
drivers.append((name, start_time))
print (drivers)
Now instead of being a list of strings, drivers is a list of tuples: the name for every item that has a name and the start time but if an input item has a name and no start time that field could be empty. Your code to write the csv file should now do the expected thing.
If you want to get all or most of the values try gathering them together into a single dictionary, then you can pull out the fields you want:
for item in driverDetails['Query']['Results']:
fields = {}
for data_item in item['XValues']:
body.append(data_item)
fields.update(data_item)
drivers.append((fields["ID"], fields["Name"], fields["StartTime"]))
print (drivers)
Once you have the fields in a single dictionary you could even build the tuple with a loop:
drivers.append(tuple(fields[f] for f in ("ID", "Name", "StartTime", "ReportScopeStartTime", "ReportScopeEndTime")))
I think you should list the fields you want explicitly just to ensure that new fields don't surprise you.
I was wondering how I could import a JSON file, and then save that to an ordered CSV file, with header row and the applicable data below.
Here's what the JSON file looks like:
[
{
"firstName": "Nicolas Alexis Julio",
"lastName": "N'Koulou N'Doubena",
"nickname": "N. N'Koulou",
"nationality": "Cameroon",
"age": 24
},
{
"firstName": "Alexandre Dimitri",
"lastName": "Song-Billong",
"nickname": "A. Song",
"nationality": "Cameroon",
"age": 26,
etc. etc. + } ]
Note there are multiple 'keys' (firstName, lastName, nickname, etc.). I would like to create a CSV file with those as the header, then the applicable info beneath in rows, with each row having a player's information.
Here's the script I have so far for Python:
import urllib2
import json
import csv
writefilerows = csv.writer(open('WCData_Rows.csv',"wb+"))
api_key = "xxxx"
url = "http://worldcup.kimonolabs.com/api/players?apikey=" + api_key + "&limit=1000"
json_obj = urllib2.urlopen(url)
readable_json = json.load(json_obj)
list_of_attributes = readable_json[0].keys()
print list_of_attributes
writefilerows.writerow(list_of_attributes)
for x in readable_json:
writefilerows.writerow(x[list_of_attributes])
But when I run that, I get a "TypeError: unhashable type:'list'" error. I am still learning Python (obviously I suppose). I have looked around online (found this) and can't seem to figure out how to do it without explicitly stating what key I want to print...I don't want to have to list each one individually...
Thank you for any help/ideas! Please let me know if I can clarify or provide more information.
Your TypeError is occuring because you are trying to index a dictionary, x with a list, list_of_attributes with x[list_of_attributes]. This is not how python works. In this case you are iterating readable_json which appears it will return a dictionary with each iteration. There is no need pull values out of this data in order to write them out.
The DictWriter should give you what your looking for.
import csv
[...]
def encode_dict(d, out_encoding="utf8"):
'''Encode dictionary to desired encoding, assumes incoming data in unicode'''
encoded_d = {}
for k, v in d.iteritems():
k = k.encode(out_encoding)
v = unicode(v).encode(out_encoding)
encoded_d[k] = v
return encoded_d
list_of_attributes = readable_json[0].keys()
# sort fields in desired order
list_of_attributes.sort()
with open('WCData_Rows.csv',"wb+") as csv_out:
writer = csv.DictWriter(csv_out, fieldnames=list_of_attributes)
writer.writeheader()
for data in readable_json:
writer.writerow(encode_dict(data))
Note:
This assumes that each entry in readable_json has the same fields.
Maybe pandas could do this - but I newer tried to read JSON
import pandas as pd
df = pd.read_json( ... )
df.to_csv( ... )
pandas.DataFrame.to_csv
pandas.io.json.read_json
EDIT:
data = ''' [
{
"firstName": "Nicolas Alexis Julio",
"lastName": "N'Koulou N'Doubena",
"nickname": "N. N'Koulou",
"nationality": "Cameroon",
"age": 24
},
{
"firstName": "Alexandre Dimitri",
"lastName": "Song-Billong",
"nickname": "A. Song",
"nationality": "Cameroon",
"age": 26,
}
]'''
import pandas as pd
df = pd.read_json(data)
print df
df.to_csv('results.csv')
result:
age firstName lastName nationality nickname
0 24 Nicolas Alexis Julio N'Koulou N'Doubena Cameroon N. N'Koulou
1 26 Alexandre Dimitri Song-Billong Cameroon A. Song
With pandas you can save it in csv, excel, etc (and maybe even directly in database).
And you can do some operations on data in table and show it as graph.