Exporting from dicts list to excel - python

Beginner here, I wrote a code where I'd like to export each and every dict inside a list to excel. For now it exports only the last one - name:evan, age:25. I have no idea why. Terminal shows all data but when I try to export it shows the last one. I'd like 'name' and 'age' to be column headers and the corresponding data below
import pandas as pd
people = [
{
'name':'corey',
'age':12,
},
{
'name':'matt',
'age':15,
},
{
'name':'evan',
'age':25
}]
for person in range(len(people)):
print(people[person]['name'], people[person]['age'])
excel_dict = {}
for s in range(len(people)):
excel_dict['p_name'] = (people[s]['name'])
excel_dict['p_age'] = (people[s]['age'])
df = pd.DataFrame(data=excel_dict, index=[0])
df = (df.T)
print (df)
df.to_excel('dict1.xlsx')

Try this solution:
import pandas as pd
people = [
{
'name':'corey',
'age':12,
},
{
'name':'matt',
'age':15,
},
{
'name':'evan',
'age':25
}]
for person in people:
print(person['name'], person['age'])
df = pd.DataFrame(people)
print(df)
df.to_excel('dict1.xlsx')
Output:
corey 12
matt 15
evan 25
name age
0 corey 12
1 matt 15
2 evan 25

Related

How can I convert nested dictionary to pd.dataframe faster?

I have a json file which looks like this
{
"file": "name",
"main": [{
"question_no": "Q.1",
"question": "what is ?",
"answer": [{
"user": "John",
"comment": "It is defined as",
"value": [
{
"my_value": 5,
"value_2": 10
},
{
"my_value": 24,
"value_2": 30
}
]
},
{
"user": "Sam",
"comment": "as John said above it simply means",
"value": [
{
"my_value": 9,
"value_2": 10
},
{
"my_value": 54,
"value_2": 19
}
]
}
],
"closed": "no"
}]
}
desired result:
Question_no question my_value_sum value_2_sum user comment
Q.1 what is ? 29 40 john It is defined as
Q.1 what is ? 63 29 Sam as John said above it simply means
What I have tried is data = json_normalize(file_json, "main") and then using a for loop like
for ans, row in data.iterrows():
....
....
df = df.append(the data)
But the issue using this is that it is taking a lot of time that my client would refuse the solution. there is around 1200 items in the main list and there are 450 json files like this to convert. So this intermediate process of conversion would take almost an hour to complete.
EDIT:
is it possible to get the sum of the my_value and value_2 as a column? (updated the desired result also)
Select dictionary by main with parameter record_path and meta:
data = pd.json_normalize(file_json["main"],
record_path='answer',
meta=['question_no', 'question'])
print (data)
user comment question_no question
0 John It is defined as Q.1 what is ?
1 Sam as John said above it simply means Q.1 what is ?
Then if order is important convert last N columns to first positions:
N = 2
data = data[data.columns[-N:].tolist() + data.columns[:-N].tolist()]
print (data)
question_no question user comment
0 Q.1 what is ? John It is defined as
1 Q.1 what is ? Sam as John said above it simply means

Pandas DataFrame created for each row

I am attempting to pass data in JSON from an API to a Pandas DataFrame. I could not get pandas.read_json to work with the API data so I'm sure it's not the best solution, but I currently have for loop running through the JSON to extract the values I want.
Here is what I have:
import json
import urllib.request
import pandas as pd
r = urllib.request.urlopen("https://graph.facebook.com/v3.1/{page-id}/insights?access_token={access-token}&pretty=0&metric=page_impressions%2cpage_engaged_users%2cpage_fans%2cpage_video_views%2cpage_posts_impressions").read()
output = json.loads(r)
for item in output['data']:
name = item['name']
period = item['period']
value = item['values'][0]['value']
df = [{'Name': name, 'Period': period, 'Value': value}]
df = pd.DataFrame(df)
print(df)
And here is an excerpt of the JSON from the API:
{
"data": [
{
"name": "page_video_views",
"period": "day",
"values": [
{
"value": 634,
"end_time": "2018-11-23T08:00:00+0000"
},
{
"value": 465,
"end_time": "2018-11-24T08:00:00+0000"
}
],
"title": "Daily Total Video Views",
"description": "Daily: Total number of times videos have been viewed for more than 3 seconds. (Total Count)",
"id": "{page-id}/insights/page_video_views/day"
},
The issue I am now facing is because of the For Loop (I believe), each row of data is being inserted into its own DataFrame like so:
Name Period Value
0 page_video_views day 465
Name Period Value
0 page_video_views week 3257
Name Period Value
0 page_video_views days_28 9987
Name Period Value
0 page_impressions day 1402
How can I pass all of them easily into the same DataFrame like so?
Name Period Value
0 page_video_views day 465
1 page_video_views week 3257
2 page_video_views days_28 9987
3 page_impressions day 1402
Again, I know this most likely isn't the best solution so any suggestions on how to improve any aspect are very welcome.
You can create list of dictionaries and pass to DataFrame constructor:
L = []
for item in output['data']:
name = item['name']
period = item['period']
value = item['values'][0]['value']
L.append({'Name': name, 'Period': period, 'Value': value})
df = pd.DataFrame(L)
Or use list comprehension:
L = [({'Name': item['name'], 'Period': item['period'], 'Value': item['values'][0]['value']})
for item in output['data']]
df = pd.DataFrame(L)
print (df)
Name Period Value
0 page_video_views day 634
Sample for testing:
output = {
"data": [
{
"name": "page_video_views",
"period": "day",
"values": [
{
"value": 634,
"end_time": "2018-11-23T08:00:00+0000"
},
{
"value": 465,
"end_time": "2018-11-24T08:00:00+0000"
}
],
"title": "Daily Total Video Views",
"description": "Daily: Total number of times videos have been viewed for more than 3 seconds. (Total Count)",
"id": "{page-id}/insights/page_video_views/day"
}]}
Try to convert dictionary after json loading to dataframe like:
output = json.loads(r)
df = pd.DataFrame.from_dict(output , orient='index')
df.reset_index(level=0, inplace=True)
If you are taking the data from the url. I would suggest this approach and passing only the data stored under an attribute
import request
data=request.get("url here").json('Period')
Period is now dictionary you can now call the pd.DataFrame.from_dict(data) to parse the data
df = pd.DataFrame.from_dict(Period)

How can we convert JSON to .csv file using python ? my json value has array of data and dictionary formation as :

{
"userName" : "Jhon",
"status" : "success",
"id" : 1234,
"myData" : {
"data1": [1,2,3,4],
"data2": [1,2,3,4],
"data3": [1,2,3,4],
"data4": 25,
"data5" : 12
},
"currentStatus" : true
}
How this data is converted to tabler form?
userName Status Id data1 data2 data3 data4 data5 currentStatus
Jhon success 1234 1 1 1 25 12 true
Jhon success 1234 2 2 2 25 12 true
Jhon success 1234 3 3 3 25 12 true
Jhon success 1234 4 4 4 25 12 true
tabluar form should be above pattern. How can this done using python? can anyone help me out.
To simplify the loop when writing to the CSV file, replace all the dataX items that are just numbers with a list of 4 copies of the number. That way you can index all of them the same way.
import json
import csv
json = '''
{
"userName" : "Jhon",
"status" : "success",
"id" : 1234,
"myData" : {
"data1": [1,2,3,4],
"data2": [1,2,3,4],
"data3": [1,2,3,4],
"data4": 25,
"data5" : 12
},
"currentStatus" : true
}'''
data = json.loads(json)
for key, val in data['myData']:
if type(val) is not list:
data['myData'][key] = [val]*4 # convert scalar to list
with open("output.csv", "w") as f:
csvfile = csv.writer(f)
# write header row
csvfile.writerow(['userName', 'Status', 'Id'] + data['myData'].keys() + ['currentStatus'])
prefix = [data['userName'], data['status'], data['id']
suffix = [data['currentStatus']]
for i in range(4):
row = prefix[:]
for d in data['myData']:
row.append(d[i])
row += suffix
csvfile.writerow(row)
There's probably a simpler way to transpose the dictionary of lists into a 2-dimensional list.
Not a python answer, and just for information, using jq command line parser:
jq -r '(["userName","status","id","data1",
"data2","data3","data4","data5","currentStatus"], # header string
range(0;.myData.data1|length) as $i| # $i=table index
[.userName,.status,.id,.myData.data1[$i],
.myData.data2[$i],.myData.data3[$i],
.myData.data4,.myData.data5,.currentStatus]) | # extract values
#tsv # format as tab separated value
' file | column -t # display in column
This assumes that the number of element of the data1 array is same for all other arrays.

Write json format using pandas Series and DataFrame

I'm working with csvfiles. My goal is to write a json format with csvfile information. Especifically, I want to get a similar format as miserables.json
Example:
{"source": "Napoleon", "target": "Myriel", "value": 1},
According with the information I have the format would be:
[
{
"source": "Germany",
"target": "Mexico",
"value": 1
},
{
"source": "Germany",
"target": "USA",
"value": 2
},
{
"source": "Brazil",
"target": "Argentina",
"value": 3
}
]
However, with the code I used the output looks as follow:
[
{
"source": "Germany",
"target": "Mexico",
"value": 1
},
{
"source": null,
"target": "USA",
"value": 2
}
][
{
"source": "Brazil",
"target": "Argentina",
"value": 3
}
]
Null source must be Germany. This is one of the main problems, because there are more cities with that issue. Besides this, the information is correct. I just want to remove several list inside the format and replace null to correct country.
This is the code I used using pandas and collections.
csvdata = pandas.read_csv('file.csv', low_memory=False, encoding='latin-1')
countries = csvdata['country'].tolist()
newcountries = list(set(countries))
for element in newcountries:
bills = csvdata['target'][csvdata['country'] == element]
frquency = Counter(bills)
sourceTemp = []
value = []
country = element
for k,v in frquency.items():
sourceTemp.append(k)
value.append(int(v))
forceData = {'source': Series(country), 'target': Series(sourceTemp), 'value': Series(value)}
dfForce = DataFrame(forceData)
jsondata = dfForce.to_json(orient='records', force_ascii=False, default_handler=callable)
parsed = json.loads(jsondata)
newData = json.dumps(parsed, indent=4, ensure_ascii=False, sort_keys=True)
# since to_json doesn´t have append mode this will be written in txt file
savetxt = open('data.txt', 'a')
savetxt.write(newData)
savetxt.close()
Any suggestion to solve this problem are appreciate!
Thanks
Consider removing the Series() around the scalar value, country. By doing so and then upsizing the dictionaries of series into a dataframe, you force NaN (later converted to null in json) into the series to match the lengths of other series. You can see this by printing out the dfForce dataframe:
from pandas import Series
from pandas import DataFrame
country = 'Germany'
sourceTemp = ['Mexico', 'USA', 'Argentina']
value = [1, 2, 3]
forceData = {'source': Series(country),
'target': Series(sourceTemp),
'value': Series(value)}
dfForce = DataFrame(forceData)
# source target value
# 0 Germany Mexico 1
# 1 NaN USA 2
# 2 NaN Argentina 3
To resolve, simply keep country as scalar in dictionary of series:
forceData = {'source': country,
'target': Series(sourceTemp),
'value': Series(value)}
dfForce = DataFrame(forceData)
# source target value
# 0 Germany Mexico 1
# 1 Germany USA 2
# 2 Germany Argentina 3
By the way, you do not need a dataframe object to output to json. Simply use a list of dictionaries. Consider the following using an Ordered Dictionary collection (to maintain the order of keys). In this way the growing list dumps into a text file without appending which would render an invalid json as opposite facing adjacent square brackets ...][... are not allowed.
from collections import OrderedDict
...
data = []
for element in newcountries:
bills = csvdata['target'][csvdata['country'] == element]
frquency = Counter(bills)
for k,v in frquency.items():
inner = OrderedDict()
inner['source'] = element
inner['target'] = k
inner['value'] = int(v)
data.append(inner)
newData = json.dumps(data, indent=4)
with open('data.json', 'w') as savetxt:
savetxt.write(newData)

How to take the first two letters of a variable in python?

I have a dataset like below:
data
id message
1 ffjffjf
2 ddjbgg
3 vvnvvv
4 eevvff
5 ggvfgg
Expected output:
data
id message splitmessage
1 ffjffjf ff
2 ddjbgg dd
3 vvnvvv vv
4 eevvff ee
5 ggvfgg gg
I am very new to Python. So how can I take 1st two letters from each row in splitmessage variable.
my data exactly looks like below image
so from the image i want only hour and min's which are 12 to 16 elements in each row of vfreceiveddate variable.
dataset = [
{ "id": 1, "message": "ffjffjf" },
{ "id": 2, "message": "ddjbgg" },
{ "id": 3, "message": "vvnvvv" },
{ "id": 4, "message": "eevvff" },
{ "id": 5, "message": "ggvfgg" }
]
for d in dataset:
d["splitmessage"] = d["message"][:2]
What you want is a substring.
mystr = str(id1)
splitmessage = mystr[:2]
Method we are using is called slicing in python, works for more than strings.
In next example 'message' and 'splitmessage' are lists.
message = ['ffjffjf', 'ddjbgg']
splitmessage = []
for myString in message:
splitmessage.append(myString[:2])
print splitmessage
If you want to do it for the entire table, I would do it like this:
dataset = [id1, id2, id3 ... ]
for id in dataset:
splitid = id[:2]

Categories

Resources