How to get json from csv file in row wise using flask - python

I have written a code like :
import pandas as pd
import numpy as np
import json
from flask import Flask,request,jsonify
app = Flask(__name__)
#app.route('/df',methods=['POST','GET'])
def ff():
df = pd.read_csv(r'dataframe_post.csv')
row = [5, 'Sanjeev', 'AE']
df.loc[len(df)] = row
# print(dfs)
ls=list(df.to_dict().values())
return jsonify(ls)
if __name__ == '__main__':
app.run(debug=True)
and I am getting output as :
enter image description here
i.e all data is shown column-wise. But i want to display data as row wise. i.e. each entry individually
like;
[
{
"id": 1,
"name": "Preeti",
"2": "CSE",
},
{
"id": 2,
"name": "Chinky",
"2": "CE",
},
|
|
|
|
|
|
]
and so on.

To return json in your desired format you can use the built in dataframe method instead of listing and jsonifying:
df.to_json(orient="records")
This will give you a json encoded string as in the example below:
df = pd.DataFrame([[5, 'Sanjeev', 'AE'], [6, 'Sven', 'AA']], columns = ["id", "name", "2"])
Which returns:
id name 2
0 5 Sanjeev AE
1 6 Sven AA
And then as JSON:
df.to_json(orient="records")
'[{"id":5,"name":"Sanjeev","2":"AE"},{"id":6,"name":"Sven","2":"AA"}]'

In your df.to_dict call, use to_dict(orient='records') which will build the json row-wise

Related

Json file to pandas data frame

I have a JSON file look like below.
myjson= {'data': [{'ID': 'da45e00ca',
'name': 'June_2016',
'objCode': 'ased',
'percentComplete': 4.17,
'plannedCompletionDate': '2021-04-29T10:00:00:000-0500',
'plannedStartDate': '2020-04-16T23:00:00:000-0500',
'priority': 4,
'asedectedCompletionDate': '2022-02-09T10:00:00:000-0600',
'status': 'weds'},
{'ID': '10041ce23c',
'name': '2017_Always',
'objCode': 'ased',
'percentComplete': 4.17,
'plannedCompletionDate': '2021-10-22T10:00:00:000-0600',
'plannedStartDate': '2021-08-09T23:00:00:000-0600',
'priority': 3,
'asedectedCompletionDate': '2023-12-30T11:05:00:000-0600',
'status': 'weds'},
{'ID': '10041ce23ca',
'name': '2017_Always',
'objCode': 'ased',
'percentComplete': 4.17,
'plannedCompletionDate': '2021-10-22T10:00:00:000-0600',
'plannedStartDate': '2021-08-09T23:00:00:000-0600',
'priority': 3,
'asedectedCompletionDate': '2023-12-30T11:05:00:000-0600',
'status': 'weds'}]}
I was trying to normalize it convert it to pandas DF using the below code but doesn't seem to come correct
from pandas.io.json import json_normalize
reff = json_normalize(myjson)
df = pd.DataFrame(data=reff)
df
Can someone have any idea what I'm doing wrong? Thanks in advance!
Try:
import pandas as pd
reff = pd.json_normalize(myjson['data'])
df = pd.DataFrame(data=reff)
df
You forgot to pull your data out of myjson. json_normalize() will iterate through the most outer-layer of your JSON.
This method first normalizes the json data and then converts it into the pandas dataframe. You would have to import this method from the pandas module.
Step 1 - Load the json data
json.loads(json_string)
Step 2 - Pass the loaded data into json_normalize() method
json_normalize(json.loads(json_string))
Example:
import pandas as pd
import json
# Create json string
# with student details
json_string = '''
[
{ "id": "1", "name": "sravan","age":22 },
{ "id": "2", "name": "harsha","age":22 },
{ "id": "3", "name": "deepika","age":21 },
{ "id": "4", "name": "jyothika","age":23 }
]
'''
# Load json data and convert to Dataframe
df = pd.json_normalize(json.loads(json_string))
# Display the Dataframe
print(df)
Output:
id name age
0 1 sravan 22
1 2 harsha 22
2 3 deepika 21
3 4 jyothika 23

Exporting from dicts list to excel

Beginner here, I wrote a code where I'd like to export each and every dict inside a list to excel. For now it exports only the last one - name:evan, age:25. I have no idea why. Terminal shows all data but when I try to export it shows the last one. I'd like 'name' and 'age' to be column headers and the corresponding data below
import pandas as pd
people = [
{
'name':'corey',
'age':12,
},
{
'name':'matt',
'age':15,
},
{
'name':'evan',
'age':25
}]
for person in range(len(people)):
print(people[person]['name'], people[person]['age'])
excel_dict = {}
for s in range(len(people)):
excel_dict['p_name'] = (people[s]['name'])
excel_dict['p_age'] = (people[s]['age'])
df = pd.DataFrame(data=excel_dict, index=[0])
df = (df.T)
print (df)
df.to_excel('dict1.xlsx')
Try this solution:
import pandas as pd
people = [
{
'name':'corey',
'age':12,
},
{
'name':'matt',
'age':15,
},
{
'name':'evan',
'age':25
}]
for person in people:
print(person['name'], person['age'])
df = pd.DataFrame(people)
print(df)
df.to_excel('dict1.xlsx')
Output:
corey 12
matt 15
evan 25
name age
0 corey 12
1 matt 15
2 evan 25

how to extract columns for dictionary that do not have keys

so I have tried resources of how transform dict in data frame, but the problem this is an weird Dict.
it is not like key: {} , key: {} and etc..
the data has lots of items. But the goal is extract only the stuff inside of dict {}, if possible the dates also is a plus.
data:
id,client,source,status,request,response,queued,created_at,updated_at
54252,sdf,https://asdasdadadad,,"{
"year": "2010",
"casa": "aca",
"status": "p",
"Group": "57981",
}",,1,"2020-05-02 11:06:17","2020-05-02 11:06:17"
54252,msc-lp,https://discover,,"{
"year": "27",
"casa": "Na",
"status": "p",
"Group": "57981",
}"
my attempts:
#attempt 1
with open('data.csv') as fd:
pairs = (line.split(None) for line in fd)
res = {int(pair[0]):pair[1] for pair in pairs if len(pair) == 2 and pair[0].isdigit()}
#attempt 2
import json
# reading the JSON data using json.load()
file = 'data.json'
with open(file) as train_file:
dict_train = json.load(train_file)
# converting json dataset from dictionary to dataframe
train = pd.DataFrame.from_dict(dict_train, orient='index')
train.reset_index(level=0, inplace=True)
#attempt 3
df = pd.read_csv("data.csv")
df = df.melt(id_vars=["index", "Date"], var_name="variables",value_name="values")
Nothening works due the data be weird shaped
Expected output:
All the items inside of the dictionary, every key will be one column at df
Date year casa status Group
2020-05-02 11:06:17 2010 aca p 57981
2020-05-02 11:06:17 27 Na p 57981
Format data into a valid csv stucture:
id,client,source,status,request,response,queued,created_at,updated_at
54252,sdf,https://asdasdadadad,,'{ "ag": "2010", "ca": "aca", "ve": "p", "Group": "57981" }',,1,"2020-05-02 11:06:17","2020-05-02 11:06:17"
54252,msc-lp,https://discover,,'{ "ag": "27", "ca": "Na", "ve": "p", "Group": "57981" }',,1,"2020-05-02 11:06:17","2020-05-02 11:06:17"
This should work for the worst-case scenario as well,
check it out.
import json
import pandas as pd
def parse_column(data):
try:
return json.loads(data)
except Exception as e:
print(e)
return None
df =pd.read_csv('tmp.csv',converters={"request":parse_column}, quotechar="'")

Using Pandas to Extract config file (Looks like K/V but not)

I have a config file in the format below:
Models{
Model1{
Description = "xxxx"
Feature = "yyyy"
EventType = [
"Type1",
"Type2"]
}
Model2{
Description = "aaaa"
Feature = "bbbb"
EventType = [
"Type3",
"Type4"]
}
}
Is there a way to transform this into a dataframe as below?
|Model | Description | Feature | EventType |
------------------------------------------------
|Model1 | xxxx | yyyy | Type1, Type2 |
|Model2 | aaaa | bbbb | Type3, Type4 |
First you should convert it into standard JSON format. You can accomplish that using regex:
with open('untitled.txt') as f:
data = f.read()
import re
# Converting into JSON format
data = re.sub(r'(=\s*".*")\n', r'\1,\n', data)
data = re.sub(r'(Description|Feature|EventType)', r'"\1"', data)
data = re.sub(r'}(\s*Model[0-9]+)', r'},\1', data)
data = re.sub(r'(Model[0-9]+)', r'"\1"=', data)
data = re.sub(r'(Models)', r'', data)
data = re.sub(r'=', r':', data)
Your file will look like this:
{
"Model1":{
"Description" : "xxxx",
"Feature" : "yyyy",
"EventType" : [
"Type1",
"Type2"]
},
"Model2":{
"Description" : "aaaa",
"Feature" : "bbbb",
"EventType" : [
"Type3",
"Type4"]
}
}
Then, use pd.read_json to read it:
import pandas as pd
from io import StringIO
df = pd.read_json(StringIO(data), orient='index').reset_index()
# index Description EventType Feature
#0 Model1 xxxx [Type1, Type2] yyyy
#1 Model2 aaaa [Type3, Type4] bbbb

Write json format using pandas Series and DataFrame

I'm working with csvfiles. My goal is to write a json format with csvfile information. Especifically, I want to get a similar format as miserables.json
Example:
{"source": "Napoleon", "target": "Myriel", "value": 1},
According with the information I have the format would be:
[
{
"source": "Germany",
"target": "Mexico",
"value": 1
},
{
"source": "Germany",
"target": "USA",
"value": 2
},
{
"source": "Brazil",
"target": "Argentina",
"value": 3
}
]
However, with the code I used the output looks as follow:
[
{
"source": "Germany",
"target": "Mexico",
"value": 1
},
{
"source": null,
"target": "USA",
"value": 2
}
][
{
"source": "Brazil",
"target": "Argentina",
"value": 3
}
]
Null source must be Germany. This is one of the main problems, because there are more cities with that issue. Besides this, the information is correct. I just want to remove several list inside the format and replace null to correct country.
This is the code I used using pandas and collections.
csvdata = pandas.read_csv('file.csv', low_memory=False, encoding='latin-1')
countries = csvdata['country'].tolist()
newcountries = list(set(countries))
for element in newcountries:
bills = csvdata['target'][csvdata['country'] == element]
frquency = Counter(bills)
sourceTemp = []
value = []
country = element
for k,v in frquency.items():
sourceTemp.append(k)
value.append(int(v))
forceData = {'source': Series(country), 'target': Series(sourceTemp), 'value': Series(value)}
dfForce = DataFrame(forceData)
jsondata = dfForce.to_json(orient='records', force_ascii=False, default_handler=callable)
parsed = json.loads(jsondata)
newData = json.dumps(parsed, indent=4, ensure_ascii=False, sort_keys=True)
# since to_json doesn´t have append mode this will be written in txt file
savetxt = open('data.txt', 'a')
savetxt.write(newData)
savetxt.close()
Any suggestion to solve this problem are appreciate!
Thanks
Consider removing the Series() around the scalar value, country. By doing so and then upsizing the dictionaries of series into a dataframe, you force NaN (later converted to null in json) into the series to match the lengths of other series. You can see this by printing out the dfForce dataframe:
from pandas import Series
from pandas import DataFrame
country = 'Germany'
sourceTemp = ['Mexico', 'USA', 'Argentina']
value = [1, 2, 3]
forceData = {'source': Series(country),
'target': Series(sourceTemp),
'value': Series(value)}
dfForce = DataFrame(forceData)
# source target value
# 0 Germany Mexico 1
# 1 NaN USA 2
# 2 NaN Argentina 3
To resolve, simply keep country as scalar in dictionary of series:
forceData = {'source': country,
'target': Series(sourceTemp),
'value': Series(value)}
dfForce = DataFrame(forceData)
# source target value
# 0 Germany Mexico 1
# 1 Germany USA 2
# 2 Germany Argentina 3
By the way, you do not need a dataframe object to output to json. Simply use a list of dictionaries. Consider the following using an Ordered Dictionary collection (to maintain the order of keys). In this way the growing list dumps into a text file without appending which would render an invalid json as opposite facing adjacent square brackets ...][... are not allowed.
from collections import OrderedDict
...
data = []
for element in newcountries:
bills = csvdata['target'][csvdata['country'] == element]
frquency = Counter(bills)
for k,v in frquency.items():
inner = OrderedDict()
inner['source'] = element
inner['target'] = k
inner['value'] = int(v)
data.append(inner)
newData = json.dumps(data, indent=4)
with open('data.json', 'w') as savetxt:
savetxt.write(newData)

Categories

Resources