Json data ordering PANDAS, python - python

I have json like this:
json = {
"b": 22,
"x": 12,
"a": 2,
"c": 4
}
When i generate an Excel file from this json like this:
import pandas as pd
df = pd.read_json(json_text)
file_name = 'test.xls'
file_path = "/tmp/" + file_name
df.to_excel(file_path, index=False)
print("path to excel " + file_path)
Pandas does its own ordering in the Excel file like this:
pandas_json = {
"a": 2,
"b": 22,
"c": 4,
"x": 12
}
I don't want this. I need the ordering which exists in the json. Please give me some advice how to do this.
UPDATE:
if i have json like this:
json = [
{"b": 22, "x":12, "a": 2, "c": 4},
{"b": 22, "x":12, "a": 2, "c": 2},
{"b": 22, "x":12, "a": 4, "c": 4},
]
pandas will generate its own ordering like this:
panas_json = [
{"a": 2, "b":22, "c": 4, "x": 12},
{"a": 2, "b":22, "c": 2, "x": 12},
{"a": 4, "b":22, "c": 4, "x": 12},
]
How can I make pandas preserve my own ordering?

You can read the json as OrderedDict which will help to retain original order:
import json
from collections import OrderedDict
json_ = """{
"b": 22,
"x": 12,
"a": 2,
"c": 4
}"""
data = json.loads(json_, object_pairs_hook=OrderedDict)
pd.DataFrame.from_dict(data,orient='index')
0
b 22
x 12
a 2
c 4
Edit, updated json also works:
j="""[{"b": 22, "x":12, "a": 2, "c": 4},
{"b": 22, "x":12, "a": 2, "c": 2},{"b": 22, "x":12, "a": 4, "c": 4}]"""
data = json.loads(j, object_pairs_hook=OrderedDict)
pd.DataFrame.from_dict(data).to_json(orient='records')
'[{"b":22,"x":12,"a":2,"c":4},{"b":22,"x":12,"a":2,"c":2},
{"b":22,"x":12,"a":4,"c":4}]'

Related

Can someone explain what am I doing wring in converting this dataframe to dictionary in Python

index print_type_solid print_type_floral cluster
A 10 10 2
B 20 20 2
A 10 10 3
B 20 20 3
C 25 30 3
Can someone help me convert the above dataframe into the following nested dictionary where the cluster becomes the main key and and the print_type_x as key and then the values as shown in the expected output below ?
{
"2" :{
"print_type_solid" : {
"A": 10,
"B": 20
},
"print_type_floral" : {
"A": 10,
"B": 20
}
},
"3" :{
"print_type_solid" : {
"A": 10,
"B": 20,
"C": 25,
},
"print_type_floral" : {
"A": 10,
"B": 20,
"C": 30,
}
}
}
I tried this :
from collections import defaultdict
d = defaultdict()
d2={}
for k1, s in dct.items():
for k2, v in s.items():
for k3, r in v.items():
d.setdefault(k3, {})[k2] = r
d2[k1]=d
But I'm getting this :
{
"2" :{
"print_type_solid" : {
"A": 10,
"B": 20,
"C": 25
},
"print_type_floral" : {
"A": 10,
"B": 20,
"C": 30
}
},
"3" :{
"print_type_solid" : {
"A": 10,
"B": 20,
"C": 25,
},
"print_type_floral" : {
"A": 10,
"B": 20,
"C": 30,
}
}
}
And this is wrong because I'm getting C also in the dictionary for cluster 2.
You can use df.iterrows() to iterate your dataframe row-wise. To create the dictionary you can use this:
import pandas as pd
df = pd.DataFrame( {"index":list("ABABC"),
"print_type_solid":[10,20,10,20,25],
"print_type_floral":[10,20,10,20,30],
"cluster":[2,2,3,3,3] })
print(df)
d = {}
pts = "print_type_solid"
ptf = "print_type_floral"
for idx, row in df.iterrows():
key = d.setdefault(row["cluster"],{})
key_pts = key.setdefault(pts,{})
key_pts[row["index"]] = row[pts]
key_ptf = key.setdefault(ptf,{})
key_ptf[row["index"]] = row[ptf]
from pprint import pprint
pprint(d)
Output:
# df
index print_type_solid print_type_floral cluster
0 A 10 10 2
1 B 20 20 2
2 A 10 10 3
3 B 20 20 3
4 C 25 30 3
# dict
{2: {'print_type_floral': {'A': 10, 'B': 20},
'print_type_solid': {'A': 10, 'B': 20}},
3: {'print_type_floral': {'A': 10, 'B': 20, 'C': 30},
'print_type_solid': {'A': 10, 'B': 20, 'C': 25}}}
You could also use collections.defaultdict - but for that few datapoints this is not needed.

Python Count JSON key values

I have a dataframe df with column 'ColumnA'. How do i count the keys in this column using python.
df = pd.DataFrame({
'ColA': [{
"a": 10,
"b": 5,
"c": [1, 2, 3],
"d": 20
}, {
"f": 1,
"b": 3,
"c": [0],
"x": 71
}, {
"a": 1,
"m": 99,
"w": [8, 6],
"x": 88
}, {
"a": 9,
"m": 99,
"c": [3],
"x": 55
}]
})
Here i want to calculate count for each key like this. Then visualise the frequency using a chart
Expected Answers :
a=3,
b=2,
c=3,
d=1,
f=1,
x=3,
m=2,
w=1
try this, Series.explode transform's list-like to a row, Series.value_counts to get counts of unique values, Series.plot to create plot out of the series generated.
df.ColA.apply(lambda x : list(x.keys())).explode().value_counts()
a 3
c 3
x 3
b 2
m 2
f 1
d 1
w 1
Name: ColA, dtype: int64

Pandas: read_json Changes Data

I have a json file that I'm trying to read into Pandas. The file looks like this:
{"0": {"a": 0, "b": "some_text", "c": "other_text"},
"1": {"a": 1, "b": "some_text1", "c": "other_text1"},
"2": {"a": 2, "b": "some_text2", "c": "other_text2"}}
When I do:
df = pd.read_json("my_file.json")
df = df.transpose()
df.head()
I see:
a b c
0 0 some_text other_text
1 1 some_text1 other_text1
10 10 some_text2 other_text2
So the dataframe's index and column a have somehow gotten mangled in the process. What am I doing incorrectly?
Thanks!

Mongoengine aggregation return empty cursor

If I execute aggregation query with matched expression:
>>> devices = [1,2,3,4,5,6] # devices ID's
>>> c = View.objects.aggregate(
{"$match": {"d": {"$in": devices},"f": {"$ne": 1}}},
{"$group":{'_id':"uniqueDocs",'count':{"$sum":1}}}
)
I'm getting result:
>>> list(c)
[{u'count': 2874791, u'_id': u'uniqueDocs'}]
But if execute query with expression not matched:
>>> now = datetime.utcnow().replace(tzinfo=tz.gettz('UTC'))
>>> current_hour_start = now.replace(minute=0, second=0, microsecond=0)
>> c = View.objects.aggregate(
{"$match": {"d": {"$in": devices}, "b": {"$gte": current_hour_start}, "f": {"$ne": 1}}},
{"$group": {'_id': "uniqueDocs", 'count': {"$sum": 1}}})
I'm getting empty cursor:
list(c)
[]
How me get zero count?
as:
>>> list(c)
[{u'count': 0, u'_id': u'uniqueDocs'}]
Update:
Example dataset and expected result.
>>> View.objects()
{
_id: ObjectId("578f79b877824688fc0d68ed") }, {
$set: {
"d": 1, /* device ID */
"i": 1899,
"s": 1,
"a": 1,
"m": 0,
"f": 0,
"b": ISODate("2016-07-20T08:35:56.066Z"), /* begin time */
"e": ISODate("2016-07-20T08:35:57.965Z") /* end time */
}
},
{
_id: ObjectId("578f79b877824688fc0d68ee") }, {
$set: {
"d": 2,
"i": 2456,
"s": 1,
"a": 1,
"m": 0,
"f": 0,
"b": ISODate("2016-07-20T08:37:26.066Z"),
"e": ISODate("2016-07-20T08:37:28.965Z")
}
},
{
_id: ObjectId("578f79b877824688fc0d68ef") }, {
$set: {
"d": 1000,/* !!! ignore this document (no matched device ID) */
"i": 2567,
"s": 1,
"a": 1,
"m": 0,
"f": 0,
"b": ISODate("2016-07-20T08:35:56.066Z"),
"e": ISODate("2016-07-20T08:35:57.965Z")
}
}
>>> c = View.objects.aggregate(
{"$match": {"d": {"$in": devices},"f": {"$ne": 1}}},
{"$group":{'_id':"uniqueDocs",'count':{"$sum":1}}}
).next()['count']
2

python map dictionary to array

I have a list of data of the form:
[line1,a]
[line2,c]
[line3,b]
I want to use a mapping of a=5, c=15, b=10:
[line1,5]
[line2,15]
[line3,10]
I am trying to use this code, which I know is incorrect, can someone guide me on how to best achieve this:
mapping = {"a": 5, "b": 10, "c": 15}
applyMap = [line[1] = 'a' for line in data]
Thanks
EDIT:
Just to clarify here, for one line, however I want this mapping to occur to all lines in the file:
Input: ["line1","a"]
Output: ["line1",5]
You could try with a list comprehension.
lines = [
["line1", "much_more_items1", "a"],
["line2", "much_more_items2", "c"],
["line3", "much_more_items3", "b"],
]
mapping = {"a": 5, "b": 10, "c": 15}
# here I assume the key you need to remove is at last position of your items
result = [ line[0:-1] + [mapping[line[-1]] for line in lines ]
Try something like this:
data = [
['line1', 'a'],
['line2', 'c'],
['line3', 'b'],
]
mapping = {"a": 5, "b": 10, "c": 15}
applyMap = [[line[0], mapping[line[1]]] for line in data]
print applyMap
>>> data = [["line1", "a"], ["line2", "b"], ["line3", "c"]]
>>> mapping = { "a": 5, "b": 10, "c": 15}
>>> [[line[0], mapping[line[1]]] for line in data]
[['line1', 5], ['line2', 10], ['line3', 15]]
lineMap = {'line1': 'a', 'line2': 'b', 'line3': 'c'}
cha2num = {'a': 5, 'b': 10, 'c': 15}
result = [[key,cha2num[lineMap[key]]] for key in lineMap]
print result
what you need is a map to relevance 'a' -> 5

Categories

Resources