pretty nested dictionary as a table - python

Is there any way to pretty print in a table format a nested dictionary? My data structure looks like this;
data = {'01/09/16': {'In': ['Jack'], 'Out': ['Lisa', 'Tom', 'Roger', 'Max', 'Harry', 'Same', 'Joseph', 'Luke', 'Mohammad', 'Sammy']},
'02/09/16': {'In': ['Jack', 'Lisa', 'Rache', 'Allan'], 'Out': ['Lisa', 'Tom']},
'03/09/16': {'In': ['James', 'Jack', 'Nowel', 'Harry', 'Timmy'], 'Out': ['Lisa', 'Tom
And I'm trying to print it out something like this (the names are kept in one line). Note that the names are listed below one another:
+----------------------------------+-------------+-------------+-------------+
| Status | 01/09/16 | 02/09/16 | 03/09/16 |
+----------------------------------+-------------+-------------+-------------+
| In | Jack Tom Tom
| Lisa | Jack |
+----------------------------------+-------------+-------------+-------------+
| Out | Lisa
Tom | Jack | Lisa |
+----------------------------------+-------------+-------------+-------------+
I've tried using pandas with this code;
pd.set_option('display.max_colwidth', -1)
df = pd.DataFrame(role_assignment)
df.fillna('None', inplace=True)
print df
But the problem above is that pandas prints it like this (The names are printed in a single line and it doesn't look good, especially if there's a lot of names);
01/09/16 \
In [Jack]
Out [Lisa, Tom, Roger, Max, Harry, Same, Joseph, Luke, Mohammad, Sammy]
02/09/16 03/09/16
In [Jack, Lisa, Rache, Allan] [James, Jack, Nowel, Harry, Timmy]
Out [Lisa, Tom] [Lisa, Tom]
I prefer this but names listed below one another;
01/09/16 02/09/16 03/09/16
In [Jack] [Jack] [James]
Out [Lisa] [Lisa] [Lisa]
Is there a way to print it neater using pandas or another tool?

This is nonsense hackery and only for display purposes only.
data = {
'01/09/16': {
'In': ['Jack'],
'Out': ['Lisa', 'Tom', 'Roger',
'Max', 'Harry', 'Same',
'Joseph', 'Luke', 'Mohammad', 'Sammy']
},
'02/09/16': {
'In': ['Jack', 'Lisa', 'Rache', 'Allan'],
'Out': ['Lisa', 'Tom']
},
'03/09/16': {
'In': ['James', 'Jack', 'Nowel', 'Harry', 'Timmy'],
'Out': ['Lisa', 'Tom']
}
}
df = pd.DataFrame(data)
d1 = df.stack().apply(pd.Series).stack().unstack(1).fillna('')
d1.index.set_levels([''] * len(d1.index.levels[1]), level=1, inplace=True)
print(d1)
01/09/16 02/09/16 03/09/16
In Jack Jack James
Lisa Jack
Rache Nowel
Allan Harry
Timmy
Out Lisa Lisa Lisa
Tom Tom Tom
Roger
Max
Harry
Same
Joseph
Luke
Mohammad
Sammy

Related

Filling Null Values of a Pandas Column with a List

I have data frame df that can be re-created with the code below:
df1 = pd.DataFrame({'name': ['jim', 'john', 'joe', 'jack', 'jake']})
df2 = pd.DataFrame({'name': ['jim', 'john', 'jack'],
'listings': [['orlando', 'los angeles', 'houston'],
['buffalo', 'boston', 'dallas', 'none'],
['phoenix', 'montreal', 'seattle', 'none']]})
df = pd.merge(df1, df2, on = 'name', how = 'left')
print(df)
name listings
0 jim [orlando, los angeles, houston, detroit]
1 john [buffalo, boston, dallas, none]
2 joe NaN
3 jack [phoenix, montreal, seattle, none]
4 jake NaN
I want to fill the NaN values in the listings column with a list of none repeated the length of the the lists in the listings column, ['none']*4, so that the resulting dataframe looks like below:
print(df)
name listings
0 jim [orlando, los angeles, houston, detroit]
1 john [buffalo, boston, dallas, none]
2 joe [none, none, none, none]
3 jack [phoenix, montreal, seattle, none]
4 jake [none, none, none, none]
I've tried both approaches below, and neither are working:
# Failed Approach 1
df['listings'] = np.where(df['listings'].isnull(), ['none']*4, df['listings'])
# Failed Approach 2
df['listings'].fillna(['none']*4)
you can do:
df.loc[df['listings'].isna(),'listings'] = [['none']*4]
name listings
0 jim [orlando, los angeles, houston]
1 john [buffalo, boston, dallas, none]
2 joe [none, none, none, none]
3 jack [phoenix, montreal, seattle, none]
4 jake [none, none, none, none]

How to extract row with mixed value

I have to extract rows from a pandas dataframe with values in 'Date of birth' column which occur in a list with dates.
import pandas as pd
df = pd.DataFrame({'Name': ['Jack', 'Mary', 'David', 'Bruce', 'Nick', 'Mark', 'Carl', 'Sofie'],
'Date of birth': ['1973', '1999', '1995', '1992/1991', '2000', '1969', '1994', '1989/1990']})
dates = ['1973', '1992', '1969', '1989']
new_df = df.loc[df['Date of birth'].isin(dates)]
print(new_df)
0 Jack 1973
1 Mary 1999
2 David 1995
3 Bruce 1992/1991
4 Nick 2000
5 Mark 1969
6 Carl 1994
7 Sofie 1989/1990
Eventually I get the table below. As you can see, Bruce's and Sofie's rows are absent since the value is followed by / and another value. How should I split up these two filter them out?
Name Date of birth
0 Jack 1973
5 Mark 1969
You could use str.contains:
import pandas as pd
df = pd.DataFrame({'Name': ['Jack', 'Mary', 'David', 'Bruce', 'Nick', 'Mark', 'Carl', 'Sofie'],
'Date of birth': ['1973', '1999', '1995', '1992/1991', '2000', '1969', '1994', '1989/1990']})
dates = ['1973', '1992', '1969', '1989']
new_df = df.loc[df['Date of birth'].str.contains(rf"\b{'|'.join(dates)}\b")]
print(new_df)
Output
Name Date of birth
0 Jack 1973
3 Bruce 1992/1991
5 Mark 1969
7 Sofie 1989/1990
The string rf"\b{'|'.join(dates)}\b" is a regex pattern that will match any of string that contains any of the dates.
I like #DaniMesejo way better but here is a way splitting up the values and stacking:
df[df['Date of birth'].str.split('/', expand=True).stack().isin(dates).max(level=0)]
Output:
Name Date of birth
0 Jack 1973
3 Bruce 1992/1991
5 Mark 1969
7 Sofie 1989/1990

Creating a new dataframe from applying a function on all cell of a dataframe

I have a data frame, df, like this:
data = {'A': ['Jason (121439)', 'Molly (194439)', 'Tina (114439)', 'Jake (127859)', 'Amy (122579)'],
'B': ['Bob (127439)', 'Mark (136489)', 'Tyler (121443)', 'John (126259)', 'Anna(174439)'],
'C': ['Jay (121596)', 'Ben (12589)', 'Toom (123586)', 'Josh (174859)', 'Al(121659)'],
'D': ['Paul (123839)', 'Aaron (124159)', 'Steve (161899)', 'Vince (179839)', 'Ron (128379)']}
df = pd.DataFrame(data)
And I want to create a new data frame with one column with the name and the other column with the number between parenthesis, which would look like this:
data2 = {'Name': ['Jason ', 'Molly ', 'Tina ', 'Jake ', 'Amy '],
'ID#': ['121439', '194439', '114439', '127859', '122579']}
result = pd.DataFrame(data2)
I try different things, but it all did not work:
1)
List_name=pd.DataFrame()
List_id=pd.DataFrame()
List_both=pd.DataFrame(columns=["Name","ID"])
for i in df.columns:
left=df[i].str.split("(",1).str[0]
right=df[i].str.split("(",1).str[1]
List_name=List_name.append(left)
List_id=List_id.append(right)
List_both=pd.concat([List_name,List_id], axis=1)
List_both
2) applying a function on all cell
Names = lambda x: x.str.split("(",1).str[0]
IDS = Names = lambda x: x.str.split("(",1).str[1]
But I was wondering how to do it in order to store it in a data frame that will look like result...
You can use stack followed by str.extract.
(df.stack()
.str.strip()
.str.extract(r'(?P<Name>.*?)\s*\((?P<ID>.*?)\)$')
.reset_index(drop=True))
Name ID
0 Jason 121439
1 Bob 127439
2 Jay 121596
3 Paul 123839
4 Molly 194439
5 Mark 136489
6 Ben 12589
7 Aaron 124159
8 Tina 114439
9 Tyler 121443
10 Toom 123586
11 Steve 161899
12 Jake 127859
13 John 126259
14 Josh 174859
15 Vince 179839
16 Amy 122579
17 Anna 174439
18 Al 121659
19 Ron 128379

List of dict of dict in Pandas

I have list of dict of dicts in the following form:
[{0:{'city':'newyork', 'name':'John', 'age':'30'}},
{0:{'city':'newyork', 'name':'John', 'age':'30'}},]
I want to create pandas DataFrame in the following form:
city name age
newyork John 30
newyork John 30
Tried a lot but without any success
can you help me?
Use list comprehension with concat and DataFrame.from_dict:
L = [{0:{'city':'newyork', 'name':'John', 'age':'30'}},
{0:{'city':'newyork', 'name':'John', 'age':'30'}}]
df = pd.concat([pd.DataFrame.from_dict(x, orient='index') for x in L])
print (df)
name age city
0 John 30 newyork
0 John 30 newyork
Solution with multiple keys with new column id should be:
L = [{0:{'city':'newyork', 'name':'John', 'age':'30'},
1:{'city':'newyork1', 'name':'John1', 'age':'40'}},
{0:{'city':'newyork', 'name':'John', 'age':'30'}}]
L1 = [dict(v, id=k) for x in L for k, v in x.items()]
print (L1)
[{'name': 'John', 'age': '30', 'city': 'newyork', 'id': 0},
{'name': 'John1', 'age': '40', 'city': 'newyork1', 'id': 1},
{'name': 'John', 'age': '30', 'city': 'newyork', 'id': 0}]
df = pd.DataFrame(L1)
print (df)
age city id name
0 30 newyork 0 John
1 40 newyork1 1 John1
2 30 newyork 0 John
from pandas import DataFrame
ldata = [{0: {'city': 'newyork', 'name': 'John', 'age': '30'}},
{0: {'city': 'newyork', 'name': 'John', 'age': '30'}}, ]
# 根据上面的ldata创建一个Dataframe
df = DataFrame(d[0] for d in ldata)
print(df)
"""
The answer is:
age city name
0 30 newyork John
1 30 newyork John
"""
import pandas as pd
d = [{0:{'city':'newyork', 'name':'John', 'age':'30'}},{0:{'city':'newyork', 'name':'John', 'age':'30'}},]
df = pd.DataFrame([list(i.values())[0] for i in d])
print(df)
Output:
age city name
0 30 newyork John
1 30 newyork John
You can use:
In [41]: df = pd.DataFrame(next(iter(e.values())) for e in l)
In [42]: df
Out[42]:
age city name
0 30 newyork John
1 30 newyork John
Came to new solution. Not as straightforward as posted here but works properly
L = [{0:{'city':'newyork', 'name':'John', 'age':'30'}},
{0:{'city':'newyork', 'name':'John', 'age':'30'}}]
df = [L[i][0] for i in range(len(L))]
df = pd.DataFrame.from_records(df)

Python - How to convert an array of json objects to a Dataframe?

I'm completely new to python. And I need a little help to be able to filter my JSON.
json = {
"selection":[
{
"person_id":105894,
"position_id":1,
"label":"Work",
"description":"A description",
"startDate":"2017-07-16T19:20:30+01:00",
"stopDate":"2017-07-16T20:20:30+01:00"
},
{
"person_id":945123,
"position_id":null,
"label":"Illness",
"description":"A description",
"startDate":"2017-07-17T19:20:30+01:00",
"stopDate":"2017-07-17T20:20:30+01:00"
}
]
}
Concretely what I'm trying to do is to transform my JSON (here above) into a Dataframe to be able to use the query methods on it, like:
selected_person_id = 105894
query_person_id = json[(json['person_id'] == selected_person_id)]
or
json.query('person_id <= 105894')
The columns must be:
cols = ['person_id', 'position_id', 'label', 'description', 'startDate', 'stopDate']
How can I do it ?
Use:
df = pd.DataFrame(json['selection'])
print (df)
description label person_id position_id startDate \
0 A description Work 105894 1.0 2017-07-16T19:20:30+01:00
1 A description Illness 945123 NaN 2017-07-17T19:20:30+01:00
stopDate
0 2017-07-16T20:20:30+01:00
1 2017-07-17T20:20:30+01:00
EDIT:
import json
with open('file.json') as data_file:
json = json.load(data_file)
for more complicated examples where a flattening of the structure is neeeded use json_normalize:
>>> data = [{'state': 'Florida',
... 'shortname': 'FL',
... 'info': {
... 'governor': 'Rick Scott'
... },
... 'counties': [{'name': 'Dade', 'population': 12345},
... {'name': 'Broward', 'population': 40000},
... {'name': 'Palm Beach', 'population': 60000}]},
... {'state': 'Ohio',
... 'shortname': 'OH',
... 'info': {
... 'governor': 'John Kasich'
... },
... 'counties': [{'name': 'Summit', 'population': 1234},
... {'name': 'Cuyahoga', 'population': 1337}]}]
>>> from pandas.io.json import json_normalize
>>> result = json_normalize(data, 'counties', ['state', 'shortname',
... ['info', 'governor']])
>>> result
name population info.governor state shortname
0 Dade 12345 Rick Scott Florida FL
1 Broward 40000 Rick Scott Florida FL
2 Palm Beach 60000 Rick Scott Florida FL
3 Summit 1234 John Kasich Ohio OH
4 Cuyahoga 1337 John Kasich Ohio OH

Categories

Resources