I have list of dict of dicts in the following form:
[{0:{'city':'newyork', 'name':'John', 'age':'30'}},
{0:{'city':'newyork', 'name':'John', 'age':'30'}},]
I want to create pandas DataFrame in the following form:
city name age
newyork John 30
newyork John 30
Tried a lot but without any success
can you help me?
Use list comprehension with concat and DataFrame.from_dict:
L = [{0:{'city':'newyork', 'name':'John', 'age':'30'}},
{0:{'city':'newyork', 'name':'John', 'age':'30'}}]
df = pd.concat([pd.DataFrame.from_dict(x, orient='index') for x in L])
print (df)
name age city
0 John 30 newyork
0 John 30 newyork
Solution with multiple keys with new column id should be:
L = [{0:{'city':'newyork', 'name':'John', 'age':'30'},
1:{'city':'newyork1', 'name':'John1', 'age':'40'}},
{0:{'city':'newyork', 'name':'John', 'age':'30'}}]
L1 = [dict(v, id=k) for x in L for k, v in x.items()]
print (L1)
[{'name': 'John', 'age': '30', 'city': 'newyork', 'id': 0},
{'name': 'John1', 'age': '40', 'city': 'newyork1', 'id': 1},
{'name': 'John', 'age': '30', 'city': 'newyork', 'id': 0}]
df = pd.DataFrame(L1)
print (df)
age city id name
0 30 newyork 0 John
1 40 newyork1 1 John1
2 30 newyork 0 John
from pandas import DataFrame
ldata = [{0: {'city': 'newyork', 'name': 'John', 'age': '30'}},
{0: {'city': 'newyork', 'name': 'John', 'age': '30'}}, ]
# 根据上面的ldata创建一个Dataframe
df = DataFrame(d[0] for d in ldata)
print(df)
"""
The answer is:
age city name
0 30 newyork John
1 30 newyork John
"""
import pandas as pd
d = [{0:{'city':'newyork', 'name':'John', 'age':'30'}},{0:{'city':'newyork', 'name':'John', 'age':'30'}},]
df = pd.DataFrame([list(i.values())[0] for i in d])
print(df)
Output:
age city name
0 30 newyork John
1 30 newyork John
You can use:
In [41]: df = pd.DataFrame(next(iter(e.values())) for e in l)
In [42]: df
Out[42]:
age city name
0 30 newyork John
1 30 newyork John
Came to new solution. Not as straightforward as posted here but works properly
L = [{0:{'city':'newyork', 'name':'John', 'age':'30'}},
{0:{'city':'newyork', 'name':'John', 'age':'30'}}]
df = [L[i][0] for i in range(len(L))]
df = pd.DataFrame.from_records(df)
Related
I have a dataframe as following:
df1 = pd.DataFrame({'id': ['1a', '2b', '3c'], 'name': ['Anna', 'Peter', 'John'], 'year': [1999, 2001, 1993]})
I want to create new data by randomly re-arranging values in each column but for column id I also need to add a random letter at the end of the values, then add the new data to existing df1 as following:
df1 = pd.DataFrame({'id': ['1a', '2b', '3c', '2by', '1ao', '1az', '3cc'], 'name': ['Anna', 'Peter', 'John', 'John', 'Peter', 'Anna', 'Anna'], 'year': [1999, 2001, 1993, 1999, 1999, 2001, 2001]})
Could anyone help me, please? Thank you very much.
Use DataFrame.sample and add random letter by numpy.random.choice:
import string
N = 5
df2 = (df1.sample(n=N, replace=True)
.assign(id =lambda x:x['id']+np.random.choice(list(string.ascii_letters),size=N)))
df1 = df1.append(df2, ignore_index=True)
print (df1)
id name year
0 1a Anna 1999
1 2b Peter 2001
2 3c John 1993
3 1aY Anna 1999
4 3cp John 1993
5 3cE John 1993
6 2bz Peter 2001
7 3cu John 1993
I have to extract rows from a pandas dataframe with values in 'Date of birth' column which occur in a list with dates.
import pandas as pd
df = pd.DataFrame({'Name': ['Jack', 'Mary', 'David', 'Bruce', 'Nick', 'Mark', 'Carl', 'Sofie'],
'Date of birth': ['1973', '1999', '1995', '1992/1991', '2000', '1969', '1994', '1989/1990']})
dates = ['1973', '1992', '1969', '1989']
new_df = df.loc[df['Date of birth'].isin(dates)]
print(new_df)
0 Jack 1973
1 Mary 1999
2 David 1995
3 Bruce 1992/1991
4 Nick 2000
5 Mark 1969
6 Carl 1994
7 Sofie 1989/1990
Eventually I get the table below. As you can see, Bruce's and Sofie's rows are absent since the value is followed by / and another value. How should I split up these two filter them out?
Name Date of birth
0 Jack 1973
5 Mark 1969
You could use str.contains:
import pandas as pd
df = pd.DataFrame({'Name': ['Jack', 'Mary', 'David', 'Bruce', 'Nick', 'Mark', 'Carl', 'Sofie'],
'Date of birth': ['1973', '1999', '1995', '1992/1991', '2000', '1969', '1994', '1989/1990']})
dates = ['1973', '1992', '1969', '1989']
new_df = df.loc[df['Date of birth'].str.contains(rf"\b{'|'.join(dates)}\b")]
print(new_df)
Output
Name Date of birth
0 Jack 1973
3 Bruce 1992/1991
5 Mark 1969
7 Sofie 1989/1990
The string rf"\b{'|'.join(dates)}\b" is a regex pattern that will match any of string that contains any of the dates.
I like #DaniMesejo way better but here is a way splitting up the values and stacking:
df[df['Date of birth'].str.split('/', expand=True).stack().isin(dates).max(level=0)]
Output:
Name Date of birth
0 Jack 1973
3 Bruce 1992/1991
5 Mark 1969
7 Sofie 1989/1990
I would like to flatten a dataframe that is inside the dataframe. In this example, the column account has a dataframe as value. I would like to flatten this into a single dataframe.
Example: (Updated)
import panda as pd
account1 = pd.DataFrame([{'nr': '123', 'balance': 56}, {'nr': '230', 'balance': 55}])
account2 = pd.DataFrame([{'nr': '456', 'balance': 575}])
account3 = pd.DataFrame([{'nr': '350', 'balance': 59}])
df = pd.DataFrame([{'id': 1, 'age': 23, 'name': 'anna', 'account': account1},
{'id': 2, 'age': 71, 'name': 'mary', 'account': account2},
{'id': 3, 'age': 42, 'name': 'bob', 'account': account3}])
print(df)
gives the dataframe:
id age name account
0 1 23 anna nr balance
0 123 56
1 230 55
1 2 71 mary nr balance
0 456 575
2 3 42 bob nr balance
0 350 59
And I would like to get:
id name age account|nr|0 account|balance|0 account|nr|1 account|balance|1
0 1 anna 23 123 56 230 55
1 2 mary 71 456 575
2 3 bob 59 350 59
How can I flatten a dataframe inside a dataframe to a single dataframe? This type of structure is called Hierarchical DataFrame?
This is the solution that I have found.
list_accounts = []
for index_j, row_j in df.iterrows():
account = row_j["account"]
account = pd.DataFrame(account).stack().to_frame().T
account.columns = ['%s%s' % (a, '|%s' % b if b else '') for a, b in account.columns]
list_accounts.append(account)
df = pd.concat([df, pd.concat(list_accounts).reset_index(drop=True)], axis=1)
df.drop(columns="account", inplace=True)
import pandas as pd
stack = pd.DataFrame(['adam',25,28,'steve',25,28,'emily',18,21)
print(stack[0].to_list()[0::2])
print(stack[0].to_list()[1::2])
df = pd.DataFrame(
{'Name': stack[0].to_list()[0::3],
'Age': stack[0].to_list()[1::3],
'New Age': stack[0].to_list()[2::3] }
)
print(df)
It how do i separate adam and steve into a different row?
I want it to line up like the table below.
Table
You can get it as list and use slice [0::2] and [1::2]
import pandas as pd
data = pd.DataFrame(['adam',22,'steve',25,'emily',18])
print(data)
#print(data[0].to_list()[0::2])
#print(data[0].to_list()[1::2])
df = pd.DataFrame({
'Name': data[0].to_list()[0::2],
'Age': data[0].to_list()[1::2],
})
print(df)
Before (like on original image which was removed from question)
0
0 adam
1 22
2 steve
3 25
4 emily
5 18
After:
Name Age
0 adam 22
1 steve 25
2 emily 18
EDIT: image from original question
EDIT: BTW: the same with normal list
import pandas as pd
data = ['adam',22,'steve',25,'emily',18]
print(data)
df = pd.DataFrame({
'Name': data[0::2],
'Age': data[1::2],
})
print(df)
These two lines should do it. However, without knowing what code you have, what you're trying to accomplish, or what else you intend to do with it, the following code is only valid in this situation.
d = {'Name': ['adam', 'steve', 'emily'], 'Age': [22, 25, 18]}
df = pd.DataFrame(d)
Is there any way to pretty print in a table format a nested dictionary? My data structure looks like this;
data = {'01/09/16': {'In': ['Jack'], 'Out': ['Lisa', 'Tom', 'Roger', 'Max', 'Harry', 'Same', 'Joseph', 'Luke', 'Mohammad', 'Sammy']},
'02/09/16': {'In': ['Jack', 'Lisa', 'Rache', 'Allan'], 'Out': ['Lisa', 'Tom']},
'03/09/16': {'In': ['James', 'Jack', 'Nowel', 'Harry', 'Timmy'], 'Out': ['Lisa', 'Tom
And I'm trying to print it out something like this (the names are kept in one line). Note that the names are listed below one another:
+----------------------------------+-------------+-------------+-------------+
| Status | 01/09/16 | 02/09/16 | 03/09/16 |
+----------------------------------+-------------+-------------+-------------+
| In | Jack Tom Tom
| Lisa | Jack |
+----------------------------------+-------------+-------------+-------------+
| Out | Lisa
Tom | Jack | Lisa |
+----------------------------------+-------------+-------------+-------------+
I've tried using pandas with this code;
pd.set_option('display.max_colwidth', -1)
df = pd.DataFrame(role_assignment)
df.fillna('None', inplace=True)
print df
But the problem above is that pandas prints it like this (The names are printed in a single line and it doesn't look good, especially if there's a lot of names);
01/09/16 \
In [Jack]
Out [Lisa, Tom, Roger, Max, Harry, Same, Joseph, Luke, Mohammad, Sammy]
02/09/16 03/09/16
In [Jack, Lisa, Rache, Allan] [James, Jack, Nowel, Harry, Timmy]
Out [Lisa, Tom] [Lisa, Tom]
I prefer this but names listed below one another;
01/09/16 02/09/16 03/09/16
In [Jack] [Jack] [James]
Out [Lisa] [Lisa] [Lisa]
Is there a way to print it neater using pandas or another tool?
This is nonsense hackery and only for display purposes only.
data = {
'01/09/16': {
'In': ['Jack'],
'Out': ['Lisa', 'Tom', 'Roger',
'Max', 'Harry', 'Same',
'Joseph', 'Luke', 'Mohammad', 'Sammy']
},
'02/09/16': {
'In': ['Jack', 'Lisa', 'Rache', 'Allan'],
'Out': ['Lisa', 'Tom']
},
'03/09/16': {
'In': ['James', 'Jack', 'Nowel', 'Harry', 'Timmy'],
'Out': ['Lisa', 'Tom']
}
}
df = pd.DataFrame(data)
d1 = df.stack().apply(pd.Series).stack().unstack(1).fillna('')
d1.index.set_levels([''] * len(d1.index.levels[1]), level=1, inplace=True)
print(d1)
01/09/16 02/09/16 03/09/16
In Jack Jack James
Lisa Jack
Rache Nowel
Allan Harry
Timmy
Out Lisa Lisa Lisa
Tom Tom Tom
Roger
Max
Harry
Same
Joseph
Luke
Mohammad
Sammy