Pandas data frame to dictionary of lists - python

How to use Python or Pandas (preferably) to convert a Pandas DataFrame to dictionary of lists for input into highcharts?
The closest I got was:
df.T.to_json('bar.json', orient='index')
But this is a dict of dicts instead of dict of lists.
My input:
import pandas
import numpy as np
df = pandas.DataFrame({
"date": ['2014-10-1', '2014-10-2', '2014-10-3', '2014-10-4', '2014-10-5'],
"time": [1, 2, 3, 4, 5],
"temp": np.random.random_integers(0, 10, 5),
"foo": np.random.random_integers(0, 10, 5)
})
df2 = df.set_index(['date'])
df2
Output:
time temp foo
date
2014-10-1 1 3 0
2014-10-2 2 8 7
2014-10-3 3 4 9
2014-10-4 4 4 8
2014-10-5 5 6 2
Desired Output: I am using this output in Highcharts, which requires it to be a dictionary of lists like so:
{'date': ['2014-10-1', '2014-10-2', '2014-10-3', '2014-10-4', '2014-10-5'],
'foo': [7, 2, 5, 5, 6],
'temp': [8, 6, 10, 10, 3],
'time': [1, 2, 3, 4, 5]}

In [199]: df2.reset_index().to_dict(orient='list')
Out[199]:
{'date': ['2014-10-1', '2014-10-2', '2014-10-3', '2014-10-4', '2014-10-5'],
'foo': [8, 1, 8, 8, 1],
'temp': [10, 10, 8, 3, 10],
'time': [1, 2, 3, 4, 5]}

to create a list of the dictionaries per line
post_data_list = []
for i in df2.index:
data_dict = {}
for column in df2.columns:
data_dict[column] = df2[column][i]
post_data_list.append(data_dict)

Related

Pandas new column based on condition on two other dataframe

df1 and df2 are of different sizes. Set the df1(row1, 'Z') value to df2(row2, 'C') value when df1(row1, 'A') is equal to df2(row2, 'B').
What is the recommended way to implement df1['Z'] = df2['C'] if df1['A']==df2['B']?
df1 = pd.DataFrame({'A': ['foo', 'bar', 'test'], 'b': [1, 2, 3], 'c': [3, 4, 5]})
df2 = pd.DataFrame({'B': ['foo', 'baz'], 'C': [3, 1]})
df1
A b c
0 foo 1 3
1 bar 2 4
2 test 3 5
df2
B C
0 foo 3
1 baz 1
After change
df1
A b c Z
0 foo 1 3 3
1 bar 2 4 NaN
2 test 3 5 NaN
What if there require multiple assignments following multiple conditions. Is iterating over rows recommended as shown below?
for i, row in df1.iterrows():
if <condition(s)>:
do assignment(s): df.at[i, 'hjk']=something
You can use numpy.where, passing the condition as df1.A equals df2.B, and for true boolean, take df2.C else take df1.Z:
np.where(df1.A.eq(df2.B), df2.C, df1.Z)
Assign above result to df1.Z
SAMPLE:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A': np.random.randint(5,10,20), 'Z': np.random.randint(5,10,20)})
df2 = pd.DataFrame({'C': np.random.randint(5,10,20), 'B': np.random.randint(5,10,20)})
>>>df1.Z.values
Out[41]: array([7, 6, 7, 7, 6, 8, 9, 7, 6, 6, 7, 6, 8, 7, 8, 8, 9, 6, 7, 7])
>>> np.where(df1.A.eq(df2.B), df2.C, df1.Z)
Out[42]: array([7, 6, 6, 7, 6, 8, 9, 7, 6, 9, 7, 8, 8, 7, 8, 8, 9, 6, 7, 7])
I would like to try map
df1['Z'] = df1['A'].map(dict(zip(df2['B'],df2['C'])))

Combining multiple colums and multiple rows as a single value in a dictionary

Given a Pandas df:
Name V1 V2
a 1 2
a 3 4
a 5 6
b 7 8
b 9 10
c 11 12
...
How to reform it into a complex dictionary of format:
{a: [(1,2), (3,4), (5,6)], b: [(7,8), (9,10)], c: [(11,12)], ...}
Please note that values of the same name also needs to be combined across rows; like "a" has three rows to be combined as one signel value array of number pairs.
Try:
df['tup'] = df[['V1','V2']].agg(tuple, axis=1)
df.groupby('Name')['tup'].agg(list).to_dict()
Output:
{'a': [(1, 2), (3, 4), (5, 6)], 'b': [(7, 8), (9, 10)], 'c': [(11, 12)]}
If you don't mind the results being list instead of tuple, you can also use groupby in a dict comprehension:
d = {group:items[["V1","V2"]].values.tolist() for group, items in df.groupby("Name")}
print (d)
{'a': [[1, 2], [3, 4], [5, 6]], 'b': [[7, 8], [9, 10]], 'c': [[11, 12]]}
Check this out, specific to columns
data_frame = {
"Name": ["a", "a", "a", "b", "b", "c"],
"V1": [1, 3, 5, 7, 9, 11],
"V2": [2, 4, 6, 8, 10, 12]
}
df = pd.DataFrame(data_frame, columns=['Name', 'V1', 'V2'])
data_dict = {}
for i, row in df.iterrows():
data_dict[row["Name"]] = [row['V1'], row['V2']]
print(data_dict)
Output be like
{'a': [5, 6], 'b': [9, 10], 'c': [11, 12]}
Assuming the DataFrame variable be data_frame
print(data_frame)
Name V1 V2
a 1 2
a 3 4
a 5 6
b 7 8
b 9 10
c 11 12
data_dict = {}
for data in data_frame.values:
print(data)
data_dict[data[0]] = [j for j in data[1:]]
print(data_dict)
Also, there are some methods on the data frame object like to_dict() variants. You can use to_dict('records') also and manipulate accordingly.
Referece: Data Frame Dict

how to limit the duplicate to 5 in pandas data frames?

col1= ['A','B','A','C','A','B','A','C','A','C','A','A','A']
col2= [1,1,4,2,4,5,6,3,1,5,2,1,1]
df = pd.DataFrame({'col1':col1, 'col2':col2})
for A we have [1,4,4,6,1,2,1,1], 8 items but i want to limit the size to 5 while converting Data frame to dict/list
Output:
Dict = {'A':[1,4,4,6,1],'B':[1,5],'C':[2,3,5]}
Use pandas.DataFrame.groupby with apply:
df.groupby('col1')['col2'].apply(lambda x:list(x.head(5))).to_dict()
Output:
{'A': [1, 4, 4, 6, 1], 'B': [1, 5], 'C': [2, 3, 5]}
Use DataFrame.groupby with lambda function, convert to list and filter first 5 values by indexing, last convert to dictionary by Series.to_dict:
d = df.groupby('col1')['col2'].apply(lambda x: x.tolist()[:5]).to_dict()
print (d)
{'A': [1, 4, 4, 6, 1], 'B': [1, 5], 'C': [2, 3, 5]}

picking values from columns [duplicate]

This question already has answers here:
Vectorized lookup on a pandas dataframe
(3 answers)
Closed 3 years ago.
I have a pandas DataFrame with values in a number of columns, make it two for simplicity, and a column of column names I want to use to pick values from the other columns:
import pandas as pd
import numpy as np
np.random.seed(1337)
df = pd.DataFrame(
{"a": np.arange(10), "b": 10 - np.arange(10), "c": np.random.choice(["a", "b"], 10)}
)
which gives
> df['c']
0 b
1 b
2 a
3 a
4 b
5 b
6 b
7 a
8 a
9 a
Name: c, dtype: object
That is, I want the first and second elements to be picked from column b, the third from a and so on.
This works:
def pick_vals_from_cols(df, col_selector):
condlist = np.row_stack(col_selector.map(lambda x: x == df.columns))
values = np.select(condlist.transpose(), df.values.transpose())
return values
> pick_vals_from_cols(df, df["c"])
array([10, 9, 2, 3, 6, 5, 4, 7, 8, 9], dtype=object)
But it just feels so fragile and clunky. Is there a better way to do this?
lookup
df.lookup(df.index, df.c)
array([10, 9, 2, 3, 6, 5, 4, 7, 8, 9])
Comprehension
But why when you have lookup?
[df.at[t] for t in df.c.items()]
[10, 9, 2, 3, 6, 5, 4, 7, 8, 9]
Bonus Hack
Not intended for actual use
[*map(df.at.__getitem__, zip(df.index, df.c))]
[10, 9, 2, 3, 6, 5, 4, 7, 8, 9]
Because df.get_value is deprecated
[*map(df.get_value, df.index, df.c)]
FutureWarning: get_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead
[10, 9, 2, 3, 6, 5, 4, 7, 8, 9]

Remove an element from list of dictionaries with Pandas element

Considering "b" defined below as a list of dictionaries. How can I remove element 6 from the 'index' in second element of b (b[1]['index'][6]) and save the new list to b?
import pandas as pd
import numpy as np
a = pd.DataFrame(np.random.randn(10))
b = [{'color':'red','index':a.index},{'color':'blue','index':a.index}]
output:
[{'color': 'red', 'index': Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')}, {'color': 'blue', 'index': Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')}]
I tried np.delete and .pop or .del for lists (no success), but I do not know what is the best way to do it?
I think this will work for you
import pandas as pd
import numpy as np
a = pd.DataFrame(np.random.randn(10))
print a
b = [{'color':'red','index':a.index},{'color':'blue','index':a.index}]
d = b[1]['index']
b[1]['index'] = d.delete(6)
print b[1]['index']
Int64Index([0, 1, 2, 3, 4, 5, 7, 8, 9], dtype='int64')

Categories

Resources