Unravelling a DataFrame - python

I need to transform a df into antoher, being the original (df1) like this:
value
A--A 4
A--B 2
A--C 1
B--B 2
C--C 3
D--B 2
E--E 6
Then I have this other df2, filled with 0:
A B C D E
A 0 0 0 0 0
B 0 0 0 0 0
C 0 0 0 0 0
D 0 0 0 0 0
E 0 0 0 0 0
F 0 0 0 0 0
G 0 0 0 0 0
I need to convert it to a final df3, getting the values from the pairs in the index from df1, separted by "--", and fill it like this:
A B C D E
A 4 2 1 0 0
B 2 2 0 2 0
C 1 0 3 0 0
D 0 2 0 0 0
E 0 0 0 0 6
F 0 0 0 0 0
G 0 0 0 0 0
There can be pairs in pd2 not existing in pd1. It that case it remains with 0. Any suggestions??

You can create this from df itself. First, set df.index to a MultiIndex using str.split, and then unstack and reindex.
df.index = pd.MultiIndex.from_arrays(zip(*df.index.str.split('--')))
(df['value'].unstack()
.reindex(index=df2.index, columns=df2.columns)
.fillna(0, downcast='infer'))
A B C D E
A 4 2 1 0 0
B 0 2 0 0 0
C 0 0 3 0 0
D 0 2 0 0 0
E 0 0 0 0 6
F 0 0 0 0 0
G 0 0 0 0 0
If you know what rows and columns you want to use, you don't even need df2.
(df['value'].unstack()
.reindex(index=list('ABCDEFG'), columns=list('ABCDE'))
.fillna(0, downcast='infer'))
A B C D E
A 4 2 1 0 0
B 0 2 0 0 0
C 0 0 3 0 0
D 0 2 0 0 0
E 0 0 0 0 6
F 0 0 0 0 0
G 0 0 0 0 0
As per OP's comment, to maintain symmetricity, use pivot your table so NaNs are preserved, then fillna with the transpose:
v = (df['value'].unstack()
.reindex(index=df2.index, columns=df2.columns))
v.fillna(v.T.reindex_like(v)).fillna(0, downcast='infer')
A B C D E
A 4 2 1 0 0
B 2 2 0 2 0
C 1 0 3 0 0
D 0 2 0 0 0
E 0 0 0 0 6
F 0 0 0 0 0
G 0 0 0 0 0

Related

is it possible to do the boolean in row by row in pandas?

I would like to 'OR' between row and row+1
for example,
A B C D E F G
r0 0 1 1 0 0 1 0
r1 0 0 0 0 0 0 0
r2 0 0 1 0 1 0 1
and the expected output will be like this
result 0 1 1 0 1 1
I know only how to sum it.
df.loc['result'] = df.sum()
but in this case i would like to do OR
thank you in advance
You can apply any over the first axis.
>>> df
>>>
A B C D E F G
r0 0 1 1 0 0 1 0
r1 0 0 0 0 0 0 0
r2 0 0 1 0 1 0 1
>>>
>>> df.loc['result'] = df.any(axis=0).astype(int)
>>> df
>>>
A B C D E F G
r0 0 1 1 0 0 1 0
r1 0 0 0 0 0 0 0
r2 0 0 1 0 1 0 1
result 0 1 1 0 1 1 1
... assuming that in your output you forgot the last column.

How to apply ffill to 1?

I have a dataframe like below,
A B C D
0 1 0 0 0
1 0 1 0 0
2 0 1 0 0
3 0 0 1 0
I want to convert this into like this,
A B C D
0 1 0 0 0
1 1 1 0 0
2 1 1 0 0
3 1 1 1 0
so far I tried,
df= df.replace('0',np.NaN)
df=df.fillna(method='ffill').fillna('0')
my above code works fine,
But I think there is some other better approach to solve this problem,
Use cumsum with data converted to numeric and then replace by DataFrame.mask:
df = df.mask(df.astype(int).cumsum() >= 1, '1')
print (df)
A B C D
0 1 0 0 0
1 1 1 0 0
2 1 1 0 0
3 1 1 1 0
Detail:
print (df.astype(int).cumsum())
A B C D
0 1 0 0 0
1 1 1 0 0
2 1 2 0 0
3 1 2 1 0
Or same principe in numpy with numpy.where:
arr = df.values.astype(int)
df = pd.DataFrame(np.where(np.cumsum(arr, axis=0) >= 1, '1', '0'),
index=df.index,
columns= df.columns)
print (df)
A B C D
0 1 0 0 0
1 1 1 0 0
2 1 1 0 0
3 1 1 1 0

Python pandas: add new columns based on the existed a column value, and set the value of new columns as 1 or 0

I have a dataframe named df as following:
ticker class_n
1 a
2 b
3 c
4 d
5 e
6 f
7 a
8 b
............................
I want to add new columns to this dataframe, the new columns names is the value of unique category of class_n(I mean no repeat class_n). Further, the value of new columns is 1 (if the value of class_n is same with column name), other is 0.
for example as the following dataframe. I want to get the new dataframe as following:
ticer class_n a b c d e f
1 a 1 0 0 0 0 0
2 b 0 1 0 0 0 0
3 c 0 0 1 0 0 0
4 d 0 0 0 1 0 0
5 e 0 0 0 0 1 0
6 f 0 0 0 0 0 1
7 a 1 0 0 0 0 0
8 b 0 1 0 0 0 0
My code is following:
lst_class = list(set(list(df['class_n'])))
for cla in lst_class:
df[c] = 0
df.loc[df['class_n'] is cla, cla] =1
but there is error:
KeyError: 'cannot use a single bool to index into setitem'
Thanks!
Use pd.get_dummies
df.join(pd.get_dummies(df.class_n))
ticker class_n a b c d e f
0 1 a 1 0 0 0 0 0
1 2 b 0 1 0 0 0 0
2 3 c 0 0 1 0 0 0
3 4 d 0 0 0 1 0 0
4 5 e 0 0 0 0 1 0
5 6 f 0 0 0 0 0 1
6 7 a 1 0 0 0 0 0
7 8 b 0 1 0 0 0 0
Or the same thing but a little more manually
f, u = pd.factorize(df.class_n.values)
d = pd.DataFrame(np.eye(u.size, dtype=int)[f], df.index, u)
df.join(d)
ticker class_n a b c d e f
0 1 a 1 0 0 0 0 0
1 2 b 0 1 0 0 0 0
2 3 c 0 0 1 0 0 0
3 4 d 0 0 0 1 0 0
4 5 e 0 0 0 0 1 0
5 6 f 0 0 0 0 0 1
6 7 a 1 0 0 0 0 0
7 8 b 0 1 0 0 0 0

Pandas DataFrame with levels of graph nodes and edges to square matrix

My Googlefu has failed me!
I have a Pandas DataFrame of the form:
Level 1 Level 2 Level 3 Level 4
-------------------------------------
A B C NaN
A B D E
A B D F
G H NaN NaN
G I J K
It basically contains nodes of a graph with the levels depicting an outgoing edge from a level of lower order to a level of a higher order. I want to convert the DataFrame/create a new DataFrame of the form:
A B C D E F G H I J K
---------------------------------------------
A | 0 1 0 0 0 0 0 0 0 0 0
B | 0 0 1 1 0 0 0 0 0 0 0
C | 0 0 0 0 0 0 0 0 0 0 0
D | 0 0 0 0 1 1 0 0 0 0 0
E | 0 0 0 0 0 0 0 0 0 0 0
F | 0 0 0 0 0 0 0 0 0 0 0
G | 0 0 0 0 0 0 0 1 1 0 0
H | 0 0 0 0 0 0 0 0 0 0 0
I | 0 0 0 0 0 0 0 0 0 1 0
J | 0 0 0 0 0 0 0 0 0 0 1
K | 0 0 0 0 0 0 0 0 0 0 0
A cell containing 1 depicts an outgoing edge from the corresponding row to the corresponding column. Is there a Pythonic way to achieve this without loops and conditions in Pandas?
Try this code:
df = pd.DataFrame({'level_1':['A', 'A', 'A', 'G', 'G'], 'level_2':['B', 'B', 'B', 'H', 'I'],
'level_3':['C', 'D', 'D', np.nan, 'J'], 'level_4':[np.nan, 'E', 'F', np.nan, 'K']})
Your input dataframe is:
level_1 level_2 level_3 level_4
0 A B C NaN
1 A B D E
2 A B D F
3 G H NaN NaN
4 G I J K
And the solution is:
# Get unique values from input dataframe and filter out 'nan' values
list_nodes = []
for i_col in df.columns.tolist():
list_nodes.extend(filter(lambda v: v==v, df[i_col].unique().tolist()))
# Initialize your result dataframe
df_res = pd.DataFrame(columns=sorted(list_nodes), index=sorted(list_nodes))
df_res = df_res.fillna(0)
# Get 'index-column' pairs from input dataframe ('nan's are exluded)
list_indexes = []
for i_col in range(df.shape[1]-1):
list_indexes.extend(list(set([tuple(i) for i in df.iloc[:, i_col:i_col+2]\
.dropna(axis=0).values.tolist()])))
# Use 'index-column' pairs to fill the result dataframe
for i_list_indexes in list_indexes:
df_res.set_value(i_list_indexes[0], i_list_indexes[1], 1)
And the final result is:
A B C D E F G H I J K
A 0 1 0 0 0 0 0 0 0 0 0
B 0 0 1 1 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 1 1 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 1 1 0 0
H 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 0 0 0 0 0 1 0
J 0 0 0 0 0 0 0 0 0 0 1
K 0 0 0 0 0 0 0 0 0 0 0

pandas - pivot table to square matrix

I have this simple dataframe in a data.csv file:
I,C,v
a,b,1
b,a,2
e,a,1
e,c,0
b,d,1
a,e,1
b,f,0
I would like to pivot it, and then return a square table (as a matrix). So far I've read the dataframe and build a pivot table with:
df = pd.read_csv('data.csv')
d = pd.pivot_table(df,index='I',columns='C',values='v')
d.fillna(0,inplace=True)
correctly obtaining:
C a b c d e f
I
a 0 1 0 0 1 0
b 2 0 0 1 0 0
e 1 0 0 0 0 0
Now I would like to return a square table with the missing columns indices in the rows, so that the resulting table would be:
C a b c d e f
I
a 0 1 0 0 1 0
b 2 0 0 1 0 0
c 0 0 0 0 0 0
d 0 0 0 0 0 0
e 1 0 0 0 0 0
f 0 0 0 0 0 0
reindex can add rows and columns, and fill missing values with 0:
index = d.index.union(d.columns)
d = d.reindex(index=index, columns=index, fill_value=0)
yields
a b c d e f
a 0 1 0 0 1 0
b 2 0 0 1 0 0
c 0 0 0 0 0 0
d 0 0 0 0 0 0
e 1 0 0 0 0 0
f 0 0 0 0 0 0

Categories

Resources