How to Align keys with columns python? - python

I want to align the keys in pandas to the columns they belong to. I have the code, and the output below, with an example of what I am trying to do.
Code:
df = pd.read_csv('Filename.txt')
df.columns = ['Date','b1','b2','b3']
df = df.set_index('Date')
reversed_df = df.iloc[::-1]
n=5
print('Game')
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
print(reversed_df.drop(df.index[n:-n]),("\n"))
BallOne = pd.get_dummies(reversed_df.b1)
BallTwo = pd.get_dummies(reversed_df.b2)
BallThree = pd.get_dummies(reversed_df.b3)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)
print(pd.concat([BallOne, BallTwo, BallThree], keys = ['D3E-B1', 'D3E-B2', 'D3E-B3'], axis=1),("\n"))
Output:
D3E-B1 D3E-B2 D3E-B3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
Date
1984-09-01 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
1984-09-03 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0
I would like the keys to be centered on their column like this:
D3E-B1 D3E-B2 D3E-B3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
Date
1984-09-01 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
1984-09-03 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0

from tabulate import tabulate
import pandas as pd
df = pd.DataFrame({'col_two' : [0.0001, 1e-005 , 1e-006, 1e-007],
'column_3' : ['ABCD', 'ABCD', 'long string', 'ABCD']})
print(tabulate(df, headers='keys', tablefmt='psql'))
+----+-----------+-------------+
| | col_two | column_3 |
|----+-----------+-------------|
| 0 | 0.0001 | ABCD |
| 1 | 1e-05 | ABCD |
| 2 | 1e-06 | long string |
| 3 | 1e-07 | ABCD |
+----+-----------+-------------+
from: Pretty Printing a pandas dataframe

Related

Trying to merge dictionaries together to create new df but dictionaries values arent showing up in df

image of jupter notebook issue
For my quarters instead of values for examples 1,0,0,0 showing up I get NaN.
How do I fix the code below so I return values in my dataframe
qrt_1 = {'q1':[1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]}
qrt_2 = {'q2':[0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0]}
qrt_3 = {'q3':[0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0]}
qrt_4 = {'q4':[0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1]}
year = {'year': [1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,8,8,9,9,9,9]}
value = data_1['Sales']
data = [year, qrt_1, qrt_2, qrt_3, qrt_4]
dataframes = []
for x in data:
dataframes.append(pd.DataFrame(x))
df = pd.concat(dataframes)
I am expecting a dataframe that returns the qrt_1, qrt_2 etc with their corresponding column names
Try to use axis=1 in pd.concat:
df = pd.concat(dataframes, axis=1)
print(df)
Prints:
year q1 q2 q3 q4
0 1 1 0 0 0
1 1 0 1 0 0
2 1 0 0 1 0
3 1 0 0 0 1
4 2 1 0 0 0
5 2 0 1 0 0
6 2 0 0 1 0
7 2 0 0 0 1
8 3 1 0 0 0
9 3 0 1 0 0
10 3 0 0 1 0
11 3 0 0 0 1
12 4 1 0 0 0
13 4 0 1 0 0
14 4 0 0 1 0
15 4 0 0 0 1
16 5 1 0 0 0
17 5 0 1 0 0
18 5 0 0 1 0
19 5 0 0 0 1
20 6 1 0 0 0
21 6 0 1 0 0
22 6 0 0 1 0
23 6 0 0 0 1
24 7 1 0 0 0
25 7 0 1 0 0
26 7 0 0 1 0
27 7 0 0 0 1
28 8 1 0 0 0
29 8 0 1 0 0
30 8 0 0 1 0
31 8 0 0 0 1
32 9 1 0 0 0
33 9 0 1 0 0
34 9 0 0 1 0
35 9 0 0 0 1

Creating week flags from DOW

I have a dataframe:
DOW
0 0
1 1
2 2
3 3
4 4
5 5
6 6
This corresponds to the dayof the week. Now I want to create this dataframe-
DOW MON_FLAG TUE_FLAG WED_FLAG THUR_FLAG FRI_FLAG SAT_FLAG
0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0
2 2 0 1 0 0 0 0
3 3 0 0 1 0 0 0
4 4 0 0 0 1 0 0
5 5 0 0 0 0 1 0
6 6 0 0 0 0 0 1
7 0 0 0 0 0 0 0
8 1 1 0 0 0 0 0
Depending on the DOW column for example its 1 then MON_FLAG will be 1 if its 2 then TUES_FLAG will be 1 and so on. I have kept Sunday as 0 that's why all the flag columns are zero in that case.
Use get_dummies with rename columns by dictionary:
d = {0:'SUN_FLAG',1:'MON_FLAG',2:'TUE_FLAG',
3:'WED_FLAG',4:'THUR_FLAG',5: 'FRI_FLAG',6:'SAT_FLAG'}
df = df.join(pd.get_dummies(df['DOW']).rename(columns=d))
print (df)
DOW SUN_FLAG MON_FLAG TUE_FLAG WED_FLAG THUR_FLAG FRI_FLAG SAT_FLAG
0 0 1 0 0 0 0 0 0
1 1 0 1 0 0 0 0 0
2 2 0 0 1 0 0 0 0
3 3 0 0 0 1 0 0 0
4 4 0 0 0 0 1 0 0
5 5 0 0 0 0 0 1 0
6 6 0 0 0 0 0 0 1
7 0 1 0 0 0 0 0 0
8 1 0 1 0 0 0 0 0

Transpose Pandas dataframe preserving the index

I have a problem while transposing a Pandas DataFrame that has the following structure:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
foo 0 4 0 0 0 0 0 0 0 0 14 1 0 1 0 0 0
bar 0 6 0 0 4 0 5 0 0 0 0 0 0 0 1 0 0
lorem 1 3 0 0 0 1 0 0 2 0 3 0 1 2 1 1 0
ipsum 1 2 0 1 0 0 1 0 0 0 0 0 4 0 6 0 0
dolor 1 2 4 0 1 0 0 0 0 0 2 0 0 1 0 0 2
..
With index:
foo,bar,lorem,ipsum,dolor,...
And this is basically a terms-documents matrix, where rows are terms and the headers (0-16) are document indexes.
Since my purpose is clustering documents and not terms, I want to transpose the dataframe and use this to perform a cosine-distance computation between documents themselves.
But when I transpose with:
pd.transpose()
I get:
foo bar ... pippo lorem
0 0 0 ... 0 0
1 4 6 ... 0 0
2 0 0 ... 0 0
3 0 0 ... 0 0
4 0 4 ... 0 0
..
16 0 2 ... 0 1
With index:
0 , 1 , 2 , 3 , ... , 15, 16
What I would like?
I'm looking for a way to make this operation preserving the dataframe index. Basically the first row of my new df should be the index.
Thank you
We can use a series of unstack
df2 = df.unstack().to_frame().unstack(1).droplevel(0,axis=1)
print(df2)
foo bar lorem ipsum dolor
0 0 0 1 1 1
1 4 6 3 2 2
2 0 0 0 0 4
3 0 0 0 1 0
4 0 4 0 0 1
5 0 0 1 0 0
6 0 5 0 1 0
7 0 0 0 0 0
8 0 0 2 0 0
9 0 0 0 0 0
10 14 0 3 0 2
11 1 0 0 0 0
12 0 0 1 4 0
13 1 0 2 0 1
14 0 1 1 6 0
15 0 0 1 0 0
16 0 0 0 0 2
Assuming data is square matrix (n x n) and if I understand the question correctly
df = pd.DataFrame([[0, 4,0], [0,6,0], [1,3,0]],
index =['foo', 'bar', 'lorem'],
columns=[0, 1, 2]
)
df_T = pd.DataFrame(df.values.T, index=df.index, columns=df.columns)

How to collapse/group columns in pandas

I have the data with column names as days up to 3000 columns with values 0/1, ex;
And would like to convert/group the columns as weekly (1-7 in week_1 & 8-14 in week_2), ex;
if the columns between 1-7 has at least 1 then week_1 should return 1 else 0.
Convert first column to index and then aggregate max by helper array created by integer division of 7 and added 1:
pd.options.display.max_columns = 30
np.random.seed(2020)
df = pd.DataFrame(np.random.choice([1,0], size=(5, 21), p=(0.1, 0.9)))
df.columns += 1
df.insert(0, 'id', 1000 + df.index)
print (df)
id 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 \
0 1000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1001 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1002 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
3 1003 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0
4 1004 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 21
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
df = df.set_index('id')
arr = np.arange(len(df.columns)) // 7 + 1
df = df.groupby(arr, axis=1).max().add_prefix('week_').reset_index()
print (df)
id week_1 week_2 week_3
0 1000 0 0 0
1 1001 1 0 0
2 1002 1 1 0
3 1003 1 1 1
4 1004 1 0 0
import pandas as pd
import numpy as np
id = list(range(1000, 1010))
cl = list(range(1,22))
data_ = np.random.rand(10,21)
data_
client_data = pd.DataFrame(data=data_, index=id, columns=cl)
def change_col(col_hd=int):
week_num = (col_hd + 6) // 7
week_header = 'week_' + str(week_num)
return week_header
new_col_header = []
for c in cl:
new_col_header.append(change_col(c))
client_data.columns = new_col_header
client_data.columns.name = 'id'
client_data.groupby(axis='columns', level=0).sum()

Need to return coincidence matrix between two channels of time-stamped events

I am trying to create a coincidence matrix between energy events measured by detectors in two channels. "Coincidence" is to say that the events occur within a user-specified timing window of each other. The data are currently stored in a pandas dataframe of the following format with fake sample data:
Energy Timestamp Channel
___________________________
6 103 1
7 70 2
4 110 2
8 205 2
2 219 1
3 333 1
5 300 1
9 350 2
I need the data in the following format such that, if a user were to select a timing window of 20, the resulting coincidence matrix would be:
Channel 1 Energy: 1 2 3 4 5 6 7 8 9 10
Channel 2 Energy:_________________________________________
1| 0 0 0 0 0 0 0 0 0 0
2| 0 0 0 0 0 0 0 0 0 0
3| 0 0 0 0 0 0 0 0 0 0
4| 0 0 0 0 0 1 0 0 0 0
5| 0 0 0 0 0 0 0 0 0 0
6| 0 0 0 0 0 0 0 0 0 0
7| 0 0 0 0 0 0 0 0 0 0
8| 0 1 0 0 0 0 0 0 0 0
9| 0 0 1 0 0 0 0 0 0 0
10| 0 0 0 0 0 0 0 0 0 0
Where now only the events that meet the condition:
Event1_Timestamp < Event2_Timestamp + Timing window & Event1_Timestamp > Event2_Timestamp - Timing window
are preserved in the coincidence matrix, and all noncoincident events are discarded.
I have tried:
df2 = df.merge(df, on="Timestamp")
df3 = pd.crosstab(df2.Energy_x, df2.Energy_y)
but there are a few problems with this output. It looks for exact matches in the timestamp rather than a timing window range, and it only lists the energies that appear, rather than a linearly spaced range of all possible energies (0-8192 energy bins). Any help is greatly appreciated.
Let's try using pd.merge_asof and pd.crosstab:
Where df,
Energy Timestamp Channel
0 6 103 1
1 7 70 2
2 4 110 2
3 8 205 2
4 2 219 1
5 3 333 1
6 5 300 1
7 9 350 2
Then,
df_out = pd.merge_asof(df.sort_values('Timestamp'),
df.sort_values('Timestamp'),
on='Timestamp',
allow_exact_matches=False,
tolerance=20)
pd.crosstab(df_out['Energy_x'],
df_out['Energy_y']).reindex(index=np.arange(1,11),
columns=np.arange(1,11),
fill_value=0)
Output:
Energy_y 1 2 3 4 5 6 7 8 9 10
Energy_x
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 1 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 1 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0
9 0 0 1 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0

Categories

Resources