How to handle Cells containing only NaN values in pandas? - python

I am setting up a stock price prediction data set,in that while applying the following code for Ichimoku Cloud Indicator:
from datetime import timedelta
high_9 = df['High'].rolling(window= 9).max()
low_9 = df['Low'].rolling(window= 9).min()
df['tenkan_sen'] = (high_9 + low_9) /2
high_26 = df['High'].rolling(window= 26).max()
low_26 = df['Low'].rolling(window= 26).min()
df['kijun_sen'] = (high_26 + low_26) /2
# this is to extend the 'df' in future for 26 days
# the 'df' here is numerical indexed df
# the problem is here
last_index = df.iloc[-1:].index[0]
last_date = df['Date'].iloc[-1].date()
for i in range(26):
df.loc[last_index+1 +i, 'Date'] = last_date + timedelta(days=i)
df['senkou_span_a'] = ((df['tenkan_sen'] + df['kijun_sen']) / 2).shift(26)
high_52 = df['High'].rolling(window= 52).max()
low_52 = df['Low'].rolling(window= 52).min()
df['senkou_span_b'] = ((high_52 + low_52) /2).shift(26)
# most charting softwares dont plot this line
df['chikou_span'] = df['Close'].shift(-26)
The above code works great but the problem is while extending to the next 26 time steps(rows) in 'senoku span a' and 'b' columns it turns other rest columns row's values to NaN.
So i need the help to make 'Senoku span a' & 'Senoku span b' predicted rows in my data set without making other rows vlaues to NaN.
The current output is:
Date Open High Low Close Senoku span a Senoku span b
2019-03-16 50 51 52 53 56.0 55.82
2019-03-17 NaN NaN NaN NaN 55.0 56.42
2019-03-18 NaN NaN NaN NaN 54.0 57.72
2019-03-19 NaN NaN NaN NaN 53.0 58.12
2019-03-20 NaN NaN NaN NaN 52.0 59.52
The expected output is:
Date Open High Low Close Senoku span a Senoku span b
2019-03-16 50 51 52 53 56.0 55.82
2019-03-17 55.0 56.42
2019-03-18 54.0 57.72
2019-03-19 53.0 58.12
2019-03-20 52.0 59.52

Related

How to trim down a Pandas data frame rows?

I'm trying so hard to shorten this awful lot of rows from an XML sitemap but I can't find a solution to trim it down.
import advertools as adv
import pandas as pd
site = "https://www.halfords.com/sitemap_index.xml"
sitemap = adv.sitemap_to_df(site)
sitemap = sitemap.dropna(subset=["loc"]).reset_index(drop=True)
# Some sitemaps keeps urls with "/" on the end, some is with no "/"
# If there is "/" on the end, we take the second last column as slugs
# Else, the last column is the slug column
slugs = sitemap['loc'].dropna()[~sitemap['loc'].dropna().str.endswith('/')].str.split('/').str[-2].str.replace('-', ' ')
slugs2 = sitemap['loc'].dropna()[~sitemap['loc'].dropna().str.endswith('/')].str.split('/').str[-1].str.replace('-', ' ')
# Merge two series
slugs = list(slugs) + list(slugs2)
# adv.word_frequency automatically removes the stop words
word_counts_onegram = adv.word_frequency(slugs)
word_counts_twogram = adv.word_frequency(slugs, phrase_len=2)
competitor = pd.concat([word_counts_onegram, word_counts_twogram])\
.rename({'abs_freq':'Count','word':'Ngram'}, axis=1)\
.sort_values('Count', ascending=False)
competitor.to_csv('competitor.csv',index=False)
competitor
competitor.shape
(67758, 2)
(67758, 2)
I've been raveling around several blogs included resources on Stack Overflow but nothing seemed to work.
This is definitely something going on with my zero expertise in coding I suppose
Two things:
You can use adv.url_to_df to split URLs and get the slugs (there should be a column called last_dir:
urldf = adv.url_to_df(sitemap['loc'].dropna())
urldf
url
scheme
netloc
path
query
fragment
dir_1
dir_2
dir_3
dir_4
dir_5
dir_6
dir_7
dir_8
dir_9
last_dir
0
https://www.halfords.com/cycling/cycling-technology/helmet-cameras/removu-k1-4k-camera-and-stabiliser-694977.html
https
www.halfords.com
/cycling/cycling-technology/helmet-cameras/removu-k1-4k-camera-and-stabiliser-694977.html
nan
nan
cycling
cycling-technology
helmet-cameras
removu-k1-4k-camera-and-stabiliser-694977.html
nan
nan
nan
nan
nan
removu-k1-4k-camera-and-stabiliser-694977.html
1
https://www.halfords.com/technology/bluetooth-car-kits/jabra-drive-bluetooth-speakerphone---white-695094.html
https
www.halfords.com
/technology/bluetooth-car-kits/jabra-drive-bluetooth-speakerphone---white-695094.html
nan
nan
technology
bluetooth-car-kits
jabra-drive-bluetooth-speakerphone---white-695094.html
nan
nan
nan
nan
nan
nan
jabra-drive-bluetooth-speakerphone---white-695094.html
2
https://www.halfords.com/tools/power-tools-and-accessories/power-tools/stanley-fatmax-v20-18v-combi-drill-kit-695102.html
https
www.halfords.com
/tools/power-tools-and-accessories/power-tools/stanley-fatmax-v20-18v-combi-drill-kit-695102.html
nan
nan
tools
power-tools-and-accessories
power-tools
stanley-fatmax-v20-18v-combi-drill-kit-695102.html
nan
nan
nan
nan
nan
stanley-fatmax-v20-18v-combi-drill-kit-695102.html
3
https://www.halfords.com/technology/dash-cams/mio-mivue-c450-695262.html
https
www.halfords.com
/technology/dash-cams/mio-mivue-c450-695262.html
nan
nan
technology
dash-cams
mio-mivue-c450-695262.html
nan
nan
nan
nan
nan
nan
mio-mivue-c450-695262.html
4
https://www.halfords.com/technology/dash-cams/mio-mivue-818-695270.html
https
www.halfords.com
/technology/dash-cams/mio-mivue-818-695270.html
nan
nan
technology
dash-cams
mio-mivue-818-695270.html
nan
nan
nan
nan
nan
nan
mio-mivue-818-695270.html
There are options that pandas provides, which you can change. For example:
pd.options.display.max_rows
60
# change it to display more/fewer rows:
pd.options.display.max_rows = 100
As you did, you can easily create onegrams and bigrams, combine them, and display them:
text_list = urldf['last_dir'].str.replace('-', ' ').dropna()
one_grams = adv.word_frequency(text_list, phrase_len=1)
bigrams = adv.word_frequency(text_list, phrase_len=2)
print(pd.concat([one_grams, bigrams])
.sort_values('abs_freq', ascending=False)
.head(15) # <-- change this to 100 for example
.reset_index(drop=True))
word
abs_freq
0
halfords
2985
1
car
1430
2
bike
922
3
kit
829
4
black
777
5
laser
686
6
set
614
7
wheel
540
8
pack
524
9
mats
511
10
car mats
478
11
thule
453
12
paint
419
13
4
413
14
spray
382
Hope that helps?

Pandas dataframe merge row by addition

I want to create a dataframe from census data. I want to calculate the number of people that returned a tax return for each specific earnings group.
For now, I wrote this
census_df = pd.read_csv('../zip code data/19zpallagi.csv')
sub_census_df = census_df[['zipcode', 'agi_stub', 'N02650', 'A02650', 'ELDERLY', 'A07180']].copy()
num_of_returns = ['Number_of_returns_1_25000', 'Number_of_returns_25000_50000', 'Number_of_returns_50000_75000',
'Number_of_returns_75000_100000', 'Number_of_returns_100000_200000', 'Number_of_returns_200000_more']
for i, column_name in zip(range(1, 7), num_of_returns):
sub_census_df[column_name] = sub_census_df[sub_census_df['agi_stub'] == i]['N02650']
I have 6 groups attached to a specific zip code. I want to get one row, with the number of returns for a specific zip code appearing just once as a column. I already tried to change NaNs to 0 and to use groupby('zipcode').sum(), but I get 50 million rows summed for zip code 0, where it seems that only around 800k should exist.
Here is the dataframe that I currently get:
zipcode agi_stub N02650 A02650 ELDERLY A07180 Number_of_returns_1_25000 Number_of_returns_25000_50000 Number_of_returns_50000_75000 Number_of_returns_75000_100000 Number_of_returns_100000_200000 Number_of_returns_200000_more Amount_1_25000 Amount_25000_50000 Amount_50000_75000 Amount_75000_100000 Amount_100000_200000 Amount_200000_more
0 0 1 778140.0 10311099.0 144610.0 2076.0 778140.0 NaN NaN NaN NaN NaN 10311099.0 NaN NaN NaN NaN NaN
1 0 2 525940.0 19145621.0 113810.0 17784.0 NaN 525940.0 NaN NaN NaN NaN NaN 19145621.0 NaN NaN NaN NaN
2 0 3 285700.0 17690402.0 82410.0 9521.0 NaN NaN 285700.0 NaN NaN NaN NaN NaN 17690402.0 NaN NaN NaN
3 0 4 179070.0 15670456.0 57970.0 8072.0 NaN NaN NaN 179070.0 NaN NaN NaN NaN NaN 15670456.0 NaN NaN
4 0 5 257010.0 35286228.0 85030.0 14872.0 NaN NaN NaN NaN 257010.0 NaN NaN NaN NaN NaN 35286228.0 NaN
And here is what I want to get:
zipcode Number_of_returns_1_25000 Number_of_returns_25000_50000 Number_of_returns_50000_75000 Number_of_returns_75000_100000 Number_of_returns_100000_200000 Number_of_returns_200000_more
0 0 778140.0 525940.0 285700.0 179070.0 257010.0 850.0
here is one way to do it using groupby and sum the desired columns
num_of_returns = ['Number_of_returns_1_25000', 'Number_of_returns_25000_50000', 'Number_of_returns_50000_75000',
'Number_of_returns_75000_100000', 'Number_of_returns_100000_200000', 'Number_of_returns_200000_more']
df.groupby('zipcode', as_index=False)[num_of_returns].sum()
zipcode Number_of_returns_1_25000 Number_of_returns_25000_50000 Number_of_returns_50000_75000 Number_of_returns_75000_100000 Number_of_returns_100000_200000 Number_of_returns_200000_more
0 0 778140.0 525940.0 285700.0 179070.0 257010.0 0.0
This question needs more information to actually give a proper answer. For example you leave out what is meant by certain columns in your data frame:
- `N1: Number of returns`
- `agi_stub: Size of adjusted gross income`
According to IRS this has the following levels.
Size of adjusted gross income "0 = No AGI Stub
1 = ‘Under $1’
2 = '$1 under $10,000'
3 = '$10,000 under $25,000'
4 = '$25,000 under $50,000'
5 = '$50,000 under $75,000'
6 = '$75,000 under $100,000'
7 = '$100,000 under $200,000'
8 = ‘$200,000 under $500,000’
9 = ‘$500,000 under $1,000,000’
10 = ‘$1,000,000 or more’"
I got the above from https://www.irs.gov/pub/irs-soi/16incmdocguide.doc
With this information, I think what you want to find is the number of
people who filed a tax return for each of the income levels of agi_stub.
If that is what you mean then, this can be achieved by:
import pandas as pd
data = pd.read_csv("./data/19zpallagi.csv")
## select only the desired columns
data = data[['zipcode', 'agi_stub', 'N1']]
## solution to your problem?
df = data.pivot_table(
index='zipcode',
values='N1',
columns='agi_stub',
aggfunc=['sum']
)
## bit of cleaning up.
PREFIX = 'agi_stub_level_'
df.columns = [PREFIX + level for level in df.columns.get_level_values(1).astype(str)]
Here's the output.
In [77]: df
Out[77]:
agi_stub_level_1 agi_stub_level_2 ... agi_stub_level_5 agi_stub_level_6
zipcode ...
0 50061850.0 37566510.0 ... 21938920.0 8859370.0
1001 2550.0 2230.0 ... 1420.0 230.0
1002 2850.0 1830.0 ... 1840.0 990.0
1005 650.0 570.0 ... 450.0 60.0
1007 1980.0 1530.0 ... 1830.0 460.0
... ... ... ... ... ...
99827 470.0 360.0 ... 170.0 40.0
99833 550.0 380.0 ... 290.0 80.0
99835 1250.0 1130.0 ... 730.0 190.0
99901 1960.0 1520.0 ... 1030.0 290.0
99999 868450.0 644160.0 ... 319880.0 142960.0
[27595 rows x 6 columns]

Is there a possibility to use a bigger List in phython?

For school I have to make a project about wifisignals and I am trying put the data in a dataframe.
There are 208.000 rows of data.
And when it comes to the code below, the code does not complete. The code is like it is stuck in an infinite loop.
But when I use only a 1000 rows my program works. So I think that my list are to small if that is possible.
Do bigger Lists exist in phython? Or is it because I use bad coding?
Thanks in advance.
edit 1:
(data is the original dataframe and wifiinfo is a column of that)
i have this format:
df = pd.DataFrame(columns=['Sender','Time','Date','Place','X','Y','Bezetting','SSID','BSSID','Signal'])
And i am trying to fill SSID, BSSID and Signal from the Column WifiInfo for this i have to split the data.
this is how 1 WifiInfo looks like:
ODISEE#88-1d-fc-41-dc-50:-83,ODISEE#88-1d-fc-2c-c0-00:-72,ODISEE#88-1d-fc-41-d2-d0:-82,CiscoC5976#58-6d-8f-19-14-38:-78,CiscoC5959#58-6d-8f-19-13-f4:-93,SNB#c8-d7-19-6f-be-b7:-99,ODISEE#88-1d-fc-2c-c5-70:-94,HackingDemo#58-6d-8f-19-11-48:-156,ODISEE#88-1d-fc-30-d4-40:-85,ODISEE#88-1d-fc-41-ac-50:-100
My current approach looks like:
for index, row in data.iterrows():
bezettingList = list()
ssidList = list()
bssidList = list()
signalList = list()
#WifiInfo splitting
wifis = row.WifiInfo.split(',')
for wifi in wifis:
#split wifi and add to List
ssid, bssid = wifi.split('#')
bssid, signal = bssid.split(':')
ssidList.append(ssid)
bssidList.append(bssid)
signalList.append(int(signal))
#add bezettingen to List
bezettingen = row.Bezetting.split(',')
for bezetting in bezettingen:
bezettingList.append(bezetting)
#add list to dataframe
df.loc[index,'SSID'] = ssidList
df.loc[index,'BSSID'] = bssidList
df.loc[index,'Signal'] = signalList
df.loc[index,'Bezetting'] = bezettingList
df.head()
IIUC, you need to first explode the row by commas so that this:
SSID BSSID Signal WifiInfo
0 NaN NaN NaN ODISEE#88-1d-fc-41-dc-50:-83,ODISEE#88- ...
becomes this:
SSID BSSID Signal WifiInfo
0 NaN NaN NaN ODISEE#88-1d-fc-41-dc-50:-83
1 NaN NaN NaN ODISEE#88-1d-fc-2c-c0-00:-72
2 NaN NaN NaN ODISEE#88-1d-fc-41-d2-d0:-82
3 NaN NaN NaN CiscoC5976#58-6d-8f-19-14-38:-78
4 NaN NaN NaN CiscoC5959#58-6d-8f-19-13-f4:-93
5 NaN NaN NaN SNB#c8-d7-19-6f-be-b7:-99
6 NaN NaN NaN ODISEE#88-1d-fc-2c-c5-70:-94
7 NaN NaN NaN HackingDemo#58-6d-8f-19-11-48:-156
8 NaN NaN NaN ODISEE#88-1d-fc-30-d4-40:-85
9 NaN NaN NaN ODISEE#88-1d-fc-41-ac-50:-100
# use `.explode`
data = data.assign(WifiInfo=data.WifiInfo.str.split(',')).explode('WifiInfo')
Now you could use .str.extract:
data['SSID'] = data['WifiInfo'].str.extract(r'(.*)#')
data['BSSID'] = data['WifiInfo'].str.extract(r'#(.*):')
data['Signal'] = data['WifiInfo'].str.extract(r':(.*)')
SSID BSSID Signal WifiInfo
0 ODISEE 88-1d-fc-41-dc-50 -83 ODISEE#88-1d-fc-41-dc-50:-83
1 ODISEE 88-1d-fc-2c-c0-00 -72 ODISEE#88-1d-fc-2c-c0-00:-72
2 ODISEE 88-1d-fc-41-d2-d0 -82 ODISEE#88-1d-fc-41-d2-d0:-82
3 CiscoC5976 58-6d-8f-19-14-38 -78 CiscoC5976#58-6d-8f-19-14-38:-78
4 CiscoC5959 58-6d-8f-19-13-f4 -93 CiscoC5959#58-6d-8f-19-13-f4:-93
5 SNB c8-d7-19-6f-be-b7 -99 SNB#c8-d7-19-6f-be-b7:-99
6 ODISEE 88-1d-fc-2c-c5-70 -94 ODISEE#88-1d-fc-2c-c5-70:-94
7 HackingDemo 58-6d-8f-19-11-48 -156 HackingDemo#58-6d-8f-19-11-48:-156
8 ODISEE 88-1d-fc-30-d4-40 -85 ODISEE#88-1d-fc-30-d4-40:-85
9 ODISEE 88-1d-fc-41-ac-50 -100 ODISEE#88-1d-fc-41-ac-50:-100
If you want to keep data grouped after column explosion, I'd assign an ID for each group of entries first:
data['Group'] = pd.factorize(data['WifiInfo'])[0]+1
SSID BSSID Signal WifiInfo Group
0 NaN NaN NaN ODISEE#88-1d-fc-41-dc-50:-83,ODISEE#88- ... 1
1 NaN NaN NaN ASD#22-1d-fc-41-dc-50:-83,QWERTY#88- ... 2
# after you explode the column
SSID BSSID Signal WifiInfo Group
ODISEE 88-1d-fc-41-dc-50 -83 ODISEE#88-1d-fc-41-dc-50:-83 1
ODISEE 88-1d-fc-2c-c0-00 -72 ODISEE#88-1d-fc-2c-c0-00:-72 1
...
...
ASD 22-1d-fc-41-dc-50 -83 ASD#88-1d-fc-41-dc-50:-83 2
QWERTY 88-1d-fc-2c-c0-00 -72 QWERTY#88-1d-fc-2c-c0-00:-72 2

Tiling in groupby on dataframe

I have a data frame that contains returns, size and sedols for a couple of dates.
My goal is to identify the top and bottom values for a certain condition per date, i.e I want the top decile largest size entries and the bottom decile smallest size entries for each date and flag them in a new column by 'xx' and 'yy'.
I am confused how to apply the tiling while grouping as well as creating a new column, here is what I already have.
import pandas as pd
import numpy as np
import datetime as dt
from random import choice
from string import ascii_uppercase
def create_dummy_data(start_date, days, entries_pday):
date_sequence_lst = [dt.datetime.strptime(start_date,'%Y-%m-%d') +
dt.timedelta(days=x) for x in range(0,days)]
date_sequence_lst = date_sequence_lst * entries_pday
returns_lst = [round(np.random.uniform(low=-0.10,high=0.20),2) for _ in range(entries_pday*days)]
size_lst = [round(np.random.uniform(low=10.00,high=10000.00),0) for _ in range(entries_pday*days)]
rdm_sedol_lst = [(''.join(choice(ascii_uppercase) for i in range(7))) for x in range(entries_pday)]
rdm_sedol_lst = rdm_sedol_lst * days
dates_returns_df = pd.DataFrame({'Date':date_sequence_lst , 'Sedols':rdm_sedol_lst, 'Returns':returns_lst,'Size':size_lst})
dates_returns_df = dates_returns_df.sort_values('Date',ascending=True)
dates_returns_df = dates_returns_df.reset_index(drop=True)
return dates_returns_df
def order_df_by(df_in,column_name):
df_out = df_in.sort_values(['Date',column_name],ascending=[True,False])
return df_out
def get_ntile(df_in,ntile):
df_in['Tiled'] = df_in.groupby(['Date'])['Size'].transform(lambda x : pd.qcut(x,ntile))
return df_in
if __name__ == "__main__":
# create dummy returns
data_df = create_dummy_data('2001-01-01',31,10)
# sort by attribute
data_sorted_df = order_df_by(data_df,'Size')
#ntile data per date
data_ntiled = get_ntile(data_sorted_df, 10)
for key, item in data_ntiled:
print(data_ntiled.get_group(key))
so far I would be expecting deciled results based on 'Size' for each date, the next step would be to filter only for decile 1 and decile 10 and flag the entries 'xx' and 'yy' respectively.
thanks
Consider using transform on the pandas.qcut method with labels 1 through ntile+1 for a decile column, then conditionally set flag with np.where using decile values:
...
def get_ntile(df_in, ntile):
df_in['Tiled'] = df_in.groupby(['Date'])['Size'].transform(lambda x: pd.qcut(x, ntile, labels=list(range(1, ntile+1))))
return df_in
if __name__ == "__main__":
# create dummy returns
data_df = create_dummy_data('2001-01-01',31,10)
# sort by attribute
data_sorted_df = order_df_by(data_df,'Size')
#ntile data per date
data_ntiled = get_ntile(data_sorted_df, 10)
data_ntiled['flag'] = np.where(data_ntiled['Tiled']==1.0, 'YY',
np.where(data_ntiled['Tiled']==10.0, 'XX', np.nan))
print(data_ntiled.reset_index(drop=True).head(15))
# Date Returns Sedols Size Tiled flag
# 0 2001-01-01 -0.03 TEEADVJ 8942.0 10.0 XX
# 1 2001-01-01 -0.03 PDBWGBJ 7142.0 9.0 nan
# 2 2001-01-01 0.03 QNVVPIC 6995.0 8.0 nan
# 3 2001-01-01 0.04 NTKEAKB 6871.0 7.0 nan
# 4 2001-01-01 0.20 ZVVCLSJ 6541.0 6.0 nan
# 5 2001-01-01 0.12 IJKXLIF 5131.0 5.0 nan
# 6 2001-01-01 0.14 HVPDRIU 4490.0 4.0 nan
# 7 2001-01-01 -0.08 XNOGFET 3397.0 3.0 nan
# 8 2001-01-01 -0.06 JOARYWC 2582.0 2.0 nan
# 9 2001-01-01 0.12 FVKBQGU 723.0 1.0 YY
# 10 2001-01-02 0.03 ZVVCLSJ 9291.0 10.0 XX
# 11 2001-01-02 0.14 HVPDRIU 8875.0 9.0 nan
# 12 2001-01-02 0.08 PDBWGBJ 7496.0 8.0 nan
# 13 2001-01-02 0.02 FVKBQGU 7307.0 7.0 nan
# 14 2001-01-02 -0.01 QNVVPIC 7159.0 6.0 nan

Populating new DataFrame by multi-criteria selection from old one with different structure

I'm using Pandas for data analysis. I have an input file like this snippet:
VEH SEC POS ACCELL SPEED
2 8.4 36.51 -0.2929 27.39
3 8.4 23.57 -0.7381 33.09
4 8.4 6.18 0.6164 38.8
1 8.5 47.76 0 25.57
I need to reorganize the data so that the rows are the unique (ordered) values from SEC as the 1st column, and then the other columns would be VEH1_POS, VEH1_SPEED, VEH1_ACCELL, VEH2_POS, VEH2_SPEED, VEH2_ACCELL, etc.:
TIME VEH1_POS VEH1_SPEED VEH1_ACCEL VEH2_POS, VEH2_SPEED, etc.
0.1 6.2 3.7 0.0 7.5 2.1
0.2 6.8 3.2 -0.5 8.3 2.1
etc.
So, for example, the value for VEH1_POS for each row in the new dataframe would be filled in by selecting values from the POS column in the original dataframe using the row where the SEC value matches the TIME value for the row in the new dataframe and the VEH value == 1.
To set up the rows in the new data frame I'm doing this:
start = inputdf['SIMSEC'].min()
end = inputdf['SIMSEC'].max()
time_steps = frange(start, end, 0.1)
outputdf['TIME'] = time_steps
But I'm lost at how to select the right values from the input dataframe and create the rest of the new dataframe for further analysis. Note also that the input file will NOT have data for every VEH for every SEC (time stamp). So the solution needs to handle that as well. My best guess was:
outputdf['veh1_pos'] = np.where((inputdf['VEH NO'] == 1) & (inputdf['SIMSEC'] == row['Time Step']))
but that doesn't work.
import pandas as pd
# your data
# ==========================
print(df)
Out[272]:
VEH SEC POS ACCELL SPEED
0 2 8.4 36.51 -0.2929 27.39
1 3 8.4 23.57 -0.7381 33.09
2 4 8.4 6.18 0.6164 38.80
3 1 8.5 47.76 0.0000 25.57
# reshaping
# ==========================
result = df.set_index(['SEC','VEH']).unstack()
Out[278]:
POS ACCELL SPEED
VEH 1 2 3 4 1 2 3 4 1 2 3 4
SEC
8.4 NaN 36.51 23.57 6.18 NaN -0.2929 -0.7381 0.6164 NaN 27.39 33.09 38.8
8.5 47.76 NaN NaN NaN 0 NaN NaN NaN 25.57 NaN NaN NaN
So here, the column has multi-level index where 1st level is POS, ACCELL, SPEED and 2nd level is VEH=1,2,3,4.
# if you want to rename the column
temp_z = result.columns.get_level_values(0)
temp_y = result.columns.get_level_values(1)
temp_x = ['VEH'] * len(temp_y)
result.columns = ['{}{}_{}'.format(x,y,z) for x,y,z in zip(temp_x, temp_y, temp_z)]
Out[298]:
VEH1_POS VEH2_POS VEH3_POS VEH4_POS VEH1_ACCELL VEH2_ACCELL VEH3_ACCELL VEH4_ACCELL VEH1_SPEED VEH2_SPEED VEH3_SPEED VEH4_SPEED
SEC
8.4 NaN 36.51 23.57 6.18 NaN -0.2929 -0.7381 0.6164 NaN 27.39 33.09 38.8
8.5 47.76 NaN NaN NaN 0 NaN NaN NaN 25.57 NaN NaN NaN

Categories

Resources