How do you replace the index column in Bokeh DataTable? - python

I created a DataTable in Bokeh, but it doesn't show the index column:
I expected to have the "Index" column in left:
This is my code:
evolution_data = treatcriteria_daily_data_table.groupby(['startdate_dayweek','startdate_weekyear'],as_index = False).sum().pivot('startdate_dayweek','startdate_weekyear').fillna(0)
evolution_data = evolution_data.droplevel(0,1)
evolution_data.loc['Total'] = evolution_data.sum()
evolution_data['Index'] = ['Lundi', 'Mardi', 'Mercredi', 'Jeudi', 'Vendredi', 'Samedi', 'Dimanche', 'Total']
evolution_data.set_index('Index', inplace=True, drop=True)
# remove the last week if there is not all the data
evolution_data = evolution_data.loc[:, ~(evolution_data == 0.).any()]
evolution_two_last_weeks = []
nb_cols = len(evolution_data.columns)
diff_cpu_2_last_weeks = evolution_data.iat[7, nb_cols - 1] - evolution_data.iat[7, nb_cols - 2]
for i in range(0,7):
daily_evolution = (data_2_weeks_before_the_last.iat[1,i] - data_2_weeks_before_the_last.iat[0,i]) / diff_cpu_2_last_weeks
evolution_two_last_weeks.append(daily_evolution)
variation_of_total = (data_2_weeks_before_the_last.iat[1,7] - data_2_weeks_before_the_last.iat[0,7]) / data_2_weeks_before_the_last.iat[0,7]
daily_variations_dict = {"Lundi": evolution_two_last_weeks[0],
"Mardi": evolution_two_last_weeks[1],
"Mercredi": evolution_two_last_weeks[2],
"Jeudi": evolution_two_last_weeks[3],
"Vendredi": evolution_two_last_weeks[4],
"Samedi": evolution_two_last_weeks[5],
"Dimanche": evolution_two_last_weeks[6],
"Total": variation_of_total}
# remove decimals in the table
cols = evolution_data.columns
evolution_data[cols] = evolution_data[cols].applymap(np.int64)
evolution_data['% évolution des 2 dernières semaines'] = evolution_data.index.map(mapper=(lambda x: daily_variations_dict[x]))
evolution_data['% évolution des 2 dernières semaines'] = pd.Series(["{0:.2f}%".format(val*100) for val in evolution_data['% évolution des 2 dernières semaines']], index = evolution_data.index)
print(evolution_data)
Columns = [TableColumn(field=Ci, title=Ci) for Ci in evolution_data.columns] # bokeh columns
data_table = DataTable(columns=Columns, source=ColumnDataSource(evolution_data)) # bokeh table
show(data_table)
How to show the index column (with day names) and not the row number? Thank you.

As already said in comments Bokeh doesn't support altering the first column which is always fixed and indicates the 0-based row number. However, under the hood, Bokeh uses the SlickGrid which does allows this functionality, but this solution is too complicated as you would need to find the reference to the SlickGrid object first in the Bokeh model and then replace the first column in JavaScript right after the page load.
Much more simple way is as suggested to hide the first column using index_position = None so you could do data_table = DataTable(... , index_position = None) and then add the Index data from your DataFrame as first one to the table's columns. The reason that it is not there now is that the df.columns doesn't include the index column that you need. So try:
cols.insert(0, 'Index')
data_table = DataTable(columns=Columns,
source=ColumnDataSource(evolution_data),
index_position = None) # add this line

Related

How to concatenate a series to a pandas dataframe in python?

I would like to iterate through a dataframe rows and concatenate that row to a different dataframe basically building up a different dataframe with some rows.
For example:
`IPCSection and IPCClass Dataframes
allcolumns = np.concatenate((IPCSection.columns, IPCClass.columns), axis = 0)
finalpatentclasses = pd.DataFrame(columns=allcolumns)
for isec, secrow in IPCSection.iterrows():
for icl, clrow in IPCClass.iterrows():
if (secrow[0] in clrow[0]):
pdList = [finalpatentclasses, pd.DataFrame(secrow), pd.DataFrame(clrow)]
finalpatentclasses = pd.concat(pdList, axis=0, ignore_index=True)
display(finalpatentclasses)
The output is:
I want the nan values to dissapear and move all the data under the correct columns. I tried axis = 1 but messes up the column names. Append does not work as well all values are placed diagonally at the table with nan values as well.
Alright, I have figured it out. The idea is that you create a newrowDataframe and concatenate all the data in a list from there you can add it to the dataframe and then conc with the final dataframe.
Here is the code:
allcolumns = np.concatenate((IPCSection.columns, IPCClass.columns), axis = 0)
finalpatentclasses = pd.DataFrame(columns=allcolumns)
for isec, secrow in IPCSection.iterrows():
for icl, clrow in IPCClass.iterrows():
newrow = pd.DataFrame(columns=allcolumns)
values = np.concatenate((secrow.values, subclrow.values), axis=0)
newrow.loc[len(newrow.index)] = values
finalpatentclasses = pd.concat([finalpatentclasses, newrow], axis=0)
finalpatentclasses.reset_index(drop=false, inplace=True)
display(finalpatentclasses)
Update the code below is more efficient:
allcolumns = np.concatenate((IPCSection.columns, IPCClass.columns, IPCSubClass.columns, IPCGroup.columns), axis = 0)
newList = []
for secrow in IPCSection.itertuples():
for clrow in IPCClass.itertuples():
if (secrow[1] in clrow[1]):
values = ([secrow[1], secrow[2], subclrow[1], subclrow[2]])
new_row = {IPCSection.columns[0]: [secrow[1]], IPCSection.columns[1]: [secrow[2]],
IPCClass.columns[0]: [clrow[1]], IPCClass.columns[1]: [clrow[2]]}
newList.append(values)
finalpatentclasses = pd.DataFrame(newList, columns=allcolumns)
display(finalpatentclasses)

Pandas assign value based on next row(s)

Consider this simple pandas DataFrame with columns 'record', 'start', and 'param'. There can be multiple rows with the same record value, and each unique record value corresponds to the same start value. However, the 'param' value can be different for the same 'record' and 'start' combination:
pd.DataFrame({'record':[1,2,3,4,4,5,6,7,7,7,8], 'start':[0,5,7,13,13,19,27,38,38,38,54], 'param':['t','t','t','u','v','t','t','t','u','v','t']})
I'd like to make a column 'end' that takes the value of 'start' in the row with the next unique value of 'record'. The values of column 'end' should be:
[5,7,13,19,19,27,38,54,54,54,NaN]
I'm able to do this using a for loop, but I know this is not preferred when using pandas:
max_end = 100
for idx, row in df.iterrows():
try:
n = 1
next_row = df.iloc[idx+n]
while next_row['start'] == row['start']:
n = n+1
next_row = df.iloc[idx+n]
end = next_row['start']
except:
end = max_end
df.at[idx, 'end'] = end
Is there an easy way to achieve this without a for loop?
I have no doubt there is a smarter solution but here is mine.
df1['end'] = df1.drop_duplicates(subset = ['record', 'start'])['start'].shift(-1).reindex(index = df1.index, method = 'ffill')
-=EDIT=-
Added subset into drop_duplicates to account for question amendment
This solution is equivalent to #Quixotic22 although more explicit.
df = pd.DataFrame({
'record':[1,2,3,4,4,5,6,7,7,7,8],
'start':[0,5,7,13,13,19,27,38,38,38,54],
'param':['t','t','t','u','v','t','t','t','u','v','t']
})
max_end = 100
df["end"] = None # create new column with empty values
loc = df["record"].shift(1) != df["record"] # record where the next value is diff from previous
df.loc[loc, "end"] = df.loc[loc, "start"].shift(-1) # assign desired values
df["end"].fillna(method = "ffill", inplace = True) # fill remaining missing values
df.loc[df.index[-1], "end"] = max_end # override last value
df

First row to header with pandas

I have the following pandas dataframe df :
import pandas as pd
from io import StringIO
s = '''\
"Unnamed: 0","Unnamed: 1"
Objet,"Unités vendues"
Chaise,3
Table,2
Tabouret,1
'''
df = pd.read_csv(StringIO(s))
which looks as:
Unnamed: 0 Unnamed: 1
0 Objet Unités vendues
1 Chaise 3
2 Table 2
3 Tabouret 1
My target is to make the first row as header.
I use :
headers = df.iloc[0]
df.columns = [headers]
However, the "0" appears in index column name (which is normal, because this 0 was in the first row).
0 Objet Unités vendues
1 Chaise 3
2 Table 2
I tried to delete it in many way, but nothing work :
Neither del df.index.name from this post
Neither df.columns.name = None from this post or this one (which is the same situation)
How can I have this expected output :
Objet Unités vendues
1 Chaise 3
2 Table 2
what about defining that when you load your table in the first place?
pd.read_csv('filename', header = 1)
otherwise I guess you can just do this:
df.drop('0', axis = 1)
What worked for me.
Replace:
headers = df.iloc[0]
df.columns = [headers]
with:
headers = df.iloc[0].values
df.columns = headers
df.drop(index=0, axis=0, inplace=True)
Using .values returns the values from the row Series as a list which does not include the index value.
Reassigning the column headers then works as expected, without the 0.
Row 0 still exists so it should be removed with df.drop.
Having my data in U and my column names in Un I came up with this algorithm.
If you can shorten it, please do so.
U = pd.read_csv('U.csv', header = None) #.to_numpy()
Un = pd.read_csv('namesU.csv', header=None).T # Read your names csv, in my case they are in one column
Un = Un.append(U) # append the data U to the names
Un.reset_index(inplace=True, drop=True) # reset the index and drop the old one, so you don't have duplicated indices
Un.columns = [Un.iloc[0]] # take the names from the first row
Un.drop(index=0, inplace=True) # drop the first row
Un.reset_index(inplace=True, drop=True) # Return the index counter to start from 0
Another option:
Un = pd.read_csv('namesY.csv', header=None) # Read your names csv, in my case they are in one column
Un = list( Un[0] )
Un = pd.DataFrame(U, columns=[Un])
Using the skiprows parameter did the job for me: i.e. skiprows=N
where N = the number of rows to skip (in the above example, 1), so:
df = pd.read_csv('filename', skiprows=1)

How to add a new column to pandas dataframe while iterate over the rows?

I want to generate a new column using some columns that already exists.But I think it is too difficult to use an apply function. Can I generate a new column (ftp_price here) when iterating through this dataframe? Here is my code. When I call product_df['ftp_price'],I got a KeyError.
for index, row in product_df.iterrows():
current_curve_type_df = curve_df[curve_df['curve_surrogate_key'] == row['curve_surrogate_key_x']]
min_tmp_df = row['start_date'] - current_curve_type_df['datab_map'].apply(parse)
min_tmp_df = min_tmp_df[min_tmp_df > timedelta(days=0)]
curve = current_curve_type_df.loc[min_tmp_df.idxmin()]
tmp_diff = row['end_time'] - np.array(row['start_time'])
if np.isin(0, tmp_diff):
idx = np.where(tmp_diff == 0)
col_name = COL_NAMES[idx[0][0]]
row['ftp_price'] = curve[col_name]
else:
idx = np.argmin(tmp_diff > 0)
p_plus_one_rate = curve[COL_NAMES[idx]]
p_minus_one_rate = curve[COL_NAMES[idx - 1]]
d_plus_one_days = row['start_date'] + rate_mapping_dict[COL_NAMES[idx]]
d_minus_one_days = row['start_date'] + rate_mapping_dict[COL_NAMES[idx - 1]]
row['ftp_price'] = p_minus_one_rate + (p_plus_one_rate - p_minus_one_rate) * (row['start_date'] - d_minus_one_days) / (d_plus_one_days - d_minus_one_days)
An alternative to setting new value to a particular index is using at:
for index, row in product_df.iterrows():
product_df.at[index, 'ftp_price'] = val
Also, you should read why using iterrows should be avoided
A row can be a view or a copy (and is often a copy), so changing it would not change the original dataframe. The correct way is to always change the original dataframe using loc or iloc:
product_df.loc[index, 'ftp_price'] = ...
That being said, you should try to avoid to explicitely iterate the rows of a dataframe when possible...

Python - Pandas library returns wrong column values after parsing a CSV file

SOLVED Found the solution by myself. Turns out that when you want to retrieve specific columns by their names you should pass the names in the order they appear inside the csv (which is really stupid for a library that is intended to save some parsing time for a developer IMO). Correct me if I am wrong but i dont see a on option to get a specific columns values by its name if the columns are in a different order...
I am trying to read a comma separated value file with python and then
parse it using Pandas library. Since the file has many values (columns) that are not needed I make a list of the column names i do need.
Here's a look at the csv file format.
Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Attendance,Referee,HS,AS,HST,AST,HHW,AHW,HC,AC,HF,AF,HO,AO,HY,AY,HR,AR,HBP,ABP,GBH,GBD,GBA,IWH,IWD,IWA,LBH,LBD,LBA,SBH,SBD,SBA,WHH,WHD,WHA
E0,19/08/00,Charlton,Man City,4,0,H,2,0,H,20043,Rob
Harris,17,8,14,4,2,1,6,6,13,12,8,6,1,2,0,0,10,20,2,3,3.2,2.2,2.9,2.7,2.2,3.25,2.75,2.2,3.25,2.88,2.1,3.2,3.1
E0,19/08/00,Chelsea,West Ham,4,2,H,1,0,H,34914,Graham
Barber,17,12,10,5,1,0,7,7,19,14,2,3,1,2,0,0,10,20,1.47,3.4,5.2,1.6,3.2,4.2,1.5,3.4,6,1.5,3.6,6,1.44,3.6,6.5
E0,19/08/00,Coventry,Middlesbrough,1,3,A,1,1,D,20624,Barry
Knight,6,16,3,9,0,1,8,4,15,21,1,3,5,3,1,0,75,30,2.15,3,3,2.2,2.9,2.7,2.25,3.2,2.75,2.3,3.2,2.75,2.3,3.2,2.62
E0,19/08/00,Derby,Southampton,2,2,D,1,2,A,27223,Andy
D'Urso,6,13,4,6,0,0,5,8,11,13,0,2,1,1,0,0,10,10,2,3.1,3.2,1.8,3,3.5,2.2,3.25,2.75,2.05,3.2,3.2,2,3.2,3.2
E0,19/08/00,Leeds,Everton,2,0,H,2,0,H,40010,Dermot
Gallagher,17,12,8,6,0,0,6,4,21,20,6,1,1,3,0,0,10,30,1.65,3.3,4.3,1.55,3.3,4.5,1.55,3.5,5,1.57,3.6,5,1.61,3.5,4.5
E0,19/08/00,Leicester,Aston Villa,0,0,D,0,0,D,21455,Mike
Riley,5,5,4,3,0,0,5,4,12,12,1,4,2,3,0,0,20,30,2.15,3.1,2.9,2.3,2.9,2.5,2.35,3.2,2.6,2.25,3.25,2.75,2.4,3.25,2.5
E0,19/08/00,Liverpool,Bradford,1,0,H,0,0,D,44183,Paul
Durkin,16,3,10,2,0,0,6,1,8,8,5,0,1,1,0,0,10,10,1.25,4.1,7.2,1.25,4.3,8,1.35,4,8,1.36,4,8,1.33,4,8
This list is passed to pandas.read_csv()'s names parameter.
See code.
# Returns an array of the column names needed for our raw data table
def cols_to_extract():
cols_to_use = [None] * RawDataCols.COUNT
cols_to_use[RawDataCols.DATE] = 'Date'
cols_to_use[RawDataCols.HOME_TEAM] = 'HomeTeam'
cols_to_use[RawDataCols.AWAY_TEAM] = 'AwayTeam'
cols_to_use[RawDataCols.FTHG] = 'FTHG'
cols_to_use[RawDataCols.HG] = 'HG'
cols_to_use[RawDataCols.FTAG] = 'FTAG'
cols_to_use[RawDataCols.AG] = 'AG'
cols_to_use[RawDataCols.FTR] = 'FTR'
cols_to_use[RawDataCols.RES] = 'Res'
cols_to_use[RawDataCols.HTHG] = 'HTHG'
cols_to_use[RawDataCols.HTAG] = 'HTAG'
cols_to_use[RawDataCols.HTR] = 'HTR'
cols_to_use[RawDataCols.ATTENDANCE] = 'Attendance'
cols_to_use[RawDataCols.HS] = 'HS'
cols_to_use[RawDataCols.AS] = 'AS'
cols_to_use[RawDataCols.HST] = 'HST'
cols_to_use[RawDataCols.AST] = 'AST'
cols_to_use[RawDataCols.HHW] = 'HHW'
cols_to_use[RawDataCols.AHW] = 'AHW'
cols_to_use[RawDataCols.HC] = 'HC'
cols_to_use[RawDataCols.AC] = 'AC'
cols_to_use[RawDataCols.HF] = 'HF'
cols_to_use[RawDataCols.AF] = 'AF'
cols_to_use[RawDataCols.HFKC] = 'HFKC'
cols_to_use[RawDataCols.AFKC] = 'AFKC'
cols_to_use[RawDataCols.HO] = 'HO'
cols_to_use[RawDataCols.AO] = 'AO'
cols_to_use[RawDataCols.HY] = 'HY'
cols_to_use[RawDataCols.AY] = 'AY'
cols_to_use[RawDataCols.HR] = 'HR'
cols_to_use[RawDataCols.AR] = 'AR'
return cols_to_use
# Extracts raw data from the raw data csv and populates the raw match data table in the database
def extract_raw_data(csv):
# Clear the database table if it has any logs
# if MatchRawData.objects.count != 0:
# MatchRawData.objects.delete()
cols_to_use = cols_to_extract()
# Read and parse the csv file
parsed_csv = pd.read_csv(csv, delimiter=',', names=cols_to_use, header=0)
for col in cols_to_use:
values = parsed_csv[col].values
for val in values:
print(str(col) + ' --------> ' + str(val))
Where RawDataCols is an IntEnum.
class RawDataCols(IntEnum):
DATE = 0
HOME_TEAM = 1
AWAY_TEAM = 2
FTHG = 3
HG = 4
FTAG = 5
AG = 6
FTR = 7
RES = 8
...
The column names are obtained using it. That part of code works ok. The correct column name is obtained but after trying to get its values using
values = parsed_csv[col].values
pandas return the values of a wrong column. The wrong column index is around 13 indexes away from the one i am trying to get. What am i missing?
You can select column by name wise.Just use following line
values = parsed_csv[["Column Name","Column Name2"]]
Or you select Index wise by
cols = [1,2,3,4]
values = parsed_csv[parsed_csv.columns[cols]]

Categories

Resources