Pandas column concatenation - python

I have a dataframe (example DF1) with 300 columns of experimental data, where some of the experiments are repeated several times. I am able to use the set default method to get the column names (index), and I was wondering if there was a was to vertically append columns with similar names to a new data frame (example DF2)? I appreciate any help :)

You can melt then use groupby + cumcount to determine the row label and then you pivot.
Sample Data
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(1,25).reshape(8,3).T,
columns=['E1', 'E1', 'E2', 'E3', 'E4', 'E4', 'E4', 'E5'])
Code
df2 = df.melt()
df2['idx'] = df2.groupby('variable').cumcount()
df2 = (df2.pivot(index='idx', columns='variable', values='value')
.rename_axis(index=None, columns=None))
E1 E2 E3 E4 E5
0 1.0 7.0 10.0 13.0 22.0
1 2.0 8.0 11.0 14.0 23.0
2 3.0 9.0 12.0 15.0 24.0
3 4.0 NaN NaN 16.0 NaN
4 5.0 NaN NaN 17.0 NaN
5 6.0 NaN NaN 18.0 NaN
6 NaN NaN NaN 19.0 NaN
7 NaN NaN NaN 20.0 NaN
8 NaN NaN NaN 21.0 NaN

Related

Can we add extra rows in pandas dataframe

import pandas as pd
data = {'id':[22.5, 24.5, 25.5],
'id_value':[100, 110, 120],
'new': [100, 110, 120]}
df = pd.DataFrame(data)
import numpy as np
Range = pd.DataFrame(data = np.arange(21, 30), columns=['id'])
df = pd.merge(df, Range, on =["id"], how ="outer")
can I add extra entries in "id"? without the last three line of the code?
Try append:
>>> df.append(pd.DataFrame(range(21, 30), columns=['id']))
id id_value new
0 22.5 100.0 100.0
1 24.5 110.0 110.0
2 25.5 120.0 120.0
0 21.0 NaN NaN
1 22.0 NaN NaN
2 23.0 NaN NaN
3 24.0 NaN NaN
4 25.0 NaN NaN
5 26.0 NaN NaN
6 27.0 NaN NaN
7 28.0 NaN NaN
8 29.0 NaN NaN
You can use append
df.append(pd.DataFrame({"id":np.arange(21, 30)}), ignore_index=True)
id id_value new
0 22.5 100.0 100.0
1 24.5 110.0 110.0
2 25.5 120.0 120.0
3 21.0 NaN NaN
4 22.0 NaN NaN
5 23.0 NaN NaN
6 24.0 NaN NaN
7 25.0 NaN NaN
8 26.0 NaN NaN
9 27.0 NaN NaN
10 28.0 NaN NaN
11 29.0 NaN NaN

How to join two dataframe with same category?

Now, I have two dataframe. I have use groupby. and count() function to export this dataframe(df1). When I used groupby. to count the total number of each category. It filtered out the category which the count is 0. How can I use Python to get the outcome?
However,I will like to have a dataframe which also required categories.
Original dataframe:
Cat UR3 VR1 VR VR3
0 ATIDS 137.0 99.0 40.0 84.0
1 BasicCrane 2.0 8.0 3.0 1.0
2 Beam Sensor 27.0 12.0 13.0 14.0
3 CLPS 1.0 NaN NaN 1.0
However,I will like to have a dataframe which also required categories.
(required categories: ATIDS, BasicCrane, LLP, Beam Sensor, CLPS, SPR)
Expected dataframe (The count number of 'LLP' and 'SPR' is 0)
Cat UR3 VR1 VR VR3
0 ATIDS 137.0 99.0 40.0 84.0
1 BasicCrane 2.0 8.0 3.0 1.0
2 LLP NaN NaN NaN NaN
3 Beam Sensor 27.0 12.0 13.0 14.0
4 CLPS 1.0 NaN NaN 1.0
5 SPR NaN NaN NaN NaN
>>> categories
['ATIDS', 'BasicCrane', 'LLP', 'Beam Sensor', 'CLPS', 'SPR']
>>> pd.merge(pd.DataFrame({'Cat': categories}), df, how='outer')
Cat UR3 VR1 VR VR3
0 ATIDS 137.0 99.0 40.0 84.0
1 BasicCrane 2.0 8.0 3.0 1.0
2 LLP NaN NaN NaN NaN
3 Beam Sensor 27.0 12.0 13.0 14.0
4 CLPS 1.0 NaN NaN 1.0
5 SPR NaN NaN NaN NaN
One way you could easily do is to fill NaN value with 0 'before' doing a groupby function. All zero data (previously NaN value) will be totally be counted as zero.
df.fillna(0)

How to append to individual columns in a Pandas DataFrame

So I want to add/append data to a specific pandas dataFrame column but without it causing NaN values in the remaining columns
I.e.
DataFrame = pd.DataFrame(columns=["column1", "column2", "column3"])
for i in range():
DataFrame = DataFrame.append({"column1":int(i)}, ignore_index=True)
DataFrame = DataFrame.append({"column2":float(i*2)}, ignore_index=True)
DataFrame = DataFrame.append({"column3":int(i*5)}, ignore_index=True)
print(DataFrame)
This will return:
column1 column2 column3
0 0.0 NaN NaN
1 NaN 0.0 NaN
2 NaN NaN 0.0
3 1.0 NaN NaN
4 NaN 2.0 NaN
5 NaN NaN 5.0
6 2.0 NaN NaN
7 NaN 4.0 NaN
8 NaN NaN 10.0
What we want returned:
column1 column2 column3
0 0.0 0.0 0.0
1 1.0 2.0 5.0
2 2.0 4.0 10.0
I know I can in this case use one .append for all the different columns. But I have some cases where the data to be appended will vary based on multiple conditions. Hence I'd like to know if it's possible to append to single columns in a dataframe without producing NaN values in the remaining columns. So that I can avoid writing hundreds of if else statements.
Or if someone has any good idea regarding how to 'collapse' the NaN values (removing the NaN values without removing the entire row so that if there is a NaN value at index 0 in column 3 and there is a integer 5 at index 1 in the same column the integer 5 gets moved up to index 0)
Happy to hear any ideas.
IIUC for your current example you can try this:
DataFrame[['column2','column3']]=DataFrame[['column2','column3']].bfill()
Output:
column1 column2 column3
0 0.0 0.0 0.0
1 NaN 0.0 0.0
2 NaN 2.0 0.0
3 1.0 2.0 5.0
4 NaN 2.0 5.0
5 NaN 4.0 5.0
6 2.0 4.0 10.0
7 NaN 4.0 10.0
8 NaN 6.0 10.0
9 3.0 6.0 15.0
10 NaN 6.0 15.0
11 NaN 8.0 15.0
12 4.0 8.0 20.0
13 NaN 8.0 20.0
14 NaN NaN 20.0
then remove the NaN :
DataFrame.dropna(inplace=True)
Outpt:
column1 column2 column3
0 0.0 0.0 0.0
3 1.0 2.0 5.0
6 2.0 4.0 10.0
9 3.0 6.0 15.0
12 4.0 8.0 20.0

Prevent NaN to become index and column in dataframe pivot

I have a dataframe which I extend to include values for all increments in 2 columns. Therefor NaN values are introduced, as expected and desired.
However, when I use pivot on this dataframe I'll get a row and column for NaN.
Can I prevent this when doing the pivot? If not, how can I drop a column named NaN?
Trying to drop it by calling [NaN],[nan] or ['NaN'] doesn't work.
Dropping the columns and rows where all values are NaN is not working in this case as the column headings and indexes are used for a seaborn heatmap plot, so eventhough all cell values are NaN it is still useful to have it as the index and key values are not NaN
Sample code;
import pandas as pd
import numpy as np
#generate dummy data
df = pd.DataFrame({'Y': np.random.randint(130,140,10),
'X': np.random.randint(5,10,10),
'Z': np.random.randint(0,25, size=10)})
df = df.round(1)
#create dataset for heatmap
#group by axis to plot
df = df.groupby(['X','Y']).sum().reset_index()
df = df.sort_values(by=['Y'])
dfY = pd.DataFrame({'Y':np.arange(min(df['Y']), max(df['Y']),1)})
dfX = pd.DataFrame({'X':np.arange(min(df['X']), max(df['X']),1)})
df = pd.merge(df,dfY, how='outer', on='Y')
df = pd.merge(df,dfX, how='outer', on='X')
df = df.round(1)
print(df)
#restructure for heatmap
data = df.pivot("Y","X","Z").sort_values(by=['Y'],ascending=False)
print(data)
Sample DataFrame before pivot:
X Y Z
0 5.0 132.0 0.0
1 5.0 135.0 20.0
2 5.0 137.0 17.0
3 7.0 132.0 15.0
4 7.0 133.0 3.0
5 6.0 133.0 30.0
6 6.0 135.0 22.0
7 6.0 138.0 16.0
8 9.0 135.0 9.0
9 NaN 134.0 NaN
10 NaN 136.0 NaN
11 8.0 NaN NaN
After pivot:
X NaN 5.0 6.0 7.0 8.0 9.0
Y
138.0 NaN NaN 16.0 NaN NaN NaN
137.0 NaN 17.0 NaN NaN NaN NaN
136.0 NaN NaN NaN NaN NaN NaN
135.0 NaN 20.0 22.0 NaN NaN 9.0
134.0 NaN NaN NaN NaN NaN NaN
133.0 NaN NaN 30.0 3.0 NaN NaN
132.0 NaN 0.0 NaN 15.0 NaN NaN
NaN NaN NaN NaN NaN NaN NaN
Desired output:
X 5.0 6.0 7.0 8.0 9.0
Y
138.0 NaN 16.0 NaN NaN NaN
137.0 17.0 NaN NaN NaN NaN
136.0 NaN NaN NaN NaN NaN
135.0 20.0 22.0 NaN NaN 9.0
134.0 NaN NaN NaN NaN NaN
133.0 NaN 30.0 3.0 NaN NaN
132.0 0.0 NaN 15.0 NaN NaN
For me working drop by missing value np.nan:
data = (df.pivot("Y","X","Z")
.sort_values(by=['Y'],ascending=False)
.drop(np.nan, axis=1)
.drop(np.nan))
Or:
data = df.pivot("Y","X","Z").sort_values(by=['Y'],ascending=False)
data = data.reindex(index=data.index.difference([np.nan]),
columns=data.columns.difference([np.nan]))

Cuting dataframe loop

I have a dataset which is only one column. I want to cut the column into multiple dataframes.
I use a for loop to create a list which contains the values at which positions I want to cut the dataframe.
import pandas as pd
df = pd.read_csv("column.csv", delimiter=";", header=0, index_col=(0))
number_of_pixels = int(len(df.index))
print("You have " + str(number_of_pixels) +" pixels in your file")
number_of_rows = int(input("Enter number of rows you want to create"))
list=[] #this list contains the number of pixels per row
for i in range (0,number_of_rows): #this loop fills the list with the number of pixels per row
pixels_per_row=int(input("Enter number of pixels in row " + str(i)))
list.append(pixels_per_row)
print(list)
After cutting the column into multiple dataframes I want to transpose each dataframe and concating all dataframes back together using:
df1=df1.reset_index(drop=True)
df1=df1.T
df2=df2.reset_index(drop=True)
df2=df2.T
frames = [df1,df2]
result = pd.concat(frames, axis=0)
print(result)
So I want to create a loop that cuts my dataframe into multiple frames at the positions stored in my list.
Thank you!
This is a problem that is better solved with numpy. I'll start from the point of you receiving a list from your user input. The whole point is to use numpy.split to separate the values based on the cumulative number of pixels requested, and then create a new DataFrame
Setup
import numpy as np
import pandas as pd
np.random.seed(123)
df = pd.DataFrame({'val': np.random.randint(1,10,50)})
lst = [4,10,2,1,15,8,9,1]
Code
pd.DataFrame(np.split(df.val.values, np.cumsum(lst)[:-1]))
Output
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 3 3.0 7.0 2.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 4 7.0 2.0 1.0 2.0 1.0 1.0 4.0 5.0 1.0 NaN NaN NaN NaN NaN
2 1 5.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 8 4.0 3.0 5.0 8.0 3.0 5.0 9.0 1.0 8.0 4.0 5.0 7.0 2.0 6.0
5 7 3.0 2.0 9.0 4.0 6.0 1.0 3.0 NaN NaN NaN NaN NaN NaN NaN
6 7 3.0 5.0 5.0 7.0 4.0 1.0 7.0 5.0 NaN NaN NaN NaN NaN NaN
7 8 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
If your list has more pixels than the total number of rows in your initial DataFrame then you'll get extra all NaN rows in your output. If your lst sums to less than the total number of pixels, it will add them to all to the last row. Since you didn't specify either of these conditions in your question, not sure how you'd want to handle that.

Categories

Resources