This question already has answers here:
add a row at top in pandas dataframe [duplicate]
(6 answers)
Closed 4 years ago.
I would like to move an entire row (index and values) from the last row to the first row of a DataFrame. Every other example I can find either uses an ordered row index (to be specific - my row index is not a numerical sequence - so I cannot simply add at -1 and then reindex with +1) or moves the values while maintaining the original index. My DF has descriptions as the index and the values are discrete to the index description.
I'm adding a row and then would like to move that into row 1. Here is the setup:
df = pd.DataFrame({
'col1' : ['A', 'A', 'B', 'F', 'D', 'C'],
'col2' : [2, 1, 9, 8, 7, 4],
'col3': [0, 1, 9, 4, 2, 3],
}).set_index('col1')
#output
In [7]: df
Out[7]:
col2 col3
col1
A 2 0
A 1 1
B 9 9
F 8 4
D 7 2
C 4 3
I then add a new row as follows:
df.loc["Deferred Description"] = pd.Series([''])
In [9]: df
Out[9]:
col2 col3
col1
A 2.0 0.0
A 1.0 1.0
B 9.0 9.0
F 8.0 4.0
D 7.0 2.0
C 4.0 3.0
Deferred Description NaN NaN
I would like the resulting output to be:
In [9]: df
Out[9]:
col2 col3
col1
Defenses Description NaN NaN
A 2.0 0.0
A 1.0 1.0
B 9.0 9.0
F 8.0 4.0
D 7.0 2.0
C 4.0 3.0
I've tried using df.shift() but only the values shift. I've also tried df.sort_index() but that requires the index to be ordered (there are several SO examples using df.loc[-1] = ... then then reindexing with df.index = df.index + 1). In my case I need the Defenses Description to be the first row.
Your problem is not one of cyclic shifting, but a simpler oneāone of insertion (which is why I've chosen to mark this question as duplicate).
Construct an empty DataFrame and then concatenate the two using pd.concat.
pd.concat([pd.DataFrame(columns=df.columns, index=['Deferred Description']), df])
col2 col3
Deferred Description NaN NaN
A 2 0
A 1 1
B 9 9
F 8 4
D 7 2
C 4 3
If this were columns, it'd have been easier. Funnily enough, pandas has a DataFrame.insert function that works for columns, but not rows.
Generalized Cyclic Shifting
If you were curious to know how you'd cyclically shift a dataFrame, you can use np.roll.
# apply this fix to your existing DataFrame
pd.DataFrame(np.roll(df.values, 1, axis=0),
index=np.roll(df.index, 1), columns=df.columns
)
col2 col3
Deferred Description NaN NaN
A 2 0
A 1 1
B 9 9
F 8 4
D 7 2
C 4 3
This, thankfully, also works when you have duplicate index values. If the index or columns aren't important, then pd.DataFrame(np.roll(df.values, 1, axis=0)) works well enough.
You can using append
pd.DataFrame({'col2':[np.nan],'col3':[np.nan]},index=["Deferred Description"]).append(df)
Out[294]:
col2 col3
Deferred Description NaN NaN
A 2.0 0.0
A 1.0 1.0
B 9.0 9.0
F 8.0 4.0
D 7.0 2.0
C 4.0 3.0
Related
I have a data frame with numeric values, such as
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
and I append a single row with all the column sums
totals = df.sum()
totals.name = 'totals'
df_append = df.append(totals)
Simple enough.
Here are the values of df, totals, and df_append
>>> df
A B
0 1 2
1 3 4
>>> totals
A 4
B 6
Name: totals, dtype: int64
>>> df_append
A B
0 1 2
1 3 4
totals 4 6
Unfortunately, in newer versions of pandas the method DataFrame.append is deprecated, and will be removed in some future version of pandas. The advise is to replace it with pandas.concat.
Now, using pd.concat naively as follows
df_concat_bad = pd.concat([df, totals])
produces
>>> df_concat_bad
A B 0
0 1.0 2.0 NaN
1 3.0 4.0 NaN
A NaN NaN 4.0
B NaN NaN 6.0
Apparently, with df.append the Series object got interpreted as a row, but with pd.concat it got interpreted as a column.
You cannot fix this with something like calling pd.concat with axis=1, because that would add the totals as column:
>>> pd.concat([df, totals], axis=1)
A B totals
0 1.0 2.0 NaN
1 3.0 4.0 NaN
A NaN NaN 4.0
B NaN NaN 6.0
(In this case, the result looks the same as using the default axis=0, because the indexes of df and totals are disjoint, as are their column names.)
How to handle this (elegantly and efficiently)?
The solution is to convert totals (a Series object) to a DataFrame (which will then be a column) using to_frame() and next transpose it using T:
df_concat_good = pd.concat([df, totals.to_frame().T])
yields the desired
>>> df_concat_good
A B
0 1 2
1 3 4
totals 4 6
I prefer to use df.loc() to solve this problem than pd.concat()
df.loc["totals"]=df.sum()
I have two pandas DataFrames (df1, df2) with a different number of rows and columns and some matching values in a specific column in each df, with caveats (1) there are some unique values in each df, and (2) there are different numbers of matching values across the DataFrames.
Baby example:
df1 = pd.DataFrame({'id1': [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 6, 6]})
df2 = pd.DataFrame({'id2': [1, 1, 2, 2, 2, 2, 3, 4, 5],
'var1': ['B', 'B', 'W', 'W', 'W', 'W', 'H', 'B', 'A']})
What I am seeking to do is create df3 where df2['id2'] is aligned/indexed to df1['id1'], such that:
NaN is added to df3[id2] when df2[id2] has fewer (or missing) matches to df1[id1]
NaN is added to df3[id2] & df3[var1] if df1[id1] exists but has no match to df2[id2]
'var1' is filled in for all cases of df3[var1] where df1[id1] and df2[id2] match
rows are dropped when df2[id2] has more matching values than df1[id1] (or no matches at all)
The resulting DataFrame (df3) should look as follows (Notice id2 = 5 and var1 = A are gone):
id1
id2
var1
1
1
B
1
1
B
1
NaN
B
2
2
W
2
2
W
3
3
H
3
NaN
H
3
NaN
H
3
NaN
H
4
4
B
6
NaN
NaN
6
NaN
NaN
I cannot find a combination of merge/join/concatenate/align that correctly solves this problem. Currently, everything I have tried stacks the rows in sequence without adding NaN in the proper cells/rows and instead adds all the NaN values at the bottom of df3 (so id1 and id2 never align). Any help is greatly appreciated!
You can first assign a helper column for id1 and id2 based on groupby.cumcount, then merge. Finally ffill values of var1 based on the group id1
def helper(data,col): return data.groupby(col).cumcount()
out = df1.assign(k = helper(df1,['id1'])).merge(df2.assign(k = helper(df2,['id2'])),
left_on=['id1','k'],right_on=['id2','k'] ,how='left').drop('k',1)
out['var1'] = out['id1'].map(dict(df2[['id2','var1']].drop_duplicates().to_numpy()))
Or similar but without assign as HenryEcker suggests :
out = df1.merge(df2, left_on=['id1', helper(df1, ['id1'])],
right_on=['id2', helper(df2, ['id2'])], how='left').drop(columns='key_1')
out['var1'] = out['id1'].map(dict(df2[['id2','var1']].drop_duplicates().to_numpy()))
print(out)
id1 id2 var1
0 1 1.0 B
1 1 1.0 B
2 1 NaN B
3 2 2.0 W
4 2 2.0 W
5 3 3.0 H
6 3 NaN H
7 3 NaN H
8 3 NaN H
9 4 4.0 B
10 6 NaN NaN
11 6 NaN NaN
I am working with python3 and pandas, and I would like to pass : as a function parameter to state all rows in a slice passed to df.loc.
For example, say I have a function that fills na values like so:
def fill_na_w_value(df, rows, columns, fill):
for col in columns:
df.loc[rows, columns].fillna(
fill,
inplace=True
)
Sometimes I may not want to apply it to some rows but to apply it to all rows, in pandas this is accessed with df.loc[:, col]
If I am calling this from a function it would like
fill_na_w_value(df, :, ['col1'], 0)
But the above will give me a syntax error because of the :; how can I pass this as a function parameter?
Use slice(None) to represent :. Note you can use pipe to pass your dataframe through a function, and loc accepts a list for row and index filtering:
df = pd.DataFrame({'col1': [1, 2, np.nan, 4, 5, np.nan, 7, 8, np.nan]})
def fill_na_w_value(df, row_slicer, columns, value):
df.loc[row_slicer, columns] = df.loc[row_slicer, columns].fillna(value)
return df
df1 = df.pipe(fill_na_w_value, slice(None), ['col1'], 0)
print(df1)
col1
0 1.0
1 2.0
2 0.0
3 4.0
4 5.0
5 0.0
6 7.0
7 8.0
8 0.0
Here's an example using a list instead of a slice object:
df2 = df.pipe(fill_na_w_value, [2, 5], ['col1'], 0)
print(df2)
col1
0 1.0
1 2.0
2 0.0
3 4.0
4 5.0
5 0.0
6 7.0
7 8.0
8 NaN
I have a data sets which has column with missing value 2439.
But the missing value is such that for specific index has some missing value and some fill value as shown below (Compare column 'Item_Identifier' and 'Item_Weight')
If carefully seen for specific item_identifier, there missing value in item_weight. Like this there many more Item_Identifier which missing value. Is there any way using python we fill missing value for only item_weight as same.
You can make the table into a pandas DataFrame then df['item_weight'].fillna(15.5, inplace=True)
Reproducible example:
df = pd.DataFrame({'col1': ['a', 'a', 'b','b', 'b', 'c'],
'col2': [10, np.nan, np.nan, np.nan, 20, 30]})
col1 col2
0 a 10.0
1 a NaN
2 b NaN
3 b NaN
4 b 20.0
5 c 30.0
You can groupby your col1 and agg using first
vals = df.groupby('col1').agg('first')
col2
col1
a 10.0
b 20.0
c 30.0
Then just use same indexing and fillna() to match and fill the values
df = df.set_index('col1').fillna(vals).reset_index()
col1 col2
0 a 10.0
1 a 10.0
2 b 20.0
3 b 20.0
4 b 20.0
5 c 30.0
I have a simple dataframe with 2 columns and 2rows.
I also have a list of 4 numbers.
I want to concatenate this list to the FIRST column of the dataframe, and only the first. So the dataframe will have 6rows in the first column, and 2in the second.
I wrote this code:
df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
numbers = [5, 6, 7, 8]
for i in range(0, 4):
df1['A'].loc[i + 2] = numbers[i]
print(df1)
It prints the original dataframe oddly enough. But when I debug and evaluate the expression df1['A'] then it does show the new numbers. What's going on here?
It's not just that it's printing the original df, it also writes the original df to csv when I use to_csv method.
It seems you need:
for i in range(0, 4):
df1.loc[0, i] = numbers[i]
print (df1)
A B 0 1 2 3
0 1 2 5.0 6.0 7.0 8.0
1 3 4 NaN NaN NaN NaN
df1 = pd.concat([df1, pd.DataFrame([numbers], index=[0])], axis=1)
print (df1)
A B 0 1 2 3
0 1 2 5.0 6.0 7.0 8.0
1 3 4 NaN NaN NaN NaN