Let's say I have a DataFrame and don't know the names of all columns. However, I know there's a column called "N_DOC" and I want this to be the first column of the DataFrame - (while keeping all other columns, regardless its order).
How can I do this?
You can reorder the columns of a datframe with reindex:
cols = df.columns.tolist()
cols.remove('N_DOC')
df.reindex(['N_DOC'] + cols, axis=1)
Use DataFrame.insert with DataFrame.pop for extract column:
df = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'N_DOC':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
c = 'N_DOC'
df.insert(0, c, df.pop(c))
Or:
df.insert(0, 'N_DOC', df.pop('N_DOC'))
print (df)
N_DOC A B C E F
0 1 a 4 7 5 a
1 3 b 5 8 3 a
2 5 c 4 9 6 a
3 7 d 5 4 9 b
4 1 e 5 2 2 b
5 0 f 4 3 4 b
Here's a simple, one line, solution using DataFrame masking:
import pandas as pd
# Building sample dataset.
cols = ['N_DOCa', 'N_DOCb', 'N_DOCc', 'N_DOCd', 'N_DOCe', 'N_DOC']
df = pd.DataFrame(columns=cols)
# Re-order columns.
df = df[['N_DOC'] + df.columns.drop('N_DOC').tolist()]
Before:
Index(['N_DOCa', 'N_DOCb', 'N_DOCc', 'N_DOCd', 'N_DOCe', 'N_DOC'], dtype='object')
After:
Index(['N_DOC', 'N_DOCa', 'N_DOCb', 'N_DOCc', 'N_DOCd', 'N_DOCe'], dtype='object')
Related
I am new to Pandas and I have this below query, I have two dataframes df1 & df2
the df1 is an empty dataframe with 3 columns and df2 has 5 columns with some records.
Df1 Ex below
A
B
C
Df2 Ex
A
B
D
C
E
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
If the column name matches, I want to copy all rows from df2 to df1.
like below
DF1
A
B
C
1
2
4
1
2
4
1
2
4
it can be df1 or it can be a new data frame. Kindly help me with this query
For general solution for filter of intersection of columns between both DataFrames use Index.intersection:
Df1 = Df2.reindex(Df1.columns.intersection(Df2.columns, sort=False), axis=1)
If always exist all columns from Df1.columns in Df2.columns use:
Df1 = Df2[Df1.columns]
No need to initialize an empty DataFrame, you can directly use:
df1 = df2[df1.columns].copy()
can I use dataframe.set_index with the index of the column or it only works with the name of the column??
Example:
df4 = df.set_index(0).T instead of df4 = df.set_index('Parametres').T
thank you
If want create new index by first column use indexing:
df = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
})
print (df.columns[0])
A
df = df.set_index(df.columns[0])
print (df)
B C
A
a 4 7
b 5 8
c 4 9
d 5 4
e 5 2
f 4 3
I want to select columns with a specific value (say 1) in a specific row (say first row) for Pandas Dataframe
you can use this
df['a'][df['a']==0]
Use iloc with boolean indexing, for performance is better filtering index not DataFrame and then select index (see performance):
df = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
s = df.iloc[0]
a = s.index[s == 1]
print (a)
Index(['D'], dtype='object')
a = s.index.values[(s == 1)]
print (a)
['D']
You can use iloc to extract a row as a series, then apply your condition:
row = df.iloc[0] # extract first row as series
res = row[res == 1].index # filter for values equal to 1 and get columns via index
I need to merge 1 df with 1 csv.
df1 contains only 1 columns (id list of the product I want to update)
df2 contains 2 columns (id of all the products, quantity)
df1=pd.read_csv(id_file, header=0, index_col=False)
df2 = pd.DataFrame(data=result_q)
df3=pd.merge(df1, df2)
What I want: a dataframe that contains only id from csv/df1 merge with the quantities of df2 for the same id
if you want only the products that ya have in first data_frame you can use this:
df_1
Out[11]:
id
0 1
1 2
2 4
3 5
df_2
Out[12]:
id prod
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
6 7 g
7 8 h
df_3 = df_1.merge(df_2,on='id')
df_3
Out[14]:
id prod
0 1 a
1 2 b
2 4 d
3 5 e
you neede use the parameter on='column' so the will generate a new df only with the correspondent rows that have the same id.
you can use new_df= pd.merge(df1,df2, on=['Product_id'])
I've found the solution. I needed to reset the index for my df2
df1=pd.read_csv(id_file)
df2 = pd.DataFrame(data=result_q).reset_index()
df1['id'] = pd.to_numeric(df1['id'], errors = 'coerce')
df2['id'] = pd.to_numeric(df2['id'], errors = 'coerce')
df3=df1.merge(df2, on='id')
Thank you everyone!
I have a pandas dataframe with multiple columns and I want to "flatten" it to just two columns - one with column name and the other with values. E.g.
df1 = pd.DataFrame({'A':[1,2],'B':[2,3], 'C':[3,4]})
How can I convert it to look like:
df2 = pd.DataFrame({'column name': ['A','A','B','B','C','C'], 'value': [1,2,2,3,3,4]})
You can stack to stack all column values into a single, column, then drop the first level index calling reset_index, overwrite the column names with the ones you desire and then finally sort using sort_values:
In [37]:
df2 = df1.stack().reset_index(level=0, drop=True).reset_index()
df2.columns = ['column name', 'value']
df2.sort_values(['column name', 'value'], inplace=True)
df2
Out[37]:
column name value
0 A 1
3 A 2
1 B 2
4 B 3
2 C 3
5 C 4
You can reshape by stack to MultiIndex Series and then reset_index with sort_values:
df2 = df1.stack().reset_index(level=0, drop=True).reset_index().sort_values('index')
df2.columns = ['column name','value']
print (df2)
column name value
0 A 1
3 A 2
1 B 2
4 B 3
2 C 3
5 C 4
One row solution with rename column index to column name:
df2 = df1.stack()
.reset_index(level=0, drop=True)
.reset_index(name='value')
.sort_values(['index'])
.rename(columns={'index':'column name'})
print (df2)
column name value
0 A 1
3 A 2
1 B 2
4 B 3
2 C 3
5 C 4
If need sort by both columns:
df2 = df1.stack().reset_index(level=0, drop=True).reset_index().sort_values(['index',0])
df2.columns = ['column name','value']
print (df2)
column name value
0 A 1
3 A 2
1 B 2
4 B 3
2 C 3
5 C 4