Is there a way to set column names for arguments as column index position, rather than column names?
Every example that I see is written with column names on value_vars. I need to use the column index.
For instance, instead of:
df2 = pd.melt(df,value_vars=['asset1','asset2'])
Using something similar to:
df2 = pd.melt(df,value_vars=[0,1])
Select columns names by indexing:
df = pd.DataFrame({
'asset1':list('acacac'),
'asset2':[4]*6,
'A':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4]
})
df2 = pd.melt(df,
id_vars=df.columns[[0,1]],
value_vars=df.columns[[2,3]],
var_name= 'c_name',
value_name='Value')
print (df2)
asset1 asset2 c_name Value
0 a 4 A 7
1 c 4 A 8
2 a 4 A 9
3 c 4 A 4
4 a 4 A 2
5 c 4 A 3
6 a 4 D 1
7 c 4 D 3
8 a 4 D 5
9 c 4 D 7
10 a 4 D 1
11 c 4 D 0
Related
I am using pandas v0.25.3. and am inexperienced but learning.
I have a dataframe and would like to swap the contents of two columns leaving the columns labels and sequence intact.
df = pd.DataFrame ({"A": [(1),(2),(3),(4)],
'B': [(5),(6),(7),(8)],
'C': [(9),(10),(11),(12)]})
This yields a dataframe,
A B C
0 1 5 9
1 2 6 10
2 3 7 11
3 4 8 12
I want to swap column contents B and C to get
A B C
0 1 9 5
1 2 10 6
2 3 11 7
3 4 12 8
I have tried looking at pd.DataFrame.values which sent me to numpy array and advanced slicing and got lost.
Whats the simplest way to do this?.
You can assign numpy array:
#pandas 0.24+
df[['B','C']] = df[['C','B']].to_numpy()
#oldier pandas versions
df[['B','C']] = df[['C','B']].values
Or use DataFrame.assign:
df = df.assign(B = df.C, C = df.B)
print (df)
A B C
0 1 9 5
1 2 10 6
2 3 11 7
3 4 12 8
Or just use:
df['B'], df['C'] = df['C'], df['B'].copy()
print(df)
Output:
A B C
0 1 9 5
1 2 10 6
2 3 11 7
3 4 12 8
You can also swap the labels:
df.columns = ['A','C','B']
If your DataFrame is very large, I believe this would require less from your computer than copying all the data.
If the order of the columns is important, you can then reorder them:
df = df.reindex(['A','B','C'], axis=1)
In python, for data slicing in DataFrame in package pandas, .ix is already deprecated from pandas 0.20.0. The official website offers alternative solutions with either .loc or .iloc to do the hybrid selection (http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html). The .index could help extract multiple rows. By contrast, the columns.get_loc seems can only select one column at most. Is there an alternative function available that can be used to extract multiple columns in a hybrid manner using .iloc?
Yes, function is called Index.get_indexer and return position of columns or index by list of names.
Use it this way:
df = pd.DataFrame({
'a':[4,5,4,5,5,4],
'b':[7,8,9,4,2,3],
'c':[1,3,5,7,1,0],
'd':[5,3,6,9,2,4],
}, index=list('ABCDEF'))
print (df)
a b c d
A 4 7 1 5
B 5 8 3 3
C 4 9 5 6
D 5 4 7 9
E 5 2 1 2
F 4 3 0 4
cols = ['a','b','c']
df1 = df.iloc[1, df.columns.get_indexer(cols)]
print (df1)
a 5
b 8
c 3
Name: B, dtype: int64
df11 = df.iloc[[1], df.columns.get_indexer(cols)]
print (df11)
a b c
B 5 8 3
idx = ['A','C']
df2 = df.iloc[df.index.get_indexer(idx), 2:]
print (df2)
c d
A 1 5
C 5 6
The question was originally asked here as a comment but could not get a proper answer as the question was marked as a duplicate.
For a given pandas.DataFrame, let us say
df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]})
df
A B
0 5 1
1 6 2
2 3 3
3 4 5
How can we select rows from a list, based on values in a column ('A' for instance)
For instance
# from
list_of_values = [3,4,6]
# we would like, as a result
# A B
# 2 3 3
# 3 4 5
# 1 6 2
Using isin as mentioned here is not satisfactory as it does not keep order from the input list of 'A' values.
How can the abovementioned goal be achieved?
One way to overcome this is to make the 'A' column an index and use loc on the newly generated pandas.DataFrame. Eventually, the subsampled dataframe's index can be reset.
Here is how:
ret = df.set_index('A').loc[list_of_values].reset_index(inplace=False)
# ret is
# A B
# 0 3 3
# 1 4 5
# 2 6 2
Note that the drawback of this method is that the original indexing has been lost in the process.
More on pandas indexing: What is the point of indexing in pandas?
Use merge with helper DataFrame created by list and with column name of matched column:
df = pd.DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3,5]})
list_of_values = [3,6,4]
df1 = pd.DataFrame({'A':list_of_values}).merge(df)
print (df1)
A B
0 3 3
1 6 2
2 4 5
For more general solution:
df = pd.DataFrame({'A' : [5,6,5,3,4,4,6,5], 'B':range(8)})
print (df)
A B
0 5 0
1 6 1
2 5 2
3 3 3
4 4 4
5 4 5
6 6 6
7 5 7
list_of_values = [6,4,3,7,7,4]
#create df from list
list_df = pd.DataFrame({'A':list_of_values})
print (list_df)
A
0 6
1 4
2 3
3 7
4 7
5 4
#column for original index values
df1 = df.reset_index()
#helper column for count duplicates values
df1['g'] = df1.groupby('A').cumcount()
list_df['g'] = list_df.groupby('A').cumcount()
#merge together, create index from column and remove g column
df = list_df.merge(df1).set_index('index').rename_axis(None).drop('g', axis=1)
print (df)
A B
1 6 1
4 4 4
3 3 3
5 4 5
1] Generic approach for list_of_values.
In [936]: dff = df[df.A.isin(list_of_values)]
In [937]: dff.reindex(dff.A.map({x: i for i, x in enumerate(list_of_values)}).sort_values().index)
Out[937]:
A B
2 3 3
3 4 5
1 6 2
2] If list_of_values is sorted. You can use
In [926]: df[df.A.isin(list_of_values)].sort_values(by='A')
Out[926]:
A B
2 3 3
3 4 5
1 6 2
I want to create a new dataframe for each unique value of station.
I tried below which gives me only last station data updated in the dataframe = tai_new.i
tai['station'].unique() has 500 values.
for i in tai['station'].unique():
tai_new.i = tai[tai_2['station'] ==i]
Another approach is creating a separate list of
tai_stations = tai['station'].unique()
And then create two loops however, I do not want to type 500 (IF) conditions.
You can create dict of DataFrames by convert groupby object to tuples and then to dict:
dfs = dict(tuple(tai.groupby('station')))
Sample:
tai = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'station':list('aabbcc')})
print (tai)
A B C D E station
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 b
3 d 5 4 7 9 b
4 e 5 2 1 2 c
5 f 4 3 0 4 c
dfs = dict(tuple(tai.groupby('station')))
#select each DataFrame by key - name of station
print (dfs['a'])
A B C D E station
0 a 4 7 1 5 a
1 b 5 8 3 3 a
print (type(dfs['a']))
<class 'pandas.core.frame.DataFrame'>
Please use this
for i in tai['station'].unique():
tai_new[i] = tai[tai_2['station'] ==i]
assuming tai_new is a dict.
Is it possible to remove an index by name? I am removing and adding lots of indexes in multiple places throughout and I can see it getting quite confusing if I have to keep track of each level at all times.
I am creating a MultiIndex like so:
df = cass_df.groupby(['hash', 'campaign_id'])[['accepted', 'rejected', 'positive_impressions', 'negative_impressions', 'revenue']].sum()
Currently I am using this code to remove the first index:
df = df.reset_index(level=0)
Is it possible to do something like this?
df = df.reset_index(level='hash')
Yes. You can also add inplace=True. You can also remove double [] in aggaregate groupby:
df = pd.DataFrame({'A':[2,2,3],
'B':[1,1,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df)
A B C D E F
0 2 1 7 1 5 7
1 2 1 8 3 3 4
2 3 6 9 5 6 3
df = df.groupby(['A','B'])['C','D','E'].sum()
print (df)
C D E
A B
2 1 15 4 8
3 6 9 5 6
df.reset_index(level='A', inplace=True)
print (df)
A C D E
B
1 2 15 4 8
6 3 9 5 6