I have a dataframe
C V S D LOC
1 2 3 4 X
5 6 7 8
1 2 3 4
5 6 7 8 Y
9 10 11 12
how can i select rows from loc X to Y and inport them in another csv
Use idxmax for first values of index where True in condition:
df = df.loc[(df['LOC'] == 'X').idxmax():(df['LOC'] == 'Y').idxmax()]
print (df)
C V S D LOC
0 1 2 3 4 X
1 5 6 7 8 NaN
2 1 2 3 4 NaN
3 5 6 7 8 Y
In [133]: df.loc[df.index[df.LOC=='X'][0]:df.index[df.LOC=='Y'][0]]
Out[133]:
C V S D LOC
0 1 2 3 4 X
1 5 6 7 8 NaN
2 1 2 3 4 NaN
3 5 6 7 8 Y
PS this will select all rows between first occurence of X and first occurence of Y
Related
I need to go through a large pd and select consecutive rows with similar values in a column. i.e. in the pd below and selecting column x: I want to specify consecutive values in column x? Say if I want consecutive values of 3 and 5 only
col row x y
1 1 1 1
5 7 3 0
2 2 2 2
6 3 3 8
9 2 3 4
5 3 3 9
4 9 4 4
5 5 5 1
3 7 5 2
6 6 6 6
5 8 6 2
3 7 6 0
The results output would be:
col row x y consecutive-count
6 3 3 8 1
9 2 3 4 1
5 3 3 9 1
5 5 5 1 2
3 7 5 2 2
I tried
m = df['x'].eq(df['x'].shift())
df[m|m.shift(-1, fill_value=False)]
But that includes the consecutive 6 that I don't want.
I also tried:
df.query( 'x in [3,5]')
That prints every row where x has 3 or 5.
IIUC use masks for boolean indexing. Check for 3 or 5, and use a cummax and reverse cummax to ensure having the order:
m1 = df['x'].eq(3)
m2 = df['x'].eq(5)
out = df[(m1|m2)&(m1.cummax()&m2[::-1].cummax())]
Output:
col row x y
2 6 3 3 8
3 9 2 3 4
4 5 3 3 9
6 5 5 5 1
7 3 7 5 2
you can create a group column for consecutive values, and filter by the group count and value of x:
# create unique ids for consecutive groups, then get group length:
group_num = (df.x.shift() != df.x).cumsum()
group_len = group_num.groupby(group_num).transform("count")
# filter main df:
df2 = df[(df.x.isin([3,5])) & (group_len > 1)]
# add new group num col
df2['consecutive-count'] = (df2.x != df2.x.shift()).cumsum()
output:
col row x y consecutive-count
3 6 3 3 8 1
4 9 2 3 4 1
5 5 3 3 9 1
7 5 5 5 1 2
8 3 7 5 2 2
I need to go through a large pd and select consecutive rows with similar values in a column. i.e. in the pd below and selecting column x:
col row x y
1 1 1 1
2 2 2 2
6 3 3 8
9 2 3 4
5 3 3 9
4 9 4 4
5 5 5 1
3 7 5 2
6 6 6 6
The results output would be:
col row x y
6 3 3 8
9 2 3 4
5 3 3 9
5 5 5 1
3 7 5 2
Not sure how to do this.
IIUC, use boolean indexing using a mask of the consecutive values:
m = df['x'].eq(df['x'].shift())
df[m|m.shift(-1, fill_value=False)]
Output:
col row x y
2 6 3 3 8
3 9 2 3 4
4 5 3 3 9
6 5 5 5 1
7 3 7 5 2
i have a dataframe looking like this:
A B....X
1 1 A
2 2 B
3 3 A
4 6 K
5 7 B
6 8 L
7 9 M
8 1 N
9 7 B
1 6 A
7 7 A
that is, some "rising edges" occur from time to time in the column X (in this example the edge is x==B)
What I need is, a new column Y which increments every time a value of B occurs in X:
A B....X Y
1 1 A 0
2 2 B 1
3 3 A 1
4 6 K 1
5 7 B 2
6 8 L 2
7 9 M 2
8 1 N 2
9 7 B 3
1 6 A 3
7 7 A 3
In SQL I would use some trick like sum(case when x=B then 1 else 0) over ... rows between first and previous. How can I do it in Pandas?
Use cumsum
df['Y'] = (df.X == 'B').cumsum()
Out[8]:
A B X Y
0 1 1 A 0
1 2 2 B 1
2 3 3 A 1
3 4 6 K 1
4 5 7 B 2
5 6 8 L 2
6 7 9 M 2
7 8 1 N 2
8 9 7 B 3
9 1 6 A 3
10 7 7 A 3
I need to create a new dataframe from an existing one by selecting multiple columns, and appending those column values to a new column with it's corresponding index as a new column
So, lets say I have this as a dataframe:
A B C D E F
0 1 2 3 4 0
0 7 8 9 1 0
0 4 5 2 4 0
Transform into this by selecting columns B through E:
A index_value
1 1
7 1
4 1
2 2
8 2
5 2
3 3
9 3
2 3
4 4
1 4
4 4
So, for the new dataframe, column A would be all of the values from columns B through E in the old dataframe, and column index_value would correspond to the index value [starting from zero] of the selected columns.
I've been scratching my head for hours. Any help would be appreciated, thanks!
Python3, Using pandas & numpy libraries.
#Another way
A B C D E F
0 0 1 2 3 4 0
1 0 7 8 9 1 0
2 0 4 5 2 4 0
# Select columns to include
start_colum ='B'
end_column ='E'
index_column_name ='A'
#re-stack the dataframe
df = df.loc[:,start_colum:end_column].stack().sort_index(level=1).reset_index(level=0, drop=True).to_frame()
#Create the "index_value" column
df['index_value'] =pd.Categorical(df.index).codes+1
df.rename(columns={0:index_column_name}, inplace=True)
df.set_index(index_column_name, inplace=True)
df
index_value
A
1 1
7 1
4 1
2 2
8 2
5 2
3 3
9 3
2 3
4 4
1 4
4 4
This is just melt
df.columns = range(df.shape[1])
s = df.melt().loc[lambda x : x.value!=0]
s
variable value
3 1 1
4 1 7
5 1 4
6 2 2
7 2 8
8 2 5
9 3 3
10 3 9
11 3 2
12 4 4
13 4 1
14 4 4
Try using:
df = pd.melt(df[['B', 'C', 'D', 'E']])
# Or df['variable'] = df[['B', 'C', 'D', 'E']].melt()
df['variable'].shift().eq(df['variable'].shift(-1)).cumsum().shift(-1).ffill()
print(df)
Output:
variable value
0 1.0 1
1 1.0 7
2 1.0 4
3 2.0 2
4 2.0 8
5 2.0 5
6 3.0 3
7 3.0 9
8 3.0 2
9 4.0 4
10 4.0 1
11 4.0 4
I am a beginner in Pandas so please bear with me. I know this is a very basic question/
I am working with pandas on the following dataframe :
x y w
1 2 5
1 2 7
3 4 3
5 4 8
3 4 5
5 9 9
And I want the following output :
x y w
1 2 5,7
3 4 2,5
5 4 8
5 9 9
Can Anyone tell me how to do it using pandas groupby.
You can use groupby with apply join:
#if type of column w is not string, convert it
print type(df.at[0,'w'])
<type 'numpy.int64'>
df['w'] = df['w'].astype(str)
print df.groupby(['x','y'])['w'].apply(','.join).reset_index()
x y w
0 1 2 5,7
1 3 4 3,5
2 5 4 8
3 5 9 9
If you have duplicates, use drop_duplicates:
print df
x y w
0 1 2 5
1 1 2 5
2 1 2 5
3 1 2 7
4 3 4 3
5 5 4 8
6 3 4 5
7 5 9 9
df['w'] = df['w'].astype(str)
print df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.drop_duplicates()))
.reset_index()
x y w
0 1 2 5,7
1 3 4 3,5
2 5 4 8
3 5 9 9
Or modified EdChum solution:
print df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str).drop_duplicates()))
.reset_index()
x y w
0 1 2 5,7
1 3 4 3,5
2 5 4 8
3 5 9 9
You can groupby on columns 'x' and 'y' and apply a lambda on the 'w' column, if required you need to cast the dtype using astype:
In [220]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str)))
Out[220]:
x y
1 2 5,7
3 4 3,5
5 4 8
9 9
Name: w, dtype: object
In [221]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str))).reset_index()
Out[221]:
x y w
0 1 2 5,7
1 3 4 3,5
2 5 4 8
3 5 9 9
EDIT
on your modified sample:
In [237]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.unique().astype(str))).reset_index()
Out[237]:
x y w
0 1 2 5,7
1 3 4 3,5
2 5 4 8
3 5 9 9