This question already has answers here:
Python dataframe replace last n rows with a list of n elements
(2 answers)
df.append() is not appending to the DataFrame
(2 answers)
Closed 1 year ago.
I'm trying to cast a series of 20 values at the end of a dataframe with more than 20 rows.
The original values are coming from a numpy array 'Y_pred':
[[3495.47227957]
[3493.27865109]
[3491.08502262]
[3488.89139414]
[3486.69776567]
[3484.50413719]
[3482.31050871]
[3480.11688024]
[3477.92325176]
[3475.72962329]
[3473.53599481]
[3471.34236633]
[3469.14873786]
[3466.95510938]
[3464.7614809 ]
[3462.56785243]
[3460.37422395]
[3458.18059548]
[3455.986967 ]
[3453.79333852]]
creating column Y_pred and trying to cast the converted series:
df['Y_pred'] = np.nan
df.Y_pred.iloc[-len(Y_pred):].append(pd.Series({'Y_pred': Y_pred}), ignore_index=True)
result is that all rows are NaN
I tried as well this:
series = pd.Series(Y_pred[:, 0])
df.Y_pred.iloc[-20:].append(series, ignore_index=True)
and
df['Y_pred'].append(Y_pred)
nothing works. How to do it properly?
Related
This question already has answers here:
Pandas DataFrame: replace all values in a column, based on condition
(8 answers)
Conditional Replace Pandas
(7 answers)
Closed 1 year ago.
If product type == option, I replace the value in the PRICE column with the value of the STRIKE column.
How can I do this without using the for loop? (to make it faster)
Now I have the following but it's slow:
for i in range(df.shape[0]):
if df.loc[i,'type'] == 'Option:
df.loc[i,'PRICE'] = df.loc[i,'STRIKE']
Use .loc in a vectorized fashion
df.loc[df['type'] == 'Option', 'PRICE'] = df['STRIKE']
mask = (df.type == 'Option')
df[mask].PRICE = df[mask].STRIKE
see:
https://www.geeksforgeeks.org/boolean-indexing-in-pandas/
This question already has answers here:
Pandas sum across columns and divide each cell from that value
(5 answers)
Closed 3 years ago.
I want calculate division of each cell by sum of each row. Actually there are many column not only A and B.
import pandas as pd
data = pd.DataFrame({'A':[1,2,3,1,2,3,1],
'B':[4,5,6,4,5,6,4]]})
sum_row = data.sum(axis=1)
Here is an example of what I expect.
I think this should do the trick
import pandas as pd
data = pd.DataFrame({'A':[1,2,3,1,2,3,1],
'B':[4,5,6,4,5,6,4]})
data['sum_row'] = data.sum(axis=1)
for col in list(data.columns.values):
data[col + ' / Sum_Row'] = [data['A'].iloc[e] / data['sum_row'].iloc[e] for e in range(0, len(data['A']))]
This question already has answers here:
Pandas, group by count and add count to original dataframe?
(3 answers)
Closed 3 years ago.
I have a dataframe containing a column of values (X).
df = pd.DataFrame({'X' : [2,3,5,2,2,3,7,2,2,7,5,2]})
For each row, I would like to find how many times it's value of X appears (A).
My expected output is:
create temp column with 1 and groupby and count to get your desired answer
df = pd.DataFrame({'X' : [2,3,5,2,2,3,7,2,2,7,5,2]})
df['temp'] = 1
df['count'] = df.groupby(['X'],as_index=False).transform(pd.Series.count)
del df['temp']
print(df)
This question already has answers here:
Select columns using pandas dataframe.query()
(5 answers)
Closed 4 years ago.
I'm trying to use query on a MultiIndex column. It works on a MultiIndex row, but not the column. Is there a reason for this? The documentation shows examples like the first one below, but it doesn't indicate that it won't work for a MultiIndex column.
I know there are other ways to do this, but I'm specifically trying to do it with the query function
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((4,4)))
df.index = pd.MultiIndex.from_product([[1,2],['A','B']])
df.index.names = ['RowInd1', 'RowInd2']
# This works
print(df.query('RowInd2 in ["A"]'))
df = pd.DataFrame(np.random.random((4,4)))
df.columns = pd.MultiIndex.from_product([[1,2],['A','B']])
df.columns.names = ['ColInd1', 'ColInd2']
# query on index works, but not on the multiindexed column
print(df.query('index < 2'))
print(df.query('ColInd2 in ["A"]'))
To answer my own question, it looks like query shouldn't be used at all (regardless of using MultiIndex columns) for selecting certain columns, based on the answer(s) here:
Select columns using pandas dataframe.query()
You can using IndexSlice
df.query('ilevel_0>2')
Out[327]:
ColInd1 1 2
ColInd2 A B A B
3 0.652576 0.639522 0.52087 0.446931
df.loc[:,pd.IndexSlice[:,'A']]
Out[328]:
ColInd1 1 2
ColInd2 A A
0 0.092394 0.427668
1 0.326748 0.383632
2 0.717328 0.354294
3 0.652576 0.520870
This question already has answers here:
Find column whose name contains a specific string
(8 answers)
Closed 7 years ago.
in Python I have a data frame (df) that contains columns with the following names A_OPEN, A_CLOSE, B_OPEN, B_CLOSE, C_OPEN, C_CLOSE, D_ etc.....
How can I easily select only the columns that contain _CLOSE in their name? A,B,C,D,E,F etc can have any value so I do not want to use the specific column names
In SQL this would be done with the like operator: df[like'%_CLOSE%']
What's the python way?
You could use a list comprehension, e.g.:
df[[x for x in df.columns if "_CLOSE" in x]]
Example:
df = pd.DataFrame(
columns = ['_CLOSE_A', '_CLOSE_B', 'C'],
data = [[2,3,4], [3,4,5]]
)
Then,
>>>print(df[[x for x in df.columns if "_CLOSE" in x]])
_CLOSE_A _CLOSE_B
0 2 3
1 3 4