Count instances in a dataframe [duplicate] - python

This question already has answers here:
Pandas, group by count and add count to original dataframe?
(3 answers)
Closed 3 years ago.
I have a dataframe containing a column of values (X).
df = pd.DataFrame({'X' : [2,3,5,2,2,3,7,2,2,7,5,2]})
For each row, I would like to find how many times it's value of X appears (A).
My expected output is:

create temp column with 1 and groupby and count to get your desired answer
df = pd.DataFrame({'X' : [2,3,5,2,2,3,7,2,2,7,5,2]})
df['temp'] = 1
df['count'] = df.groupby(['X'],as_index=False).transform(pd.Series.count)
del df['temp']
print(df)

Related

Split commas separeted cell in pandas dataframe into different columns [duplicate]

This question already has answers here:
Pandas split column into multiple columns by comma
(7 answers)
Closed 1 year ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
How do I split the comma-separated string to new columns
Expected output
Source Target Weight
0 Majed Moqed Majed Moqed 0
Try this:
df['Source'] = df['(Source, Target, Weight)'].split(',')[0]
df['Target'] = df['(Source, Target, Weight)'].split(',')[1]
df['Weight'] = df['(Source, Target, Weight)'].split(',')[2]
Try this:
col = '(Source, Target, Weight)'
df = pd.DataFrame(df[col].str.split(',').tolist(), columns=col[1:-1].split(', '))
You can also do:
col = '(Source, Target, Weight)'
df[col.strip('()').split(', ')] = df[col].str.split(',', expand=True)

change a value based on other value in dataframe [duplicate]

This question already has answers here:
Pandas DataFrame: replace all values in a column, based on condition
(8 answers)
Conditional Replace Pandas
(7 answers)
Closed 1 year ago.
If product type == option, I replace the value in the PRICE column with the value of the STRIKE column.
How can I do this without using the for loop? (to make it faster)
Now I have the following but it's slow:
for i in range(df.shape[0]):
if df.loc[i,'type'] == 'Option:
df.loc[i,'PRICE'] = df.loc[i,'STRIKE']
Use .loc in a vectorized fashion
df.loc[df['type'] == 'Option', 'PRICE'] = df['STRIKE']
mask = (df.type == 'Option')
df[mask].PRICE = df[mask].STRIKE
see:
https://www.geeksforgeeks.org/boolean-indexing-in-pandas/

How to cast pandas series into dataframe [duplicate]

This question already has answers here:
Python dataframe replace last n rows with a list of n elements
(2 answers)
df.append() is not appending to the DataFrame
(2 answers)
Closed 1 year ago.
I'm trying to cast a series of 20 values at the end of a dataframe with more than 20 rows.
The original values are coming from a numpy array 'Y_pred':
[[3495.47227957]
[3493.27865109]
[3491.08502262]
[3488.89139414]
[3486.69776567]
[3484.50413719]
[3482.31050871]
[3480.11688024]
[3477.92325176]
[3475.72962329]
[3473.53599481]
[3471.34236633]
[3469.14873786]
[3466.95510938]
[3464.7614809 ]
[3462.56785243]
[3460.37422395]
[3458.18059548]
[3455.986967 ]
[3453.79333852]]
creating column Y_pred and trying to cast the converted series:
df['Y_pred'] = np.nan
df.Y_pred.iloc[-len(Y_pred):].append(pd.Series({'Y_pred': Y_pred}), ignore_index=True)
result is that all rows are NaN
I tried as well this:
series = pd.Series(Y_pred[:, 0])
df.Y_pred.iloc[-20:].append(series, ignore_index=True)
and
df['Y_pred'].append(Y_pred)
nothing works. How to do it properly?

python split pandas numeric vector column into multiple columns [duplicate]

This question already has answers here:
Split a Pandas column of lists into multiple columns
(11 answers)
Closed 4 years ago.
I have a dataframe in pandas, with a column which is a vector:
df = pd.DataFrame({'ID':[1,2], 'Averages':[[1,2,3],[4,5,6]]})
and I wish to split and divide it into elements which would look like this:
df2 = pd.DataFrame({'ID':[1,2], 'A':[1,4], 'B':[2,5], 'C':[3,6]})
I have tried
df['Averages'].astype(str).str.split(' ') but with no luck. any help would be appreciated.
pd.concat([df['ID'], df['Averages'].apply(pd.Series)], axis = 1).rename(columns = {0: 'A', 1: 'B', 2: 'C'})
This will work:
df[['A','B','C']] = pd.DataFrame(df.averages.values.tolist(), index= df.index)

Python data frames - how to select all columns that have a specific substring in their name [duplicate]

This question already has answers here:
Find column whose name contains a specific string
(8 answers)
Closed 7 years ago.
in Python I have a data frame (df) that contains columns with the following names A_OPEN, A_CLOSE, B_OPEN, B_CLOSE, C_OPEN, C_CLOSE, D_ etc.....
How can I easily select only the columns that contain _CLOSE in their name? A,B,C,D,E,F etc can have any value so I do not want to use the specific column names
In SQL this would be done with the like operator: df[like'%_CLOSE%']
What's the python way?
You could use a list comprehension, e.g.:
df[[x for x in df.columns if "_CLOSE" in x]]
Example:
df = pd.DataFrame(
columns = ['_CLOSE_A', '_CLOSE_B', 'C'],
data = [[2,3,4], [3,4,5]]
)
Then,
>>>print(df[[x for x in df.columns if "_CLOSE" in x]])
_CLOSE_A _CLOSE_B
0 2 3
1 3 4

Categories

Resources