Iterate over rows/columns and extract values to a different dataframe [duplicate] - python

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 5 months ago.
df
df1
dfsum
using df-column'code', i want to reference to df1 and return column 'title' & 'cu' values to dfsum

if both df have the same size the you can iterate just like a regular matrix
# go through the rows
for row in range(total_rows):
# go through the columns
for column in range(total_columns):
#make the condition if they match
if df[row][column] == df1[row][column]:
# now just assign the value from df1 to df
df[row][column] = df1[row][column]
i hope this solves your issue :)

Related

How to set a new index [duplicate]

This question already has answers here:
How to convert index of a pandas dataframe into a column
(9 answers)
Closed 1 year ago.
My df has the columns 'Country' and 'Country Code' as the current index. How can I remove this index and create a new one that just counts the rows? I´ll leave the picture of how it´s looking. All I want to do is add a new index next to Country. Thanks a lot!
If you are using a pandas DataFrame and your DataFrame is called df:
df = df.reset_index(drop=False)

Merging Pandas DFs and overwriting NaN [duplicate]

This question already has answers here:
How to remove nan value while combining two column in Panda Data frame?
(5 answers)
Closed 1 year ago.
I have two DFs that I am trying to merge on the column 'conId'.The DFs have different number of rows and the only other overlapping column is 'delta'.
I am using pf.merge(greek,on='conId',how='left')
The resulting DF is giving me columns 'delta_x' and 'delta_y'
how can I merge these two columns into one column?
Thank you!
You can use
df['delta_x'] = df['detlt_x'].fillna(df['delta_y'])
then drop column if you want
df.drop(['delta_y'], axis=1)

Get the specified set of columns from pandas dataframe [duplicate]

This question already has answers here:
How to select all columns whose names start with X in a pandas DataFrame
(11 answers)
Closed 2 years ago.
I manually select the columns in a pandas dataframe using
df_final = df[['column1','column2'.......'column90']]
Instead I provide the list of column names in a list by
dp_col = [col for col in df if col.startswith('column')]
But not sure how to use this list to get only those set of columns from the source dataframe.
You can use this as the list of columns to select, so:
df_final = df[[col for col in df if col.startswith('column')]]
The "origin" of the list of strings is of no importance, as long as you pass a list of strings to the subscript, this will normally work.
Use loc access with boolean masking:
df.loc[:, df.columns.str.startswith('column')]

merge duplicate rows by adding a column 'count' [duplicate]

This question already has answers here:
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
Closed 3 years ago.
I want to merge duplicate rows by adding a new column 'count'
Final dataframe that I want
rows can be in any order
You can use:
df["count"] = 1
df = df.groupby(["user_id", "item_id", "total"])["count"].count().reset_index()

Delete rows if there are null values in a specific column in Pandas dataframe [duplicate]

This question already has answers here:
Python: How to drop a row whose particular column is empty/NaN?
(2 answers)
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 4 years ago.
I'm new to python pandas. Need some help with deleting a few rows where there are null values. In the screenshot, I need to delete rows where charge_per_line == "-" using python pandas.
If the relevant entries in Charge_Per_Line are empty (NaN) when you read into pandas, you can use df.dropna:
df = df.dropna(axis=0, subset=['Charge_Per_Line'])
If the values are genuinely -, then you can replace them with np.nan and then use df.dropna:
import numpy as np
df['Charge_Per_Line'] = df['Charge_Per_Line'].replace('-', np.nan)
df = df.dropna(axis=0, subset=['Charge_Per_Line'])
Multiple ways
Use str.contains to find rows containing '-'
df[~df['Charge_Per_Line'].str.contains('-')]
Replace '-' by nan and use dropna()
df.replace('-', np.nan, inplace = True)
df = df.dropna()

Categories

Resources