Make two different dataframes using parent dataframe - python

I want to split one dataframe into two different data frames based on one of the columns value
Eg: df(parents dataframe)
df has a column MODE with values swiggy , zomato
df1 with all the columns which has common with MODE = swiggy
df2 with all the columns which has common with MODE= Zomato
I know its simple, I am beginner, Please help. Thanks.

df1 = df[df['MODE'] == 'swiggy'] and df2 = df[df['MODE'] == 'Zomato'].
This way, you will be filtering the dataframe based on the MODE column and assigning the resulting dataframe to new variables.

Related

Merge two dataframes one common column

I have two data frames, one with three columns and another with two columns. Two columns are common in both data frames:
enter image description here
I have to update the Marks column of df1 from df2 where the data is missing only and keep the existing value as same in the df1.
I have tried pd.merge but the result created a separate column which was not intended.
Following worked for me:
df1['Mark'] = df1.Marks_x.combine_first(df1.Marks_y)
df1['Marks_x'] = df1['Mark']
df1 = df1.drop(['Marks_y', 'Mark'], axis=1)
df1 = df1.rename(columns = {'Marks_x':'Marks'})

Using pandas I need to create a new column that takes a value from a previous row

I have many rows of data and one of the columns is a flag. I have 3 identifiers that need to match between rows.
What I have:
partnumber, datetime1, previousdatetime1, datetime2, previousdatetime2, flag
What I need:
partnumber, datetime1, previousdatetime1, datetime2, previousdatetime2, flag, previous_flag
I need to find flag from the row where partnumber matches, and where the previousdatetime1(current row*) == datetime1(other row)*, and the previousdatetime2(current row) == datetime2(other row).
*To note, the rows are not necessarily in order so the previous row may come later in the dataframe
I'm not quite sure where to start. I got this logic working in PBI using a LookUpValue and basically finding where partnumber = Value(partnumber), datetime1 = Value(datetime1), datetime2 = Value(datetime2). Thanks for the help!
Okay, so assuming you've read this in as a pandas dataframe df1:
(1) Make a copy of the dataframe:
df2=df1.copy()
(2) For sanity, drop some columns in df2
df2.drop(['previousdatetime1','previousdatetime2'],axis=1,inplace=True)
Now you have a df2 that has columns:
['partnumber','datetime1','datetime2','flag']
(3) Merge the two dataframes
newdf=df1.merge(df2,how='left',left_on=['partnumber','previousdatetime1'],right_on=['partnumber','datetime1'],suffixes=('','_previous'))
Now you have a newdf that has columns:
['partnumber','datetime1','previousdatetime1','datetime2','previousdatetime2','flag','partnumber_previous','datetime1_previous','datetime2_previous','flag_previous']
(4) Drop the unnecessary columns
newdf.drop(['partnumber_previous', 'datetime1_previous', 'datetime2_previous'],axis=1,inplace=True)
Now you have a newdf that has columns:
['partnumber','datetime1','previousdatetime1','datetime2','previousdatetime2','flag','flag_previous']

Using select() on all existing columns AND using a list to generate new lit columns

I have an existing df with n columns and I need to also add thousands of lit columns for a later application. Because there is so many columns to be added, I cannot use a for loop of withColumn().
Is it possible to combine the following two .select() functions?
df1 = df.select([lit(f"{i}").alias(f"{i}") for i in test_list])
df2 = df.select("*")
This didn't seem to work as well as a few other variations
df1 = df.select("*", [lit(f"{i}").alias(f"{i}") for i in test_list])
we can use df.columns which provides list of existing columns and append with lit columns.
df1 = df.select(df.columns+[lit(f"{i}").alias(f"{i}") for i in test_list])

changing row values in a dataframe by looking into another dataframe [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed last year.
I have a look up table as a dataframe (1000 rows) consisting of codes and labels. I have another dataframe (2,00,000 rows) consisting of codes and geometries.
I need to get label names for each corresponding code by looking in the look up dataframe.
Output should be dataframe.
I tried it as follows.
df = pd.read_csv(filepath)
codes = df['codes'].values
labels = df['labels'].values
df2 = pd.read_csv(filepath)
print (df2.shape)
for ix in df2.index:
code = df2.loc[ix, 'code']
df2.loc[ix, 'label'] = labels[codes==code][0]
print (df2)
Result is correct, but it's very slow... for looping is very slow
Can you help me?
You should use the merge method of DataFrames (https://pandas.pydata.org/docs/reference/api/pandas.merge.html). It allows to join two dataframes based on a common column. Your code should look like this:
df2 = df2.merge(df, left_on="code", right_on="codes", how="left")
# Check labels using df2["labels"]
The common column name is specified in the parameters left_on and right_on. The parameter how='left' indicates that all the rows from df2 are preserved even if there is no code for a row.

Is there a function that can remove multiple rows based on multiple specific column values in a pandas dataframe?

I have a particular Pandas dataframe that has multiple different string categories in a particular column - 'A'. I want to create a new dataframe with only rows that contain 7 separate categories from column A out of about 15.
I know that I can individually remove/add categories using:
df1 = df[df.Category != 'a']
but I also tried using a list to try and do it in a single line, like such:
df1 = df[df.Category = ['x','y','z']]
but that gave me a syntax error. Is there any way to perform this function?
try:
df1 = df[df.Category.isin(['x','y','z'])]

Categories

Resources