I have a dataset and I would like to merge the two first column and the two next and so on.
You didn't show your column names there for I have put random names into your columns. When you assign this dataset to pandas dataframe I assume your dataframe variable is df
In [2]: df
Out[2]:<your dataset>
First get sum of first two columns and assign it into single column
In [3]:df['Total1'] = df['first_column'] + df['Second_column']
Then we get sum of Third and forth column and assign it into another single column
In [4]:df['Total2'] = df['Third_column'] + df['Fourth_column']
All are complete then you can run this
In [5]:df
Out[5]:<your dataset with Total1 and Total2 columns>
Hope it will help you!
Related
I have a dataframe with columns that are a string of blanks (null/nan set to 0) with sporadic number values.
I am tying to compare the last two non-zero values in a data frame column.
Something like :
df['Column_c'] = df[column_a'].last_non_zero_value > df[column_a'].second_to_last_non_zero_value
This is what the columns look like in excel
You could drop all the rows with missing data using pd.df.dropna() and then access the last row in the dataframe index and have it return the values as an array which should be easy to find the last two elements in.
I have a pandas dataframe that I want the numbers of the column C to be added together and created a new column D.
For example
Thanks in advance.
Use Series.str.extractall for get numbers separately, convert to integers and last sum per first level of MultiIndex:
df['D'] = df['C'].str.extractall('(\d)').astype(int).sum(level=0)
I am having a problem with replacing a specific column using its index in a dataframe with a new dataframe that consists of only 1 column given that they both have the same length
I need to replace the column only knowing its index as I am choosing a random column to replace in the dataframe df that contains 8 columns with the new dataframe df_temp that only has 1 column
N=random.randint(1,8)
df.iloc(: , [N - 1]) = df_temp.values
This gives me syntax error I don't know if I am using the .iloc wrong or there is an alternative way to do that.
I am not clearly understand, but can you try it:
df.iloc[:, [N-1]] = df_temp.values
I have a dataframe of two columns Stock and DueDate, where I need to select first row from the repeated consecutive entries based on stock column.
df:
I am expecting output like below,
Expected output:
My Approach
The approach I tried to use is to first list out what all rows repeating based on stock column by creating a new column repeated_yes and then subset the first row only if any rows are repeating more than twice.
I have used the below line of code to create new column "repeated_yes",
ss = df.Stock.ne(df.Stock.shift())
df['repeated_yes'] = ss.groupby(ss.cumsum()).cumcount() + 1
so the new updated dataframe looks like this,
df_new
But I am stuck on subsetting only row number 3 and 8 inorder to attain the result. If there are any other effective approach it would be helpful.
Edited:
Forgot to include the actual full question,
If there are any other rows below the last row in the dataframe df it should not display any output.
Chain another mask created by Series.duplicated with keep=False by & for bitwise AND and filter in boolean indexing:
ss = df.Stock.ne(df.Stock.shift())
ss1 = ss.cumsum().duplicated(keep=False)
df = df[ss & ss1]
I have two data frames like this: The first has one column and 720 rows (dataframe A), the second has ten columns and 720 rows(dataframe B). The dataframes contain only numerical values.
I am trying to compare them this way: I want to go through each column of dataframe B and compare each cell(row) of that column to the corresponding row in dataframe A .
(Example: For the first column of dataframe B I compare the first row to the first row of dataframe A, then the second row of B to the second row of A etc.)
Basically I want to compare each column of dataframe B to the single column in dataframe A, row by row.
If the the value in dataframe B is smaller or equal than the value in dataframe A, I want to add +1 to another dataframe (or list, depending on how its easier). In the end, I want to drop any column in dataframe B that doesnt have at least one cell to satisfy the condition (basically if the value added to the list or new dataframe is 0).
I tried something like this (written for a single row, I was thinking of creating a for loop using this) but it doesn't seem to do what I want:
DfA_i = pd.DataFrame(DA.iloc[i])
DfB_j = pd.DataFrame(DB.iloc[j])
B = DfB_j.values
DfC['Criteria'] = DfA_i.apply(lambda x: len(np.where(x.values <= B)), axis=1)
dv = dt_dens.values
if dv[1] < 1:
DF = DA.drop(i)
I hope I made my problem clear enough and sorry for any mistakes. Thanks for any help.
Let's try:
dfB.loc[:, dfB.ge(dfA.values).any()]
Explanation: dfA.values returns the numpy array with shape (720,1). Then dfB.ge(dfA.values) check each column from dfB against that single column from dfA; this returns a boolean dataframe of same size with dfB. Finally .any() check along the columns of that boolean dataframe for any True.
how about this:
pd.DataFrame(np.where(A.to_numpy() <= B.to_numpy(),1,np.nan), columns=B.columns, index=A.index).dropna(how='all')
you and replace the np.nan in the np.where condition with whatever values you wish, including keeping the original values of dataframe 'B'