I have a dataframe with 78 columns, but i want to melt just 10 consecutives. Is there any way to select that columns range and leave others just like they are?
Related
I have a dataset that looks like this:
df = pd.DataFrame([[1,1,5,4],[1,1,6,3]], columns =['date','site','chemistry','measurement']) df
I'm looking to transform this dataset so that the values in the chemistry and measurement columns become separate columns and the repeated values in the other columns become a single row like this:
new_df = pd.DataFrame([[1,1,4,3]], columns=['date','site','5','6']) new_df
I've tried some basic things like df.transpose() pd.pivot() but this doesn't get me what I need.
The pivot is closer but still not the format I'm looking for.
I'm imagining there's a way to loop through the dataframe to this but I'm not sure how to do it. Any suggestions?
Try this:
df.set_index(['date','site','chemistry'])['measurement'].unstack().reset_index()
Output:
chemistry date site 5 6
0 1 1 4 3
I am aware of the function DataFrame.dropna(subset), where subset argument can be used to remove nan rows only from the given set of columns.
What I want is to remove nan rows from columns excluding a set of columns. Is there a way to do this in pandas ?
Use Index.difference with list of columns for exclude:
df = df.dropna(subset=df.columns.difference(exclude_columns)))
This question already has answers here:
Pandas: sum DataFrame rows for given columns
(8 answers)
Closed 4 years ago.
I want add the row values of different three columns in pandas. like
dctr mctr tctr
100 20 10
20 90 70``
30 10 80
40 05 120
50 20 60
I want add these three columns by rows values to total_ctr. Here what type of comment want to be used in pandas.??
Like this I have seven total values and I want to add these seven different values into a new dataframe. Is that possible. Likewise "total_ctr", "total_cpc", "total_avg", "total_cost" and so on. I want to make this as a new dataframe from these total values
I know there's a similar question on sum of rows, but I've not managed to get that one to work for this problem.
This will work, assuming above is dataframe named df
df['total_ctr'] = df.sum(axis=1)
I have two DataFrames with the first df:
indegree interrupts Subject
1 2 Weather
2 3 Weather
4 5 Weather
The second join:
Subject interrupts_mean indegree_mean
weather 2 3
But the second is a lot shorter since I made that the means of all the different subjects in the first dataframe.
When I want to merge both DataFrames
pd.merge(df,join,left_index=True,right_index=True,how='left')
it merges but it gives NaNs on the second dataframe in the new dataframe and I suppose it it so since the DataFrames are not the same length. How can I still merge on subject so that the values from the second DataFrame are duplicated in the new DataFrame?
I'm creating columns with aggregated values with the data from Pandas Dataframe using groupby() and reset_index() functions like that:
df=data.groupby(["subscription_id"])["count_boxes"].sum().reset_index(name="amount_boxes")
df1=data.groupby(["subscription_id"])["product"].count().reset_index(name="count_product")
Want to combine all these aggregated columns ("amount_boxes" and "count_product") in one dataframe with groupby column "subscription_id". Is there any way to do that ithin a function rather than merging the dataframes?
Let's look at using .agg with a dictionary of column and aggregation function.
(df.groupby('Subscription_id')
.agg({'count_boxes':'sum','product':'count'})
.reset_index()
.rename(columns={'count_boxes':'amount_boxes','product':'count_product'}))
Sample Output:
Subscription_id amount_boxes count_product
0 1 16 2
1 2 39 6
2 3 47 7