How can I rearrange a pandas dataframe into this specific configuration? - python

I'm trying to rearrange a pandas dataframe that looks like this: [![enter image description here][1]][1]
into a dataframe that looks like this:
[![enter image description here][2]][2]
This is derived in a way that for each original row, a number of rows are created where the first two columns are unchanged, the third column is which of the next original columns this new column is from, and the fourth column is the corresponding float value (e.g. 20.33333).
I don't think this is a pivot table, but I'm not sure how exactly to get this cleanly. Apologies if this question has been asked before, I can't seem to find what I'm looking for. Apologies also if my explanation or formatting were less than ideal! Thanks for your help.

I think you need DataFrame.melt with GroupBy.size if need counts values per 3 columns:
df1 = df.melt(id_vars=['CentroidID_O', 'CentroidID_D'], var_name='dt_15')
df2 = (df1.groupby(['CentroidID_O', 'CentroidID_D', 'dt_15'])
.size()
.reset_index(name='counts'))

Related

How add or merge duplicate rows and columns

I have a data frame with over 2000 rows, but in the first column, there are a number of duplicates. I want to add up the data in each duplicate together. An example of the data is seen below
enter image description here
Considering your case, I think this is what you are expecting which will squeeze the duplicate rows and returns single row and its sum
df2 = df.groupby('Player').sum()
print(df2)
You can also explicitly specify on which column you wanted to do a sum() operation. The below example applies the sum on the Gls column which is in the image you provided.
df2 = df.groupby('Player')['Gls'].sum()
print(df2)
Hope this helps your case, if not please feel free to comment. Thanks

How can I transfer columns of a table to rows in python?

One ID can have multiple dates and results and I want each date and result column stacked sideways to be stacked into 1 date and 1 result row. How can I transfer columns of a table to rows?
[Table which needs to be transposed]
enter image description here
[I want to change like this]
enter image description here
This seems to work, not sure if it's the best solution:
df2 = pd.concat([df.loc[:,['ID','Date','Result']],
df.loc[:,['ID','Date1','Result1']].rename(columns={'Date1':'Date','Result1':'Result'}),
df.loc[:,['ID','Date2','Result2']].rename(columns={'Date2':'Date','Result2':'Result'})
]).dropna().sort_values(by = 'ID')
It's just separating the dataframes, concatenating them together inline, removing the NAs and then sorting.
If you are looking to transpose data from pandas you could use pandas.DataFrame.pivot There are more examples there on the syntax.

Replacing one column onto another in python/pandas, but keeping the replaced columns values if the replacing column has a NaN value?

I have two columns in a data frame that I want to merge together. The attached image shows the columns:
Image of the two columns I want to merge
I want the "precio_uf_y" column to take precedent over the "precio_uf_x" column a new column, but if there is a NaN value in the "precio_uf_y" column I want the value in the "precio_uf_x" column to go to the new column. My ideal new merged column would look like this:
Desired new column
I have tried different merge functions, and taking min and max with numpy, but maybe there is a way to write a function with these parameters?
Thank you in advance for any help.
You can use df.apply.
def get_new_val(x):
if np.isnan(x.precio_uf_y):
return x.precio_uf_x
else:
return x.precio_uf_y
df["new_precio_uf"] = df.apply(get_new_val, axis=1)

I have pandas dataframe which i would like to be sliced after every 4 columns

I have pandas dataframe which i would like to be sliced after every 4 columns and then vertically stacked on top of each other which includes the date as index.Is this possible by using np.vstack()? Thanks in advance!
ORIGINAL DATAFRAME
Please refer the image for the dataframe.
I want something like this
WANT IT MODIFIED TO THIS
Until you provide a Minimal, Complete, and Verifiable example, I will not test this answer but the following should work:
given that we have the data stored in a Pandas DataFrame called df, we can use pd.melt
moltendfs = []
for i in range(4):
moltendfs.append(df.iloc[:, i::4].reset_index().melt(id_vars='date'))
newdf = pd.concat(moltendfs, axis=1)
We use iloc to take only every fourth column, starting with the i-th column. Then we reset_index in order to be able to keep the date column as our identifier variable. We use melt in order to melt our DataFrame. Finally we simply concatenate all of these molten DataFrames together side by side.

groupby based on conditions

I'm handling my data.
Here's my data.
I write my code like this.
complete_data = complete_data.groupby(['STDR_YM_CD', 'TRDAR_CD' ]).sum().reset_index()
I got the dataframe like below picture After executing the code
But I wanna aggregate the values based on the first three letters of characters in SVC_INDUTY_CD column like below picture.
Here is my data link
http://blogattach.naver.com/c356df6c7f2127fbd539596759bfc1bd1848b453f1/20170316_215_blogfile/khm2963_1489653338468_dtPz6k_csv/test2.csv?type=attachment
Thank in advance
I'm sure there's a better way but this is one way you could do this:
complete_data['first_three_temp'] = complete_data['SVC_INDUTY_CD'].str[:3]
complete_data = complete_data.groupby(['STDR_YM_CD', 'TRDAR_CD', 'first_three_temp' ], as_index=False).sum()
complete_data.drop('first_three_temp', axis=1, inplace=True)
This will add a temporary column containing only the first three characters of your SVC_INDUTY_CD column. You can then groupby on and drop the temporary column. As I said I'm sure there's a more efficient way so I'm not sure if you'll be limited by the size of your dataset.

Categories

Resources