This question already has answers here:
Convert Pandas Column to DateTime
(8 answers)
Closed 2 years ago.
I have a dataset with one column that I want to change to date-time format. If I use this:
df = pd.to_datetime(df['product_first_sold_date'],unit='d',origin='1900-01-01')
df will only have this one particular column while all others are removed. Instead, I want to keep the remaining columns unchanged and just apply the to_datetime function to one column.
I tried using loc with multiple ways, including this:
df.loc[df['product_first_sold_date']] = pd.to_datetime(df['product_first_sold_date'],unit='d',origin='1900-01-01')
but it throws a key error.
How else can I achieve this?
df['product_first_sold_date'] = pd.to_datetime(df['product_first_sold_date'],unit='d',origin='1900-01-01')
should work i think
Related
This question already has answers here:
Reversing 'one-hot' encoding in Pandas
(9 answers)
Closed 1 year ago.
I've been trying to use reverse explode from here: How to implode(reverse of pandas explode) based on a column
But I have a little bit different df.
I have df looking like this:
I need to 'reverse explode' it, but I couldn't find any option to groupby by index. Is there any option to do that?
To be precise, I need all columns to remain, but all '1' should be combined in a row.
I merged dummy df with main df, but can not figure out what to do next.
rest_cuisine_style = pd.concat([rest_cuisine_style, cuisine_dummies], axis=1)
Does this work?
rest_cuisine_style = rest_cuisine_style.idxmax(axis=1)
This question already has answers here:
Pandas fill missing values in dataframe from another dataframe
(6 answers)
Closed 1 year ago.
I am trying to replace the values of my second dataframe ('area') with those of my first dataframe ('test').
Image of my inputs:
The catch is that I only want to replace the values that are not NaN, so, for example, area.iloc[0,1] will be "6643.68" rather than "3321.84" but area.iloc[-2,-1] will be "19.66" rather than "NaN". I would have thought I could do something like:
area.loc[test.notnull()] = test
or
area.replace(area.loc[test.notnull()], test.notnull())
But this gives me the error "Cannot index with multidimensional key". Any ideas? This should be simple.
Use fillna like:
area.fillna(test)
This question already has answers here:
Accessing a Pandas index like a regular column
(3 answers)
Closed 1 year ago.
I am working on this dataset where I have used the sort_values() function to get two distinct columns: the index and the values. I can even rename the index and the values columns. However, if I rename the dataset columns and assign everything to a new dataframe, I am not able to call the index column with the name that I assigned to it earlier.
pm_freq = df["payment_method"].value_counts()
pm_freq = pm_freq.head().to_frame()
pm_freq.index.name="Method"
pm_freq.rename(columns={"payment_method":"Frequency"},inplace=True)
pm_freq
Now I want to call it like this:
pm_freq["Method"]
But there's an error which states:
" KeyError: 'Method'"
Can someone please help me out with this?
Check out the comment here, not sure if still correct:
https://stackoverflow.com/a/18023468/15600610
This question already has an answer here:
Factorize a column of strings in pandas
(1 answer)
Closed 2 years ago.
I have a dataset consisting of 382 rows and 4 columns. I need to convert all the names in to numbers. The names do repeat here and there, so I can't just randomly give numbers.
So, I made a dictionary of the names and the corresponding values. But now, I am not able to change the values in the column.
This is how I tried to add the values to the column:
test_df.replace(to_replace = d_loc,value = None, regex = True, inplace = True)
print(test_df)
but test_df just gives me the same dataframe, without any modifications.
What should I use? I have over 100 unique names, so I cannot mannually rename them.
df.applymap() works on each item in a dataframe:
test_df.applymap(lambda x: dict_to_replace[x])
This question already has answers here:
How to take column-slices of dataframe in pandas
(11 answers)
Closed 6 years ago.
I have a dataframe with 85 columns and something like 10.000 rows.
The first column is Shrt_Desc and the last Refuse_Pct
The new data frame that I want has to have Shrt_Desc, then leave some columns out and then include in series Fiber_TD_(g) to Refuse_Pct
I use:
dfi_3 = food_info.loc[:, ['Shrt_Desc', 'Fiber_TD_(g)':'Refuse_Pct']]
but it gives a syntax error.
Any ideas how can I achieve this?
Thank you.
Borrowing the main idea from this answer:
pd.concat([food_info['Shrt_Desc'], food_info.ix[:, 'Fiber_TD_(g)':]], axis=1)