Set_index() on a Pandas DataFrame giving unexpected results

Set_index() on a Pandas DataFrame giving unexpected results - python

I asked this question before but the question was downgraded for being unclear. So I deleted it.
I hope that this re-worked version will be much clearer!
The buggy code is part of a much larger project so it's not so easy to create a minimum example, especially as I am still fairly new to Python can almost completely new to Pandas, but if required I will try.
All_holdings is part of the portfolio object. Looking at it in the variables window it appears to be a list of dictionaries (is this correct)?
As you can see from the code it is then converted into a pandas data frame called curve using.
curve = pd.DataFrame(self.all_holdings)
At this point the curve data frame includes the columns 'datetime' and 'total' both containing the correct values from the original list of dicts in self.all_holdings.
However after performing:
curve.set_index('datetime', inplace=True)
The 'datetime' column has disappeared and the column 'total' now has the 'datetime' values.
The original values of column 'total' have also disappeared?
I would have expected the 'datetime' column to become the index (but not for it's values to disappear) and everything else to stay the same?
Is this an issue of Python versions I am using 3.6 to his 2.7, also I'm using pandas 0.22.0 where as the example uses an unspecified earlier version.

I don't see any issue there. You did set an index on datetime. The total values are indexed on datetime too

Related

unable to sort excel values using pandas [duplicate]

New to Pandas, so maybe I'm missing a big idea?
I have a Pandas DataFrame of register transactions with shape like (500,4):
Time datetime64[ns]
Net Total float64
Tax float64
Total Due float64
I'm working through my code in a Python3 Jupyter notebook. I can't get past sorting any column. Working through the different code examples for sort, I'm not seeing the output reorder when I inspect the df. So, I've reduced the problem to trying to order just one column:
df.sort_values(by='Time')
# OR
df.sort_values(['Total Due'])
# OR
df.sort_values(['Time'], ascending=True)
No matter which column title, or which boolean argument I use, the displayed results never change order.
Thinking it could be a Jupyter thing, I've previewed the results using print(df), df.head(), and HTML(df.to_html()) (the last example is for Jupyter notebooks). I've also rerun the whole notebook from import CSV to this code. And, I'm also new to Python3 (from 2.7), so I get stuck with that sometimes, but I don't see how that's relevant in this case.
Another post has a similar problem, Python pandas dataframe sort_values does not work. In that instance, the ordering was on a column type string. But as you can see all of the columns here are unambiguously sortable.
Why does my Pandas DataFrame not display new order using sort_values?

df.sort_values(['Total Due']) returns a sorted DF, but it doesn't update DF in place.
So do it explicitly:
df = df.sort_values(['Total Due'])
or
df.sort_values(['Total Due'], inplace=True)

My problem, fyi, was that I wasn't returning the resulting dataframe, so PyCharm wasn't bothering to update said dataframe. Naming the dataframe after the return keyword fixed the issue.
Edit:
I had return at the end of my method instead of
return df,
which the debugger must of noticed, because df wasn't being updated in spite of my explicit, in-place sort.

Sort dataframe by absolute value without changing value or adding column

I have a dataframe that's the result of importing a csv and then performing a few operations and adding a column that's the difference between two other columns (column 10 - column 9 let's say). I am trying to sort the dataframe by the absolute value of that difference column, without changing its value or adding another column.
I have seen this syntax over and over all over the internet, with indications that it was a success (accepted answers, comments saying "thanks, that worked", etc.). However, I get the error you see below:
df.sort_values(by='Difference', ascending=False, inplace=True, key=abs)
Error:
TypeError: sort_values() got an unexpected keyword argument 'key'
I'm not sure why the syntax that I see working for other people is not working for me. I have a lot more going on with the code and other dataframes, so it's not a pandas import problem I don't think.
I have moved on and just made a new column that is the absolute value of the difference column and sorted by that, and exclude that column from my export to worksheet, but I really would like to know how to get it to work the other way. Any help is appreciated.
I'm using Python 3

df.loc[(df.c - df.b).sort_values(ascending = False).index]
Sorting by difference between "c" and "b" , without creating new column.
I hope this is what you were looking for.

key is optional argument
It accepts series as input , maybe you were working with dataframe.
check this

How can I sort my column so that NaN is not shown? [duplicate]

New to Pandas, so maybe I'm missing a big idea?
I have a Pandas DataFrame of register transactions with shape like (500,4):
Time datetime64[ns]
Net Total float64
Tax float64
Total Due float64
I'm working through my code in a Python3 Jupyter notebook. I can't get past sorting any column. Working through the different code examples for sort, I'm not seeing the output reorder when I inspect the df. So, I've reduced the problem to trying to order just one column:
df.sort_values(by='Time')
# OR
df.sort_values(['Total Due'])
# OR
df.sort_values(['Time'], ascending=True)
No matter which column title, or which boolean argument I use, the displayed results never change order.
Thinking it could be a Jupyter thing, I've previewed the results using print(df), df.head(), and HTML(df.to_html()) (the last example is for Jupyter notebooks). I've also rerun the whole notebook from import CSV to this code. And, I'm also new to Python3 (from 2.7), so I get stuck with that sometimes, but I don't see how that's relevant in this case.
Another post has a similar problem, Python pandas dataframe sort_values does not work. In that instance, the ordering was on a column type string. But as you can see all of the columns here are unambiguously sortable.
Why does my Pandas DataFrame not display new order using sort_values?

df.sort_values(['Total Due']) returns a sorted DF, but it doesn't update DF in place.
So do it explicitly:
df = df.sort_values(['Total Due'])
or
df.sort_values(['Total Due'], inplace=True)

My problem, fyi, was that I wasn't returning the resulting dataframe, so PyCharm wasn't bothering to update said dataframe. Naming the dataframe after the return keyword fixed the issue.
Edit:
I had return at the end of my method instead of
return df,
which the debugger must of noticed, because df wasn't being updated in spite of my explicit, in-place sort.

Problem when stores dict into pandas Dataframe

Recently in my project, I need to store a dictionary into a Pandas DataFrame with the code
self.device_info.loc[i,'interference']=[temp_dict]
The device_info is a Pandas DataFrame. The temp_dict is a dictionary and I want it to be stored as an element in the DataFrame for future use. The square bracket is added to ensure there's no error when assigning.
I just found it today that with Pandas version 0.22.0, this code will pack the dictionary as a list and store it into the DataFrame. However, in the version of 0.24.2, this code directly stores the dictionary into Pandas DataFrame.
For example, say when i=0, after executing the code
with Pandas.version == '0.22.0'
type(self.device_info.loc[0,'interference'])
returns list, while Pandas.version == '0.24.2', this code returns a dict. From my perspective, I need a consistent performance that there is always a dictionary stored.
I am currently working on two PCs, one's home and one's at my office, and I cannot update the older version of pandas on my office PC. So I would be much appreciated if anyone can help me figure out why this happens.

Pandas has a from_dict method with many option which takes a dict as input and returns a DataFrame.
You can chose to infer type or force it (to str, for example).
Then, manipulating and appending dataframes is way easier as you won't ever again have dict object problem in that line or column.

How to sort column with datetime values in pandas dataframe? [duplicate]

New to Pandas, so maybe I'm missing a big idea?
I have a Pandas DataFrame of register transactions with shape like (500,4):
Time datetime64[ns]
Net Total float64
Tax float64
Total Due float64
I'm working through my code in a Python3 Jupyter notebook. I can't get past sorting any column. Working through the different code examples for sort, I'm not seeing the output reorder when I inspect the df. So, I've reduced the problem to trying to order just one column:
df.sort_values(by='Time')
# OR
df.sort_values(['Total Due'])
# OR
df.sort_values(['Time'], ascending=True)
No matter which column title, or which boolean argument I use, the displayed results never change order.
Thinking it could be a Jupyter thing, I've previewed the results using print(df), df.head(), and HTML(df.to_html()) (the last example is for Jupyter notebooks). I've also rerun the whole notebook from import CSV to this code. And, I'm also new to Python3 (from 2.7), so I get stuck with that sometimes, but I don't see how that's relevant in this case.
Another post has a similar problem, Python pandas dataframe sort_values does not work. In that instance, the ordering was on a column type string. But as you can see all of the columns here are unambiguously sortable.
Why does my Pandas DataFrame not display new order using sort_values?

df.sort_values(['Total Due']) returns a sorted DF, but it doesn't update DF in place.
So do it explicitly:
df = df.sort_values(['Total Due'])
or
df.sort_values(['Total Due'], inplace=True)

My problem, fyi, was that I wasn't returning the resulting dataframe, so PyCharm wasn't bothering to update said dataframe. Naming the dataframe after the return keyword fixed the issue.
Edit:
I had return at the end of my method instead of
return df,
which the debugger must of noticed, because df wasn't being updated in spite of my explicit, in-place sort.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Set_index() on a Pandas DataFrame giving unexpected results - python

I don't see any issue there. You did set an index on datetime. The total values are indexed on datetime too

Related

unable to sort excel values using pandas [duplicate]

Sort dataframe by absolute value without changing value or adding column

How can I sort my column so that NaN is not shown? [duplicate]

Problem when stores dict into pandas Dataframe

How to sort column with datetime values in pandas dataframe? [duplicate]

Categories

Resources