I have a dataframe and am trying to set the index to the column 'Timestamp'. Currently the index is just a row number. An example of Timestamp's format is: 2015-09-03 16:35:00
I've tried to set the index:
df.set_index('Timestamp')
I don't get an error, but when I print the dataframe, the index is still the row number. How can I use Timestamp as the index?
You need to either specify inplace=True, or assign the result to a variable. Try:
df.set_index('Timestamp', inplace=True, drop=True)
Basically, there are two things that you might want to do when you set the index. One is new_df = old_df.set_index('Timestamp', inplace=False). I.e. You want a new DataFrame that has the new index, but still want a copy of the original DataFrame. The other is df.set_index('Timestamp', inplace=True). Which is for when you want to modify the existing object.
To add to the accepted answer:
Remember that you might need to set your timestamp into a datetime!
df = pd.read_csv(dataFile)
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index("timestamp", inplace=True, drop=True)
df.info()
References:
https://www.geeksforgeeks.org/python-pandas-dataframe-set_index/
how set column as date index?
Related
I got a dataframe that looks like this. What I want to do is to:
Sort my DF by DateTime
After sorting my DF by date, adding a new column that counts and acummulates values for EACH rowname in "Cod_atc".
The problem is that everytime I add this new column, no matter what I do, I can not get my DF sorted by DateTime
This is the code I am using, I am just adding a column called "DateTime" and sorting everything by that column. The problem is when i add the new column called "count".
df1['DateTime'] = pd.to_datetime(df1['local_date'])
df1.sort_values(by='DateTime')
df1['count']=df1.groupby(['cod_atc']).cumcount() #sort=False
df1
This is the result I get and the problem is that, if I try to sort my DF by DateTime again, it works but the "count" column would not make any sense! "Count" column should be counting and acumulating values for EACH rowname in "COD_Atc" but following the DATETIME!
Did you not forgot to add inplace = True when you sorted df1?
Without that you lose the sort step.
df1['DateTime'] = pd.to_datetime(df1['local_date'])
df1.sort_values(by='DateTime', inplace =True)
df1['count']=df1.groupby(['cod_atc']).cumcount() #sort=False
df1
I have a pandas data frame like following.
colName
date
2020-06-02 03:00:00 39
I can get value of each entry of colName using following. How to get date value?
for index, row in max_items.iterrows():
print(str(row['colName]))
// How to get date??
Anti-pattern Warning
First I want to highlight, this is an anti-pattern, using iteration is highly counterproductive.
There are extremely rare cases when you need to iterate through the pandas dataframes. Essentially, Map, Apply and applymap can achieve results efficiently.
Coming to the issue at hand:
you need to convert your index to datetime if not already there.
Simple example:
# Creating the dataframe
df1 = pd.DataFrame({'date':pd.date_range(start='1/1/2018', end='1/03/2018'),
'test_value_a':[5, 6, 9],
'test_value_b':[2, 5, 1]})
# Coverting date column into index of type datetime.
df1.index = pd.to_datetime(df1.date)
# Dropping date column we had created
df1.drop(labels='date', axis="columns")
To print date, month, month name, day or day_name:
df1.index.date
df1.index.month
df1.index.month
df1.index.month_name
df1.index.day
df1.index.day_name
I would suggest read about loc, iloc and ix in the pandas' documentation that should help.
I hope I didn't veer off from the crux of the question.
I have a dataframe and am trying to set the index to the column 'Timestamp'. Currently the index is just a row number. An example of Timestamp's format is: 2015-09-03 16:35:00
I've tried to set the index:
df.set_index('Timestamp')
I don't get an error, but when I print the dataframe, the index is still the row number. How can I use Timestamp as the index?
You need to either specify inplace=True, or assign the result to a variable. Try:
df.set_index('Timestamp', inplace=True, drop=True)
Basically, there are two things that you might want to do when you set the index. One is new_df = old_df.set_index('Timestamp', inplace=False). I.e. You want a new DataFrame that has the new index, but still want a copy of the original DataFrame. The other is df.set_index('Timestamp', inplace=True). Which is for when you want to modify the existing object.
To add to the accepted answer:
Remember that you might need to set your timestamp into a datetime!
df = pd.read_csv(dataFile)
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index("timestamp", inplace=True, drop=True)
df.info()
References:
https://www.geeksforgeeks.org/python-pandas-dataframe-set_index/
how set column as date index?
I have the following code which imports a CSV file. There are 3 columns and I want to set the first two of them to variables. When I set the second column to the variable "efficiency" the index column is also tacked on. How can I get rid of the index column?
df = pd.DataFrame.from_csv('Efficiency_Data.csv', header=0, parse_dates=False)
energy = df.index
efficiency = df.Efficiency
print efficiency
I tried using
del df['index']
after I set
energy = df.index
which I found in another post but that results in "KeyError: 'index' "
When writing to and reading from a CSV file include the argument index=False and index_col=False, respectively. Follows an example:
To write:
df.to_csv(filename, index=False)
and to read from the csv
df.read_csv(filename, index_col=False)
This should prevent the issue so you don't need to fix it later.
df.reset_index(drop=True, inplace=True)
DataFrames and Series always have an index. Although it displays alongside the column(s), it is not a column, which is why del df['index'] did not work.
If you want to replace the index with simple sequential numbers, use df.reset_index().
To get a sense for why the index is there and how it is used, see e.g. 10 minutes to Pandas.
You can set one of the columns as an index in case it is an "id" for example.
In this case the index column will be replaced by one of the columns you have chosen.
df.set_index('id', inplace=True)
If your problem is same as mine where you just want to reset the column headers from 0 to column size. Do
df = pd.DataFrame(df.values);
EDIT:
Not a good idea if you have heterogenous data types. Better just use
df.columns = range(len(df.columns))
you can specify which column is an index in your csv file by using index_col parameter of from_csv function
if this doesn't solve you problem please provide example of your data
One thing that i do is df=df.reset_index()
then df=df.drop(['index'],axis=1)
To remove or not to create the default index column, you can set the index_col to False and keep the header as Zero. Here is an example of how you can do it.
recording = pd.read_excel("file.xls",
sheet_name= "sheet1",
header= 0,
index_col= False)
The header = 0 will make your attributes to headers and you can use it later for calling the column.
It works for me this way:
Df = data.set_index("name of the column header to start as index column" )
I'd like to replace some values in the first row of a dataframe by a dummy.
df[[0]].replace(["x"], ["dummy"])
The problem here is that the values in the first column are replaced, but not as part of the dataframe.
print(df)
yields the dataframe with the original data in column 1. I've tried
df[(df[[0]].replace(["x"], ["dummy"]))]
which doesn't work either..
replace returns a copy of the data by default, so you need to either overwrite the df by self-assign or pass inplace=True:
df[[0]].replace(["x"], ["dummy"], inplace=True)
or
df[0] = df[[0]].replace(["x"], ["dummy"])
see the docs