i have a dataframe downloading as:
the dataframe with the date header on a separate row
if i export it to a csv file and import it again it has all the headers on the first row.
if i look for the information from the row via .iloc[0] i get:
bidopen 1.14140
bidclose 1.14143
bidhigh 1.14160
bidlow 1.14116
askopen 1.14153
askclose 1.14164
askhigh 1.14179
asklow 1.14127
tickqty 5204.00000
Name: 2022-01-14 21:00:00, dtype: float64
resetting the index does not work
essentially i am trying to be able to select the date column, i.e. df['date'] etc, but with no luck in its current form.
any help would be greatly appreciated.
This code will let you switch the date as a columns and reset your index. You will need to import pandas
df['Date'] = df.index
df.reset_index(drop=True, inplace=True)
Related
I have the below dataframe and i am trying to display how many rides per day.
But i can see only 1 column "near_penn" is considered as a column but "Date" is not.
c = df[['start day','near_penn','Date']]
c=c.loc[c['near_penn']==1]
pre_pandemic_df_new=pd.DataFrame()
pre_pandemic_df_new=c.groupby('Date').agg({'near_penn':'sum'})
print(pre_pandemic_df_new)
print(pre_pandemic_df_new.columns)
Why doesn't it consider "Date" as a column?
How can i make Date as a column of "pre_pandemic_df_new"?
Feel you can use to to_datetime method.
import pandas as pd
pre_pandemic_df_new["Date"]= pd.to_datetime(pre_pandemic_df_new["Date"])
Hope this works
Why doesn't it consider "Date" as a column?
Because the date is an index for your Dataframe.
How can I make Date as a column of "pre_pandemic_df_new"?
you can try this:
pre_pandemic_df_new.reset_index(level=['Date'])
df[['Date','near_penn']] = df[['Date_new','near_penn_new']]
Once you created your dataframe you can try this to add new columns to the end of the dataframe to test if it works before you make adjustments
OR
You can check for a value for the first row corresponding to the first "date" row.
These are the first things that came to my mind hope it helps
I have a pandas data frame like following.
colName
date
2020-06-02 03:00:00 39
I can get value of each entry of colName using following. How to get date value?
for index, row in max_items.iterrows():
print(str(row['colName]))
// How to get date??
Anti-pattern Warning
First I want to highlight, this is an anti-pattern, using iteration is highly counterproductive.
There are extremely rare cases when you need to iterate through the pandas dataframes. Essentially, Map, Apply and applymap can achieve results efficiently.
Coming to the issue at hand:
you need to convert your index to datetime if not already there.
Simple example:
# Creating the dataframe
df1 = pd.DataFrame({'date':pd.date_range(start='1/1/2018', end='1/03/2018'),
'test_value_a':[5, 6, 9],
'test_value_b':[2, 5, 1]})
# Coverting date column into index of type datetime.
df1.index = pd.to_datetime(df1.date)
# Dropping date column we had created
df1.drop(labels='date', axis="columns")
To print date, month, month name, day or day_name:
df1.index.date
df1.index.month
df1.index.month
df1.index.month_name
df1.index.day
df1.index.day_name
I would suggest read about loc, iloc and ix in the pandas' documentation that should help.
I hope I didn't veer off from the crux of the question.
I have a dataframe and am trying to set the index to the column 'Timestamp'. Currently the index is just a row number. An example of Timestamp's format is: 2015-09-03 16:35:00
I've tried to set the index:
df.set_index('Timestamp')
I don't get an error, but when I print the dataframe, the index is still the row number. How can I use Timestamp as the index?
You need to either specify inplace=True, or assign the result to a variable. Try:
df.set_index('Timestamp', inplace=True, drop=True)
Basically, there are two things that you might want to do when you set the index. One is new_df = old_df.set_index('Timestamp', inplace=False). I.e. You want a new DataFrame that has the new index, but still want a copy of the original DataFrame. The other is df.set_index('Timestamp', inplace=True). Which is for when you want to modify the existing object.
To add to the accepted answer:
Remember that you might need to set your timestamp into a datetime!
df = pd.read_csv(dataFile)
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index("timestamp", inplace=True, drop=True)
df.info()
References:
https://www.geeksforgeeks.org/python-pandas-dataframe-set_index/
how set column as date index?
I want to read an excel file where the second line is a date in a string format and the first line is the weekday that corresponds to each date, and then change the second line from string to datetime. If I only read the second line as index, and completely skip the first line with the days, I do the following to convert it to a datetime:
Receipts_tbl.columns = pd.to_datetime(Receipts_tbl.columns)
How do I do that if I have a multiindexed dataframe, where the first line of the indices remains as weekdays, and I want the second to be converted to datetime?
Thanx
You didn't give an example of what your data source looks like, so I'm inferring.
If you use pd.read_excel with header=None, it will treat the first two rows as data and you can manipulate them to achieve your goal. Here's a minimum example, with an example "real" data row beneath:
df = pd.DataFrame([['Mon', 'Tues'], ['10-02-1995', '11-23-1997'],
[12, 32]])
# 0 1
#0 Mon Tues
#1 10-02-1995 11-23-1997
#2 12 32
Next, convert the first row to datetime as you said in your question.
df.loc[1] = pd.to_datetime(df.loc[1])
Create a multi-index from the first two rows, and set it as the dataframe's columns
df.columns = df.T.set_index([0,1]).index.set_names(['DOW', 'Date'])
Lastly, select from second row down, as the first two rows are now in the columns.
df = df.loc[2:].reset_index()
df
#DOW Mon Tues
#Date 812592000000000000 880243200000000000
#0 12 32
Note that DOW and Date are now a multilevel index for the columns, and the 'data' rows have been reindexed to start at 0.
Please let me know if I misunderstood your question.
Assuming you have this data in the clipboard
Day Date Data
Mo 2018-08-06 blah
Mo 2018-08-06 blah
Mo 2018-08-06 blah
Tu 2018-08-07 blah
Try
import pandas as pd
df = pd.read_clipboard().set_index(['Day', 'Date'])
to get a multiindexed example
Then change the Date to Datetime
df2 = df.reset_index()
df2.Date = pd.to_datetime(df2.Date, yearfirst=True)
Afterwards you can set the multiindex again, if you want.
Note, check out the documentation on to_datetime if your
datetime string is formatted differently. It assumes
month first, unless you set dayfirst or yearfirst to True.
I am reading a csv file, cleaning it up a little, and then saving it back to a new csv file. The problem is that the new csv file has a new column (first column in fact), labelled as index. Now this is not the row index, as I have turned that off in the to_csv() function as you can see in the code. Plus row index doesn't have a column label as well.
df = pd.read_csv('D1.csv', na_values=0, nrows = 139) # Read csv, with 0 values converted to NaN
df = df.dropna(axis=0, how='any') # Delete any rows containing NaN
df = df.reset_index()
df.to_csv('D1Clean.csv', index=False)
Any ideas where this phantom column is coming from and how to get rid of it?
I think you need add parameter drop=True to reset_index:
df = df.reset_index(drop=True)
drop : boolean, default False
Do not try to insert index into dataframe columns. This resets the index to the default integer index.