Bug when indexing date column in Pandas

Bug when indexing date column in Pandas - python

I'm trying to make pandas recognise the first column as a date.
import csv
import pandas as pd
import plotly.express as px
cl = open('cl.csv')
cl = pd.read_csv('CL.csv', parse_dates=['Date'], index_col=['Date'])
cl.info()
Then to visualise the price:
fig = px.line(cl, y="Adj Close", title='Crude Oil Price', labels = {'Adj Close':'Crude Oil Price(in USD)'})
But it gives back a ruined chart:
Date indexed chart
If I comment out 'parse_dates=['Date'], index_col=['Date'])' and just leave 'cl = pd.read_csv('CL.csv')' the chart will look just fine.
Chart without date
What am I doing wrong here?

If you print c1 out and the dates look fine, then the reason behind the graph could likely be that your c1 wasn't sorted by Date, do the following before visualizing it:
c1 = c1.sort_values('Date')

  I think this problem can be caused by the type of date format that column contains ('Date'), so researching the documentation, I quote the following: For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more, then you could replace cl = pd.read_csv('CL.csv', parse_dates=['Date'], index_col=['Date']) with cl = pd.read_csv('CL.csv', parse_dates=['Date'], date_parser=lambda col: pd.to_datetime(col, utc=True))

Related

Unable to create a Plot using the pivoted data set : key error

I want to create a plot chart with forecasted figures for next 2 months. The below is the code I wrote.
import pandas as pd
from datetime import datetime
df= pd.read_csv(r'C:\Users\Desktop\Customers.csv')
parsed = pd.to_datetime(df["Date"], errors="coerce").fillna(pd.to_datetime(df["Date"],format="%Y-%d-%m",errors="coerce"))
ordinal = pd.to_numeric(df["Date"], errors="coerce").apply(lambda x: pd.Timestamp("1899-12-30")+pd.Timedelta(x, unit="D"))
df["Date"] = parsed.fillna(ordinal)
df['Amount currency'] = df['Amount currency'].str.replace(r'[^0-9\.]', '', regex=True)
df['Amount'] = df['Amount'].str.replace(r'[^0-9\.]', '', regex=True)
df['Amount currency'] = pd.to_numeric(df['Amount currency'])
df['Amount'] = pd.to_numeric(df['Amount'])
#df.Date = pd.to_datetime(df.Date).dt.to_period('m')
df['Date'] = df['Date'].dt.to_period('M').dt.to_timestamp() + pd.offsets.MonthEnd()
columns = ['Date', 'Type', 'Amount']
df = df[columns]
and it is required to pivot the figures
df2=pd.pivot_table(df,index='Date',values = 'Amount', columns = 'Type',aggfunc='sum')
So the final output columns are,
Date
Customer Credit Note
Payment
Sales Invoice
Based on the above code, I wanted to create a plot with 2 months of forecast
import matplotlib.pyplot as plt
import seaborn as sns
sns.lineplot(x='Date',y='Payment',data=dataset)
plt.title("Monthly_cases")
plt.xlabel("Month end date")
plt.ylabel("Payment")
plt.show()
But the above code returns with an error named KeyError:Date
What would be the reason for this? Can anyone help me? Also can anyone help me to modify the above code to get next two months forecasted values?
Thanks

Partial string filter pandas

On Pandas 1.3.4 and Python 3.9.
So I'm having issues filtering for a partial piece of the string. The "Date" column is listed in the format of MM/DD/YYYY HH:MM:SS A/PM where the most recent one is on top. If the date is single digit (example: November 3rd), it does not have the 0 such that it is 11/3 instead of 11/03. Basically I'm looking to go look at column named "Date" and have python read parts of the string to filter for only today.
This is what the original csv looks like. This is what I want to do to the file. Basically looking for a specific date but not any time of that date and implement the =RIGHT() formula. However this is what I end up with with the following code.
from datetime import date
import pandas as pd
df = pd.read_csv(r'file.csv', dtype=str)
today = date.today()
d1 = today.strftime("%m/%#d/%Y") # to find out what today is
df = pd.DataFrame(df, columns=['New Phone', 'Phone number', 'Date'])
df['New Phone'] = df['Phone number'].str[-10:]
df_today = df['Date'].str.contains(f'{d1}',case=False, na=False)
df_today.to_csv(r'file.csv', index=False)

This line is wrong:
df_today = df['Date'].str.contains(f'{d1}',case=False, na=False)
All you're doing there is creating a mask; essentially what that is is just a Pandas series, containg True or False in each row, according to the condition you created the mask in. The spreadsheet get's only FALSE as you showed because non of the items in the Date contain the string that the variable d1 holds...
Instead, try this:
from datetime import date
import pandas as pd
# Load the CSV file, and change around the columns
df = pd.DataFrame(pd.read_csv(r'file.csv', dtype=str), columns=['New Phone', 'Phone number', 'Date'])
# Take the last ten chars of each phone number
df['New Phone'] = df['Phone number'].str[-10:]
# Convert each date string to a pd.Timestamp, removing the time
df['Date'] = pd.to_datetime(df['Date'].str.split(r'\s+', n=1).str[0])
# Get the phone numbers that are from today
df_today = df[df['Date'] == date.today().strftime('%m/%d/%Y')]
# Write the result to the CSV file
df_today.to_csv(r'file.csv', index=False)

How to convert a column in a dataframe to an index datetime object?

I have a question about how to convert a column 'Timestamp' into an index&datetime. And then also drop the column once it's converted into an index.
df = {'Timestamp':['20/01/2021 01:00:00.12 AM','20/01/2021 01:00:00.21 AM','20/01/2021 01:00:01.34 AM],
'Value':['14','178','158','75']}
I tried the following, but obvious didn't work.
df.Timestamp = pd.to_datetime(df.Timestamp.str[0])
df=df.set_index(['Timestamp'], drop=True)
FYI. The df is actually a lot text processing so unfortunately I cannot just do read_csv and parse datetime object. :( So yes, the df is exactly as what's prescribed above.
Thank you.

Don't enclose 'Timestamp' in square brackets.
import pandas as pd
df = pd.DataFrame({'Timestamp':['20/01/2021 01:00:00.12 AM','20/01/2021 01:00:00.21 AM','20/01/2021 01:00:01.34 AM'],
'Value':['14','178','158']})
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.set_index('Timestamp')
print(df)
## Output
Value
Timestamp
20/01/2021 01:00:00.12 AM 14
20/01/2021 01:00:00.21 AM 178
20/01/2021 01:00:01.34 AM 158

Python Pandas filtering dataframe on date

I am trying to manipulate a CSV file on a certain date in a certain column.
I am using pandas (total noob) for that and was pretty successful until i got to dates.
The CSV looks something like this (with more columns and rows of course).
These are the columns:
Circuit
Status
Effective Date
These are the values:
XXXX001
Operational
31-DEC-2007
I tried dataframe query (which i use for everything else) without success.
I tried dataframe loc (which worked for everything else) without success.
How can i get all rows that are older or newer from a given date? If i have other conditions to filter the dataframe, how do i combine them with the date filter?
Here's my "raw" code:
import pandas as pd
# parse_dates = ['Effective Date']
# dtypes = {'Effective Date': 'str'}
df = pd.read_csv("example.csv", dtype=object)
# , parse_dates=parse_dates, infer_datetime_format=True
# tried lot of suggestions found on SO
cols = df.columns
cols = cols.map(lambda x: x.replace(' ', '_'))
df.columns = cols
status1 = 'Suppressed'
status2 = 'Order Aborted'
pool = '2'
region = 'EU'
date1 = '31-DEC-2017'
filt_df = df.query('Status != #status1 and Status != #status2 and Pool == #pool and Region_A == #region')
filt_df.reset_index(drop=True, inplace=True)
filt_df.to_csv('filtered.csv')
# this is working pretty well
supp_df = df.query('Status == #status1 and Effective_Date < #date1')
supp_df.reset_index(drop=True, inplace=True)
supp_df.to_csv('supp.csv')
# this is what is not working at all
I tried many approaches, but i was not able to put it together. This is just one of many approaches i tried.. so i know it is perhaps completely wrong, as no date parsing is used.
supp.csv will be saved, but the dates present are all over the place, so there's no match with the "logic" in this code.
Thanks for any help!

Make sure you convert your date to datetime and then filter slice on it.
df['Effective Date'] = pd.to_datetime(df['Effective Date'])
df[df['Effective Date'] < '2017-12-31']
#This returns all the values with dates before 31th of December, 2017.
#You can also use Query

Why can't I search for a row in a pandas df using a date as part of a tuple index?

I am trying to search a pandas df I made which has a tuple as an index. The first part of the tuple is a date and the second part is a forex pair. I've tried a few things but I can't seem to search using a date-formatted string as part of a tuple with .loc or .ix
My df looks like this:
Open Close
(11-01-2018, AEDAUD) 0.3470 0.3448
(11-01-2018, AEDCAD) 0.3415 0.3408
(11-01-2018, AEDCHF) 0.2663 0.2656
(11-01-2018, AEDDKK) 1.6955 1.6838
(11-01-2018, AEDEUR) 0.2277 0.2261
Here is the complete code :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
forex_11 = pd.read_csv('FOREX_20180111.csv', sep=',', parse_dates=['Date'])
forex_12 = pd.read_csv('FOREX_20180112.csv', sep=',', parse_dates=['Date'])
time_format = '%d-%m-%Y'
forex = forex_11.append(forex_12, ignore_index=False)
forex['Date'] = forex['Date'].dt.strftime(time_format)
GBP = forex[forex['Symbol'] == "GBPUSD"]
forex.index = list(forex[['Date', 'Symbol']].itertuples(index=False, name=None))
forex_open_close = pd.DataFrame(np.array(forex[['Open','Close']]), index=forex.index)
forex_open_close.columns = ['Open', 'Close']
print(forex_open_close.head())
print(forex_open_close.ix[('11-01-2018', 'GBPUSD')])
How do I get the row which has index ('11-01-2018', 'GBPUSD') ?

Can you try putting the tuple in a list using brackets?
Like this:
print(forex_open_close.ix[[('11-01-2018', 'GBPUSD')]])

I would recommend using the Pandas multiIndex. In your case you could do the following:
tuples = list(data[['Date', 'Symbol']].itertuples(index=False, name=None))
data.index = pd.MultiIndex.from_tuples(tuples, names=['Date', 'Symbol'])
# And then to index
data.loc['2018-01-11', 'AEDCAD']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Bug when indexing date column in Pandas - python

If you print c1 out and the dates look fine, then the reason behind the graph could likely be that your c1 wasn't sorted by Date, do the following before visualizing it: c1 = c1.sort_values('Date')

Related

Unable to create a Plot using the pivoted data set : key error

Partial string filter pandas

How to convert a column in a dataframe to an index datetime object?

Python Pandas filtering dataframe on date

Why can't I search for a row in a pandas df using a date as part of a tuple index?

Categories

Resources