Plotly date formatting issue for pandas dataframe

Plotly date formatting issue for pandas dataframe - python

I am using the Plotly python API to upload a pandas dataframe that contains a date column (which is generated via pandas date_range function).
If I look at the pandas dataframe locally the date formatting is as I'd expect, i.e YYYY-MM-DD. However, when I view it in Plotly I see it in the form YYYY-MM-DD HH:MM:SS. I really don't need this level of precision and also having such a wide column results in formatting issues when I try to fit all the other columns that I want in.
Is there a way to prevent Plotly from re-formatting the pandas dataframe?
A basic example of my current approach looks like:
import plotly.plotly as py
from plotly.tools import FigureFactory as FF
import pandas as pd
dates = pd.date_range('2016-01-01', '2016-02-01', freq='D')
df = pd.DataFrame(dates)
table = FF.create_table(df)
py.plot(table, filename='example table')

It turns out that this problem wasn't solvable - Plotly just happened to treat datetimes in that way.
This has since been updated (fixed) - you can read more here.

Related

Convert date from xlsx dataset from YYYY.TEXT (e.g. 2012.916667) to a normal date format (e.g. 01/01/2012)

I've read in a xlsx file using pandas.read_excel and the dates on the dataset have come in like 2012.916667 for example. I can't figure out what the actual dates are as I don't have them so I'm not sure what the numbers mean. Anyone know how to convert them to normal dates? Thanks!

You can convert it in the regular pandas Timestamp data format like so
import pandas as pd
pd.to_datetime(2012.916667, unit='d', origin='1970-01-01')
# if the dates are loaded in a column, say, dates
pd.to_datetime(df['dates'], unit='d', origin='1970-01-01')
where the assumption is that the integer part is the number of days since the epoch (origin), and the decimal part is the percentage of day.
Since the data is coming from an excel file, the above assumptions are probably correct. Still, you should first get it confirmed from the data owner and use the appropriate parameters in the pandas function.

Pandas Data frames and sorting values

I am having a difficult time with writing this hw assignment, and am not sure where I messed up. I have tried several things, and believe my issue lies in the sort_values or maybe in the groupby command.
The issue is that I want to only display graph data from the year 2007. (using pandas and plotly in jupyternotebook for my class). I have the graph I want mostly but cannot get it to display the data correctly. It simply isn't filtering out the years, or taking data from specific dates as requested.
import pandas as pd
import plotly.express as px
df = pd.read_csv('Data/Country_Data.csv')
print(df.shape)
df.head(2)
df_Q1 = df.query("year == '2007'")
print(df_Q1.shape)
df_Q1.head()
This is where the issue begins, because it prints a table with only header information. As in it prints all the column names, but none of the data for them, and then later on it displays a graph of what I assume is the most recent death data rather than the year 2007 as specified.

Why does not Seaborn Relplot print datetime value on x-axis?

I'm trying to solve a Kaggle Competition to get deeper into data science knowledge. I'm dealing with an issue with seaborn library. I'm trying to plot a distribution of a feature along the date but the relplot function is not able to print the datetime value. On the output, I see a big black box instead of values.
Here there is my code, for plotting:
rainfall_types = list(auser.loc[:,1:])
grid = sns.relplot(x='Date', y=rainfall_types[0], kind="line", data=auser);
grid.fig.autofmt_xdate()
Here there is the
Seaborn.relpot output and the head of my dataset

I found the error. Pratically, when you use pandas.read_csv(dataset), if your dataset contains datetime column they are parsed as object, but python read these values as 'str' (string). So when you are going to plot them, matplotlib is not able to show them correctly.
To avoid this behaviour, you should convert the datetime value into datetime object by using:
df = pandas.read_csv(dataset, parse_date='Column_Date')
In this way, we are going to indicate to pandas library that there is a date column identified by the key 'Column_Date' and it has to be converted into datetime object.
If you want, you could use the Column Date as index for your dataframe, to speed up the analyis along the time. To do it add argument index='Column_Date' at your read_csv.
I hope you will find it helpful.

StyleFrame Plugin Not Respecting number_format

No matter what I have tried, StyleFrame seems to insist on formatting all dates in DD/MM/YYYY format. I am trying to get them to format as MM/DD/YYYY.
I am trying to control this using number_format but it appears to be entirely ignored in the resulting Excel file. I have been able to apply many other kinds of styles successfully such as font size, text alignment, and column width but it seems to ignore number_format.
Specifically, I am trying to do this:
sf = StyleFrame(exampleDataFrame)
sf.apply_column_style(cols_to_style=['Start Date', 'End Date'],
width=35, styler_obj=Styler(number_format='MM/DD/YYYY', font_size=10))
The width and the font size are applied as expected but not the date format.

StyleFrame seems to insist on formatting all dates in DD/MM/YYYY
This is correct, and is due to a bit of an oversight in the design of the styling process.
Until a formal fix, a working workaround is to override the underlying default style for date (or datetime) objects. Unfortunately this does not provide control over specific columns and must be done before the creation of the StyleFrame object.
I'll create a github issue for this so it will be looked into in the future.
import pandas as pd
from styleframe import StyleFrame, utils
df = pd.DataFrame({'a': ['11/12/2020']})
df['a'] = pd.to_datetime(df['a'])
orig_date_time_format = utils.number_formats.default_date_time_format
utils.number_formats.default_date_time_format = 'YYYYMMDD'
sf = StyleFrame(df)
sf.to_excel('test.xlsx').save()
utils.number_formats.default_date_time_format = orig_date_time_format
This will result with 20201112 in the cell.

Plotting multiple time series from pandas dataframe

I have a pandas dataframe loaded from file in the following format:
ID,Date,Time,Value1,Value2,Value3,Value4
0063,04/21/2020,11:22:55,0.0347,0.41,1440,10.5
0064,04/21/2020,11:22:56,0.0355,0.41,1440,10.4
...
9849,04/22/2020,10:46:19,0.058,1.05,1460,10.6
I have tried multiple methods of plotting a line graph of each value vs date/time or a single graph with multiple subplots with limited success. I am hoping someone with much more experience may have an elegant solution to try as opposed to my blind swinging. Note that the dataset may have large breaks in time between days.
Thanks!

parsing dates during the import of the pandas dataframe seemed to be my biggest issue. Once I added parse_dates to the pd.read_csv I was able to define the dt column and plot with matplotlib as expected.
df = pd.read_csv(input_text, parse_dates = [["Date", "Time"]])
dt = df["Date_Time"]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotly date formatting issue for pandas dataframe - python

It turns out that this problem wasn't solvable - Plotly just happened to treat datetimes in that way. This has since been updated (fixed) - you can read more here.

Related

Convert date from xlsx dataset from YYYY.TEXT (e.g. 2012.916667) to a normal date format (e.g. 01/01/2012)

Pandas Data frames and sorting values

Why does not Seaborn Relplot print datetime value on x-axis?

StyleFrame Plugin Not Respecting number_format

Plotting multiple time series from pandas dataframe

Categories

Resources