StyleFrame Plugin Not Respecting number_format - python

No matter what I have tried, StyleFrame seems to insist on formatting all dates in DD/MM/YYYY format. I am trying to get them to format as MM/DD/YYYY.
I am trying to control this using number_format but it appears to be entirely ignored in the resulting Excel file. I have been able to apply many other kinds of styles successfully such as font size, text alignment, and column width but it seems to ignore number_format.
Specifically, I am trying to do this:
sf = StyleFrame(exampleDataFrame)
sf.apply_column_style(cols_to_style=['Start Date', 'End Date'],
width=35, styler_obj=Styler(number_format='MM/DD/YYYY', font_size=10))
The width and the font size are applied as expected but not the date format.

StyleFrame seems to insist on formatting all dates in DD/MM/YYYY
This is correct, and is due to a bit of an oversight in the design of the styling process.
Until a formal fix, a working workaround is to override the underlying default style for date (or datetime) objects. Unfortunately this does not provide control over specific columns and must be done before the creation of the StyleFrame object.
I'll create a github issue for this so it will be looked into in the future.
import pandas as pd
from styleframe import StyleFrame, utils
df = pd.DataFrame({'a': ['11/12/2020']})
df['a'] = pd.to_datetime(df['a'])
orig_date_time_format = utils.number_formats.default_date_time_format
utils.number_formats.default_date_time_format = 'YYYYMMDD'
sf = StyleFrame(df)
sf.to_excel('test.xlsx').save()
utils.number_formats.default_date_time_format = orig_date_time_format
This will result with 20201112 in the cell.

Related

Why does not Seaborn Relplot print datetime value on x-axis?

I'm trying to solve a Kaggle Competition to get deeper into data science knowledge. I'm dealing with an issue with seaborn library. I'm trying to plot a distribution of a feature along the date but the relplot function is not able to print the datetime value. On the output, I see a big black box instead of values.
Here there is my code, for plotting:
rainfall_types = list(auser.loc[:,1:])
grid = sns.relplot(x='Date', y=rainfall_types[0], kind="line", data=auser);
grid.fig.autofmt_xdate()
Here there is the
Seaborn.relpot output and the head of my dataset
I found the error. Pratically, when you use pandas.read_csv(dataset), if your dataset contains datetime column they are parsed as object, but python read these values as 'str' (string). So when you are going to plot them, matplotlib is not able to show them correctly.
To avoid this behaviour, you should convert the datetime value into datetime object by using:
df = pandas.read_csv(dataset, parse_date='Column_Date')
In this way, we are going to indicate to pandas library that there is a date column identified by the key 'Column_Date' and it has to be converted into datetime object.
If you want, you could use the Column Date as index for your dataframe, to speed up the analyis along the time. To do it add argument index='Column_Date' at your read_csv.
I hope you will find it helpful.

Plotting multiple time series from pandas dataframe

I have a pandas dataframe loaded from file in the following format:
ID,Date,Time,Value1,Value2,Value3,Value4
0063,04/21/2020,11:22:55,0.0347,0.41,1440,10.5
0064,04/21/2020,11:22:56,0.0355,0.41,1440,10.4
...
9849,04/22/2020,10:46:19,0.058,1.05,1460,10.6
I have tried multiple methods of plotting a line graph of each value vs date/time or a single graph with multiple subplots with limited success. I am hoping someone with much more experience may have an elegant solution to try as opposed to my blind swinging. Note that the dataset may have large breaks in time between days.
Thanks!
parsing dates during the import of the pandas dataframe seemed to be my biggest issue. Once I added parse_dates to the pd.read_csv I was able to define the dt column and plot with matplotlib as expected.
df = pd.read_csv(input_text, parse_dates = [["Date", "Time"]])
dt = df["Date_Time"]

Exporting Pandas DataFrame cells directly to excel/csv (python)

I have a Pandas DataFrame that has sports records in it. All of them look like this: "1-2-0", "17-12-1", etc., for wins, losses and ties. When I export this the records come up in different date formats within Excel. Some will come up as "12-May", others as "9/5/2001", and others will come up as I want them to.
The DataFrame that I want to export is named 'x' and this is the command I'm currently using. I tried it without the date_format part and it gave the same response in Excel.
x.to_csv(r'C:\Users\B\Desktop\nba.csv', date_format = '%s')
Also tried using to_excel and I kept getting errors while trying to export. Any ideas? I was thinking I am doing the date_format part wrong, but don't know to transfer the string of text directly instead of it getting automatically switched to a string.
Thanks!
I don't think its a python issue, but Excel auto detecting dates in your data.
But, see below to convert your scores to strings.
Try this,
import pandas as pd
df = pd.DataFrame({"lakers" : ["10-0-1"],"celtics" : ["11-1-3"]})
print(df.head())
here is the dataframe with made up data.
lakers celtics
0 10-0-1 11-1-3
Convert to dataframe to string
df = df.astype(str)
and save the csv:
df.to_csv('nba.csv')
Opening in LibreOffice gives me to columns with scores (made up)
You might have a use Excel issue going on here. Inline with my comment below, you can change any column in Excel to lots of different formats. In this case I believe Excel is auto detecting date formatting, incorrectly. Select your columns of data, right click, select format and change to anything else, like 'General'.

How to convert a date with a specific format without loosing the date type in Pandas dataframe

I have a list of columns I need to convert to date without loosing the date type format. To convert to date one could use df1[col] = df1[col].astype('datetime64[ns]') which gives an actual date type but if I want it to be of type '%m/%d/%Y' it is often suggested on here that one does this: df1[col] = df1[col].dt.strftime('%m/%d/%Y') but in excel this is now recognized as a string type and not a date type.
I have gone through many posts and searched online to find a solution to this problem and there must be one.
Here is my code I have that is giving the incorrect types that I do not want:
convert_date_cols = ['CutoffDate', 'ModEffectiveDate', 'IOExpirationDate', 'FirstRateAdjustmentDate', 'GF_Service_Xfer_Date',
'BPODate', 'FirstPaymentDate', 'MaturityDate', 'NextRateAdjustmentDate', 'NextPaymentAdjustmentDate',
'ModFirstPayDate', 'ModMaturityDate', 'REO Date', 'FCDate', 'BK Date', 'Fico Score Date']
for i, col in enumerate(convert_date_cols):
df1[col] = df1[col].astype('datetime64[ns]')
df1[col] = df1[col].dt.strftime('%m/%d/%Y')
You're trying to solve a contradiction in terms. The code you posted explicitly changes the datetime variable to a string representation. It's not reasonable to expect Excel to treat it as a date for you. Note, however, that most progress depends on wanting something "unreasonable", and making it happen.
I suspect that your actual problem is to make the spreadsheet display the date in the desired format. To this end, do not blindly accept the automatic set-up in your excel file. Write a little extra code to specify the column display format to be what you want. See the Excel documentation for details. Then look to see how much of that control you can grab through the Python interface.

Plotly date formatting issue for pandas dataframe

I am using the Plotly python API to upload a pandas dataframe that contains a date column (which is generated via pandas date_range function).
If I look at the pandas dataframe locally the date formatting is as I'd expect, i.e YYYY-MM-DD. However, when I view it in Plotly I see it in the form YYYY-MM-DD HH:MM:SS. I really don't need this level of precision and also having such a wide column results in formatting issues when I try to fit all the other columns that I want in.
Is there a way to prevent Plotly from re-formatting the pandas dataframe?
A basic example of my current approach looks like:
import plotly.plotly as py
from plotly.tools import FigureFactory as FF
import pandas as pd
dates = pd.date_range('2016-01-01', '2016-02-01', freq='D')
df = pd.DataFrame(dates)
table = FF.create_table(df)
py.plot(table, filename='example table')
It turns out that this problem wasn't solvable - Plotly just happened to treat datetimes in that way.
This has since been updated (fixed) - you can read more here.

Categories

Resources