Change data pandas dataframe - python

I want to change the date format in a pandas dataframe. I foud out that it should be something like the code below:
df['date'] = pd.to_datetime(df['date']).dt.strftime("%d-%m-%Y")
The problem is that I get a "SettingWithCopyWarning" everytime I do this. Does somebody know a proper way to do this?

Related

Find the format that is visualized in a dataframe from a datetime type column

I have a problem, that I am not able to automatically solve since I just cannot find how to do it. I would like to extract the format of a datetime column that is visualized when a dataframe is printed.
I have a column within my dataframe that is of the type datetime.datetime. If I print the dataframe I get the following:
And if I print one value I get this:
I am not sure what the approach is to easily return the format of the values in the upper image. Just to be clear, I would like to have code that will return the format, that is shown in the dataframe, in datetime codes. In this example it should return: '%Y-%m-%d %H:%M:%S.%f'.
I am able to return this by first transforming the column to string values and then use the function _guess_datetime_format_for_array() from pandas.core.tools.datetimes, but this approach is a bit excessive in my opinion. Does anyone have a suggestion of a more easy solution?

Converting String to dates in pyspark

I have a String in the format of "MMM-YY" (ie) "jun-22","Jan-22" etc.
I want to convert it into Date with 01st Day of Month in the following format.
Jan-22 --> 01-Jan-22
Feb-21 --> 01-Feb-21
I have tried a few ways but couldn't get to the solution.
Can someone please advise on what is the quickest and most efficient way of doing this in a Pyspark Dataframe.
Code used could be pyspark or Python.
Thanks for the help. I was able to add "01-" at the beginning of the date string and converting it into a date.

Changing Column Data Type Pandas

I'm working on an NBA Project and I am using an API to get data from Basketball Reference. The data type "SEASONS" with the dataframe shown below is as a date time object and I want to change it to a String but I'm unable to. What Am i doing wrong? Code is below.
Data Frame with Seasons column
player["SEASON"]=player["SEASON"].values.astype('str')
line_graph = px.bar(data_frame=player, x='SEASON', y="PTS")
despite doing this my graph still looks like this graph showing it may be in a date time format. Can anyone please help?
If your SEASON column is a pandas datetime object, you can use the .dt.strftime() method:
player["SEASON"]=player["SEASON"].dt.strftime('%Y-%m')
line_graph = px.bar(data_frame=player, x='SEASON', y="PTS")

Pyspark: Pass parameter to String Column in Dataframe

I'm quite new to PySpark and coming from SAS I still don't get how to handle parameters (or Macro Variables in SAS terminology).
I have a date parameter like "202105" and want to add it as a String Column to a Dataframe.
Something like this:
date = 202105
df = df.withColumn("DATE", lit('{date}'))
I think it's quite trivial but so far, I didn't find an exact answer to my problem, maybe it's just too trivial...
Hope you guys can help me out. Best regards
You can use string interpolations i.e. {}.format() (or) f'{}'.
Example:
df.withColumn("DATE", lit("{0}".format(date)))
df.withColumn("DATE", lit("{}".format(date)))
#or
df.withColumn('DATE', lit(f'{date}'))

Exporting Pandas DataFrame cells directly to excel/csv (python)

I have a Pandas DataFrame that has sports records in it. All of them look like this: "1-2-0", "17-12-1", etc., for wins, losses and ties. When I export this the records come up in different date formats within Excel. Some will come up as "12-May", others as "9/5/2001", and others will come up as I want them to.
The DataFrame that I want to export is named 'x' and this is the command I'm currently using. I tried it without the date_format part and it gave the same response in Excel.
x.to_csv(r'C:\Users\B\Desktop\nba.csv', date_format = '%s')
Also tried using to_excel and I kept getting errors while trying to export. Any ideas? I was thinking I am doing the date_format part wrong, but don't know to transfer the string of text directly instead of it getting automatically switched to a string.
Thanks!
I don't think its a python issue, but Excel auto detecting dates in your data.
But, see below to convert your scores to strings.
Try this,
import pandas as pd
df = pd.DataFrame({"lakers" : ["10-0-1"],"celtics" : ["11-1-3"]})
print(df.head())
here is the dataframe with made up data.
lakers celtics
0 10-0-1 11-1-3
Convert to dataframe to string
df = df.astype(str)
and save the csv:
df.to_csv('nba.csv')
Opening in LibreOffice gives me to columns with scores (made up)
You might have a use Excel issue going on here. Inline with my comment below, you can change any column in Excel to lots of different formats. In this case I believe Excel is auto detecting date formatting, incorrectly. Select your columns of data, right click, select format and change to anything else, like 'General'.

Categories

Resources