How can I grab the date from the top of this csv?

How can I grab the date from the top of this csv? - python

Hi! I have this csv I'm trying to grab the date from using pandas. the date is located above the header in the picture above. I thought I could just grab row 3 but that doesn't seem to work. Here is my code. My goal is to convert that date into datetime so I can recognize what day I'm grabbing info from. The name of the csv unfortunately has the wrong date.
datetime_df = pd.read_csv(holdings_file)
print(datetime_df.row(3))
AttributeError: 'DataFrame' object has no attribute 'row'

To get a value at a certain cell in a dataframe, you need to use iat rather than row. Also, if you want that date, you want the 3rd column not the 3rd row.
datetime_df = pd.read_csv(holdings_file)
print(datetime_df.iat[0,3])

Related

Changing Column Data Type Pandas

I'm working on an NBA Project and I am using an API to get data from Basketball Reference. The data type "SEASONS" with the dataframe shown below is as a date time object and I want to change it to a String but I'm unable to. What Am i doing wrong? Code is below.
Data Frame with Seasons column
player["SEASON"]=player["SEASON"].values.astype('str')
line_graph = px.bar(data_frame=player, x='SEASON', y="PTS")
despite doing this my graph still looks like this graph showing it may be in a date time format. Can anyone please help?

If your SEASON column is a pandas datetime object, you can use the .dt.strftime() method:
player["SEASON"]=player["SEASON"].dt.strftime('%Y-%m')
line_graph = px.bar(data_frame=player, x='SEASON', y="PTS")

'float' object is not subscriptable error when using pandas to extract data from cell

I am importing a CSV file and I want to look at the school_hours column and extract the starting hours. Here are the first few lines from the column (you can assume that the data is already imported)
Essentially, I want to create a new column named starting_hours that should look something like this:
This is the following code that I wrote:
df['starting_hours']= [i[0:5] for i in df.School_Hours.str.split(' ').values]
I keep getting a 'float' object is not subscriptable error

You can try str.split then access the first element of list with str[0]
df['starting_hours'] = df['School_Hours'].str.split('-').str[0].str.strip('AM|PM')
print(df)
School_Hours starting_hours
0 08:00AM-O3:00PM 08:00

How to change date from “yyyy-mm-dd” to “mm/dd/yyyy” in Python?

I have a column in my Excel file (before importing into IDE with read_csv) with dates that begin as string type with the format of “yyyy-mm-dd” and I need to change that entire column to date type with format of “mm/dd/yyyy” as I’m importing it as a data frame in Python with Pandas.
Also, it would be great if the format could be where if the month and/or day is a single digit, then it comes out like “1/4/2021”. But if one or both are plural, then it comes out as “1/12/2021” or “10/8/2021” or “11/16/2020”.
I currently have this code:
df = df.df.strptime(“Date”, “%Y-%m-%d”).strftime(“%m/%d/%Y”)
But the IDE is saying there’s a syntax error. And I’m not sure if this is close to correct in terms of making sure the entire column is being changed.

This line will change the format "yyyy-mm-dd" to "mm/dd/yyyy"
df = df[5:7]+'/'+df[8:10]+'/'+df[0:4]

Handling timestamps with timezones in Pandas and Rpy2

I'm trying to understand how to add a row that contains a timestamp to a Pandas dataframe that has a column with a data type of datetime64[ns, UTC]. Unfortunately, when I add a row, the column datatype changes to object, which ends up breaking conversion to a R data frame via Rpy2.
Here are the interesting lines of code where I'm seeing the problem, with debug printing statements around it whose output I'll share as well. The variable observation is a simple python list whose first value is a timestamp. Code:
print('A: df.dtypes[0] = {}'.format(str(df.dtypes[0])))
print('observation[0].type = {}, observation[0].tzname() = {}'.format(str(type(observation[0])), observation[0].tzname()))
df.loc[len(df)] = observation
print('B: df.dtypes[0] = {}'.format(str(df.dtypes[0])))
Here is the output of the above code snippet:
A: df.dtypes[0] = datetime64[ns, UTC]
observation[0].type = <class 'datetime.datetime'>, observation[0].tzname() = UTC
B: df.dtypes[0] = object
What I'm observing is that the datatype of the column is being changed when I append the row. As far as I can tell, Pandas is adding the timestamp as an instance of . The rpy2 pandas2ri module seems to be unable to convert values of that class.
I've so far been unable to find an approach that lets me append a row to the data frame and preserve the column type for the timestamp column. Suggestions would be welcome.
==========================
Update
I've been able to work around the problem in a hacky way. I create a one-row temporary dataframe from the list of values, then set the types on the columns for this one-row dataframe. Then I append the row from this temporary dataframe to the one I'm working on. This is the only approach I was able to identify that preserves the column type of the dataframe I'm appending to. It's almost enough to make me pine for a strongly typed language.
I'd prefer a more elegant solution, so I'm leaving this open in case anyone can suggest one.

Check this post for an answer, especially the answer by Wes McKinney:
Converting between datetime, Timestamp and datetime64

Get result of value_count() to excel from Pandas

I have a data frame "df" with a column called "column1". By running the below code:
df.column1.value_counts()
I get the output which contains values in column1 and its frequency. I want this result in the excel. When I try to this by running the below code:
df.column1.value_counts().to_excel("result.xlsx",index=None)
I get the below error:
AttributeError: 'Series' object has no attribute 'to_excel'
How can I accomplish the above task?

You are using index = None, You need the index, its the name of the values.
pd.DataFrame(df.column1.value_counts()).to_excel("result.xlsx")

If go through the documentation Series had no method to_excelit applies only to Dataframe.
So either you can save it another frame and create an excel as:
a=df.column1.value_counts()
a.to_excel("result.xlsx")
Look at Merlin comment I think it is the best way:
pd.DataFrame(df.column1.value_counts()).to_excel("result.xlsx")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I grab the date from the top of this csv? - python

To get a value at a certain cell in a dataframe, you need to use iat rather than row. Also, if you want that date, you want the 3rd column not the 3rd row. datetime_df = pd.read_csv(holdings_file) print(datetime_df.iat[0,3])

Related

Changing Column Data Type Pandas

'float' object is not subscriptable error when using pandas to extract data from cell

How to change date from “yyyy-mm-dd” to “mm/dd/yyyy” in Python?

Handling timestamps with timezones in Pandas and Rpy2

Get result of value_count() to excel from Pandas

Categories

Resources