I realize this is probably a very trivial question but I have a dataframe of 1000+ rows and I want to create a new column "Date" but for a single date "2018-01-31". I tried the code below but python just returns "Length of values (1) does not match length of index"
I would really appreciate any help!
Date = ['2018-01-31']
for i in range(len(Output)):
Output['Date']= Date
Assuming Output is the name of your pandas dataframe with 1000+ rows you can do:
Output['Date'] = "2018-01-31"
or using the datetime library you could do:
from datetime import date
Output["Date"] = date(2018, 1, 31)
to format it as a date object rather than a string. You also do not need to iterate over each row if you are wanting the same value for each row. Simply adding a new column with the value will set the value of the new column to the assigned value for each row.
Related
I have a dataframe with the following details: (first df in image)
I want to be able to add new rows to to df that calculate the column next_apt + days with the new timestamp that it was run. So I want it to look like this:
the other columns should be left as it it. just add the next next_apt with the newer timestamp that it was calculated and append the rows to the same df.
Use date_add and cast it to timestamp
This should work:
df1.withColumn("newDateWithTimestamp", F.date_add(F.col("next_apt"), F.col("days")).cast("timestamp")).show()
Input
Output
I have the below dataframe and i am trying to display how many rides per day.
But i can see only 1 column "near_penn" is considered as a column but "Date" is not.
c = df[['start day','near_penn','Date']]
c=c.loc[c['near_penn']==1]
pre_pandemic_df_new=pd.DataFrame()
pre_pandemic_df_new=c.groupby('Date').agg({'near_penn':'sum'})
print(pre_pandemic_df_new)
print(pre_pandemic_df_new.columns)
Why doesn't it consider "Date" as a column?
How can i make Date as a column of "pre_pandemic_df_new"?
Feel you can use to to_datetime method.
import pandas as pd
pre_pandemic_df_new["Date"]= pd.to_datetime(pre_pandemic_df_new["Date"])
Hope this works
Why doesn't it consider "Date" as a column?
Because the date is an index for your Dataframe.
How can I make Date as a column of "pre_pandemic_df_new"?
you can try this:
pre_pandemic_df_new.reset_index(level=['Date'])
df[['Date','near_penn']] = df[['Date_new','near_penn_new']]
Once you created your dataframe you can try this to add new columns to the end of the dataframe to test if it works before you make adjustments
OR
You can check for a value for the first row corresponding to the first "date" row.
These are the first things that came to my mind hope it helps
I have a dataframe, which contains following records:
I need to fill this dataframe with rows with dates which are not present in it.
After inserting new dates the timestamp column should be in range df.timestamp.iloc[0] and df.timestamp.iloc[0]
You can use relativedelta() along with split() from the datetime library
Very simple query but did not find the answer on google.
df with timestamp in date column
Date
22/11/2019 22:30:10 etc. say which is of the form object on doing df.dtype()
Code:
df['Date']=pd.to_datetime(df['Date']).dt.date
Now I want the date to be converted to datetime using column number rather than column name. Column number in this case will be 0(I have very big column names and similar multipe files, so I want to change date column to datetime using its position '0' in this case).
Can anyone help?
Use DataFrame.iloc for column (Series) by position:
df.iloc[:, 0] = pd.to_datetime(df.iloc[:, 0]).dt.date
Or is also possible extract column name by indexing:
df[df.columns[0]] = pd.to_datetime(df[df.columns[0]]).dt.date
I have a dataframe jobs screenshot of dataframe
I need to add a new column ‘year’ to jobs data frame. This column should contain the corresponding year for each post_date (which is already a column). For example: for post_date value 2017-08-16 ‘year’ value should be 2017.
I am unsure how to insert a new column while also pulling data from a pre-existing column.
Use dt.year:
jobs['year'] = pd.to_datetime(jobs['post_date'], errors='coerce').dt.year
I would begin by transforming the column post_date into date format. After doing this, you could use a simple function to extract the year.
jobs["post_date"] =pd.to_datetime(jobs["post_date"])
should be enough to change it into a datetime type. If it doesnt you should use datetime strpstring in order to tell python what is the specific format of the "post_date" column, so it to read it as a date. After that do the following:
jobs["year"] =jobs["post_date"].dt.year
If I understand your question correctly, you want to add a new column of values of years to the existing dataframe from a column in your current dataframe.
For extracting only the year values, you need to do some calculations first. You can make use of pandas datetime.datetime and extract only the values of the year in your Post_date column. Have a look at this or this.
For storing these year values, you can simply do this:
jobs['year'] = jobs['post_date'].dt.year