How to remove last 5 characters from a string in pandas series - python

Trying to remove "time" from rows in column using pandas '06/07/2020 14:00'
How can I access last 6 characters of a string to replace it using str.replace("x", "")
Your advice will be much appreciated.

data = {'datetime': ['06/07/2020 14:00', '06/07/2020 16:00', '06/07/2020 18:00']}
df = pd.DataFrame(data)
df['date'] = df['datetime'].str[:-6]

.str[:-5]
is the solution I was looking for.

time_and_date = '06/07/2020 14:00'
only_time = time_and_date.split(' ')[1]
# or
only_time = time_and_date[-5:]
if you want to replace the time in the string, you can do it like this:
time_and_date = '06/07/2020 14:00'
new_value_to_be_placed = 'Some value'
new_time_and_date = time_and_date.split(' ')[0] + new_value_to_be_placed

Use apply and lambda expressions. If your column is really a string:
import pandas as pd
from datetime import datetime, timedelta
d1 = {'my_date_str': ['06/07/2020 14:00', '08/07/2020 14:00'], 'my_date': [datetime.now(), datetime.now() - timedelta(days=10)]}
d1 = pd.DataFrame(data=d1)
d1['my_date_str_new'] = d1['my_date_str'].apply(lambda x: x[:10])
if your column is a datetime object:
d1['my_date_new'] = d1['my_date'].apply(lambda x: x.date())

Related

How fill new column with businessDuration result in Dataframe Python

please help me to solve this, How to make new column in df with duration result? also result for all row. Thanks.
import pandas as pd
from datetime import time,datetime
from itertools import repeat
df = pd.read_csv("data.csv")
df['startdate_column'] = pd.to_datetime(df['startdate_column'])
df['enddate_column'] = pd.to_datetime(df['enddate_column'])
start_time=time(8,0,0)
end_time=time(17,0,0)
unit='min'
df['Duration'] = list(map(businessDuration,startdate=df['startdate_column'],enddate=df['enddate_column'],repeat(start_time),repeat(end_time),repeat(weekendlist=[6]),repeat(unit)))```
Use:
f = lambda x: businessDuration(startdate=x['startdate_column'],
enddate=x['enddate_column'],
starttime=start_time,
endtime=end_time,
weekendlist=[6],
unit=unit)
df['Duration'] = df.apply(f, axis=1)

Partial string filter pandas

On Pandas 1.3.4 and Python 3.9.
So I'm having issues filtering for a partial piece of the string. The "Date" column is listed in the format of MM/DD/YYYY HH:MM:SS A/PM where the most recent one is on top. If the date is single digit (example: November 3rd), it does not have the 0 such that it is 11/3 instead of 11/03. Basically I'm looking to go look at column named "Date" and have python read parts of the string to filter for only today.
This is what the original csv looks like. This is what I want to do to the file. Basically looking for a specific date but not any time of that date and implement the =RIGHT() formula. However this is what I end up with with the following code.
from datetime import date
import pandas as pd
df = pd.read_csv(r'file.csv', dtype=str)
today = date.today()
d1 = today.strftime("%m/%#d/%Y") # to find out what today is
df = pd.DataFrame(df, columns=['New Phone', 'Phone number', 'Date'])
df['New Phone'] = df['Phone number'].str[-10:]
df_today = df['Date'].str.contains(f'{d1}',case=False, na=False)
df_today.to_csv(r'file.csv', index=False)
This line is wrong:
df_today = df['Date'].str.contains(f'{d1}',case=False, na=False)
All you're doing there is creating a mask; essentially what that is is just a Pandas series, containg True or False in each row, according to the condition you created the mask in. The spreadsheet get's only FALSE as you showed because non of the items in the Date contain the string that the variable d1 holds...
Instead, try this:
from datetime import date
import pandas as pd
# Load the CSV file, and change around the columns
df = pd.DataFrame(pd.read_csv(r'file.csv', dtype=str), columns=['New Phone', 'Phone number', 'Date'])
# Take the last ten chars of each phone number
df['New Phone'] = df['Phone number'].str[-10:]
# Convert each date string to a pd.Timestamp, removing the time
df['Date'] = pd.to_datetime(df['Date'].str.split(r'\s+', n=1).str[0])
# Get the phone numbers that are from today
df_today = df[df['Date'] == date.today().strftime('%m/%d/%Y')]
# Write the result to the CSV file
df_today.to_csv(r'file.csv', index=False)

How to add seconds in a datetime

I need to add seconds in YYYY-MM-DD-HH-MM-SS. My code works perfectly for one data point but not for the whole set. The data.txt consists of 7 columns and around 200 rows.
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
df = pd.read_csv('data.txt',sep='\t',header=None)
a = np.array(list(df[0]))
b = np.array(list(df[1]))
c = np.array(list(df[2]))
d = np.array(list(df[3]))
e = np.array(list(df[4]))
f = np.array(list(df[5]))
g = np.array(list(df[6]))
t1=datetime(year=a, month=b, day=c, hour=d, minute=e, second=f)
t = t1 + timedelta(seconds=g)
print(t)
You can pass parameter names to read_csv for new columns names in first step and then convert first 5 columns to datetimes by to_datetime and add seconds converted to timedeltas by to_timedelta:
names = ["year","month","day","hour","minute","second","new"]
df = pd.read_csv('data.txt',sep='\t',names=names)
df['out'] = pd.to_datetime(df[names]) + pd.to_timedelta(df["new"], unit='s')
use apply with axis=1 to apply a function to every row of the dataframe.
df.apply(lambda x: datetime(year=x[0],
month=x[1],
day=x[2],
hour=x[3],
minute=x[4],
second=x[5]) + timedelta(seconds=int(x[6])) , axis=1)
generating dataset
simple to do as pandas series
s = 20
df = pd.DataFrame(np.array([np.random.randint(2015,2020,s),np.random.randint(1,12,s),np.random.randint(1,28,s),
np.random.randint(0,23,s), np.random.randint(0,59,s), np.random.randint(0,59,s),
np.random.randint(0,200,s)]).T,
columns=["year","month","day","hour","minute","second","add"])
pd.to_datetime(df.loc[:,["year","month","day","hour","minute","second"]]) + df["add"].apply(lambda s: pd.Timedelta(seconds=s))
without using apply()
pd.to_datetime(df.loc[:,["year","month","day","hour","minute","second"]]) + pd.to_timedelta(df["add"], unit="s")

Copy and convert all values in pandas dataframe

In a dataframe, I have a column "UnixTime" and want to convert it to a new column containing the UTC time.
import pandas as pd
from datetime import datetime
df = pd.DataFrame([1565691196, 1565691297, 1565691398], columns = ["UnixTime"])
unix_list = df["UnixTime"].tolist()
utc_list = []
for i in unix_list:
i = datetime.utcfromtimestamp(i).strftime('%Y-%m-%d %H:%M:%S')
utc_list.append(i)
df["UTC"] = utc_list
This works, but I guess there is a smarter approach?
Could you try this:
df["UTC"] = pd.to_datetime(df['UnixTime'], unit='s')
If you mean by smarter approach is pandas-way and less code, then this is your answer :
df["UTC"] = pd.to_datetime(df["UnixTime"], unit = "s")
Hope this helps.

How to change datetime format?

I have timestamps given in the following format in my pandas DataFrame df: 2015-03-09 11:09:05.0.
How can I transform them into this format 2015-03-09T11:09:05.0 (i.e. separated by T)?
df["dt"] = df["dt"].apply(lambda x: ???)
You were almost there. You are looking for the the isoformat. https://docs.python.org/3.6/library/datetime.html#datetime.date.isoformat
import pandas as pd
df = pd.DataFrame({'dt':pd.to_datetime(['2015-03-09 11:09:05.0'])})
df["dt"] = df["dt"].apply(lambda x: x.isoformat())
df
Returns
dt
0 2015-03-09T11:09:05
You can change the T (default) by inserting a parameter to isoformat(), e.g. df["dt"] = df["dt"].apply(lambda x: x.isoformat(" "))
Use strftime with custom format:
df = pd.DataFrame({'dt':pd.to_datetime(['2015-03-09 11:09:05.0'])})
print (df)
df["dt"] = df["dt"].dt.strftime('%Y-%m-%dT%H:%M:%S.%f')
print (df)
dt
0 2015-03-09T11:09:05.000000
Or convert to string, split by whitespace and join by T:
df["dt"] = df["dt"].astype(str).str.split().str.join('T')

Categories

Resources