Can I insert comma into numbers? The numbers are parts of strings - python

I want to do like this. Do you know a good way?
import re
if __name__ == '__main__':
sample = "eventA 12:30 - 14:00 5200yen / eventB 15:30 - 17:00 10200yen enjoy!"
i_want_to_generate = "eventA 12:30 - 14:00 5,200yen / eventB 15:30 - 17:00 10,200yen enjoy!"
replaced = re.sub("(\d{1,3})(\d{3})", "$1,$2", sample) # Wrong.
print(replaced) # eventA 12:30 - 14:00 $1,$2yen / eventB 15:30 - 17:00 $1,$2yen enjoy!

You're not using the correct notation for your back-reference(s). You could also add a positive lookahead assertion containing the currency to ensure only those after the 'yen' are changed:
replaced = re.sub(r"(\d{1,3})(\d{3})(?=yen)", r"\1,\2", sample) # Wrong.
print(replaced)
# eventA 12:30 - 14:00 5,200yen / eventB 15:30 - 17:00 10,200yen enjoy!

Use \1 instead of $1 for substitution
Check: https://regex101.com/r/T2sbD2/1

Related

How to formulate logic that will parse month-day date pair and append year based on previous value in Pandas row

Hello and thanks for taking a moment to read my issue. I currently have a column or series of data within a Pandas dataframe that I am attempting to parse into a proper YYYY-MM-DD (%Y-%m-%d %H:%M) type format. The problem is this data does not contain a year on its own.
cur_date is what I currently have to work with.
cur_date
Jan-20 14:05
Jan-4 05:07
Dec-31 12:07
Apr-12 20:54
Jan-21 06:12
Nov-3 04:10
Feb-5 11:45
Jan-7 07:09
Dec-3 12:11
req_date is what I am aiming to achieve.
req_date
2023-01-20 14:05
2023-01-04 05:07
2022-12-31 12:07
2022-04-12 20:54
2022-01-21 06:12
2021-11-03 04:10
2021-02-05 11:45
2021-01-07 07:09
2020-12-03 12:11
I am aware of writing something like the following df['cur_date'] = pd.to_datetime(df['cur_date'], format='%b-%d %H:%M') but this will not allow me to append a descending year to the individual row.
I tried various packages, one being dateparser which has some options to handle incomplete dates such as the settings={'PREFER_DATES_FROM': 'past'} setting but this does not have the capability to look back at a previous value and interpret the date as I am looking for.
i hope these codes work for you :)
note: When the epoch value is equal, it's up to you whether to change the year or not
import time
current_year = 2023
last = {"ly":current_year, "epoch":0}
def set_year(tt):
epoch = time.mktime(tt)
if epoch > last["epoch"] and last["epoch"] != 0: # first year must current year or you can compare with current time
last["ly"] -= 1
last["epoch"] = epoch
return str(last["ly"])
def transform_func(x):
time_tup = time.strptime(f"{current_year}-"+x, "%Y-%b-%d %H:%M") # const year for comparing
time_format = time.strftime("%m-%d %H:%M", time_tup)
ly = set_year(time_tup)
return f"{ly}-{time_format}"
df["req_date"] = df["cur_date"].transform(transform_func)

Pandas datetime inconsistent format

I hope someone can help me with the following:
I'm trying to convert my data to daily averages using:
df['timestamp'] = pd.to_datetime(df['Datum WSM-09'])
df_daily_avg = df.groupby(pd.Grouper(freq='D', key='timestamp')).mean()
df['Datum WSM-09'] looks like this:
0 6-3-2020 12:30
1 6-3-2020 12:40
2 6-3-2020 12:50
3 6-3-2020 13:00
4 6-3-2020 13:10
...
106785 18-3-2022 02:00
106786 18-3-2022 02:10
106787 18-3-2022 02:20
106788 18-3-2022 02:30
106789 18-3-2022 02:40
Name: Datum WSM-09, Length: 106790, dtype: object
However, when executing the first line the data under "timestamp" is inconsistent. The last rows displayed in the picture are correct. For the first ones, it should be 2020-03-06 12:30. The month and the day are switched around.
Many thanks
Try using the "dayfirst" option:
df['timestamp'] = pd.to_datetime(df['Datum WSM-09'], dayfirst=True)
In https://xkcd.com/1179 Randall Munroe explains
that "you're doing it Wrong."
Your source column is apparently object / text.
The March 18th timestamps are unambiguous,
as there's fewer than 18 months in the year.
The ambiguous March 6th timestamps make the hair
on the back of the black cat stand on end.
You neglected to specify a timestamp format,
given that the source column is ambiguously formatted.
Please RTFM:
format : str, default None
The strftime to parse time, e.g. "%d/%m/%Y". Note that "%f" will parse all the way up to nanoseconds. See strftime documentation for more information on choices.
You tried offering a value of None,
which is not a good match to your business needs.
I don't know what all of your input data looks like,
but perhaps %d-%m-%Y %H:%M would better
match your needs.

Filtering Pandas time series by specific EST time of day

I am trying to match rows in a pandas dataframe where the DatetimeIndex is in US/Eastern timezone at exactly 15:30:00 each day US/Eastern time by doing the following:
check_time = pd.to_datetime("15:30:00").time()
last_30m_mask = df.index.time == check_time
up_df = df[last_30m_mask]
However the rows I get back are as follows:
w1 w2
timestamp
2021-08-04 15:30:00-04:00 382.37 388.27
2021-08-05 15:30:00-04:00 395.65 400.78
2021-08-09 15:30:00-04:00 434.79 437.04
...
Am I correct in thinking that this is instead giving me 15:30 UTC which is 11:30 EST (or 10:30 EST for most of the year)?
If so, how would I re-write the check_time variable to give me 15:30 EST (US/Eastern) at all times?
As FObersteiner correctly pointed out the timestamp is indeed local and the offset gives you the delta to UTC.
The error I was committing was on the other end. When converting my data from the source I wasn't giving it the proper context.
I had:
time_col = pd.to_datetime(source_data["time"]).tz_localize("US/Eastern")
Whereas I needed to have:
time_col = pd.to_datetime(source_data["time"]).tz_localize("UTC").tz_convert("US/Eastern")
This way I can now correctly compare my local times with pd.to_datetime("XX:XX:XX").time() as desired.

How to add and subtract times in python [duplicate]

This question already has answers here:
How to compare times of the day?
(8 answers)
python time subtraction
(1 answer)
Closed 6 years ago.
I want to write a simple timesheet script. My input file looks like
9:00 17:00
10:45 12:35
11:00 15:00
I would like to read it in, compute the number of hours worked per day and then sum those hours up. When the day started before 12:00 and ended after 13:00 I would like to subtract half an hour too for lunch break.
My attempt so far is:
import sys
from datetime import datetime
gap = datetime.strptime('00:30','%H:%M')
hours = []
for line in sys.stdin:
(start_time,end_time) = line.split()
start_time = datetime.strptime(start_time, '%H:%M')
end_time = datetime.strptime(end_time, '%H:%M')
#if start_time before 12:00 and end_time after 13:00 then subtract gap
hours.append(end_time-start_time)
print sum(hours)
I don't know how to get the if statement line to work and summing the hours doesn't seem to work either as you can't sum datetime.timedelta types.
Thanks to the link in the comments, replacing sum(hours) with reduce(operator.add, hours) works.
The remaining part is how to test if start_time is before 12:00 and end_time is after 13:00 and if so to reduce the timedelta by half an hour.
You are using incorrect code (and syntax) in your if statement.
if start_time < datetime.strptime('12:00', '%H:%M') and end_time > datetime.strptime('13:00', '%H:%M'):
delta_hours = end_time.hour - start_time.hour)
delta_minutes = end_time.minutes - start_time.minutes)
# Do whatever your want with it now.
# Substraction of the break is not implemented in this example, it depends on how you want to save it.
Time delta might be worth looking into as well, it can use basic operations like
a = timedelta(...)
b = timedelta(...)
c = b - a - gap

Add a space in string between time and am/pm in pandas [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
My data in a pandas DataFrame looks like this:
time
10:00am
11:00am
12:30pm
1:45pm
10:00pm
My desired output is:
time
10:00 am
11:00 am
12:30 pm
1:45 pm
10:00 pm
You could use str.replace:
df['time'].str.replace(r'(am|pm)', r' \1')
Or as #Kartik suggests, Series.replace can do the same thing:
df['time'].replace(to_replace=r'(am|pm)', value=r' \1', regex=True)
Or slicing and concatenation:
df['time'].str[:-2] + ' ' + df['time'].str[-2:]
Both produce the new column:
0 10:00 am
1 11:00 am
2 12:30 pm
3 1:45 pm
4 10:00 pm
Name: time, dtype: object
As an aside, if you're working with times and/or dates pandas has very good support for datetime and timedelta types. These are much easier to work with than string types if you're doing analysis.

Categories

Resources