extract 'date-related' string from a sentence-python - python

I am new to python, and want to find all 'date-related' words in a sentence, such as date, Monday, Tuesday, last week, next week, tomorrow, yesterday, today, etc.
For example:
input: 'Yesterday I went shopping'
return: 'Yesterday'
input: 'I will start working on Tuesday'
return: 'Tuesday'
input: 'My birthday is 1998-12-12'
return: '1998-12-12'
I find that python package 'datefinder' can find these words, but it will automatically change these words to standard datetime. However, I only want to extract these words, is there any other method or package that can do this?
Thanks for your help!

This is how I would do the logic for it, as far as getting the numbers from a string that contains digits as well I'm not sure, I would create and input that would specifically ask for digits then as I did firstSentence.lower() I would then do firstSentence = int(firstSentence) to ensure only ints passed
firstSentence = raw_input('Tell me something: ')
firstSentence = firstSentence.lower()
if 'yesterday' in firstSentence:
#now pass a function that returns date/time
pass
elif 'tuesday' in firstSentence:
#now pass a function that returns date/time
pass
else:
print 'No day found'

Related

How and where to place .lower() in code with multiple conditions?

I am working on a (simple) function. Based on user input (name and month) the function searches in the df. The code sums op the amount of money spent in that shop in the specified month.
Names in the df are written sometimes with capital, sometimes not. So I want all names extracted from df to be lowercase as well as all user input.
Making the name input lowercase is no problem. But how / where do I write .lower in the code with multiple conditions?
So my question is: how do I place .lower around the .str.contains(naam) part?
(code below works well when part of name is typed with Capital letters in the right spot).
def euro_month():
name = input('What shop are you looking for: ')
name = (name.lower())
month = input('Give the month number, 1 - 12: ')
df = df_2019.loc[((df_2019['Name'].str.contains(name)))&(df_2019['Month'] == int(month))]
bedrag = round(df['Bedrag'].sum(),2)
print('We spent in shop', name, 'in month ', str(maand), ' 2019', bedrag, ' Euro's.' )
pandas str.contains() has an argument to make the search not case sensitive https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
in your code:
df = df_2019.loc[((df_2019['Name'].str.contains(name, case=False)))&(df_2019['Month'] == int(month))]
or instead you can use str.lower() https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.lower.html
df = df_2019.loc[((df_2019['Name'].str.lower().str.contains(name, case=False)))&(df_2019['Month'] == int(month))]
This should work.
df = df_2019.loc[((df_2019['Name'].str.lower().str.contains(name))) & (df_2019['Month'] == int(month))]
You can simply call .lower and then call .str.contains

Convert certain numbers in a sentence such as date, time, phone number from numbers to words in Python

I am kind of new to Python so I apologize for my lacks. I have a code in python perfected with other users' help (thank you) that converts a date from numbers into words using dictionaries for days,months,years, like 3.6.2015 => march.third.two thousand fifteen using:
date = raw_input("Give date: ")
I want to input a sentence such as: "today is 3.6.2015, it is 10:00 o'clock and it's rainy" and from it I do not know how to search through the sentence for the date, or time, or phone number and to that date and time to apply the conversion.
If someone can please help, thank you.
You could use regular expressions:
import re
s = "today is 3.6.2015, it is 10:00 o'clock and it's rainy"
mat = re.search(r'(\d{1,2}\.\d{1,2}\.\d{4})', s)
date = mat.group(1)
print date # 3.6.2015
Note, if there's nothing matching this regular expression in the input text, an AttributeError will be raised, that you'll either have to prevent (e.g. if mat:) or handle.
EDIT
Assuming you can turn your conversion code into a function, you could use re.sub:
import re
def your_function(num_string):
# Whatever your function does
words_string = "march.third.two thousand fifteen"
return words_string
s = "today is 3.6.2015, it is 10:00 o'clock and it's rainy"
date = re.sub(r'(\d{1,2}\.\d{1,2}\.\d{4})', your_function, s)
print date
# today is march.third.two thousand fifteen, it is 10:00 o'clock and it's rainy
Just modify your_function to change the 3.6.2015 into march.third.two thousand fifteen.

Get date from string by splitting

I have a batch of raw text files. Each file begins with Date>>month.day year News garbage.
garbage is a whole lot of text I don't need, and varies in length. The words Date>> and News always appear in the same place and do not change.
I want to copy month day year and insert this data into a CSV file, with a new line for every file in the format day month year.
How do I copy month day year into separate variables?
I tryed to split a string after a known word and before a known word. I'm familiar with string[x:y], but I basically want to change x and y from numbers into actual words (i.e. string[Date>>:News])
import re, os, sys, fnmatch, csv
folder = raw_input('Drag and drop the folder > ')
for filename in os.listdir(folder):
# First, avoid system files
if filename.startswith("."):
pass
else:
# Tell the script the file is in this directory and can be written
file = open(folder+'/'+filename, "r+")
filecontents = file.read()
thestring = str(filecontents)
print thestring[9:20]
An example text file:
Date>>January 2. 2012 News 122
5 different news agencies have reported the story of a man washing his dog.
Here's a solution using the re module:
import re
s = "Date>>January 2. 2012 News 122"
m = re.match("^Date>>(\S+)\s+(\d+)\.\s+(\d+)", s)
if m:
month, day, year = m.groups()
print("{} {} {}").format(month, day, year)
Outputs:
January 2 2012
Edit:
Actually, there's another nicer (imo) solution using re.split described in the link Robin posted. Using that approach you can just do:
month, day, year = re.split(">>| |\. ", s)[1:4]
You can use the string method .split(" ") to separate the output into a list of variables split at the space character. Because year and month.day will always be in the same place you can access them by their position in the output list. To separate month and day use the .split function again, but this time for .
Example:
list = theString.split(" ")
year = list[1]
month= list[0].split(".")[0]
day = list[0].split(".")[1]
You could use string.split:
x = "A b c"
x.split(" ")
Or you could use regular expressions (which I see you import but don't use) with groups. I don't remember the exact syntax off hand, but the re is something like r'(.*)(Date>>)(.*). This re searches for the string "Date>>" in between two strings of any other type. The parentheses will capture them into numbered groups.

Extracting sub-string after the first space in Python

I need help in regex or Python to extract a substring from a set of string. The string consists of alphanumeric. I just want the substring that starts after the first space and ends before the last space like the example given below.
Example 1:
A:01 What is the date of the election ?
BK:02 How long is the river Nile ?
Results:
What is the date of the election
How long is the river Nile
While I am at it, is there an easy way to extract strings before or after a certain character? For example, I want to extract the date or day like from a string like the ones given in Example 2.
Example 2:
Date:30/4/2013
Day:Tuesday
Results:
30/4/2013
Tuesday
I have actually read about regex but it's very alien to me. Thanks.
I recommend using split
>>> s="A:01 What is the date of the election ?"
>>> " ".join(s.split()[1:-1])
'What is the date of the election'
>>> s="BK:02 How long is the river Nile ?"
>>> " ".join(s.split()[1:-1])
'How long is the river Nile'
>>> s="Date:30/4/2013"
>>> s.split(":")[1:][0]
'30/4/2013'
>>> s="Day:Tuesday"
>>> s.split(":")[1:][0]
'Tuesday'
>>> s="A:01 What is the date of the election ?"
>>> s.split(" ", 1)[1].rsplit(" ", 1)[0]
'What is the date of the election'
>>>
There's no need to dig into regex if this is all you need; you can use str.partition
s = "A:01 What is the date of the election ?"
before,sep,after = s.partition(' ') # could be, eg, a ':' instead
If all you want is the last part, you can use _ as a placeholder for 'don't care':
_,_,theReallyAwesomeDay = s.partition(':')

Find and replace logic in Python

In python I need a logic for below scenario I am using split function to this.
I have string which contains input as show below.
"ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988."
"ID909900000 25-01-1986 hello 10 minutes."
And output should be as shown below which replace date format to "date" and time format to "time".
"ID674021384 date hello hi thanks time date."
"ID909900000 date hello time."
And also I need a count of date and time for each Id as show below
ID674021384 DATE:2 TIME:1
ID909900000 DATE:1 TIME:1
>>> import re
>>> from collections import defaultdict
>>> lines = ["ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988.", "ID909900000 25-01-1986 hello 10 minutes."]
>>> pattern = '(?P<date>\d{1,2}[/-]\d{1,2}[/-]\d{4})|(?P<time>\d+ minutes)'
>>> num_occurences = {line:defaultdict(int) for line in lines}
>>> def repl(matchobj):
num_occurences[matchobj.string][matchobj.lastgroup] += 1
return matchobj.lastgroup
>>> for line in lines:
text_id = line.split(' ')[0]
new_text = re.sub(pattern,repl,line)
print new_text
print '{0} DATE:{1[date]} Time:{1[time]}'.format(text_id, num_occurences[line])
print ''
ID674021384 date heloo hi thanks time and date.
ID674021384 DATE:2 Time:1
ID909900000 date hello time.
ID909900000 DATE:1 Time:1
For parsing similar lines of text, like log files, I often use regular expressions using the re module. Though split() would work well also for separating fields which don't contain spaces and the parts of the date, using regular expressions allows you to also make sure the format matches what you expect, and if need be warn you of a weird looking input line.
Using regular expressions, you could get the individual fields of the date and time and construct date or datetime objects from them (both from the datetime module). Once you have those objects, you can compare them to other similar objects and write new entries, formatting the dates as you like. I would recommend parsing the whole input file (assuming you're reading a file) and writing a whole new output file instead of trying to alter it in place.
As for keeping track of the date and time counts, when your input isn't too large, using a dictionary is normally the easiest way to do it. When you encounter a line with a certain ID, find the entry corresponding to this ID in your dictionary or add a new one to it if not. This entry could itself be a dictionary using dates and times as keys and whose values is the count of each encountered.
I hope this answer will guide you on the way to a solution even though it contains no code.
You could use a couple of regular expressions:
import re
txt = 'ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988.'
retime = re.compile('([0-9]+) *minutes')
redate = re.compile('([0-9]+[/-][0-9]+[/-][0-9]{4})')
# find all dates in 'txt'
dates = redate.findall(txt)
print dates
# find all times in 'txt'
times = retime.findall(txt)
print times
# replace dates and times in orignal string:
newtxt = txt
for adate in dates:
newtxt = newtxt.replace(adate, 'date')
for atime in times:
newtxt = newtxt.replace(atime, 'time')
The output looks like this:
Original string:
ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988.
Found dates:['25/01/1986', '25-01-1988']
Found times: ['5']
New string:
ID674021384 date heloo hi thanks time minutes and date.
Dates and times found:
ID674021384 DATE:2 TIME:1
Chris

Categories

Resources