Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have multiple strings in the form of :
AM-2019-04-22 06-47-57865BCBFB-9414907A-4450BB24
And I need the month from the date part replaced with something else, for example:
AM-2019-07-22 06-47-57865BCBFB-9414907A-4450BB24
How can I achieve this using python and regex?
Also, I have multiple text files that contain a line similar to this:
LocalTime: 21/4/2019 21:48:41
And I need to do the same thing as above (replace the month with something else).
For your first example:
import re
string = 'AM-2019-07-22 06-47-57865BCBFB-9414907A-4450BB24'
replace = '0'+str(int((re.match('(AM-\d{4}-)(\d{2})',string).group(2)))+2)
re.sub('(AM-\d{4}-)(\d{2})',r'\g<1>'+replace,string) #Replace 07 for 07 +2
Output:
Out[10]: 'AM-2019-09-22 06-47-57865BCBFB-9414907A-4450BB24'
For the second one:
string2 = 'LocalTime: 21/4/2019 21:48:41'
replace = str(int((re.match(r'(LocalTime: \d{1,2}/)(\d{1,2}).*',string2).group(2)))+2)
re.sub('(Time: \d{2}/)(\d{1,2})',r'\g<1>'+replace,string2) #Replace 4 for 6
Output:
Out[14]: 'LocalTime: 21/6/2019 21:48:41'
If you want to limit the months in which this operation is done, you can use an if statement:
if re.match('(AM-\d{4}-)(\d{2})',string).group(2).isin(['04','05','06']:
if re.match(r'(LocalTime: \d{1,2}/)(\d{1,2}).*',string2).group(2).isin(['4','5','6']:
Similar answer but with more code and a lookbehind.
First question:
import re
#This could be any number of strings, or some other iterable
string_list = []
string_list.append("AM-2019-04-22 06-47-57865BCBFB-9414907A-4450BB24")
string_list.append("AM-2019-07-22 06-47-57865BCBFB-9414907A-4450BB24")
#This checks for four digits and a hyphen, then any number of digits to
#replace (which is the month
pattern = r"(?<=\d{4}-)\d+"
#This should be a string
month = "08"
for string in string_list:
print("BEFORE: " + string)
string = re.sub(pattern, month, string)
print("AFTER: " + string)
Second question:
import re
#checks for a colon, 2 numbers, a forward slash, and then selects however many numbers after (which is the month)
pattern = r"/(?<=: \d{2}/)\d+"
#IMO it's better just to write to another file. You can edit the current file you're on, but it's cleaner this way and you won't accidentally screw it up if my regex is wrong.
in_file = open("mytextfile.txt", 'r')
out_file = open("myoutputfile.txt", 'w')
#This should be a string
month = "9"
for line in in_file:
changed_line = re.sub(pattern, month, line)
out_file.write(changed_line)
in_file.close()
out_file.close()
Hope this helps.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I would like to delete all the rows that does not follow this pattern
01-12-2002 12:00:00
My column is type('O') and I would like to convert it into datetime, but unfortunately there are some rows which contain text.
What I thought was to exclude all the rows which do no follow that pattern (using regex I would say \w+-\w+-\w+\s\w+-\w+-\w+) and not digit.
However, it seems that the pattern above does work when applied to the column.
I would appreciated if you could tell me how to fix the pattern above in order to exclude (or just replace with null values) the rows not containing that schema.
Try .str.match:
# sample data
df = pd.DataFrame({'your_column':['01-12-2002 12:00:00', 'This 01-12-2002 12:00:00',
'Another row', '01-12-2002 12:00:01']})
# different pattern than yours, notice the two `:`
df.loc[df['your_column'].str.match('^\w+-\w+-\w+\s\w+:\w+:\w+$')]
Output:
your_column
0 01-12-2002 12:00:00
3 01-12-2002 12:00:01
Your mistake was that you used - after the space when you should've used :. Also, you should use \d instead of \w, since \w allows for letters.
import re
teststr = """
01-12-2002 12:00:00
02-27-2012 11:12:34
this is text
08-03-2004 01:13:37
""".strip()
# re.M is multiline flag that lets ^ match start of line and $ match end of line
pattern = re.compile(r"^\d+-\d+-\d+\s\d+:\d+:\d+$", re.M)
# find all the lines that match and join on newline
filtered = "\n".join(pattern.findall(teststr))
print(filtered)
"""
prints:
01-12-2002 12:00:00
02-27-2012 11:12:34
08-03-2004 01:13:37
"""
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
**im a newbie here. Im kinda working on a program that counts words that starts with capital letter, per line inside a csv file using python. I used regex but i think it doesnt work. Here is the sample code that i made but unfortunately it doesnt give the output that i want. hope you could help me.
**
import re
line_details = []
result = []
count = 0
total_lines = 0
class CapitalW(): #F8 Word that starts with capital letter count
fh = open(r'20items.csv', "r", encoding = "ISO-8859-1").read()
#next(fh)
for line in fh.split("n"):
total_lines += 1
for line in re.findall('[A-Z]+[a-z]+$', fh):
count+=1
line_details.append("Line %d has %d Words that start with capital letter" %
(total_lines, count))
for line in line_details:
result7 = line
print (result7)
**- result should be as follows:
Line 1 has 2 Words that start with capital letter
Line 2 has 5 Words that start with capital letter
Line 3 has 1 Words that start with capital letter
Line 4 has 10 Words that start with capital letter**
In the regex you doens't need the $ character beacause [A-Z]+[a-z]+$ matches only if there is one word in the line. So [A-Z]+[a-z]+ instead.
The other, is, that I see from the encoding, that you maybe use characters what are not between a-z for example é. So you maybe have to add these also to the pattern. [A-ZÉÖ]+[a-zéö]+ and add all the other special characters.
Assuming a fixed indentation and in addition to matebende's answer, these are the required further corrections:
for line in fh.split("n"): is supposed to be for line in fh.split("\n"):.
The initialization count = 0 has to be inside this for loop.
The fh in for line in re.findall('[A-Z]+[a-z]+$', fh): is wrong and has to be line.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I python code to read lines in a text file and to copy text between specific characters. For example, text between _ _.
Input
./2425/1/115_Lube_45484.jpg 45484
./2425/1/114_Spencerian_73323.jpg 73323
Output
./2425/1/115_Lube_45484.jpg 45484
Lube
./2425/1/114_Spencerian_73323.jpg 73323
Spencerian
Any suggestions?
Instead of regex i would use build in: split()
input = './2425/1/114_Spencerian_73323.jpg 73323'
output = input.split('_')[1]
print(output)
Of course if every line has double _ in input string
Try this:
import re
for line in your_text.splitlines():
result = re.match("_(.*)_", your_text)
print(match.group(0))
print(match.group(1))
Where your_text is a string containing your example as above.
test = './2425/1/114_Spencerian_73323.jpg_abc_ 73323'
result = test.split("_",1)[1].split("_")[0]
print(result)
.split('',1) splits the string in 2 parts i-e: 0 index will be left substring of '' and 1 index will be right substring of string. We again split the right part of string with '_' so that the text between _ will be extracted.
Note : this will be helpful only when there is single occurence of text between _ like test. It wont extract text if there exist this case multiple times in a string
Solved.
file_path = "text_file.txt"
with open(file_path) as f:
line = f.readline()
count= 1
while line:
print(line,line.split('_')[1])
line = f.readline()
count+= 1
Thank you all
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm trying to understand matching patterns before or after a specific character.
I have a string:
myString = "A string with a - and another -."
QUESTION: What regular expression should be used in the following substitution function that allows me to match anything after the first '-' character such that the following function would print everything before it?
print re.sub(r'TBD', '', myString) # would yield "A string with a "
QUESTION How would it change if I wanted to match everything before the first '-' character?
print re.sub(r'TBD', '', myString) # would yield " and another -."
Thanks, in advance, for any help you can offer.
You may use the following solution for re.sub:
import re
myString = "A string with a - and another -."
print(re.sub(r'-.*',r'',myString))
#A string with a
print(re.sub(r'^[^-]+-',r'',myString))
# and another -.
re.search will get you an answer, and you can edit it by converting it to list and rejoin it.
import re
m = re.compile(r'(.*?)[-]')
p = m.search('A string with a - and another -.')
print(''.join(list(p.group())[:-1]))
n = re.compile(r'-(.*?)-')
q = n.search('A string with a - and another -.')
print(''.join(list(q.group())[1:]))
Use re.search with lookahead and lookbehind to match the first occurance:
import re
myString = "A string with a - and another -."
print(re.search(r'.*?(?=-)', myString).group()) # A string with a
print(re.search(r'(?<=-).*', myString).group()) # and another -.
There is a better way if you are sure a regex is not mandatory:
myString = "A string with a - and another -."
splitted = myString.split('-')
print(splitted[0]) # A string with a
print('-'.join(splitted[1:])) # and another -.
str.partition() works on the first occurrence, you could use this to partition your string then you would have a list with everything before and after in separate indexes.
my_string = "A string with a - and another -."
s = my_string.partition('-')
print(s[0]) # A string with a
print(s[-1]) # and another -.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I've a string like this Delete File/Folder. I need to break the sentence based on the / equivalent to or.
Finally need to generate two strings out of this like Delete File as one string and Delete Folder as the other one.
I've tried very naive way where I check for the index of / and then form strings with a bunch of conditions.
It some times fails when we have string like File/Folder Deleted.
Edit:
If you split on / then for case 1 we have Delete File and Folder. Then I'll check for spaces present in first string and spaces present is second string.
The one which has less number of spaces will be replaced with first string last element. This is getting complicated.
In the case of Delete File/Folder, thinking through why the word Delete gets distributed to both of File and Folder might help with the inherent assumptions we all intuitively make when lexical parsing.
For instance, it would be parsed between the the i and l to return ["Delete File", "Delete FiFolder"].
It sounds like you want to want to split the string into words based on where there are spaces and then split each word based on / to generate new full strings.
>>> import itertools
>>> my_str = "Delete File/Folder"
>>> my_str = ' '.join(my_str.split()).replace('/ ', '/').replace(' /', '/') # First clean string to ensure there aren't spaces around `/`
>>> word_groups = [word.split('/') for word in my_str.split(' ')]
>>> print [' '.join(words) for words in itertools.product(*word_groups)]
['Delete File', 'Delete Folder']
Do you want that? Comment if you want a more generalized solution.
lst = your_string.split()[1].split("/")
finalList=[]
for i in lst:
finalList.append("Delete {0}",i)
print finalList
For string:
Delete File/Folder
Output:
['Delete File', 'Delete Folder']
str = "Do you want to Delete File/Folder?"
word = str.split(" ")
count = str.count("/")
c = True
for j in range(0,2*count):
for i in word:
if("/" in i):
words = i.split("/")
if c:
print words[1],
else:
print words[0],
else:
print i, # comma not to separate line
c = not c
print
output
Do you want to Delete File
Do you want to Delete Folder?
st1 = "Do you want to Delete File/Folder"
st2 = "File/Folder Updated"
def spl(st):
import re
li = []
ff = re.search(r'\w+/\w+',st).group()
if ff:
t = ff.split('/')
l = re.split(ff,st)
for el in t:
if not l[0]:
li.append((el + ''.join(l)))
else:
li.append((''.join(l) + el))
return li
for item in st1,st2:
print(spl(item))
['Do you want to Delete File', 'Do you want to Delete Folder']
['File Updated', 'Folder Updated']