Regular expression help to find space after a long string - python

My code is as follow:
list = re.findall(("PROGRAM S\d\d"), contents
If I print the list I just print S51 but I want to take everything.
I want to findall everything like that "PROGRAM S51_Mix_Station". I know how to put the digits to find them but I donĀ“t know how to find everything until the next space because usually after the last character there is an space.
Thanks in advance.

You can also use \w+:
import re
s = "PROGRAM S51_Mix_Station"
new_data = re.findall('^PROGRAM\s\w+\_\w+_\w+', s)
final_data = new_data[0] if new_data else new_data
Output:
'PROGRAM S51_Mix_Station'

Ok, thanks. I find another solution.
lista = re.findall(("PROGRAM S\d\d\S+") To find any character after the digit as repetition.

You could use this:
list = re.findall(r"PROGRAM S\d\d[^ ]*", contents)
This would match PROGRAM S followed by two digits, then followed by any number of non space characters. If you wanted to include all whitespace characters with spaces, then the #Wiktor comment would be better, i.e. use PROGRAM S\d\d\S*.

Related

How to grab a specified number of characters after a part of a specified string?

Let's say I have a string defined like this:
string1 = '23h4b245hjrandomstring345jk3n45jkotherrandomstring'
The goal is to grab the 11 characters (these for example '345jk3n45jk') after a part of the string (this part for example 'randomstring') using a specified search term and the specified number of characters to grab after that search term.
I tried doing something like this:
string2 = substring(string1,'randomstring', 11)
I appreciate any help you guys have to offer!
string2 = string1[string1.find("randomstring")+len("randomstring"):string1.find("randomstring")+len("randomstring")+11]
In one line, using split, and supposing that your randomstring is unique in your string, which seems to be the case as you worded out the question :
string1 = '23h4b245hjrandomstring345jk3n45jkotherrandomstring'
randomstring = 'randomstring'
nb_char_to_take = 11
# split using randomstring as splitter, take part of the string after, i.e the second part of the array, and then the 11 first character
result = string1.split(randomstring)[1][:nb_char_to_take]
You can use a simple regular expression like this
import re
s = "23h4b245hjrandomstring345jk3n45jkotherrandomstring"
result = re.findall("randomstring(.{11})", s)[0]
string1 = '23h4b245hjrandomstring345jk3n45jkotherrandomstring'
string2 = string1[10:22]
print(string2)
randomstring
You could use that. Its called string slicing, you basically count the position of the letters and then the first number before the colon is your starting point the second is your ending point when you enter those position numbers you should get whatever is in-between those position, the last is for a different function I highly suggest you search string slicing on YouTube as my explanation wouldn't really help you, and also search up * Find string method* those should hep you get the idea behind those functions. Sorry couldn't be of much help hope the videos help.

How to find part of string?

I am working with a string. I could find the part of string I need but not all of it. Which part of my code needs to change?
s = "3D(filters:!!(),refreshInterval:(pause:!!t,value:0),time:(from:!%272019-10-01T20:28:50.088Z!%27,to:now))%26_a%3D(description:!%27!%27,filters:!!(),fullScreenMode:!!"
report_time = s[s.find("time:(") + 1:s.find("))")]
Output I need:
>>> report_time
'time:(from:!%272019-10-01T20:28:50.088Z!%27,to:now))'
Output I have:
>>> report_time
'ime:(from:!%272019-10-01T20:28:50.088Z!%27,to:now)'
You put the "+1" on the wrong index. You need to pick up from the first find location and go one character past the second to pick up the extra right parenthesis. This last needs even one more character (thanks to `smac89 for catching that).
report_time = s[s.find("time:("):s.find("))") + 2]
Output:
'time:(from:!%272019-10-01T20:28:50.088Z!%27,to:now))'
Alternatively use a regular expression, e.g:
import re
re.search(r'(time:\(.*\)\))', s).group(1)
Explanation: group(1) returns the matching content of the 1st set of parentheses. .* matches any characters in between. The parentheses in your search therm need to be escaped.
Output:
'time:(from:!%272019-10-01T20:28:50.088Z!%27,to:now))'

RegEx : Match all lines except for a specific sub-string

Below is the list :
cf-ab1
cf-bc2
cf-ab1-hotfix
cf-bc2-hotfix
cf-ab1-canary
cf-cd1-staging
cf-cd1-staging2
cf-cd1
cf-cd1-sic-staging
cf-cd1-sagdf-staging
I would like to match everything except for cf-cd1-staging, cf-cd1-staging2 and cf-ab1-canary
I am running the below regex :
^((?!canary|staging).)*$
But these ideally matches all lines that doesnot contain staging and canary..! which should not be my desired o/p.
Could you please help here..!? because my desired matches should be :
cf-ab1
cf-bc2
cf-ab1-hotfix
cf-bc2-hotfix
cf-cd1
cf-cd1-sic-staging
cf-cd1-sagdf-staging
Regards,
Rohith
Try this : -
import re
lines = ["cf-ab1", "cf-bc2", "cf-ab1-hotfix", "cf-bc2-hotfix", "cf-ab1-canary", "cf-
cd1-staging", "cf-cd1-staging2", "cf-cd1", "cf-cd1-sic-staging", "cf-cd1-sagdf-
staging"]
line_compile = re.compile('^(?!.*(ab1-canary|cd1-staging|cf-ab1-canary)).*$')
matched = []
for line in lines:
if line_compile.match(line):
matched.append(line)
As always with RegEx, there's many possible solutions. I came up with one on the fly but you could argue that it's overfitted to that dataset and not very generalized.
^cf-\w\w\d(-[hs][oia][tcg].+?)?$
I simply wrote all the "allowed" letters in square brackets until the undesired matches weren't possible anymore. Also, I put the second half in ()? so that the two short entries are also matched.

How do I extract some string from a long string in Python?

I have a lot of long strings - not all of them have the same length and content, so that's why I can't use indices - and I want to extract a string from all of them. This is what I want to extract:
http://www.someDomainName.com/anyNumber
SomeDomainName doesn't contain any numbers and and anyNumber is different in each long string. The code should extract the desired string from any string possible and should take into account spaces and any other weird thing that might appear in the long string - should be possible with regex right? -. Could anybody help me with this? Thank you.
Update: I should have said that www. and .com are always the same. Also someDomainName! But there's another http://www. in the string
import re
results = re.findall(r'\bhttp://www\.someDomainName\.com/\d+\b', long_string)
>>> import re
>>> pattern = re.compile("(http://www\\.)(\\w*)(\\.com/)(\\d+)")
>>> matches = pattern.search("http://www.someDomainName.com/2134")
>>> if matches:
print matches.group(0)
print matches.group(1)
print matches.group(2)
print matches.group(3)
print matches.group(4)
http://www.someDomainName.com/2134
http://www.
someDomainName
.com/
2134
In the above pattern, we have captured 5 groups -
One is the complete string that is matched
Rest are in the order of the brackets you see.. (So, you are looking for the second one..) - (\\w*)
If you want, you can capture only the part of the string you are interested in.. So, you can remove the brackets from rest of the pattern that you don't want and just keep (\w*)
>>> pattern = re.compile("http://www\\.(\\w*)\\.com/\\d+")
>>> matches = patter.search("http://www.someDomainName.com/2134")
>>> if matches:
print matches.group(1)
someDomainName
In the above example, you won't have groups - 2, 3 and 4, as in the previous example, as we have captured only 1 group.. And yes group 0 is always captured.. That is the complete string that matches..
Yeah, your simplest bet is regex. Here's something that will probably get the job done:
import re
matcher = re.compile(r'www.(.+).com\/(.+)
matches = matcher.search(yourstring)
if matches:
str1,str2 = matches.groups()
If you are sure that there are no dots in SomeDomainName you can just take the first occurence of the string ".com/" and take everything from that index on
this will avoid you the use of regex which are harder to maintain
exp = 'http://www.aejlidjaelidjl.com/alieilael'
print exp[exp.find('.com/')+5:]

Parsing and reformatting CSV/text data using Python

sorry if this a bit of a beginner's question, but I haven't had much experience with python, and could really use some help in figuring this out. If there is a better programming language for tackling this, I'd be more than open to hearing it
I'm working on a small project, and I have two blocks of data, formatted differently from each other. They're all spreadsheets saved as CSV files, and I'd really like to make one group match the other without having to manually edit all the data.
What I need to do is go through a CSV, and format any data saved like this:
10W
20E
15-16N
17-18S
To a format like this (respective line to respective format):
10,W
20,E
,,15,16,N
,,17,18,S
So that they can just be copied over when opened as spreadsheets
I'm able to get the files into a string in python, but I'm unsure of how to properly write something to search for a number-hyphen-number-letter format.
I'd be immensely grateful for any help I can get. Thanks
This sounds like a good use-case for regular expressions. Once you've split the lines up into individual strings and stripped the whitespace (using s.strip()) these should work (I'm assuming those are cardinal directions; you'll need to change [NESW] to something else if that assumption is incorrect.):
>>> import re
>>> re.findall('\A(\d+)([NESW])', '16N')
[('16', 'N')]
>>> re.findall('\A(\d+)([NESW])', '15-16N')
[]
>>> re.findall('\A(\d+)-(\d+)([NESW])', '15-16N')
[('15', '16', 'N')]
>>> re.findall('\A(\d+)-(\d+)([NESW])', '16N')
[]
The first regex '\A(\d+)([NESW])' matches only a string that begins with a sequence of digits followed by a capital letter N, E, S, or W. The second matches only a string that begins with a sequence of digits followed by a hyphen, followed by another sequence of digits, followed by a capital letter N, E, S, or W. Forcing it to match at the beginning ensures that these regexes don't match a suffix of a longer string.
Then you can do something like this:
>>> vals = re.findall('\A(\d+)([NESW])', '16N')[0]
>>> ','.join(vals)
'16,N'
>>> vals = re.findall('(\d+)-(\d+)([NESW])', '15-16N')[0]
>>> ',,' + ','.join(vals)
',,15,16,N'
This is a whole solution that uses regexs. #senderle has beat me to the answer, so feel free to tick his response. This is just added here as I know how difficult it was to wrap my head around re in my code at first.
import re
dash = re.compile('(\d{2})-(\d{2})([WENS])')
no_dash = re.compile( '(\d{2})([WENS])' )
raw = '''10W
20E
15-16N
17-18S'''
lines = raw.split('\n')
data = []
for l in lines:
if '-' in l:
match = re.search(dash, l).groups()
data.append( ',,%s,%s,%s' % (match[0], match[1], match[2] ) )
else:
match = re.search(no_dash, l).groups()
data.append( '%s,%s' % (match[0], match[1] ) )
print '\n'.join(data)
In your case, I think the quick solution would involve regexps
You can either use the match method to extract your different tokens when they match a given regular expression, or the split method to split your string into tokens given a separator.
However, in your case, the separator would be a single character, so you can use the split method from the str class.

Categories

Resources