How to split string in Python to take only middle characters? - python

I have string
['tick_calculated_2_2020-05-27T11-59-06.json.gz']
I want to get only 59-06
>>> f.split('_')
['tick', 'calculated', '2', '2020-05-27T11-59-06.json.gz']
>>> f.split('_')[3]
'2020-05-27T11-59-06.json.gz'
>>> f.split('_')[3].split('.')[0]
'2020-05-27T11-59-06'
What should be the next step?

You are going in the right direction.
Contrary to other answers, I feel regex is a bit of an overkill, apart from being slower and harder to understand and maintain.
Once you have the string x = '2020-05-27T11-59-06', you can do x.split('-') to get a list lst = ['2020', '05', '27T11', '59', '06'].
You can then access the last 2 elements of this list to get what you want easily: lst[-1], lst[-2].

You could try using re (regex).
import re
f = "tick_calculated_2_2020-05-27T11-59-06.json.gz"
res = re.search(r"T\d+\-([\d\-]+)\.json\.gz", f)
print(res.groups()[0])
output:
59-06

Assuming you don't know about using regular expressions, try to Google python string slicing. You had the right idea to split by '_', continue it to split by '.' then slice the string thus acquired for last 5 chars
f = 'tick_calculated_2_2020-05-27T11-59-06.json.gz'
splitted = f.split('_')
print(splitted)
date = splitted[3].split('.')[0]
specialNum = date[-5:]
print(specialNum)

You can use str.rfind this way:
index = s.rfind('-')
s[index - 2:index + 3]
Or use a regexp this way:
import re
re.search(r'.{5}(?=\.json)', s).group()

This uses Positive lookahead and Positive lookbehind to assert that match occurs accurately.
import re
string = 'tick_calculated_2_2020-05-27T11-59-06.json.gz'
re.search(r'(?<=T\d{2}-)\d{2}-\d{2}(?=\.json)', string).group()
Output:
59-06

Related

Cutting string in python

I did some search but couldn't find any useful information.
s = ['33PM']
My aim is to cut 'PM' from s[0] and append it as s[1].
You can use re.findall to extract continuous range of numbers and characters. \d+ would extract all numbers and \w+ would extract all character ranges
>>> import re
>>> s = re.findall(r'\d+|\w+', s[0])
>>> s
['33', 'PM']
Here is a method that uses simple Python code, avoiding the complications of regular expressions. This is designed for when you know that 'PM' is in the string, and if there is any text in the string after that it will be moved to the second list item together with the 'PM. This code also assumes that you care only about the first item in the list--any later items will be dropped.
s = ['33PM']
string0 = s[0]
loc = string0.find('PM')
s = [string0[:loc], string0[loc:]]
If you now print s the result is
['33', 'PM']

Best way to convert string to integer in Python

I have a spreadsheet with text values like A067,A002,A104. What is most efficient way to do this? Right now I am doing the following:
str = 'A067'
str = str.replace('A','')
n = int(str)
print n
Depending on your data, the following might be suitable:
import string
print int('A067'.strip(string.ascii_letters))
Python's strip() command takes a list of characters to be removed from the start and end of a string. By passing string.ascii_letters, it removes any preceding and trailing letters from the string.
If the only non-number part of the input will be the first letter, the fastest way will probably be to slice the string:
s = 'A067'
n = int(s[1:])
print n
If you believe that you will find more than one number per string though, the above regex answers will most likely be easier to work with.
You could use regular expressions to find numbers.
import re
s = 'A067'
s = re.findall(r'\d+', s) # This will find all numbers in the string
n = int(s[0]) # This will get the first number. Note: If no numbers will throw exception. A simple check can avoid this
print n
Here's some example output of findall with different strings
>>> a = re.findall(r'\d+', 'A067')
>>> a
['067']
>>> a = re.findall(r'\d+', 'A067 B67')
>>> a
['067', '67']
You can use the replace method of regex from re module.
import re
regex = re.compile("(?P<numbers>.*?\d+")
matcher = regex.search(line)
if matcher:
numbers = int(matcher.groupdict()["numbers"] #this will give you the numbers from the captured group
import string
str = 'A067'
print (int(str.strip(string.ascii_letters)))

Using parentheses as delimiter in re or str.split() python

I am trying to split a string such as: add(ten)sub(one) into add(ten) sub(one).
I can't figure out how to match the close parentheses. I have used re.sub(r'\\)', '\\) ') and every variation of escaping the parentheses,I can think of. It is hard to tell in this font but I am trying to add a space between these commands so I can split it into a list later.
There's no need to escape ) in the replacement string, ) has a special a special meaning only in the regex pattern so it needs to be escaped there in order to match it in the string, but in normal string it can be used as is.
>>> strs = "add(ten)sub(one)"
>>> re.sub(r'\)(?=\S)',r') ', strs)
'add(ten) sub(one)'
As #StevenRumbalski pointed out in comments the above operation can be simply done using str.replace and str.rstrip:
>>> strs.replace(')',') ').strip()
'add(ten) sub(one)'
d = ')'
my_str = 'add(ten)sub(one)'
result = [t+d for t in my_str.split(d) if len(t) > 0]
result = ['add(ten)','sub(one)']
Create a list of all substrings
import re
a = 'add(ten)sub(one)'
print [ b for b in re.findall('(.+?\(.+?\))', a) ]
Output:
['add(ten)', 'sub(one)']

Wilcard matching substring in Python

I am completely new to Python and don't know how to get a sub-string which matches some wildcard condition from a string.
I am trying to get a timestamp from the following string:
sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data
I want to get only "1360922654.97671" part out of the string.
Please help.
Because you mentioned wildcards you can use re
In [77]: import re
In [78]: s = "sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data"
In [79]: re.findall("\d+\.\d+", s)
Out[79]: ['1360922654.97671']
If the dots and dashes have their specific function within your string, you can use this:
>>> s = "sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data"
>>> s.rsplit('.', 1)[0].split('-')[-1]
'1360922654.97671'
Step by step:
>>> s.rsplit('.', 1)
['sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671', 'data']
>>> s.rsplit('.', 1)[0]
'sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671'
>>> s.rsplit('.', 1)[0].split('-')
['sdc4', '251504', '7f5', 'f59c349f0e516894fc89d2686a0d57f5', '1360922654.97671']
>>> s.rsplit('.', 1)[0].split('-')[-1]
'1360922654.97671'
This will work for any strings in the form:
anything-WHATYOUWANT.stringwithoutdots
>>> s = "sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data"
>>> s.split('-')[-1][:-5]
'1360922654.97671'
slightly fewer characters, only works where the last part of the string is .data or another 5 character string.

Python regular words cut

I have string: './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
I need string: '27-10-2011 17:07:02'
How can i do this in python?
There are many ways to do this, one way is to use str.partition:
text='./money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
before,_,after = text.partition('[')
print(after[:-1])
# 27-10-2011 17:07:02
Another is to use str.split:
before,after = text.split('[',1)
print(after[:-1])
# 27-10-2011 17:07:02
or str.find and str.rfind:
ind1 = text.find('[')+1
ind2 = text.rfind(']')
print(text[ind1:ind2])
All these methods rely on the desired substring immediately following the first left-bracket [.
The first two methods also rely on the desired substring ending at the next-to-last character in text. The last method (using rfind) searches from the right for the index of the right-bracket, so it is a little more general, and does not depend on quite so many (potential off-by-one) constants.
If your string has always the same structure this is probably the simplest solution:
s = r'./money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
s[s.find("[")+1:s.find("]")]
Update:
After seeing some of the other answers this is a slight improvement:
s[s.find("[")+1:-1]
Exploiting the fact that the closing square bracket is the last character in your string.
If the format is "fixed", you can also use this
>>> s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
>>> s[-20:-1:]
'27-10-2011 17:07:02'
>>>
You can also use regular expression:
import re
s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
print re.search(r'\[(.*?)\]', s).group(1)
Try with a regex :
import re
re.findall(".*\[(.*)\]", './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]')
>>> ['27-10-2011 17:07:02']
Probably the easiest way(if you know the string will always be in this format
>>> s = './money.log_rotated.27.10.2011_17:15:01:[27-10-2011 17:07:02]'
>>> s[s.index('[') + 1:-1]
'27-10-2011 17:07:02'

Categories

Resources