python: how can I make this code with regular expression? - python

If I input "1995y 05m 05d", then I want to make a program that prints "950505". More example: "1949y 05m 23d" --> "490523".
import re
Birthday = str(input("insert your birth<(ex) xxxxy **m 00d> : "))
p= re.sub('[ymd ]','',Birthday)
print(p) #result is "xxxx**00"
here is my code. How do I fix it? any solutions?

Since you're basically working with date strings, you can use datetime.strptime() to parse them:
>>> from datetime import datetime
>>> birthday = '1995y 05m 05d'
>>> datetime.strptime(birthday, '%Yy %mm %dd').strftime('%y%m%d')
'950505'

Your existing code prints the full year, where you want only two digits. Just skip the first two digits on print.
print(p[2:])
That will print p starting from position 2 (the third character, since lists are counted from 0), with no end to the range, so it prints the entire string except the first two characters (19 in your sample).

Using Regex expression :
>>> import re
>>> a = re.findall("\d+","1995y 05m 05d")
>>> a[0] = a[0][2:]
>>> output = ""
>>> for item in a:
output += item
>>> int(output)
950505
>>>

Requested regex solution:
import re
s = '1995y 05m 05d'
print(''.join(re.findall(r'\d{2}(?=[ymd])', s)))
# 950505
Uses findall to find all two digits before y, m and d and join to join all to required format.

birthday = input("birthday").split()
a = ''.join([''.join([i for i in c if i.isdigit()][-2:]) for c in birthday])
this does it with out any libraries

Related

How to determine if first two elements of a value are integers or characters?

My task is save the number of a day in a value.
For example:
today_str = 10-2-2018
date_numb = 10
OR:
today_str = 3/5/2018
date_numb = 3
To sum up, I wanna check if its 2 integers or 1 so I can save the day's number in a value.
Pure Python
You could use next() on a generator expression to get the first index of a character which is not a digit then slice up to this index:
>>> date_str = '10-2-2018'
>>> date_str[:next(i for i, c in enumerate(date_str) if not c.isdigit())]
'10'
>>> date_str = '3/5/2018'
>>> date_str[:next(i for i, c in enumerate(date_str) if not c.isdigit())]
'3'
Regex
You can also use a regular expression:
>>> import re
>>> date_str = '10-2-2018'
>>> re.search('^[0-9]+', date_str).group(0)
'10'
>>> date_str = '3/5/2018'
>>> re.search('^[0-9]+', date_str).group(0)
'3'
I've also put this on regex101 where you can see an explanation of how this works.
today_str = '10-2-2018'
date_num = ''
if '-' in today_str:
date_num = today_str.split('-')[0]
elif '/' in today_str:
date_num = today_str.split('/')[0]
else:
raise Exception('Wrong date format')
length_date = len(date_num)
You could use a regex:
"^[0-3]?[0-9]"
^ matches the beginning of the string
[0-3]? matches a digit between 0 and 3, zero or one time (because of ?)
[0-9] matches a digit between 0 and 9
This regex will find your day, and also make sure that it is valid.
Using the regex with Python:
import re
today_str = "10-2-2018"
date_numb = re.match(r"^[0-3]?[0-9]", today_str).group(0)
today_str = "3/5/2018"
date_numb = re.match(r"^[0-3]?[0-9]", today_str).group(0)
This is a bad idea. You can use re, isdigit, split or some other generic functions.
But for your use case there are ready-made tools that add structure to your data, rather than turn a date into an alphanumeric variable. Here is an example with pandas library:
import pandas as pd
today_str = '10-2-2018'
pd.to_datetime(today_str, dayfirst=True).day
# 10

Best way to convert string to integer in Python

I have a spreadsheet with text values like A067,A002,A104. What is most efficient way to do this? Right now I am doing the following:
str = 'A067'
str = str.replace('A','')
n = int(str)
print n
Depending on your data, the following might be suitable:
import string
print int('A067'.strip(string.ascii_letters))
Python's strip() command takes a list of characters to be removed from the start and end of a string. By passing string.ascii_letters, it removes any preceding and trailing letters from the string.
If the only non-number part of the input will be the first letter, the fastest way will probably be to slice the string:
s = 'A067'
n = int(s[1:])
print n
If you believe that you will find more than one number per string though, the above regex answers will most likely be easier to work with.
You could use regular expressions to find numbers.
import re
s = 'A067'
s = re.findall(r'\d+', s) # This will find all numbers in the string
n = int(s[0]) # This will get the first number. Note: If no numbers will throw exception. A simple check can avoid this
print n
Here's some example output of findall with different strings
>>> a = re.findall(r'\d+', 'A067')
>>> a
['067']
>>> a = re.findall(r'\d+', 'A067 B67')
>>> a
['067', '67']
You can use the replace method of regex from re module.
import re
regex = re.compile("(?P<numbers>.*?\d+")
matcher = regex.search(line)
if matcher:
numbers = int(matcher.groupdict()["numbers"] #this will give you the numbers from the captured group
import string
str = 'A067'
print (int(str.strip(string.ascii_letters)))

Wilcard matching substring in Python

I am completely new to Python and don't know how to get a sub-string which matches some wildcard condition from a string.
I am trying to get a timestamp from the following string:
sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data
I want to get only "1360922654.97671" part out of the string.
Please help.
Because you mentioned wildcards you can use re
In [77]: import re
In [78]: s = "sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data"
In [79]: re.findall("\d+\.\d+", s)
Out[79]: ['1360922654.97671']
If the dots and dashes have their specific function within your string, you can use this:
>>> s = "sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data"
>>> s.rsplit('.', 1)[0].split('-')[-1]
'1360922654.97671'
Step by step:
>>> s.rsplit('.', 1)
['sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671', 'data']
>>> s.rsplit('.', 1)[0]
'sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671'
>>> s.rsplit('.', 1)[0].split('-')
['sdc4', '251504', '7f5', 'f59c349f0e516894fc89d2686a0d57f5', '1360922654.97671']
>>> s.rsplit('.', 1)[0].split('-')[-1]
'1360922654.97671'
This will work for any strings in the form:
anything-WHATYOUWANT.stringwithoutdots
>>> s = "sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data"
>>> s.split('-')[-1][:-5]
'1360922654.97671'
slightly fewer characters, only works where the last part of the string is .data or another 5 character string.

Convert string into integer

How can I convert string into integer and remove every character from that change.
Example:
S = "--r10-" I want to have this: S = 10
This not work:
S = "--10-"
int(S)
You can use filter(str.isdigit, s) to keep only those characters of s that are digits:
>>> s = "--10-"
>>> int(filter(str.isdigit, s))
10
Note that this might lead to unexpected results for strings that contain multiple numbers
>>> int(filter(str.isdigit, "12 abc 34"))
1234
or negative numbers
>>> int(filter(str.isdigit, "-10"))
10
Edit: To make this work for unicode objects instead of str objects, use
int(filter(unicode.isdigit, u"--10-"))
remove all non digits first like that:
int(''.join(c for c in "abc123def456" if c.isdigit()))
You could just strip off - and r:
int("--r10-".strip('-r'))
use regex replace with /w to replace non word characters with "" empty string. then cast it
I prefer Sven Marnach's answer using filter and isdigit, but if you want you can use regular expressions:
>>> import re
>>> pat = re.compile(r'\d+') # '\d' means digit, '+' means one or more
>>> int(pat.search('--r10-').group(0))
10
If there are multiple integers in the string, it pulls the first one:
>>> int(pat.search('12 abc 34').group(0))
12
If you need to deal with negative numbers use this regex:
>>> pat = re.compile(r'\-{0,1}\d+') # '\-{0,1}' means zero or one dashes
>>> int(pat.search('negative: -8').group(0))
-8
This is simple and does not require you to import any packages.
def _atoi(self, string):
i = 0
for c in string:
i += ord(c)
return i

in python, how to match Strings based on Regular Expression and get the non-matching parts as a list?

For example: I have a string "abcde2011-09-30.log", I want to check if this string matchs "(\d){4}-(\d){2}-(\d){2}" ( dont think it has correct syntax, but you get the idea). And I need to split the string into 3 parts: (abcde),(e2011-09-30), (.log). How can I do it in python? Thanks.
There's a split method in the re module that should work for you.
>>> s = 'abcde2011-09-30.log'
>>> re.split('(\d{4}-\d{2}-\d{2})', s)
('abcde', '2011-09-30', '.log')
If you don't actually want the date as part of the returned list, just omit the parentheses around the regular expression so that it doesn't have a capturing group:
>>> re.split('\d{4}-\d{2}-\d{2}', s)
('abcde', '.log')
Be advised that if the pattern matches more than once, i.e. if there is more than one date in the filename, then this will split on both of them. For example,
>>> s2 = 'abcde2011-09-30fghij2012-09-31.log'
>>> re.split('(\d{4}-\d{2}-\d{2})', s2)
('abcde', '2011-09-30', 'fghij', '2012-09-31', '.log')
If this is a problem, you can use the maxsplit argument to split to only split it once, on the first occurrence of the date:
>>> re.split('(\d{4}-\d{2}-\d{2})', s, 1)
('abcde', '2011-09-30', 'fghij2012-09-31.log')
How's this:
>>> import re
>>> a = "abcde2011-09-30.log"
>>> myregexp = re.compile(r'^(.*)(\d{4}-\d{2}-\d{2})(\.\w+)$')
>>> m = myregexp.match(a)
>>> m
<_sre.SRE_Match object at 0xb7f69480>
>>> m.groups()
('abcde', '2011-09-30', '.log')
I don't know the exact python regex syntax but something like this should do the job:
/^(\D+?)([\d-]+)(\.log)$/
(without using regex and interpreting your string as a filename:)
lets start with splitting the filename and the extension 'log':
filename, ext = os.path.splitext('abcde2011-09-30.log')
most probably, the length of the date is allways 10, allowing for:
year, month, day = [int(i) for i in filename[-10:].split('-')]
description = filename[:-10]
However, if you are not sure we can find out where the date-part of the filename starts:
for i in range(len(filename)):
if filename[i].isdigit():
break
description, date = filename[:i], filename[i:]
year, month, day = [int[c] for c in date.split('-')]

Categories

Resources