How to use dictionary to replace first to characters of a string? - python

I have a webscraper that inputs values into a data extractor. The dates have to be accepted a one month back.
For example January is equal to 00.
'''
today="02-10-2020"
preDay="02-09-2020"
months ={"01":"00","02":"01","03":"02","04":"03","05":"04","06":"05",
"07":"06","08":"07","09":"08","10":"09","11":"10","12":"11"}
for cur, pre in months.items():
today= today[0:2].replace(cur, pre)
'''
Maybe I do not complete understand how dictionaries are iterated but when I try doing this. I will replace all the values that match the key. I only want to it change the first two characters in the string and then leave the rest of the data alone.
I have successfully done the action with an "if" statement but I would to try the same using a dictionary.

If I'm understanding your question correctly there is no need for a loop at all.
To convert the month part of the date back one month using your dictionary you could simply.
today="02-10-2020"
months ={"01":"00","02":"01","03":"02","04":"03","05":"04","06":"05",
"07":"06","08":"07","09":"08","10":"09","11":"10","12":"11"}
today = today.replace(today[0:2], months[today[0:2]], 1)
print(today)
#output:
#01-10-2020
According to the Documentation:
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
So the code I wrote takes the first two characters as the old and replaces it with the value from the dictionary at the key that matches the first two characters. The "1" at the end makes sure this replacement only happens once.

Related

Very Beginner Python: Replacing Part of a String

I want to know how you can replace one letter of a string without replacing the same letter. For example, let the variable:
action = play sports.
I could substitute "play" for "playing" by doing print(action.replace("play", "playing")
But what if you have to of the same letters?
For example, what if you want to replace the last half of "honeyhoney" into "honeysweet" (Replacing the last half of the string to sweet?
Sorry for the bad wording, I am new to coding and really unfamiliar with this. Thanks!
def replaceLast(str, old, new):
return str[::-1].replace(old[::-1],new[::-1], 1)[::-1]
print(replaceLast("honeyhoney", "honey", "sweet"))
output
honeysweet
so the idea is to reverse the string and the old and new substrings,
so the last substring becomes the first, do a replace and then reverse the returned string once again, and the number 1 is to replace only once and not both matches
Another solution
def replaceLast(str, old, new):
ind = str.rfind(old)
if ind == -1 : return str
return str[:ind] + new + str[ind + len(old):];
print(replaceLast("honeyhoney", "honey", "sweet"))
output
honeysweet
so here we get the string from the beginning to the index of the last substring then we add the new substring and the rest of the string from where the old substring ends and return them as the new string, String.rfind returns -1 in case of no match found and we need to check aginst that to make sure the output is correct even if there is nothing to replace.

Getting wrong data with regex

I'm facing an issue here. Python version 3.7.
https://regex101.com/r/WVxEKM/3
As you can see on regex site, my regex is working great, however, when I try to read the strings with python, I only get the first part, meaning, no values after comma.
Here's my code:
part_number = str(row)
partn = re.search(r"([a-zA-Z0-9 ,-]+)", part_number)
print(partn.group(0))
This is what partn.group(0) is printing:
FMC2H-OHC-100018-00
I need to get the string as regex, with comma and value:
FMC2H-OHC-100018-00, 2
Is it my regex wrong?. What is happening with commas and values?
ROW Values
Here are the row values converted to string, the data retrieve from my db also include parentheses and quotes:
('FMC2H-OHC-100018-00', 2)
('FMC2H-OHC-100027-00', 0)
I don't think the you need to convert the row values to string and then try to parse the result with a regex. The clue was when you said in your update that "Here are the row values converted to string" implying that they're in some other format initially—because the result looks they're actually tuples of two values, a string and an integer.
If that's correct, then you can avoid converting them to strings and then trying to parse it with a regex, because you can get the string you want simply by using the relatively simple built-in string formatting capabilities Python has to do it.
Here's what I mean:
# Raw row data retrieved from database.
rows = [('FMC2H-OHC-100018-00', 2),
('FMC2H-OHC-100027-00', 0),
('FMC2H-OHC-100033-00', 0),
('FMC2H-OHC-100032-00', 20),
('FMC2H-OHC-100017-00', 16)]
for row in rows:
result = '{}, {}'.format(*row) # Convert data in row to a formatted string.
print(result)
Output:
FMC2H-OHC-100018-00, 2
FMC2H-OHC-100027-00, 0
FMC2H-OHC-100033-00, 0
FMC2H-OHC-100032-00, 20
FMC2H-OHC-100017-00, 16
Your problem is that you didn't include the ' in your character group. So this regex matches for example FMC2H-OHC-100018-00 and , 2, but not both together. Also re.search stops searching after it finds the first match. So if you only want the first match, go with:
re.search(r"([\w ',-]+)", part_number)
Where I changed A-Za-z0-9 to \w, because it's shorter and more readable. If you want a list that matches all elements, go with:
re.findall(r"([\w ',-]+)", part_number)

Replace string from a huge python dictionary

I have a dictionary like this:
id_dict = {'C1001': 'John','D205': 'Ben','501': 'Rose'}
This dictionary has more than 10000 keys and values. I have to search for the key from a report which has nearly 500 words and replace with values.
I have to process thousands of reports within a few minutes, so speed and memory are really important for me.
This is the code I am using now:
str = "strings in the reports"
for key, value in id_dict.iteritems():
str = str.replace(key, value)
Is there any better solution than this?
Using str.replace in a loop is very inefficient. A few arguments:
when the word is replaced, a new string is allocated and the old one is discarded. If you have a lot of words, it can take ages
str.replace would replace inside of words, probably not what you want: ex: replace "nut" by "eel" changes "donut" to "doeel".
if there are a lot of words in your replacement dictionary, you loop through all of them (using a python loop, rather slow), even if the text doesn't contain any one of them.
I would use re.sub with a replacement function (as a lambda), matching a word-boundary alphanumeric string (letters or digits).
The lambda would lookup in the dictionary and return the word if found, else return the original word, replacing nothing, but since everything is done in the re module, it executes way faster.
import re
id_dict = {'C1001': 'John','D205': 'Ben','501': 'Rose'}
s = "Hello C1001, My name is D205, not X501"
result = re.sub(r"\b(\w+)\b",lambda m : id_dict.get(m.group(1),m.group(1)),s)
print(result)
prints:
Hello John, My name is Ben, not X501
(note that the last word was left unreplaced because it's only a partial match)

Find Certain String Indices

I have this string and I need to get a specific number out of it.
E.G. encrypted = "10134585588147, 3847183463814, 18517461398"
How would I pull out only the second integer out of the string?
You are looking for the "split" method. Turn a string into a list by specifying a smaller part of the string on which to split.
>>> encrypted = '10134585588147, 3847183463814, 18517461398'
>>> encrypted_list = encrypted.split(', ')
>>> encrypted_list
['10134585588147', '3847183463814', '18517461398']
>>> encrypted_list[1]
'3847183463814'
>>> encrypted_list[-1]
'18517461398'
Then you can just access the indices as normal. Note that lists can be indexed forwards or backwards. By providing a negative index, we count from the right rather than the left, selecting the last index (without any idea how big the list is). Note this will produce IndexError if the list is empty, though. If you use Jon's method (below), there will always be at least one index in the list unless the string you start with is itself empty.
Edited to add:
What Jon is pointing out in the comment is that if you are not sure if the string will be well-formatted (e.g., always separated by exactly one comma followed by exactly one space), then you can replace all the commas with spaces (encrypt.replace(',', ' ')), then call split without arguments, which will split on any number of whitespace characters. As usual, you can chain these together:
encrypted.replace(',', ' ').split()

Speed up python regex matching

In python 're' module, I want to use a large number of calls ~ 1 million of re.findall() and re.sub(). I want to find all occurrences of a pattern in a string and then replace them with a fixed string. Ex. all dates in a strings are returned as a list and in original list, it was replaced by 'DATE'. How can I combine both into one ?
re.sub's replace argument can be a callable:
dates = []
def store_dates(match):
dates.append(match.group())
return 'DATE'
data = re.sub('some-date-string', store_dates, data)
# data is now your data with all the date strings replaced with 'DATE'
# dates now has all of the date strings that matched your regex

Categories

Resources