Replace some string in python - python

I have two address like:
first_address = 'Красноярский край, г Красноярск, пр-кт им газеты Красноярский Рабочий, 152г, квартира (офис) /1'
second_address = 'Красноярский край, г Красноярск, пр-кт им.газеты "Красноярский рабочий", 152г'
And I want to replace all text before квартира (офис) /1
My code looks like:
c = first_address.split(',')
v = second_address.split(',')
b = c[:len(v)]
b = v
n = c[len(v)::]
f = ''.join(str(b)) + ''.join(str(n))
I get output:
['Красноярский край', ' г Красноярск', ' пр-кт им.газеты "Красноярский рабочий"', ' 152г'][' квартира (офис) /1']
How can I easily make this?

Looks like you want to take substrings from second_address until they run out, then use substrings from first_address. Here's a straightforward way to do it.
first_subs = first_address.split(',')
second_subs = second_address.split(',')
[(f if s is None else s)
for (f, s) in zip(first_subs,
second_subs + [None] * (len(first_subs) - len(second_subs)))]

Related

difficulties with files in python

For a homework assignment I have a filepath called P, and a string called S which is equal to 'parrot', I need to search P for S and output the number of times S appears. I cannot use regexs.
this is my code:
matches = []
matches2 = []
def file_reading(P, S):
file1 = open(P, 'r')
matches.append(S)
file1.close()
for S in P:
matches2.append(S)
print (len(matches2))
The output should be 3 but this only outputs 1, can someone point me in the right direction? if more details are needed let me know and I will edit them in.
In order to search how many times S appears in P, you can simply do the following.
P = "/home/shan/shan/shan/editshanfile/exe"
S = "shan"
parts = P.split(S)
print (len(parts)-1)
Open the file using given path P
Read the file into a variable
Search that variable for the target string S
Close the file
Print the output
I suspect string.count(string2) is what you're looking for:
>>> big_string = 'a' * 100 + 'parrot' + 'b' * 20 + 'parrot' + 'c' * 50 + 'parrot'
>>> len(big_string)
188
>>> big_string.count('parrot')
3
>>>

Parsing text files with "magic" values

Background
I have some large text files used in an automation script for audio tuning. Each line in the text file looks roughly like:
A[BANANA] + B[PINEAPPLE] - C[CHERRY] [[BANANA]] BANANA # BANANA
The text gets fed to an old command-line program which searches for keywords, and swaps them out. Sample output would be:
A[0] + B[100] - C[0x1000] [[0]] 0 # 0
A[2] + B[200] - C[0x100A] [[2]] 0 # 0
Problem
Sometimes, text files have keywords that are meant to be left untouched (i.e. cases where we don't want "BANANA" substituted). I'd like to modify the text files to use some kind of keyword/delimiter that is unlikely to pop up in normal circumstances, i.e:
A[#1] + B[#2] - C[#3] [[#1]] #1 # #1
Question
Does python's text file parser have any special indexing/escape sequences I could use instead of simple keywords?
use a regular expression replacement function with a dictionary.
Match everything between brackets (non-greedy, avoiding the brackets themselves) and replace by the value of the dict, put original value if not found:
import re
d = {"BANANA":"12", "PINEAPPLE":"20","CHERRY":"100","BANANA":"400"}
s = "A[BANANA] + B[PINEAPPLE] - C[CHERRY] [[BANANA]]"
print(re.sub("\[([^\[\]]*)\]",lambda m : "[{}]".format(d.get(m.group(1),m.group(1))),s))
prints:
A[400] + B[20] - C[100] [[400]]
You can use re.sub to perform the substitution. This answer creates a list of randomized values to demonstrate, however, the list can be replaces with the data you are using:
import re
import random
s = "A[BANANA] + B[PINEAPPLE] - C[CHERRY] [[BANANA]]"
new_s = re.sub('(?<=\[)[a-zA-Z0-9]+(?=\])', '{}', s)
random_data = [[random.randint(1, 2000) for i in range(4)] for _ in range(10)]
final_results = [new_s.format(*i) for i in random_data]
for command in final_results:
print(command)
Output:
A[51] + B[134] - C[864] [[1344]]
A[468] + B[1761] - C[1132] [[1927]]
A[1236] + B[34] - C[494] [[1009]]
A[1330] + B[1002] - C[1751] [[1813]]
A[936] + B[567] - C[393] [[560]]
A[1926] + B[936] - C[906] [[1596]]
A[1532] + B[1881] - C[871] [[1766]]
A[506] + B[1505] - C[1096] [[491]]
A[290] + B[1841] - C[664] [[38]]
A[1552] + B[501] - C[500] [[373]]
Just use
\[([^][]+)\]
And replace this with the desired result, e.g. 123.
Broken down, this says
\[ # opening bracket
([^][]+) # capture anything not brackets, 1+ times
\] # closing bracket
See a demo on regex101.com.
For your changed requirements, you could use an OrderedDict:
import re
from collections import OrderedDict
rx = re.compile(r'\[([^][]+)\]')
d = OrderedDict()
def replacer(match):
item = match.group(1)
d[item] = 1
return '[#{}]'.format(list(d.keys()).index(item) + 1)
string = "A[BANANA] + B[PINEAPPLE] - C[CHERRY] [[BANANA]]"
string = rx.sub(replacer, string)
print(string)
Which yields
A[#1] + B[#2] - C[#3] [[#1]]
The idea here is to put every (potentially) new item in the dict, then search for the index. OrderedDicts remember the order entry.
For the sake of academic completeness, you could do it all on your own as well:
import re
class Replacer:
rx = re.compile(r'\[([^][]+)\]')
keywords = []
def do_replace(self, match):
idx = self.lookup(match.group(1))
return '[#{}]'.format(idx + 1)
def replace(self, string):
return self.rx.sub(self.do_replace, string)
def lookup(self, item):
for idx, key in enumerate(self.keywords):
if key == item:
return idx
self.keywords.append(item)
return len(self.keywords)-1
string = "A[BANANA] + B[PINEAPPLE] - C[CHERRY] [[BANANA]]"
rpl = Replacer()
string = rpl.replace(string)
print(string)
Can also be done using pyparsing.
This parser essentially defines noun to be the uppercase things within square brackets, then defines a sequence of them to be one line of input, as complete.
To replace items identified with other things define a class derived from dict in a suitable way, so that anything not in the class is left unchanged.
>>> import pyparsing as pp
>>> noun = pp.Word(pp.alphas.upper())
>>> between = pp.CharsNotIn('[]')
>>> leftbrackets = pp.OneOrMore('[')
>>> rightbrackets = pp.OneOrMore(']')
>>> stmt = 'A[BANANA] + B[PINEAPPLE] - C[CHERRY] [[BANANA]]'
>>> one = between + leftbrackets + noun + rightbrackets
>>> complete = pp.OneOrMore(one)
>>> complete.parseString(stmt)
(['A', '[', 'BANANA', ']', ' + B', '[', 'PINEAPPLE', ']', ' - C', '[', 'CHERRY', ']', ' ', '[', '[', 'BANANA', ']', ']'], {})
>>> class Replace(dict):
... def __missing__(self, key):
... return key
...
>>> replace = Replace({'BANANA': '1', 'PINEAPPLE': '2'})
>>> new = []
>>> for item in complete.parseString(stmt).asList():
... new.append(replace[item])
...
>>> ''.join(new)
'A[1] + B[2] - C[CHERRY] [[1]]'
I think it's easier — and clearer — using plex. The snag is that it appears to be available only for Py2. It took me an hour or two to make sufficient conversion work to Py3 to get this.
Just three types of tokens to watch for, then a similar number of branches within a while statement.
from plex import *
from io import StringIO
stmt = StringIO('A[BANANA] + B[PINEAPPLE] - C[CHERRY] [[BANANA]]')
lexicon = Lexicon([
(Rep1(AnyBut('[]')), 'not_brackets'),
(Str('['), 'left_bracket'),
(Str(']'), 'right_bracket'),
])
class Replace(dict):
def __missing__(self, key):
return key
replace = Replace({'BANANA': '1', 'PINEAPPLE': '2'})
scanner = Scanner(lexicon, stmt)
new_statement = []
while True:
token = scanner.read()
if token[0] is None:
break
elif token[0]=='no_brackets':
new_statement.append(replace[token[1]])
else:
new_statement.append(token[1])
print (''.join(new_statement))
Result:
A[BANANA] + B[PINEAPPLE] - C[CHERRY] [[BANANA]]

splicing between sequence of two defined boundaries

I have created a function that takes in four strings. The first two strings will be long strings that can be anything. The last two strings will be referred to as boundaries. I want to take everything in string1 between the defined boundaries and replace everything in string2 between the defined boundaries. The part of the string taken away from string 1 will be removed and the part replaced in string 2 will be removed. An example of this function is below:
def bound('DOGYOMAMA','ROOGMEMAD', 'OG' 'MA') --> RETURNS('DMA','ROOGYOMAD',
'OG', 'MA')
This is the function I have created to do what I wrote above
def bound(st,sz,a,b):
s1=''.join(st)
s2=''.join(sz)
if a in s1 and b in s1 and a in s2 and b in s2:
f1=s1.find(a)
l1=s1.find(b)
f2=s2.find(a)
l2=s2.find(b)
blen1 = len(b)
blen2 = len(b)
s1_n = s1[:f1] +s1[l1+blen1:]
s2_n = s2[:f2] + s1[f1:l1 + blen1] +s2[l2+blen2]
return s1_n, s2_n, a, b
print(bound('DOGYOMAMA','ROOGMEMAD', 'OG','MA'))
My problem is that I also need to make it so this will work in reverse so if I have ('DOGYOMAMA','ROOGMEMAD', 'OG' 'MA') it should also look for ('AMAMOYGOD','DAMEMGOOR', 'GO' 'AM'). Another thing would be if the string can be spliced both ways it will take only the sequence that is spliced at the lowest index.
Try this :
and if you have to return many items then don't return the output instead of store the output in a list and return that list at last , that i did there :
def bound(st,sz,a,b):
result=[]
string_s = [''.join(st), ''.join(sz), ''.join(st)[::-1], ''.join(sz)[::-1]]
boundaries = [a, b, a[::-1], b[::-1]]
for chunk in range(0, len(string_s), 2):
word = string_s[chunk:chunk + 2]
bound = boundaries[chunk:chunk + 2]
if bound[0] in word[0] and bound[1] in word[0] and bound[0] in word[1] and bound[1] in word[1]:
f1 = word[0].find(bound[0])
l1 = word[0].find(bound[1])
f2 = word[1].find(bound[0])
l2 = word[1].find(bound[1])
blen1 = len(bound[1])
blen2 = len(bound[1])
s1_n = word[0][:f1] + word[0][l1 + blen1:]
s2_n = word[1][:f2] + word[0][f1:l1 + blen1] + word[1][l2 + blen2]
result.append([s1_n, s2_n, bound[0], bound[1]])
return result
print(bound('DOGYOMAMA','ROOGMEMAD', 'OG','MA'))
output:
[['DMA', 'ROOGYOMAD', 'OG', 'MA'], ['AMAMOYAMOYGOD', 'DAMEME', 'GO', 'AM']]

In python 3.6, how to filter output of help built-in function using wildcards from interactive prompt?

I realize this isn't correct syntax. But, Here's what I want to do:
>>> import MyLib.MyMod
>>> help("MyLib.MyMod.DoXyz*")
Is there a way to filter the output of the help command such that I only get Functions starting with string "DoXyz"?
Also, Is there a way to put output of Help command in alphabetical order at the same time?
I was able to create something similar after reading some ideas from COLDSPEED.
def help2(mpath, filter=None, mode=0):
"""Search help with filter. Example: help2(MyMod, 'DoXyz*')"""
if not filter:
sg = True
elif not mode:
r1 = "^" + filter
r1.replace('*', '\u0001')
r1.replace('\u0001', '.*')
else:
r1 = filter
for x in sorted(dir(mpath)):
if not x.startswith('_'):
if filter:
sg = re.search(r1, x, re.IGNORECASE)
if sg:
e2 = getattr(mpath, x)
f = x + "():"
d = e2.__doc__
if d:
L = d.splitlines()[0]
else:
L = "(none)"
print("%-30s %s" % (f, L))

Create a Python list filled with the same string over and over and a number that increases based on a variable.

I'm trying to create a list that is populated by a reoccurring string and a number that marks which one in a row it is. The number that marks how many strings there will be is gotten from an int variable.
So something like this:
b = 5
a = range(2, b + 1)
c = []
c.append('Adi_' + str(a))
I was hoping this would create a list like this:
c = ['Adi_2', 'Adi_3', 'Adi_4', 'Adi_5']
Instead I get a list like this
c = ['Adi_[2, 3, 4, 5]']
So when I try to print it in new rows
for x in c:
print"Welcome {0}".format(x)
The result of this is:
Welcome Adi_[2, 3, 4, 5]
The result I want is:
Welcome Adi_2
Welcome Adi_3
Welcome Adi_4
Welcome Adi_5
If anybody has Ideas I would appreciate it.
You almost got it:
for i in a:
c.append('Adi_' + str(i))
Your initial line was transforming the whole list a as a string.
Note that you could get rid of the loop with a list comprehension and some string formatting:
c = ['Adi_%s' % s for s in a]
or
c = ['Adi_{0}'.format(s) for s in a] #Python >= 2.6
Or as a list comprehension:
b = 5
a = range(2, b + 1)
c = ["Adi_" + str(i) for i in a]
Using list comprehensions:
b = 5
a = range(2, b + 1)
c = ['Adi_'+str(i) for i in a]
for x in c:
print"Welcome {0}".format(x)
Or all on one line:
>>> for s in ['Welcome Adi_%d' % i for i in range(2,6)]:
... print s
...
Welcome Adi_2
Welcome Adi_3
Welcome Adi_4
Welcome Adi_5

Categories

Resources