For a homework assignment I have a filepath called P, and a string called S which is equal to 'parrot', I need to search P for S and output the number of times S appears. I cannot use regexs.
this is my code:
matches = []
matches2 = []
def file_reading(P, S):
file1 = open(P, 'r')
matches.append(S)
file1.close()
for S in P:
matches2.append(S)
print (len(matches2))
The output should be 3 but this only outputs 1, can someone point me in the right direction? if more details are needed let me know and I will edit them in.
In order to search how many times S appears in P, you can simply do the following.
P = "/home/shan/shan/shan/editshanfile/exe"
S = "shan"
parts = P.split(S)
print (len(parts)-1)
Open the file using given path P
Read the file into a variable
Search that variable for the target string S
Close the file
Print the output
I suspect string.count(string2) is what you're looking for:
>>> big_string = 'a' * 100 + 'parrot' + 'b' * 20 + 'parrot' + 'c' * 50 + 'parrot'
>>> len(big_string)
188
>>> big_string.count('parrot')
3
>>>
Related
I have two address like:
first_address = 'Красноярский край, г Красноярск, пр-кт им газеты Красноярский Рабочий, 152г, квартира (офис) /1'
second_address = 'Красноярский край, г Красноярск, пр-кт им.газеты "Красноярский рабочий", 152г'
And I want to replace all text before квартира (офис) /1
My code looks like:
c = first_address.split(',')
v = second_address.split(',')
b = c[:len(v)]
b = v
n = c[len(v)::]
f = ''.join(str(b)) + ''.join(str(n))
I get output:
['Красноярский край', ' г Красноярск', ' пр-кт им.газеты "Красноярский рабочий"', ' 152г'][' квартира (офис) /1']
How can I easily make this?
Looks like you want to take substrings from second_address until they run out, then use substrings from first_address. Here's a straightforward way to do it.
first_subs = first_address.split(',')
second_subs = second_address.split(',')
[(f if s is None else s)
for (f, s) in zip(first_subs,
second_subs + [None] * (len(first_subs) - len(second_subs)))]
In ArcGIS I have intersected a large number of zonal polygons with another set and recorded the original zone IDs and the data they are connected with. However the strings that are created are one long list of numbers ranging from 11 to 77 (each ID is 11 characters long). I am looking to add a "," between each one making, it easier to read and export later as a .csv file. To do this I wrote this code:
def StringSplit(StrO,X):
StrN = StrO #Recording original string
StrLen = len(StrN)
BStr = StrLen/X #How many segments are inside of one string
StrC = BStr - 1 #How many times it should loop
if StrC > 0:
while StrC > 1:
StrN = StrN[ :((X * StrC) + 1)] + "," + StrN[(X * StrC): ]
StrC = StrC - 1
while StrC == 1:
StrN = StrN[:X+1] + "," + StrN[(X*StrC):]
StrC = 0
while StrC == 0:
return StrN
else:
return StrN
The main issue is how it has to step through multiple rows (76) with various lengths (11 -> 77). I got the last parts to work, just not the internal loop as it returns an error or incorrect outputs for strings longer than 22 characters.
Thus right now:
1. 01234567890 returns 01234567890
2. 0123456789001234567890 returns 01234567890,01234567890
3. 012345678900123456789001234567890 returns either: Error or ,, or even ,,01234567890
I know it is probably something pretty simple I am missing, but I can't seem remember what it is...
It can be easily done by regex.
those ........... are 11 dots for give split for every 11th char.
you can use pandas to create csv from the array output
Code:
import re
x = re.findall('...........', '01234567890012345678900123456789001234567890')
print(x)
myString = ",".join(x)
print(myString)
output:
['01234567890', '01234567890', '01234567890', '01234567890']
01234567890,01234567890,01234567890,01234567890
for the sake of simplicity you can do this
code:
x = ",".join(re.findall('...........', '01234567890012345678900123456789001234567890'))
print(x)
Don't make the loops by yourself, use python libraries or builtins, it will be easier. For example :
def StringSplit(StrO,X):
substring_starts = range(0, len(StrO), X)
substrings = (StrO[start:start + X] for start in substring_starts)
return ','.join(substrings)
string = '1234567890ABCDE'
print(StringSplit(string, 5))
# '12345,67890,ABCDE'
I have to change the res variable value in the next code for every loop it does.
txt = open(os.path.expanduser('~FOLDER\\numbers.txt'), 'r')
res = txt.read().splitlines()
u = [something]
for item in u:
var['Number : ' + res[0]]
txt variable contains a text file. In this text file there some lines of numbers in this format:
123
1234
125342
562546
I have to take a variable for each loop the script does and assign to res. At the moment, with res[0] it only iterate the same number (ex: 123) on every loop. How can I solve the problem ?
It should be 0 at first, 1 at second ad so on...
I think this should do the job :
with open(os.path.expanduser('~FOLDER\\numbers.txt'), 'r') as res:
for line in res:
var['Number ': line]
More info here
txt = open(os.path.expanduser('~FOLDER\\numbers.txt'), 'r')
res = txt.read().splitlines()
u = [something]
for index, item in enumerate(u):
var['Number : ' + res[index]]
More info about enumerate here: https://docs.python.org/2/library/functions.html#enumerate
I assume you want to iterate through u and v simultaneously. In this case, you either want to just use a plain index using for loop over a range, or you could use enumerate as follows:
txt = open(os.path.expanduser('~FOLDER\\numbers.txt'), 'r')
res = txt.read().splitlines()
u = [something]
for index, item in enumerate(u):
var['Number : ' + res[index]]
I realize this isn't correct syntax. But, Here's what I want to do:
>>> import MyLib.MyMod
>>> help("MyLib.MyMod.DoXyz*")
Is there a way to filter the output of the help command such that I only get Functions starting with string "DoXyz"?
Also, Is there a way to put output of Help command in alphabetical order at the same time?
I was able to create something similar after reading some ideas from COLDSPEED.
def help2(mpath, filter=None, mode=0):
"""Search help with filter. Example: help2(MyMod, 'DoXyz*')"""
if not filter:
sg = True
elif not mode:
r1 = "^" + filter
r1.replace('*', '\u0001')
r1.replace('\u0001', '.*')
else:
r1 = filter
for x in sorted(dir(mpath)):
if not x.startswith('_'):
if filter:
sg = re.search(r1, x, re.IGNORECASE)
if sg:
e2 = getattr(mpath, x)
f = x + "():"
d = e2.__doc__
if d:
L = d.splitlines()[0]
else:
L = "(none)"
print("%-30s %s" % (f, L))
following is my code. not finding any comments, I will add my codes.
filenames2 = ['BROWN1_L1.txt', 'BROWN1_M1.txt', 'BROWN1_N1.txt', 'BROWN1_P1.txt', 'BROWN1_R1.txt']
with open("C:/Python27/L1_R1_TRAINING.txt", 'w') as outfile:
for fname in filenames2:
with open(fname) as infile:
for line in infile:
outfile.write(line)
b = open("C:/Python27/L1_R1_TRAINING.txt", 'rU')
filenames3 =[]
for path, dirs, files in os.walk("C:/Python27/Reutertest"):
for file in files:
file = os.path.join(path, file)
filenames3.append(file)
with open("C:/Python27/REUTER.txt", 'w') as outfile:
for fname in filenames3:
with open(fname) as infile:
for line in infile:
outfile.write(line)
c = open("C:/Python27/REUTER.txt", 'rU')
def Cross_Entropy(x,y):
filecontents1 = x.read()
filecontents2 = y.read()
sentence1 = filecontents1.upper()
sentence2 = filecontents2.upper()
count_A1 = sentence1.count('A')
count_B1 = sentence1.count('B')
count_C1 = sentence1.count('C')
count_all1 = len(sentence1)
prob_A1 = count_A1 / count_all1
prob_B1 = count_B1 / count_all1
prob_C1 = count_C1 / count_all1
count_A2 = sentence2.count('A')
count_B2 = sentence2.count('B')
count_C2 = sentence2.count('C')
count_all2 = len(sentence2)
prob_A2 = count_A2 / count_all2
prob_B2 = count_B2 / count_all2
prob_C2 = count_C2 / count_all2
Cross_Entropy = -(prob_A1 * math.log(prob_A2, 2) + prob_B1 * math.log(prob_B2, 2) + prob_C1 * math.log(prob_C2, 2)
Cross_Entropy(b, c)
Yes. now. I'v got error "prob_A1 = count_A1 / count_all1
ZeroDivisionError: division by zero" . what's wrong with my code? Is my orthography is wrong?
I'm not quite sure what is behind your failure to read your strings from the files, but your cross-entropy can be computed much more succinctly:
def crossEntropy(s1,s2):
s1 = s1.upper()
s2 = s2.upper()
probsOne = (s1.count(c)/float(len(s1)) for c in 'ABC')
probsTwo = (s2.count(c)/float(len(s2)) for c in 'ABC')
return -sum(p*math.log(q,2) for p,q in zip(probsOne,probsTwo))
For example,
>>> crossEntropy('abbcabcba','abbabaccc')
1.584962500721156
If this is what you want to compute -- you can now concentrate on assembling the strings to pass to crossEntropy. I would recommend getting rid of the read-write-read logic (unless you need the two files that you are trying to create) and just directly read the files in the two directories into two arrays, joining them into two big strings which are stripped of all white space and then passed to crossEntropy
Another possible approach. If all you want are the counts of 'A', 'B', 'C' in the two directories -- just create two dictionaries, one for each directory, both keyed by 'A', 'B', and 'C', and iterate through the files in each directory, reading each file in turn, iterating through but not saving the resulting string, just getting the counts of those three characters, and creating a version of crossEntropy which is expecting two dictionaries.
Something like:
def crossEntropy(d1,d2):
countOne = sum(d1[c] for c in 'ABC')
countTwo = sum(d2[c] for c in 'ABC')
probsOne = (d1[c]/float(countOne) for c in 'ABC')
probsTwo = (d2[c]/float(countTwo) for c in 'ABC')
return -sum(p*math.log(q,2) for p,q in zip(probsOne,probsTwo))
For example,
>>> d1 = {'A':3,'B':5,'C':2}
>>> d2 = {'A':2,'B':5,'C':3}
>>> crossEntropy(d1,d2)
1.54397154729945