Why do identical strings return false with ==?

Why do identical strings return false with ==? - python

This is the simplified version of my problem.
QA = open('Qestions and answers.txt')
Q = []
A = []
for line in QA:
(first,second) = line.split(';')
Q.append(first)
A.append(second)
QA.close()
print(A[0], A[1])
print(A[0] == '1981')
print(A[1] == 'Feb')
print(str(A[0]) == '1981') # I even tried str
print(str(A[1]) == "Feb")
Output:
1981
Feb
False
False
False
False

You've got extra whitespace in there. My guess is this:
print(repr(A[0]))
Output:
'1981\n'
This is because when you read lines from a file, you will get the line breaks at the end of each line as well. If you don't want that, strip them out.
for line in QA:
line = line.rstrip('\n')
...

strip() works for this problem
print(A[0].strip() == '1981')
print(A[1].strip() == 'Feb')
True
True

Related

Error after elif statement is added to df.iterrows() loop

So I was not obtaining any syntax errors with the first if statement. When I add the second elif, i get a syntax error for the next line:
NPI = df["NPI2"]
^
SyntaxError: invalid syntax
Not sure why this is happening since the first if statement is essentially the same as the elif
Here's my code:
for i, row in df.iterrows():
NPI2 = row["NPI2"]
if row["CapDesignation"] == "R" and row["BR"] >= row["CapThreshold"]:
NPI2 = row["CapThreshold"]*row['KBETR']-.005
df.at[i,"NPI2"] = NPI2
df.at[i,"BR"] = NPI2/row["KBETR"]
elif row["CapDesignation"] == "N" and row["BN"] >= row["CapThreshold"]:
NPI2 = row["CapThreshold"]*(row["KBETR"]-row["P"])-.005
df.at[i,"NPI2"] = NPI2
df.at[i,"BN"] = NPI2/((row["KBETR"]-row["P"])
NPI = df["NPI2"]
df["KBETR_N"] = round(R + Delta_P + NPI,2)

Not sure where the error is without the traceback message, but it should be:
df.at[i,"BN"] = NPI2/((row["KBETR"]-row["P"]) where you have 2 parentheses in ((row["KBETR"] instead of 1(if you wish to have row["KBETR"]-row["P"] surrounded by parentheses).

How to find byte sequence in file?

I have a binary file, in which I need to change certain bit.
That bit's byte's address is relative to some byte sequence (some ASCII string):
content = array('B')
with open(filename, mode="r+b") as file:
content.fromfile(file, os.fstat(file.fileno()).st_size)
abc = [ord(letter) for letter in "ABC"]
i = content.index(abc) // ValueError: array.index(x): x not in list
content[i + 0x16] |= 1
content.tofile(file)
However as I must confess to my shame, that after Googling far and wide, I couldn't find the method to get the index of that "ABC" string...
Sure, I can write a function that does it with loops, but I can't believe there is no one-liner (OK, even two...) that accomplishes it.
How can it be done?

Not sure if this is the most Pythonic way, but this works. In this file
$ cat so.bin
���ABC̻�X��w
$ hexdump so.bin
0000000 eeff 41dd 4342 bbcc 58aa 8899 0a77
000000e
Edit: New solution starts here.
import string
char_ints = [ord(c) for c in string.ascii_letters]
with open("so.out.bin", "wb") as fo:
with open("so.bin", "rb") as fi:
# Read bytes but only keep letters.
chars = []
for b in fi.read():
if b in char_ints:
chars.append(chr(b))
else:
chars.append(" ")
# Search for 'ABC' in the read letters.
pos = "".join(chars).index("ABC")
# We now know the position of the intersting byte.
pos_x = pos + len("ABC") + 3 # known offset
# Now copy all bytes from the input to the output, ...
fi.seek(0)
i = 0
for b in fi.read():
# ... but replace the intersting byte.
if i == pos_x:
fo.write(b"Y")
else:
fo.write(bytes([b]))
i = i + 1
Edit: New solution ends here.
I want to get the X four positions after ABC. A little state keeping locates the position of ABC, skips the offset, prints the interesting bytes.
foundA = False
foundB = False
foundC = False
found = False
offsetAfterC = 3
lengthAfterC = 1
with open("so.bin", "rb") as f:
pos = 0
for b in f.read():
pos = pos + 1
if not found:
if b == 0x41:
foundA = True
elif foundA and b == 0x42:
foundB = True
elif foundA and foundB and b == 0x43:
foundC = True
else:
foundA, foundB, foundC = False, False, False
if foundA and foundB and foundC:
found = True
break
f.seek(0)
i = 0
while i < pos + offsetAfterC:
b = f.read(1)
i = i + 1
while i < pos + offsetAfterC + lengthAfterC:
b = f.read(1)
print(hex(int.from_bytes(b, byteorder="big")))
i = i + 1
Output:
0x58

Errors When Formatting Code In Python

I know this is probably quite a simple problem but I am having an issue with the formatting of my function. I am getting a lot of 'unexpected indent' and also 'unexpected token'. I keep trying to format the function correctly but I have no idea why these errors keep on appearing. Here is my function:
def stringCheck(stringForCheck, letterOrNumber):
valid = True
x = 0
a = int(ord(stringForCheck)
length = len(stringForCheck)
if LetterOrNumber == 'Letter':
lowerBoundary = 65
upperBoundary = 90
elif LetterOrNumber == 'Number':
lowerBoundary = 48
upperBoundary = 57
while valid == True and x < length:
if a < lowerBoundary or a > upperBoundary:
valid = False
else:
valid = True
x = x + 1
stringCheck = valid
stringCheck('2','Number')

Remove the unnecessary blank lines
You are missing a closing bracket here: a = int(ord(stringForCheck)
From the line if LetterOrNumber == 'Letter': to your while loop the lines have one indentation level too much.
After fixing the code it should look something like this:
def stringCheck(stringForCheck, letterOrNumber):
valid = True
x = 0
a = int(ord(stringForCheck))
length = len(stringForCheck)
if LetterOrNumber == 'Letter':
lowerBoundary = 65
upperBoundary = 90
elif LetterOrNumber == 'Number':
lowerBoundary = 48
upperBoundary = 57
while valid is True and x < length:
if a < lowerBoundary or a > upperBoundary:
valid = False
else:
valid = True
x = x + 1
stringCheck = valid
stringCheck('2', 'Number')

Try adding a close bracket after the line
a = int(ord(stringForCheck))

Python: Search in file, replace preceding entry

I am trying to alter an existing ASCII data file in a specific way.
The way I would like to go is to find find either one string from an array, which I define beforehand.
If this string is found in the file I would like to change the preceding entry; the string to put in here depends on which of the strings is found in the first place.
I have a file, where the entrys are separated by spaces and I have trailing spaces at the end to fill up 30 columns. The respective strings would not be in the first line and there would never be more than one per line. An example could look like this:
test01out.txt:
a0997 b0998 c0999
a1000 b1001 c1002
a1003 b1004 c1005
a1006 a1000 c1007
a1008 b1009 c1010
b1001 b1011 c1012
a1013 b1014 b1001
a1015 b1016 c1017
The file does not necessarily have to have three columns in a row. It is possible, that a row has only two but can also have four or five columns.
My current attempt was the following:
from numpy import *
findlines = open("test01.txt").read().split("\n")
searcharray = array(["a1000","b1001"])
alterarray = array(["this1","this2"])
tempstring_current = ""
fileout = open("test01out.txt", "w")
for i, line in enumerate(findlines):
tempstring_last = tempstring_current
tempstring_current = line.rstrip().split(" "))
if any(x in tempstring_current for x in searcharray): # check if one of the elements is in the current line -> unfortunately this seems to be true for any line checked...
print(i)
print(tempstring_current)
for j, element in enumerate(tempstring_current):
if any(searcharray == tempstring_current):
currentsearchindex = argmax(searcharray == tempstring_current)
currentalterstring = alterarray[currentsearchindex]
if currentsearchindex == 0:
tempstring_last.split(" ")[-1] = currentalterstring
else:
tempstring_current.split(" ")[currentsearchindex - 1] = currentalterstring
tempstring_current.split(" ")[currentsearchindex-1] = "XPRZeugs_towrite" + repr(currentdesignatedspeed)
tempstring_last = tempstring_last.ljust(30)
try:
fileout.write(str(tempstring_last))
fileout.write("\r")
try:
fileout.close()
searcharray and alterarray would have some more elements, than two.
I have tested the script up to the any condition; unfortunately the any conditions seems to be met always for some reason I do not quite understand:
from numpy import *
findlines = open("test01.txt").read().split("\n")
searcharray = array(["a1000","b1001"])
alterarray = array(["this1","this2"])
tempstring_current = ""
fileout = open("test01out.txt", "w")
for i, line in enumerate(findlines):
tempstring_last = tempstring_current
tempstring_current = line.rstrip().split(" ")
if any(x in tempstring_current for x in searcharray): # check if one of the elements is in the current line -> unfortunately this seems to be true for any line checked...
print(i)
print(tempstring_current)
I get the lines printed for every line in the file, which I did not expect.
Edit/Solution:
I realized I made a mistake in the input testfile:
It should look like this:
a0997 b0998 c0999
a1000 b1001 c1001
a1003 b1004 c1005
a1006 a1000 c1007
a1008 b1009 c1010
c1002 b1011 c1012
a1013 b1014 c1002
a1015 b1016 c1017
The full code doing the job is the following:
from numpy import *
findlines = open("test01.txt").read().split("\n")
searcharray = array(["a1000","c1002"])
alterarray = array(["this1","this2"])
tempstring_current = ""
fileout = open("test01out.txt", "w")
for i, line in enumerate(findlines):
tempstring_last = tempstring_current
tempstring_current = line.rstrip().split(" ")
if any([x in tempstring_current for x in searcharray]): # check if one of the elements is in the current line -> unfortunately this seems to be true for any line checked...
# print(i)
# print(tempstring_current)
# print(searcharray)
# print([x in tempstring_current for x in searcharray])
# print(argmax([x in tempstring_current for x in searcharray]))
currentsearchposindex = argmax([x in tempstring_current for x in searcharray]) # welchen Index hat das entsprechende Element im Searcharray?
currentalterstring = alterarray[currentsearchposindex] # was ist der entsprechende Eintrag im Alterarray
for j, currentXPRelement in enumerate(tempstring_current):
if currentXPRelement == searcharray[currentsearchposindex]:
currentsearchindex_intemparray = j
# print(len(tempstring_current))
# print(searcharray[currentsearchposindex])
# print(tempstring_current == searcharray[currentsearchposindex])
# print(searcharray[currentsearchposindex] == tempstring_current)
# print(argmax(tempstring_current == searcharray[currentsearchposindex]))
# currentsearchindex_intemparray = argmax(tempstring_current == searcharray[currentsearchposindex])
if currentsearchindex_intemparray == 0:
tempstring_last[-1] = currentalterstring
else:
tempstring_current[currentsearchindex_intemparray - 1] = currentalterstring
# tempstring_current[currentsearchindex_intemparray-1] = "XPRZeugs_towrite" + repr(currentalterstring)
tempstring_last = str(" ".join(tempstring_last)).ljust(30)
if not i == 0:
try:
fileout.write(str(tempstring_last))
fileout.write("\r")
finally:
None
try:
fileout.write(" ".join(tempstring_current))
fileout.write("\r")
fileout.close()
finally:
None

To fix your code so at least it can fail to always match, change
if any(x in tempstring_current for x in searcharray):
to
if any([x in tempstring_current for x in searcharray]):
I think the reason is that the 'x in tempstring_current for x in searcharray' expression returns an interator function - any() says 'this value (i.e. the iterator function reference) is not None so it is True', so the result is always True. The changed syntax creates a list from the iterator and then any works as you probably wanted, i.e. it returns true if any element in the list is true.

Word count with pattern in Python

So this is the question:
Write a program to read in multiple lines of text and count the number
of words in which the rule i before e, except after c is broken, and
number of words which contain either ei or ie and which don't break
the rule.
For this question, we only care about the c if it is the character
immediately before the ie or the ei. So science counts as breaking the
rule, but mischievous doesn't. If a word breaks the rule twice (like
obeisancies), then it should still only be counted once.
Example given:
Line: The science heist succeeded
Line: challenge accepted
Line:
Number of times the rule helped: 0
Number of times the rule was broken: 2
and my code:
rule = []
broken = []
line = None
while line != '':
line = input('Line: ')
line.replace('cie', 'broken')
line.replace('cei', 'rule')
line.replace('ie', 'rule')
line.replace('ei', 'broken')
a = line.count('rule')
b = line.count('broken')
rule.append(a)
broken.append(b)
print(sum(a)); print(sum(b))
How do I fix my code, to work like the question wants it to?

I'm not going to write the code to your exact specification as it sounds like homework but this should help:
import pprint
words = ['science', 'believe', 'die', 'friend', 'ceiling',
'receipt', 'seize', 'weird', 'vein', 'foreign']
rule = {}
rule['ie'] = []
rule['ei'] = []
rule['cei'] = []
rule['cie'] = []
for word in words:
if 'ie' in word:
if 'cie' in word:
rule['cie'].append(word)
else:
rule['ie'].append(word)
if 'ei' in word:
if 'cei' in word:
rule['cei'].append(word)
else:
rule['ei'].append(word)
pprint.pprint(rule)
Save it to a file like i_before_e.py and run python i_before_e.py:
{'cei': ['ceiling', 'receipt'],
'cie': ['science'],
'ei': ['seize', 'weird', 'vein', 'foreign'],
'ie': ['believe', 'die', 'friend']}
You can easily count the occurrences with:
for key in rule.keys():
print "%s occured %d times." % (key, len(rule[key]))
Output:
ei occured 4 times.
ie occured 3 times.
cie occured 1 times.
cei occured 2 times.

Firstly, replace does not chance stuff in place. What you need is the return value:
line = 'hello there' # line = 'hello there'
line.replace('there','bob') # line = 'hello there'
line = line.replace('there','bob') # line = 'hello bob'
Also I would assume you want actual totals so:
print('Number of times the rule helped: {0}'.format(sum(rule)))
print('Number of times the rule was broken: {0}'.format(sum(broken)))
You are printing a and b. These are the numbers of times the rule worked and was broken in the last line processed. You want totals.
As a sidenote: Regular expressions are good for things like this. re.findall would make this a lot more sturdy and pretty:
line = 'foo moo goo loo foobar cheese is great '
foo_matches = len(re.findall('foo', line)) # = 2

Let's split the logic up into functions, that should help us reason about the code and get it right. To loop over the line, we can use the iter function:
def rule_applies(word):
return 'ei' in word or 'ie' in word
def complies_with_rule(word):
if 'cie' in word:
return False
if word.count('ei') > word.count('cei'):
return False
return True
helped_count = 0
broken_count = 0
lines = iter(lambda: input("Line: "), '')
for line in lines:
for word in line.split():
if rule_applies(word):
if complies_with_rule(word):
helped_count += 1
else:
broken_count += 1
print("Number of times the rule helped:", helped_count)
print("Number of times the rule was broken:", broken_count)
We can make the code more concise by shortening the complies_with_rule function and by using generator expressions and Counter:
from collections import Counter
def rule_applies(word):
return 'ei' in word or 'ie' in word
def complies_with_rule(word):
return 'cie' not in word and word.count('ei') == word.count('cei')
lines = iter(lambda: input("Line: "), '')
words = (word for line in lines for word in line.split())
words_considered = (word for word in words if rule_applies(word))
did_rule_help_count = Counter(complies_with_rule(word) for word in words_considered)
print("Number of times the rule helped:", did_rule_help_count[True])
print("Number of times the rule was broken:", did_rule_help_count[False])

If I understand correctly, your main problematic is to get unique result per word. Is that what you try to achieve:
rule_count = 0
break_count = 0
line = None
while line != '':
line = input('Line: ')
rule_found = False
break_found = False
for word in line.split():
if 'cie' in line:
line = line.replace('cie', '')
break_found = True
if 'cei' in line:
line = line.replace('cei', '')
rule_found = True
if 'ie' in line:
rule_found = True
if 'ei' in line:
break_found = True
if rule_found:
rule_count += 1
if break_found:
break_count += 1
print(rule_found); print(break_count)

rule = []
broken = []
tb = 0
tr = 0
line = ' '
while line:
lines = input('Line: ')
line = lines.split()
for word in line:
if 'ie' in word:
if 'cie' in word:
tb += 1
elif word.count('cie') > 1:
tb += 1
elif word.count('ie') > 1:
tr += 1
elif 'ie' in word:
tr += 1
if 'ei' in word:
if 'cei' in word:
tr += 1
elif word.count('cei') > 1:
tr += 1
elif word.count('ei') > 1:
tb += 1
elif 'ei' in word:
tb += 1
print('Number of times the rule helped: {0}'.format(tr))
print('Number of times the rule was broken: {0}'.format(tb))
Done.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why do identical strings return false with ==? - python

You've got extra whitespace in there. My guess is this: print(repr(A[0])) Output: '1981\n' This is because when you read lines from a file, you will get the line breaks at the end of each line as well. If you don't want that, strip them out. for line in QA: line = line.rstrip('\n') ...

strip() works for this problem print(A[0].strip() == '1981') print(A[1].strip() == 'Feb') True True

Related

Error after elif statement is added to df.iterrows() loop

How to find byte sequence in file?

Errors When Formatting Code In Python

Python: Search in file, replace preceding entry

Word count with pattern in Python

Categories

Resources