Why string getting from file is not equal to common string? [duplicate] - python

This question already has answers here:
Is there a difference between "==" and "is"?
(13 answers)
Closed 6 years ago.
I am on python 3.5 and want to find the matched words from a file. The word I am giving is awesome and the very first word in the .txt file is also awesome. Then why addedWord is not equal to word? Can some one give me the reason?
myWords.txt
awesome
shiny
awesome
clumsy
Code for matching
addedWord = "awesome"
with open("myWords.txt" , 'r') as openfile:
for word in openfile:
if addedWord is word:
print ("Match")
I also tried as :
d = word.replace("\n", "").rstrip()
a = addedWord.replace("\n", "").rstrip()
if a is d:
print ("Matched :" +word)
I also tried to get the class of variables by typeOf(addedWord) and typeOf(word) Both are from 'str' class but are not equal. Is any wrong here?

There are two problems with your code.
1) Strings returned from iterating files include the trailing newline. As you suspected, you'll need to .strip(), .rstrip() or .replace() the newline away.
2) String comparison should be performed with ==, not is.
So, try this:
if addedWord == word.strip():
print ("Match")

Those two strings will never be the same object, so you should not use is to compare them. Use ==.
Your intuition to strip off the newlines was spot-on, but you just need a single call to strip() (it will strip all whitespace including tabs and newlines).

Related

Looking for a way to correctly strip a string [duplicate]

This question already has answers here:
python split() vs rsplit() performance?
(5 answers)
Closed 2 years ago.
I'm using the Spotify API to get song data from a lot of songs. To this end, I need to input the song URI intro an API call. To obtain the song URI's, I'm using another API endpoint. It returns the URI in this form: 'spotify:track:5CQ30WqJwcep0pYcV4AMNc' I only need the URI part,
So I used 'spotify:track:5CQ30WqJwcep0pYcV4AMNc'.strip("spotify:track) to strip away the first part. Only this did not work as expected, as this call also removes the trailing "c".
I tried to built a regex to strip away the first part, but instructions were too complicated and D**K is now stuck in ceiling fan :'(. Any help would be greatly appreciated.
strip() removes all the leading and trailing characters that are in the in the argument string, it doesn't match the string exactly.
You can use replace() to remove an exact string:
'spotify:track:5CQ30WqJwcep0pYcV4AMNc'.replace("spotify:track:", "")
or split it at : characters:
'spotify:track:5CQ30WqJwcep0pYcV4AMNc'.split(":")[-1]
Use simple regex replace:
import re
txt = 'spotify:track:5CQ30WqJwcep0pYcV4AMNc'
pat_to_strip = ['^spotify\:track', 'MNc$']
pat = f'({")|(".join(pat_to_strip)})'
txt = re.sub(pat, '', txt)
# outputs:
>>> txt
:5CQ30WqJwcep0pYcV4A
Essentially the patterns starting with ^ will be stripped from the beginning, and the ones ending with $ will be stripped from the end.
I stripped last 3 letters just as an example.

Is there a string function similar to "split()" that works for strings without a repeated character? [duplicate]

This question already has answers here:
How do I split a string into a list of characters?
(15 answers)
Closed 2 years ago.
I want to split the ascii_letters* intoa list (in the string module) and it doesn't have any repeated characters. I tried to put the split marker as '' but that didn't work; I got an ValueError: empty separator message. Is there a string manipulator other than split() which I can use? I might be able to put spaces in, but that may become tedious and might take up a lot of code space.
import string
letters = string.ascii_letters
print(letters.split(''))
*The ascii_letters is a string that contains 'abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ'
list(letters)
might be what you are looking for.
You can use a regex to split a string using split() of the re module.
re.split(r'.', str)
To split at every character.
Or simply use list(str) to get the list of characters as suggested by #Klaus D.

Python regex: having trouble understanding results [duplicate]

This question already has answers here:
Removing a list of characters in string
(20 answers)
Closed 3 years ago.
I have a dataframe that I need to write to disk but pyspark doesn't allow any of these characters ,;{}()\\n\\t= to be present in the headers while writing as a parquet file.
So I wrote a simple script to detect if this is happening
import re
for each_header in all_headers:
print(re.match(",;{}()\\n\\t= ", each_header))
But for each header, None was printed. This is wrong because I know my file has spaces in its headers.
So, I decided to check it out by executing the following couple of lines
a = re.match(",;{}()\\n\\t= ", 'a s')
print(a)
a = re.search(",;{}()\\n\\t= ", 'a s')
print(a)
This too resulted in None getting printed.
I am not sure what I am doing wrong here.
PS: I am using python3.7
The problem is that {} and also () are regex metacharacters, and have a special meaning. Perhaps the easiest way to write your logic would be to use the pattern:
[,;{}()\n\t=]
This says to match the literal characters which PySpark does not allow to be present in the headers.
a = re.match("[,;{}()\n\t=]", 'a s')
print(a)
If you wanted to remove these characters, you could try using re.sub:
header = '...'
header = re.sub(r'[,;{}()\n\t=]+', '', header)
If you want to check whether a text contains any of the "forbidden"
characters, you have to put them between [ and ].
Another flaw in your regex is that in "normal" strings (not r-strings)
any backslash should be doubled.
So change your regex to:
"[,;{}()\\n\\t= ]"
Or use r-string:
r"[,;{}()\n\t= ]"
Note that I included also a space, which you missed.
One more remark: {} and () have special meaning, but outside [...].
Between [ and ] they represent themselves, so they need no
quotation with a backslash.
As already explained you could use regex for looking for forbidden characters, I want to add that you could do it without using regex following way:
forbidden = ",;{}()\n\t="
def has_forbidden(txt):
for i in forbidden:
if i in txt:
return True
return False
print(has_forbidden("ok name")) # False
print(has_forbidden("wrong=name")) # True
print(has_forbidden("with\nnewline")) # True
Note that using this approach you do not have to care about escaping special-regex characters, like for example *.

What's the quickest way to find the number of spaces that begin a string in Python? [duplicate]

This question already has answers here:
What is the pythonic way to count the leading spaces in a string?
(7 answers)
Closed 5 years ago.
What's the quickest way to find the number of spaces that begin a string? I'm wanting this to calculate things like how nested my space-indents are (when text parsing).
E.g.
s=" There are five spaces"
num=num_start_spaces(s) #it equals five
I probably wouldn't have asked this, but I noticed I didn't find a quick reference for it anywhere (so, I thought I'd do my own Q/A; if you have another way, feel free to contribute!).
Here's an alternative answer:
def countspaces(x):
for i, j in enumerate(x):
if j != ' ':
return i
s=" There are five spaces"
countspaces(s) # 5
One can use str.lstrip() method and take the difference of the lengths of both strings, which will be the number of spaces that begin the string.
def num_start_spaces(text):
return len(text)-len(text.lstrip(" "))
print(num_start_spaces(" spaces"))
The above prints "8 spaces".
Edit: I enhanced this answer above using info from the duplicate question.
However, for the task at hand, I think doing this alone in the context stated is going to be a little tedious and have a lot of overhead. Before you do any text parsing, you'll probably want to use it to make a list of the indents per line (and then when you iterate through the lines, you'll have that for quick reference):
lines=myString.split("\n") #lines is the lines of the text we're parsing
indents=[] #Values are the number of indents on the line.
for line in lines:
spaces=num_start_spaces(line)
if spaces%4!=0:
raise ValueError("The spaces on line "+str(len(indents))+" are not zero or a multiple of four:", spaces)
indents.append(spaces/4)
i=0
while i<len(lines):
#Do text parsing here (and use indents for reference). We're in a while loop so we can reference lines before and after more easily than in a for loop.
i+=1

Iterating over full text lines instead of characters [duplicate]

This question already has answers here:
Iterate over the lines of a string
(6 answers)
Closed 7 years ago.
I noticed when I try to iterate over a file with lines such as
"python"
"please"
"work"
I only get individual characters back, such as,
"p"
"y"
"t"...
how could I get it to give me the full word? I've been trying a couple hours and can't find a method. I'm using the newest version of python.
Edit: All the quotation marks are new lines.
You can iterate over a file object:
for line in open('file'):
for word in line.split():
do_stuff(word)
See the docs for the details:
http://docs.python.org/2/library/stdtypes.html#bltin-file-objects
If you are storing the words as a string, you can split the words by space using split function.
>>> "python please work".split(' ')
['python', 'please', 'work']
If you have your data in a single string which spans several lines (e.g. it contains '\n' characters), you will need to split it before iterating. This is because iterating over a string (rather than a list of strings) will always iterate over characters, rather than words or lines.
Here's some example code:
text = "Spam, spam, spam.\Lovely spam!\nWonderful spam!"
lines = text.splitlines() # or use .split("\n") to do it manually
for line in lines:
do_whatever(line)

Categories

Resources