Iterating over full text lines instead of characters [duplicate]

Iterating over full text lines instead of characters [duplicate] - python

This question already has answers here:
Iterate over the lines of a string
(6 answers)
Closed 7 years ago.
I noticed when I try to iterate over a file with lines such as
"python"
"please"
"work"
I only get individual characters back, such as,
"p"
"y"
"t"...
how could I get it to give me the full word? I've been trying a couple hours and can't find a method. I'm using the newest version of python.
Edit: All the quotation marks are new lines.

You can iterate over a file object:
for line in open('file'):
for word in line.split():
do_stuff(word)
See the docs for the details:
http://docs.python.org/2/library/stdtypes.html#bltin-file-objects

If you are storing the words as a string, you can split the words by space using split function.
>>> "python please work".split(' ')
['python', 'please', 'work']

If you have your data in a single string which spans several lines (e.g. it contains '\n' characters), you will need to split it before iterating. This is because iterating over a string (rather than a list of strings) will always iterate over characters, rather than words or lines.
Here's some example code:
text = "Spam, spam, spam.\Lovely spam!\nWonderful spam!"
lines = text.splitlines() # or use .split("\n") to do it manually
for line in lines:
do_whatever(line)

Related

Joining results in Python [duplicate]

This question already has answers here:
How do I append one string to another in Python?
(12 answers)
How to concatenate (join) items in a list to a single string
(11 answers)
Which is the preferred way to concatenate a string in Python? [duplicate]
(12 answers)
Closed last month.
Im new to Python, like around an hour and a half into it new.. ive crawled my website using cewl to get a bespoke wordlist for password audits, i also want to combine randomly 3 of these words together.
IE Cewl wordlist ;
word1
word2
word3
word4
using a python script i want to further create another wordlist randomly joining 3 words together IE
word4word2word1
word1word3word4
word3word4word2
so far all ive come up with is;
import random
print(random.choice(open("test.txt").read().split()))
print (random.choice(open("test.txt").read().split()))
print(random.choice(open("test.txt").read().split()))
Whilst this is clearly wrong, it will give me 3 random words from my list i just want to join them without delimiter, any help for a complete novice would be massively appreciated

First thing to do is only read the words once and using a context manager so the file gets closed properly.
with open("test.txt") as f:
lines = f.readlines()
Then use random.sample to pick three words.
words = random.sample(lines, 3)
Of course, you probably want to strip newlines and other extraneous whitespace for each word.
words = random.sample([x.strip() for x in lines], 3)
Now you just need to join those together.

Using your code/style:
import random
wordlist = open("test.txt").read().split()
randomword = ''.join([random.choice(wordlist), random.choice(wordlist), random.choice(wordlist)])
print(randomword)
join is a method of the string type and it will join the elements of a list using the string as a delimiter. In this case we use an empty string '' and join a list made up of random choices from your test.txt file.

Looking for a way to correctly strip a string [duplicate]

This question already has answers here:
python split() vs rsplit() performance?
(5 answers)
Closed 2 years ago.
I'm using the Spotify API to get song data from a lot of songs. To this end, I need to input the song URI intro an API call. To obtain the song URI's, I'm using another API endpoint. It returns the URI in this form: 'spotify:track:5CQ30WqJwcep0pYcV4AMNc' I only need the URI part,
So I used 'spotify:track:5CQ30WqJwcep0pYcV4AMNc'.strip("spotify:track) to strip away the first part. Only this did not work as expected, as this call also removes the trailing "c".
I tried to built a regex to strip away the first part, but instructions were too complicated and D**K is now stuck in ceiling fan :'(. Any help would be greatly appreciated.

strip() removes all the leading and trailing characters that are in the in the argument string, it doesn't match the string exactly.
You can use replace() to remove an exact string:
'spotify:track:5CQ30WqJwcep0pYcV4AMNc'.replace("spotify:track:", "")
or split it at : characters:
'spotify:track:5CQ30WqJwcep0pYcV4AMNc'.split(":")[-1]

Use simple regex replace:
import re
txt = 'spotify:track:5CQ30WqJwcep0pYcV4AMNc'
pat_to_strip = ['^spotify\:track', 'MNc$']
pat = f'({")|(".join(pat_to_strip)})'
txt = re.sub(pat, '', txt)
# outputs:
>>> txt
:5CQ30WqJwcep0pYcV4A
Essentially the patterns starting with ^ will be stripped from the beginning, and the ones ending with $ will be stripped from the end.
I stripped last 3 letters just as an example.

Is there a string function similar to "split()" that works for strings without a repeated character? [duplicate]

This question already has answers here:
How do I split a string into a list of characters?
(15 answers)
Closed 2 years ago.
I want to split the ascii_letters* intoa list (in the string module) and it doesn't have any repeated characters. I tried to put the split marker as '' but that didn't work; I got an ValueError: empty separator message. Is there a string manipulator other than split() which I can use? I might be able to put spaces in, but that may become tedious and might take up a lot of code space.
import string
letters = string.ascii_letters
print(letters.split(''))
*The ascii_letters is a string that contains 'abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ'

list(letters)
might be what you are looking for.

You can use a regex to split a string using split() of the re module.
re.split(r'.', str)
To split at every character.
Or simply use list(str) to get the list of characters as suggested by #Klaus D.

Why string getting from file is not equal to common string? [duplicate]

This question already has answers here:
Is there a difference between "==" and "is"?
(13 answers)
Closed 6 years ago.
I am on python 3.5 and want to find the matched words from a file. The word I am giving is awesome and the very first word in the .txt file is also awesome. Then why addedWord is not equal to word? Can some one give me the reason?
myWords.txt
awesome
shiny
awesome
clumsy
Code for matching
addedWord = "awesome"
with open("myWords.txt" , 'r') as openfile:
for word in openfile:
if addedWord is word:
print ("Match")
I also tried as :
d = word.replace("\n", "").rstrip()
a = addedWord.replace("\n", "").rstrip()
if a is d:
print ("Matched :" +word)
I also tried to get the class of variables by typeOf(addedWord) and typeOf(word) Both are from 'str' class but are not equal. Is any wrong here?

There are two problems with your code.
1) Strings returned from iterating files include the trailing newline. As you suspected, you'll need to .strip(), .rstrip() or .replace() the newline away.
2) String comparison should be performed with ==, not is.
So, try this:
if addedWord == word.strip():
print ("Match")

Those two strings will never be the same object, so you should not use is to compare them. Use ==.
Your intuition to strip off the newlines was spot-on, but you just need a single call to strip() (it will strip all whitespace including tabs and newlines).

Can two conditions be given for python's string.split function? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Python: Split string with multiple delimiters
I have a program where I am parsing a file by each line and splitting it into two half. After that for each half I am parsing each word from the line and appending it in a list.
Here mfcList1 is a list of lines from a text file. I am parsing each word in the line that are either separated by a comma or by a space. But it isn't exactly working.
for lines in mfcList1:
lines = lines.lstrip()
if lines!='':
p.append(string.split(lines,','or " "))
mfcList2 = reduce(lambda x,y:x+y,p)
print mfcList2
When i am using string.split it is working with only those elements who end with a comma it is ignoring the or operator I am using with split method. I want to slice off each and everyword from the line. They either end with a comma or with a space.
for eg. 'enableEmergencySpare=1 useGlobalSparesForEmergency=1 useUnconfGoodForEmergency=1',
this line is being stored as a single list element where as I am trying to split them using split method..
Can anyone pls suggest what i can do instead of using or operator... thanks..

You can use split() from the re module:
import re
...
p.extend(re.split('[ ,]', lines))
The [ ,] is a regular expression which means "a space or a comma". Also, assuming p is a list and you want to add all the words to it, you should use extend() rather than append(), the latter adds a single element.
Note also that if a line in the file contains command followed by space (or other sequence of commas and spaces) your list p will contain a corresponding number of empty strings.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Iterating over full text lines instead of characters [duplicate] - python

You can iterate over a file object: for line in open('file'): for word in line.split(): do_stuff(word) See the docs for the details: http://docs.python.org/2/library/stdtypes.html#bltin-file-objects

If you are storing the words as a string, you can split the words by space using split function. >>> "python please work".split(' ') ['python', 'please', 'work']

Related

Joining results in Python [duplicate]

Looking for a way to correctly strip a string [duplicate]

Is there a string function similar to "split()" that works for strings without a repeated character? [duplicate]

Why string getting from file is not equal to common string? [duplicate]

Can two conditions be given for python's string.split function? [duplicate]

Categories

Resources