Python, breaking up Strings

Python, breaking up Strings - python

I need to make a program in which the user inputs a word and I need to do something to each individual letter in that word. They cannot enter it one letter at a time just one word.
I.E. someone enters "test" how can I make my program know that it is a four letter word and how to break it up, like make my program make four variables each variable set to a different letter. It should also be able to work with bigger and smaller words.
Could I use a for statement? Something like For letter ste that letter to a variable, but what is it was like a 20 character letter how would the program get all the variable names and such?

Do you mean something like this?
>>> s = 'four'
>>> l = list(s)
>>> l
['f', 'o', 'u', 'r']
>>>
Addendum:
Even though that's (apparently) what you think you wanted, it's probably not necessary because it's possible for a string to hold virtually any size of a word -- so a single string variable likesabove should be good enough for your program verses trying to create a bunch of separately named variables for each character. For one thing, it would be difficult to write the rest of the program because you wouldn't to know what valid variable names to use.
The reason it's OK not to have separate variable for each character is because a single string can have any number of characters in it as well as be empty. Python's built-inlen()function will return a count of the number of letters in a string if applied to one, so the result oflen(s)in the above would be4.
Any character in a string can be randomly accessed by indexing it with an integer between0andlen(s)-1inside of square brackets, so to reference the third character you would uses[2]. It's useful to think of the index as the offset or the character from the beginning of the string.
Even so, in Python using indexing is often not needed because you can also iteratively process each character in a string in aforloop without using them as shown in this simple example:
num_vowels = 0
for ch in s:
if ch in 'aeiou':
num_vowels += 1
print 'there are', num_vowels, 'vowel(s) in the string', s
Python also has many other facilities and built-ins that further help when processing strings (and in fact could simplify the above example), which you'll eventually learn as you become more familiar with the language and its many libraries.

When you iterate a string, it returns the individual characters like
for c in thestring:
print(c)
You can use this to put the letters into a list if you really need to, which will retain its order but list(string) is a better choice for that (be aware that unordered types like dict or set do not guarantee any order).

You don't have to do any of those; In Python, you can access characters of a string using square brackets:
>>> word = "word"
>>> print(word[0])
w
>>> print(word[3])
d
>>> print(len(word))
4

You don't want to assign each letter to a separate variable. Then you'd be writing the rest of your program without even being able to know how many variables you have defined! That's an even worse problem than dealing with the whole string at once.
What you instead want to do is have just one variable holding the string, but you can refer to individual characters in it with indexing. Say the string is in s, then s[0] is the first character, s[1] is the second character, etc. And you can find out how far up the numbers go by checking len(s) - 1 (because the indexes start at 0, a length 1 string has maximum index 0, a length 2 string has maximum index 1, etc).
That's much more manageable than figuring out how to generate len(s) variable names, assign them all to a piece of the string, and then know which variables you need to reference.
Strings are immutable though, so you can't assign to s[1] to change the 2nd character. If you need to do that you can instead create a list with e.g. l = list(s). Then l[1] is the second character, and you can assign l[1] = something to change the element in the list. Then when you're done you can get a new string out with s_new = ''.join(l) (join builds a string by joining together a sequence of strings passed as its argument, using the string it was invoked on to the left as a separator between each of the elements in the sequence; in this case we're joining a list of single-character strings using the empty string as a separator, so we just get all the single-character strings joined into a single string).

x = 'test'
counter = 0
while counter < len(x):
print x[counter] # you can change this to do whatever you want to with x[counter]
counter += 1

Related

Why does it recognize the second capital T as 0?

I'm trying to make a short program that will find all the capital letters in a single string. I got it to work for the first two capital letters but it won't return the correct position of the last capital letter. What did I do wrong?
def capital_indexes(n):
listOfUpperPlaces = []
for x in n:
print(x)
if x.isupper():
characterPlace = n.index(x)
print(characterPlace)
listOfUpperPlaces.append(characterPlace)
return listOfUpperPlaces
print(capital_indexes("TEsTo"))

That is because n.index(x) returns the first occurrence of x in the string n. Because "T" occurs multiple times, n.index(x) returns the first occurrence of "T"
You want to iterate through range(len(n), like
def capital_indexes(n):
listOfUpperPlaces = []
for x in range(len(n)):
print(n[x])
if n[x].isupper():
print(x)
listOfUpperPlaces.append(x)
return listOfUpperPlaces
print(capital_indexes("TEsTo"))

The issue is the call to n.index(x)
This is searching the string to find x, and its able to find a capital T right at the beginning of the string.
A better way to do this would be to use enumerate, which gives you both the index and the item at the same time.
Can't code very well from a phone, but something like:
for index, character in enumerate(n):
if character.isUpper():
list_of_upper_places.append(index)
This will handle duplicates correctly, and will also be faster, since you don't need to search through the string just to count which character you are currently checking. It will be easier to read for most python programmers too.

string matching with interchangeable characters

I am trying to do a simple string matching between two strings, a small string to a bigger string. The only catch is that I want to equate two characters in the small string to be the same. In particular if there is a character 'I' or a character 'L' in the smaller string, then I want it to be considered interchangeably.
For example let's say my small string is
s = 'AKIIMP'
and then the bigger string is:
b = 'MPKGEXAKILMP'
I want to write a function that will take the two strings and checks if the smaller one is in the big one. In this particular example even though the smaller string s is not a substring in b because there is no exact match, however in my case it should match with it because like I mentioned characters 'I' and 'L' would be used interchangeably and therefore the result should find a match.
Any idea of how I could proceed with this?

s.replace('I', 'L') in b.replace('I', 'L')
will evaluate to True in your example.

You could do it with regular expressions:
import re
s = 'AKIIMP'
b = 'MPKGEXAKILMP'
p = re.sub('[IL]', '[IL]', s)
if re.search(p, b):
print(f'{s!r} is in {b!r}')
else:
print('Not found')
This is not as elegant as #Deepstop's answer, but it provides a bit more flexibility in terms of what characters you equate.

Python re.sub() is not replacing every match

I'm using Python 3 and I have two strings: abbcabb and abca. I want to remove every double occurrence of a single character. For example:
abbcabb should give c and abca should give bc.
I've tried the following regex (here):
(.)(.*?)\1
But, it gives wrong output for first string. Also, when I tried another one (here):
(.)(.*?)*?\1
But, this one again gives wrong output. What's going wrong here?
The python code is a print statement:
print(re.sub(r'(.)(.*?)\1', '\g<2>', s)) # s is the string

It can be solved without regular expression, like below
>>>''.join([i for i in s1 if s1.count(i) == 1])
'bc'
>>>''.join([i for i in s if s.count(i) == 1])
'c'

re.sub() doesn't perform overlapping replacements. After it replaces the first match, it starts looking after the end of the match. So when you perform the replacement on
abbcabb
it first replaces abbca with bbc. Then it replaces bb with an empty string. It doesn't go back and look for another match in bbc.
If you want that, you need to write your own loop.
while True:
newS = re.sub(r'(.)(.*?)\1', r'\g<2>', s)
if newS == s:
break
s = newS
print(newS)
DEMO

Regular expressions doesn't seem to be the ideal solution
they don't handle overlapping so it it needs a loop (like in this answer) and it creates strings over and over (performance suffers)
they're overkill here, we just need to count the characters
I like this answer, but using count repeatedly in a list comprehension loops over all elements each time.
It can be solved without regular expression and without O(n**2) complexity, only O(n) using collections.Counter
first count the characters of the string very easily & quickly
then filter the string testing if the count matches using the counter we just created.
like this:
import collections
s = "abbcabb"
cnt = collections.Counter(s)
s = "".join([c for c in s if cnt[c]==1])
(as a bonus, you can change the count to keep characters which have 2, 3, whatever occurrences)

EDIT: based on the comment exchange - if you're just concerned with the parity of the letter counts, then you don't want regex and instead want an approach like #jon's recommendation. (If you don't care about order, then a more performant approach with very long strings might use something like collections.Counter instead.)
My best guess as to what you're trying to match is: "one or more characters - call this subpattern A - followed by a different set of one or more characters - call this subpattern B - followed by subpattern A again".
You can use + as a shortcut for "one or more" (instead of specifying it once and then using * for the rest of the matches), but either way you need to get the subpatterns right. Let's try:
>>> import re
>>> pattern = re.compile(r'(.+?)(.+?)\1')
>>> pattern.sub('\g<2>', 'abbcabbabca')
'bbcbaca'
Hmm. That didn't work. Why? Because with the first pattern not being greedy, our "subpattern A" can just match the first a in the string - it does appear later, after all. So if we use a greedy match, Python will backtrack until it finds as long of a pattern for subpattern A that still allows for the A-B-A pattern to appear:
>>> pattern = re.compile(r'(.+)(.+?)\1')
>>> pattern.sub('\g<2>', 'abbcabbabca')
'cbc'
Looks good to me.

The site explains it well, hover and use the explanation section.
(.)(.*?)\1 Does not remove or match every double occurance. It matches 1 character, followed by anything in the middle sandwiched till that same character is encountered again.
so, for abbcabb the "sandwiched" portion should be bbc between two a
EDIT:
You can try something like this instead without regexes:
string = "abbcabb"
result = []
for i in string:
if i not in result:
result.append(i)
else:
result.remove(i)
print(''.join(result))
Note that this produces the "last" odd occurrence of a string and not first.
For "first" known occurance, you should use a counter as suggested in this answer . Just change the condition to check for odd counts. pseudo code(count[letter] %2 == 1)

How can I delete comma at the end of the output in Python?

I am trying to order a word's letters by alphabetically in Python. But there is a comma at the end of the output.(I tried ''.sort() command, it worked well but there is square brackets at the beginning and at the end of the output). The input and the output must be like this:
word
'd','o','r','w'
This is my code:
alphabet='AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
word=str(input())
for i in alphabet:
for j in word:
if i==j:
print("'{}',".format(i),end='')
And this is my output:
word
'd','o','r','w',

Python strings have a join() function:
ls = ['a','b','c']
print(",".join(ls)) # prints "a,b,c"
Python also has what is called a 'list comprehension', that you can use like so:
alphabet='AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
word=str(input())
matches = [l for l in word if l in alphabet]
print(",".join(sorted(matches)))
All the list comprehension does is put l in the list if it is in alphabet. All the candidate ls are taken from the word variable.
sorted is a function that will do a simple sort (though more complex sorts are possible).
Finally; here are a few other fun options that all result in "a,b,c,d":
"a,b,c,d,"[:-1] . # list-slice
"a,b,c,d,".strip(",") . # String strip

you store it in an array and then print it at the end
alphabet='AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
word=str(input())
matches = []
for i in alphabet:
for j in word:
if i==j:
matches.append("'{i}',".format(i=i))
#now that matches has all our matches
print(",".join(arrayX) # join it
or as others have mentioned
print(",".join(sorted(word)))

You want to use the string.join() function.
alphabet='AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
','.join(alphabet)
There's really no need to anything to make the string into a list, join will iterate over it quite happily. Tried on python 2.7 and 3.6

Doing it your self
The trick is in the algorithm you use.
You want to add a comma and a space, after each field, except the last. But it is hard to know which is the last, until it is too late.
It would be much easier if you could make the first field the special case, as this is mach easier to predict.
Therefore transform the algorithm to: Add a comma and a space, before each field, except the first. This produces the same output, but is a much simpler algorithm.
Use a library
Using a library is always preferable (unless doing it just for the practice).
python has the join method. See other answers.

Find Certain String Indices

I have this string and I need to get a specific number out of it.
E.G. encrypted = "10134585588147, 3847183463814, 18517461398"
How would I pull out only the second integer out of the string?

You are looking for the "split" method. Turn a string into a list by specifying a smaller part of the string on which to split.
>>> encrypted = '10134585588147, 3847183463814, 18517461398'
>>> encrypted_list = encrypted.split(', ')
>>> encrypted_list
['10134585588147', '3847183463814', '18517461398']
>>> encrypted_list[1]
'3847183463814'
>>> encrypted_list[-1]
'18517461398'
Then you can just access the indices as normal. Note that lists can be indexed forwards or backwards. By providing a negative index, we count from the right rather than the left, selecting the last index (without any idea how big the list is). Note this will produce IndexError if the list is empty, though. If you use Jon's method (below), there will always be at least one index in the list unless the string you start with is itself empty.
Edited to add:
What Jon is pointing out in the comment is that if you are not sure if the string will be well-formatted (e.g., always separated by exactly one comma followed by exactly one space), then you can replace all the commas with spaces (encrypt.replace(',', ' ')), then call split without arguments, which will split on any number of whitespace characters. As usual, you can chain these together:
encrypted.replace(',', ' ').split()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python, breaking up Strings - python

You don't have to do any of those; In Python, you can access characters of a string using square brackets: >>> word = "word" >>> print(word[0]) w >>> print(word[3]) d >>> print(len(word)) 4

x = 'test' counter = 0 while counter < len(x): print x[counter] # you can change this to do whatever you want to with x[counter] counter += 1

Related

Why does it recognize the second capital T as 0?

string matching with interchangeable characters

Python re.sub() is not replacing every match

How can I delete comma at the end of the output in Python?

Find Certain String Indices

Categories

Resources