Why is replace not getting rid of my white space? - python

Im trying to use replace to get rid of white space but it is not working. What am i doing wrong?
import re
list = ('255 +1', '282 +5', '255 + 3', '5 - 2',)
for i in list:
# seperating the numbers in to a list
nums = re.split(r'[+,-]\s*', i)
#getting rid of white space in list
for num in nums:
num.replace(' ', '')
print(nums)
this is the output. in the first part of the lists it is not getting rid of it.
['255 ', '1']
['282 ', '5']
['255 ', '3']
['5 ', '2']

Strings are immutable objects in python, meaning their values cannot be changed. If you were to use the replace method you would have to set that to a new variable.
For example, if you wanted to use replace you would need to initialize a new String Variable and set it equal to num.replace(' ', ''), or alternatively use the strip() method with no parameters to remove trailing and leading white spaces.
If you wanted to reflect a new string without whitespaces...
myString = 'thisString ' #Notice the trailing whitespace
newString = myString.strip() #Removes whitespace and stores in newString
print(newString) #This would output the string with no white space

(1) we can use regex in the first step to just extract all numbers. this results in s list of 8 elements: ['255', '1', '282', '5', '255', '3', '5', '2']
(2) we create an index by using range() with step 2
(3) we use the index from previous step to create chunks (length 2) of pairs by using a slice and append them to a list
import re
regex = r"(\d+)"
test_str = ("'255 +1', '282 +5', '255 + 3', '5 - 2'\n")
matches = re.findall(regex, test_str) # (1)
res = []
for ind in range(0, len(matches), 2): # (2)
res.append(matches[ind:ind + 2]) # (3)
print(res)
output is [['255', '1'], ['282', '5'], ['255', '3'], ['5', '2']]

Related

Python ValueError: could not convert string to float

line = 'f 1// 2// 3// 4//'
vertices = []
line = line.split(" ")
toks = line[1:]
for vertex in toks:
l = vertex.split("/")
#print(l)
l = np.array[float(x) for x in l]).astype(_DT)
position = (l[0])
vertices.append(position)
print(vertices)
this is the output ['1', '2', '3', '3'] which is correct! but I am getting a "ValueError could not covert string to float: " at this line
l = np.array[float(x) for x in l]).astype(_DT)
but there are no leading or trailing whitespaces , not sure how to go about this error!!
when doing print(l) i get ['1', ''] ['2', ''] ['3', ''] ['4', '']
I tried using .strip() on line but that didn't do anything. Also tried replace(" ", "") didn't do anything either. I can't find where the actual problem is. How can I identify where there is white spaces? HOW do i remove the '' ?
You can skip the empty strings in your array comprehension with if. You can use the fact that empty strings evaluate as false, and any nonempty string will evaluate as true:
[float(x) for x in l if x]

Separate each item of a list in an specific way

I have an input, which is a tuple of strings, encoded in a1z26 cipher: numbers from 1 to 26 represent alphabet letters, hyphens represent same word letters and spaces represent an space between words.
For example:
8-9 20-8-5-18-5 should translate to 'hi there'
Let's say that the last example is a tuple in a var called string
string = ('8-9','20-8-5-18-5')
The first thing I find logical is convert the tuple into a list using
string = list(string)
so now
string = ['8-9','20-8-5-18-5']
The problem now is that when I iterate over the list to compare it with a dictionary which has the translated values, double digit numbers are treated as one, so instead of, for example, translating '20' it translate '2' and then '0', resulting in the string saying 'hi bheahe' (2 =b, 1 = a and 8 = h)
so I need a way to convert the list above to the following
list
['8','-','9',' ','20','-','8','-','5','-','18','-','5',]
I've already tried various codes using
list(),
join() and
split()
But it ends up giving me the same problem.
To sum up, I need to make any given list (converted from the input tuple) into a list of characters that takes into account double digit numbers, spaces and hyphens altogether
This is what I've got so far. (The last I wrote) The input is further up in the code (string)
a1z26 = {'1':'A', '2':'B', '3':'C', '4':'D', '5':'E', '6':'F', '7':'G', '8':'H', '9':'I', '10':'J', '11':'K', '12':'L', '13':'M', '14':'N', '15':'O', '16':'P', '17':'Q', '18':'R', '19':'S', '20':'T', '21':'U', '22':'V', '23':'W', '24':'X', '25':'Y', '26':'Z', '-':'', ' ' : ' ', ', ' : ' '}
translation = ""
code = list(string)
numbersarray1 = code
numbersarray2 = ', '.join(numbersarray1)
for char in numbersarray2:
if char in a1z26:
translation += a1z26[char]
There's no need to convert the tuple to a list. Tuples are iterable too.
I don't think the list you name is what you actually want. You probably want a 2d iterable (not necessarily a list, as you'll see below we can do this in one pass without generating an intermediary list), where each item corresponds to a word and is a list of the character numbers:
[[8, 9], [20, 8, 5, 18, 5]]
From this, you can convert each number to a letter, join the letters together to form the words, then join the words with spaces.
To do this, you need to pass a parameter to split, to tell it how to split your input string. You can achieve all of this with a one liner:
plaintext = ' '.join(''.join(num_to_letter[int(num)] for num in word.split('-'))
for word in ciphertext.split(' '))
This does exactly the splitting procedure as described above, and then for each number looks into the dict num_to_letter to do the conversion.
Note that you don't even need this dict. You can use the fact that A-Z in unicode is contiguous so to convert 1-26 to A-Z you can do chr(ord('A') + num - 1).
You don't really need hypens, am I right?
I suggest you the following approach:
a = '- -'.join(string).split('-')
Now a is ['8', '9', ' ', '20', '8', '5', '18', '5']
You can then convert each number to the proper character using your dictionary
b = ''.join([a1z26[i] for i in a])
Now b is equal to HI THERE
I think, it's better to apply regular expressions there.
Example:
import re
...
src = ('8-9', '20-8-5-18-5')
res = [match for tmp in src for match in re.findall(r"([0-9]+|[^0-9]+)", tmp + " ")][:-1]
print(res)
Result:
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
using regex here is solution
import re
string = '8-9 20-8-5-18-5'
exp=re.compile(r'[0-9]+|[^0-9]+')
data= exp.findall(string)
print(data)
output
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
if you want to get hi there from the input string , here is a method (i am assuming all character are in uppercase):
import re
string = '8-9 20-8-5-18-5'
exp=re.compile(r'[0-9]+|[^0-9]+')
data= exp.findall(string)
new_str =''
for i in range(len(data)):
if data[i].isdigit():
new_str+=chr(int(data[i])+64)
else:
new_str+=data[i]
result = new_str.replace('-','')
output:
HI THERE
You could also try this itertools solution:
from itertools import chain
from itertools import zip_longest
def separate_list(lst, delim, sep=" "):
result = []
for x in lst:
chars = x.split(delim) # 1
pairs = zip_longest(chars, [delim] * (len(chars) - 1), fillvalue=sep) # 2, 3
result.extend(list(chain.from_iterable(pairs))) # 4
return result[:-1] # 5
print(separate_list(["8-9", "20-8-5-18-5"], delim="-"))
Output:
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
Explanation of above code:
Split each string by delimiter '-'.
Create interspersing delimiters.
Create pairs of characters and separators with itertools.zip_longest.
Extend flattened pairs to result list with itertools.chain.from_iterable.
Remove trailing ' ' from result list added.
You could also create your own intersperse generator function and apply it twice:
from itertools import chain
def intersperse(iterable, delim):
it = iter(iterable)
yield next(it)
for x in it:
yield delim
yield x
def separate_list(lst, delim, sep=" "):
return list(
chain.from_iterable(
intersperse(
(intersperse(x.split(delim), delim=delim) for x in lst), delim=[sep]
)
)
)
print(separate_list(["8-9", "20-8-5-18-5"], delim="-"))
# ['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']

Looping through a list of strings in Python sometimes only sees last element of list

(ep is a list of strings with a len of 58.)
Calling list on a string is supposed to give a list of the characters. But why in this case did it only see the last string? (Note, unlike this question, each element is unique).
for s in ep:
ll = list(s)
ll
['3', '5', '0', ' ', 'U', '.', 'S', '.', ' ', '1', '0', '1', '0']`
While in this loop with print() it sees all the elements, but doesn't modify any:
for s in ep:
ll = list(s)
print(s)
350 U.S. 1
350 U.S. 3
Here's another for loop with print that works:
for e in ep:
print([e])
['350 U.S. 1']
['350 U.S. 3']
And another without it that only sees the last element:
for e in ep:
ep5 = [e.strip("'").replace(' ', ",")]
ep5
['350,U.S.,1010']
And finally:
for e in ep:
ep11 = e.split()
print(ep11)
['350', 'U.S.', '1']
['350', 'U.S.', '3']
...<snip>...
['350', 'U.S.', '1010']
BUT....
ep11
['350', 'U.S.', '1010']
Both a list comprehension and enumerate (omitted) see all the elements, but not the simple for loop, at least not without print(). I know print() adds a newline, but I can't see how that alone accounts for this behavior. Can anyone explain this to me? Python 3.5.2 on Ubuntu 16.04. Thanks.
You're overwriting the same ll variable over and over again. Only the last value written will remain.
If you expect the result to be a list of list of string variables, then create a new list and add to it on every iteration:
characterLists = []
for s in strings:
characterLists.append(list(s))
print(characterLists)
Better yet, just use a list comprehension:
characterLists = [list(s) for s in strings]
print(characterLists)

Python regular expression split string into numbers and text/symbols

I would like to split a string into sections of numbers and sections of text/symbols
my current code doesn't include negative numbers or decimals, and behaves weirdly, adding an empty list element on the end of the output
import re
mystring = 'AD%5(6ag 0.33--9.5'
newlist = re.split('([0-9]+)', mystring)
print (newlist)
current output:
['AD%', '5', '(', '6', 'ag ', '0', '.', '33', '--', '9', '.', '5', '']
desired output:
['AD%', '5', '(', '6', 'ag ', '0.33', '-', '-9.5']
Your issue is related to the fact that your regex captures one or more digits and adds them to the resulting list and digits are used as a delimiter, the parts before and after are considered. So if there are digits at the end, the split results in the empty string at the end to be added to the resulting list.
You may split with a regex that matches float or integer numbers with an optional minus sign and then remove empty values:
result = re.split(r'(-?\d*\.?\d+)', s)
result = filter(None, result)
To match negative/positive numbers with exponents, use
r'([+-]?\d*\.?\d+(?:[eE][-+]?\d+)?)'
The -?\d*\.?\d+ regex matches:
-? - an optional minus
\d* - 0+ digits
\.? - an optional literal dot
\d+ - one or more digits.
Unfortunately, re.split() does not offer an "ignore empty strings" option. However, to retrieve your numbers, you could easily use re.findall() with a different pattern:
import re
string = "AD%5(6ag0.33-9.5"
rx = re.compile(r'-?\d+(?:\.\d+)?')
numbers = rx.findall(string)
print(numbers)
# ['5', '6', '0.33', '-9.5']
As mentioned here before, there is no option to ignore the empty strings in re.split() but you can easily construct a new list the following way:
import re
mystring = "AD%5(6ag0.33--9.5"
newlist = [x for x in re.split('(-?\d+\.?\d*)', mystring) if x != '']
print newlist
output:
['AD%', '5', '(', '6', 'ag', '0.33', '-', '-9.5']

How do I remove hyphens from a nested list?

In the nested list:
x = [['0', '-', '3', '2'], ['-', '0', '-', '1', '3']]
how do I remove the hyphens?
x = x.replace("-", "")
gives me AttributeError: 'list' object has no attribute 'replace', and
print x.remove("-")
gives me ValueError: list.remove(x): x not in list.
x is a list of lists. replace() will substitute a pattern string for another within a string. What you want is to remove an item from a list. remove() will remove the first occurrence of an item. A simple approach:
for l in x:
while ("-" in l):
l.remove("-")
For more advanced solutions, see the following: Remove all occurrences of a value from a Python list

Categories

Resources