Remove blank string value from a list of strings - python

I am reading string information as input from a text file and placing them into lists, and one of the lines is like this:
30121,long,Mehtab,10,20,,30
I want to remove the empty value in between the ,, portion from this list, but have had zero results. I've tried .remove() and filter(). Python reads it as a 'str' value.

>>> import re
>>> re.sub(',,+', ',', '30121,long,Mehtab,10,20,,30')
'30121,long,Mehtab,10,20,30'

Use split() and remove()
In [11]: s = '30121,long,Mehtab,10,20,,30'
In [14]: l = s.split(',')
In [15]: l.remove('')
In [16]: l
Out[16]: ['30121', 'long', 'Mehtab', '10', '20', '30']

Filter should work. First I am writing the data in a list and then using filter operation to filter out items in a list which which are empty. In other words, only taking items that are not empty.
data = list("30121","long","Mehtab",10,20,"",30)
filtered_data = list(filter(lambda str: str != '', data))
print(filtered_data)

You can split the string based on your separator ("," for this) and then use list comprehension to consolidate the elements after making sure they are not blank.
",".join([element for element in string.split(",") if element])
We can also use element.strip() as if condition if we want to filter out string with only spaces.

Related

how to split string and assign it to arey with adding some parts in python

I have some string which contains parts separated by commas and need to add some part to each and assign all to array of variables.
the string looks like
chp_algos = 'AES256_SSE','AES128_CBC','AES64_CBC','AES33_CBC'
I want to put in array which looks like:
arr = [
[AES128_CBC],
[AES128_CBC_fer],
[AES128_SSE],
[AES128_SSE_fer],
[AES64_CBC],
[AES64_CBC_fer],
[AES33_CBC],
[AES33_CBC_fer]
]
and I want to map the following final result to db
f = 'AES128_CBC_fer AES128_SSE_fer AES64_CBC_fer AES33_CBC_fer'
As written in the question, chp_algos is a tuple, not a string. So, it is already "split"
I'd recommend not using a list of lists. Just create a list of strings.
from itertools import chain
arr = list(chain.from_iterable([x, x + '_fer'] for x in chp_algos))
Output
['AES256_SSE',
'AES256_SSE_fer',
'AES128_CBC',
'AES128_CBC_fer',
'AES64_CBC',
'AES64_CBC_fer',
'AES33_CBC',
'AES33_CBC_fer']
With that, you can filter.
But you could also just skip arr and build a new list from concatenating to the original string values
f = ' '.join(x for x in arr if x.endswith('_fer'))
You can do this by sorting chp_algos then using f strings in a generator expression
>>> ' '.join(f'{i}_fer' for i in sorted(chp_algos))
'AES128_CBC_fer AES256_SSE_fer AES33_CBC_fer AES64_CBC_fer'
Could you clarify your string chp_algos? The way you wrote it now it is not compatible with python.
Anyway, what you can do in your case, assuming that chp_algos is a string of the form chp_algos= "'AES256_SSE','AES128_CBC','AES64_CBC','AES33_CBC'", then you can split the string into a list of strings via chp_algos.split(",").
The argument of split() is the delimiter which should be used to split the string.
Now you have something like array = ["'AES256_SSE'", "'AES128_CBC'", "'AES64_CBC'", "'AES33_CBC'"].
To get the array that you want you can just do a simple loop through your array:
arr = []
for element in array:
arr.append([element])
arr.append([element + str("_fer")])
Now you might have some issues with the quotes (depends on how your data looks like). But these you can just remove by looking at the relevant indices of element. To do this just replace element in the code above by element[1:-2]. This removes the first and the last element of the string.
To get the f string in the very end, you can just loop through arr[1::2] which returns every 2nd element of arr starting at the second one (index 1).
Say we have a string:
s = 'AES256_SSE,AES128_CBC,AES64_CBC,AES33_CBC'
In order to replace commas with a space and append a suffix to each part:
' '.join(f'{p}_fer' for p in s.split(','))
As for an array:
def g(s):
for s in s.split(','):
yield s
yield f'{s}_fer'
arr = [*g(s)]

How to remove characters from a string after a certain point within a list?

I have a list of strings within a list and I want to remove everything in each string after the tenth character.
EX:
['0.04112243,0.04112243,right,4.11%', '0.12733313,0.05733313,right,12.73%', '0.09203131,0.02203131,right,9.2%']
I want just the first ten integers from the list and everything else should be stripped from it.
Output
['0.04112243', '0.12733313', '0.09203131']
You can use a list comprehension:
original = ['0.04112243,0.04112243,right,4.11%', '0.12733313,0.05733313,right,12.73%', '0.09203131,0.02203131,right,9.2%']
new = [s[:10] for s in original]
Output:
['0.04112243', '0.12733313', '0.09203131']
You can also be a bit more flexible if you want to keep everything before the first comma:
new = [s.partition(',')[0] for s in original]
You can access string characters similar as an array.
Code:
example = ['0.04112243,0.04112243,right,4.11%', '0.12733313,0.05733313,right,12.73%', '0.09203131,0.02203131,right,9.2%']
for s in example:
print(s[:10])
Output:
0.04112243
0.12733313
0.09203131
list comprehension and string slicing:
dirty = ['0.04112243,0.04112243,right,4.11%', '0.12733313,0.05733313,right,12.73%', '0.09203131,0.02203131,right,9.2%']
clean = [num[:10] for num in dirty]
Split() creates a list of strings delimited by the specified character. With this in mind, I would split each string on the comma (,) char and then append the first element to a list.
lst = ['0.04112243,0.04112243,right,4.11%', '0.12733313,0.05733313,right,12.73%', '0.09203131,0.02203131,right,9.2%']
result = []
for i in lst:
result.append(i.split(",")[0])
#Test output
print(result)
This should return the values you need, in the format you want!
Hope this helps.

Python 3 split()

When I'm splitting a string "abac" I'm getting undesired results.
Example
print("abac".split("a"))
Why does it print:
['', 'b', 'c']
instead of
['b', 'c']
Can anyone explain this behavior and guide me on how to get my desired output?
Thanks in advance.
As #DeepSpace pointed out (referring to the docs)
If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']).
Therefore I'd suggest using a better delimiter such as a comma , or if this is the formatting you're stuck with then you could just use the builtin filter() function as suggested in this answer, this will remove any "empty" strings if passed None as the function.
sample = 'abac'
filtered_sample = filter(None, sample.split('a'))
print(filtered_sample)
#['b', 'c']
When you split a string in python you keep everything between your delimiters (even when it's an empty string!)
For example, if you had a list of letters separated by commas:
>>> "a,b,c,d".split(',')
['a','b','c','d']
If your list had some missing values you might leave the space in between the commas blank:
>>> "a,b,,d".split(',')
['a','b','','d']
The start and end of the string act as delimiters themselves, so if you have a leading or trailing delimiter you will also get this "empty string" sliced out of your main string:
>>> "a,b,c,d,,".split(',')
['a','b','c','d','','']
>>> ",a,b,c,d".split(',')
['','a','b','c','d']
If you want to get rid of any empty strings in your output, you can use the filter function.
If instead you just want to get rid of this behavior near the edges of your main string, you can strip the delimiters off first:
>>> ",,a,b,c,d".strip(',')
"a,b,c,d"
>>> ",,a,b,c,d".strip(',').split(',')
['a','b','c','d']
In your example, "a" is what's called a delimiter. It acts as a boundary between the characters before it and after it. So, when you call split, it gets the characters before "a" and after "a" and inserts it into the list. Since there's nothing in front of the first "a" in the string "abac", it returns an empty string and inserts it into the list.
split will return the characters between the delimiters you specify (or between an end of the string and a delimiter), even if there aren't any, in which case it will return an empty string. (See the documentation for more information.)
In this case, if you don't want any empty strings in the output, you can use filter to remove them:
list(filter(lambda s: len(s) > 0, "abac".split("a"))

How to replace multiple substrings in a list?

I need to turn the input_string into the comment below using a for loop. First I sliced it using the split() function, but now I need to somehow turn the input string into ['result1', 'result2', 'result3', 'result5']. I tried replacing the .xls and the dash for nothing (''), but the string output is unchanged. Please don't import anything, I'm trying to do this with functions and loops only.
input_string = "01-result.xls,2-result.xls,03-result.xls,05-result.xls"
# Must be turned into ['result1','result2', 'result3', 'result5']
splitted = input_string.split(',')
for c in ['.xls', '-', '0']:
if c in splitted:
splitted = splitted.replace(splitted, 'c', '')
When I type splitted, the output is ['01-result.xls', '2-result.xls', '03-result.xls', '05-result.xls'] therefore nothing is happening.
Use the re module's sub function and split.
>>> input_string = "01-result.xls,2-result.xls,03-result.xls,05-result.xls"
>>> import re
>>> re.sub(r'(\d+)-(\w+)\.xls',r'\2\1',input_string)
'result01,result2,result03,result05'
>>> re.sub(r'(\d+)-(\w+)\.xls',r'\2\1',input_string).split(',')
['result01', 'result2', 'result03', 'result05']
Using no imports, you can use a list comprehension
>>> [''.join(x.split('.')[0].split('-')[::-1]) for x in input_string.split(',')]
['result01', 'result2', 'result03', 'result05']
The algo here is, we loop through the string after splitting it on ,. Now we split the individual words on . and the first element of these on -. We now have the number and the words, which we can easily join.
Complete explanation of the list comp answer -
To understand what a list comprehension is, Read What does "list comprehension" mean? How does it work and how can I use it?
Coming to the answer,
Splitting the input list on ,, gives us the list of individual file names
>>> input_string.split(',')
['01-result.xls', '2-result.xls', '03-result.xls', '05-result.xls']
Now using the list comprehension construct, we can iterate through this,
>>> [i for i in input_string.split(',')]
['01-result.xls', '2-result.xls', '03-result.xls', '05-result.xls']
As we need only the file name and not the extension, we split by using . and take the first value.
>>> [i.split('.')[0] for i in input_string.split(',')]
['01-result', '2-result', '03-result', '05-result']
Now again, what we need is the number and the name as two parts. So we again split by -
>>> [i.split('.')[0].split('-') for i in input_string.split(',')]
[['01', 'result'], ['2', 'result'], ['03', 'result'], ['05', 'result']]
Now we have the [number, name] in a list, However the format that we need is "namenumber". Hence we have two options
Concat them like i.split('.')[0].split('-')[1]+i.split('.')[0].split('-')[0]. This is an unnecessarily long way
Reverse them and join. We can use slices to reverse a list (See How can I reverse a list in python?) and str.join to join like ''.join(x.split('.')[0].split('-')[::-1]).
So we get our final list comprehension
>>> [''.join(x.split('.')[0].split('-')[::-1]) for x in input_string.split(',')]
['result01', 'result2', 'result03', 'result05']
Here's a solution using list comprehension and string manipulation if you don't want to use re.
input_string = "01-result.xls,2-result.xls,03-result.xls,05-result.xls"
# Must be turned into ['result1','result2', 'result3', 'result5']
splitted = input_string.split(',')
#Remove extension, then split by hyphen, switch the two values,
#and combine them into the result string
print ["".join(i.split(".")[0].split("-")[::-1]) for i in splitted]
#Output
#['result01', 'result2', 'result03', 'result05']
The way this list comprehension works is:
Take the list of results and remove the ".xls". i.split(".)[0]
Split on the - and switch positions of the number and "result". .split("-")[::-1]
For every item in the list, join the list into a string. "".join()

Protect commas on consecutive string.join() and string.split()

Suppose the following code (notice the commas inside the strings):
>>> a = ['1',",2","3,"]
I need to concatenate the values into a single string. Naive example:
>>> b = ",".join(a)
>>> b
'1,,2,3,'
And later I need to split the resulting object again:
>>> b.split(',')
['1', '', '2', '3', '']
However, the result I am looking for is the original list:
['1', ',2', '3,']
What's the simplest way to protect the commas in this process? The best solution I came up with looks rather ugly.
Note: the comma is just an example. The strings can contain any character. And I can choose other characters as separators.
The strings can contain any character.
If no matter what you use as a delimiter, there is a chance that the item itself contains the delimiter character, then use the csv module:
import csv
class PseudoFile(object):
# http://stackoverflow.com/a/8712426/190597
def write(self, string):
return string
writer = csv.writer(PseudoFile())
This concatenates the items in a using commas:
a = ['1',",2","3,"]
line = writer.writerow(a)
print(line)
# 1,",2","3,"
This recovers a from line:
print(next(csv.reader([line])))
# ['1', ',2', '3,']
Do you have to use comas to separate the items? Else you could also use another symbol that is not used in the items of the list.
In [1]: '|'.join(['1', ',2', '3,']).split('|')
Out[1]: ['1', ',2', '3,']
Edit: The string may apparently contain any character. Is it an option to use the json module? You could just dump and load the list.
In [3]: json.dumps(['1', ',2', '3,'])
Out[3]: '["1", ",2", "3,"]'
In [4]: json.loads('["1", ",2", "3,"]')
Out[4]: [u'1', u',2', u'3,']
Edit #2: If you may not use it, you could use str.encode('string-encode') to escape the characters in your string and then enclose the encoded version into single quotes and separate those with comas:
In [10]: print "'example'".encode('string-escape')
\'example\' #' (have to close the opened string for stackoverflow
In [11]: print r"\'example\'".decode('string-escape')
'example'
Edit #3: Running example of str.encode('string-encode'):
import re
def list_to_str(list):
return ','.join("'{}'".format(s.encode('string-escape')) for s in list)
def str_to_list(str):
return re.findall(r"'([^']*)'", str)
if __name__ == '__main__':
a = ['1', ',2', '3,']
b = list_to_str(a)
print 'It is {} that this works.'.format(str_to_list(b) == a)
When you are serializing a list to a String, then you need to choose as a separator a character that doesn't appear in the list items. Can't you just replace the comma with another character?
b = ";".join(a)
b.split(';')
Does the delimiter need to be only a single character? If not then you can use a delimiter made up of a sequence of characters that definitley wont appear in your string, like |#| or something similar.
You need to escape the comma and probably also escape the escape sequence. Here's one way:
>>> a = ['1',",2","3,"]
>>> b = ','.join(s.replace('%', '%%').replace(',', '%2c') for s in a)
>>> [s.replace('%2c', ',').replace('%%', '%') for s in b.split(',')]
['1', ',2', '3,']
>>> b
'1,%2c2,3%2c'
>>>
I would join and split using another character than ",", e.g. ";":
>>> b = ";".join(a)
>>> b.split(';')
['1', ',2', '3,']

Categories

Resources