Ignoring or removing line breaks python [duplicate] - python

This question already has answers here:
Remove all newlines from inside a string
(8 answers)
Closed 1 year ago.
I'm sorry for the noobish question, but none of the answers I've looked at seem to fix this. I'd like to take a multi-line string like this:
myString = """a
b
c
d
e"""
And get a result that looks like or that is at least interpreted as this:
myString = "abcde"
myString.rstrip(), myString.rstrip(\n), and myString.rstrip(\r) don't seem to change anything when I print this little "abcde" test string. Some of the other solutions I've read involve entering the string like this:
myString = ("a"
"b"
"c")
But this solution is impractical because I'm working with very large sets of data. I need to be able to copy a dataset and paste it into my program, and have python remove or ignore the line breaks.
Am I entering something in wrong? Is there an elegant solution to this? Thanks in advance for your patience.

Use the replace method:
myString = myString.replace("\n", "")
For example:
>>> s = """
test
test
test
"""
>>> s.replace("\n", "")
'testtesttest'
>>> s
'\ntest\ntest\ntest\n' # warning! replace does not alter the original

>>> myString = """a
... b
... c
... d
... e"""
>>> ''.join(myString.splitlines())
'abcde'

Related

Using regex as search string for python's "in" keyword [duplicate]

This question already has answers here:
Regular expression to filter list of strings matching a pattern
(5 answers)
Closed 2 years ago.
Say I have a dictionary of sets of paths:
my_dict['some_key'] = {'abc/hi/you','xyz/hi/you','jkl/hi/you'}
I want to see if a path appears is in this set. If I have the whole path I simply would do the following:
str = 'abc/hi/you'
if str in my_dict['some_key']:
print(str)
But what if I don't know b is what comes in between a and c. What if it could be literally anything. If I was lsing in a shell I'd just put * and call it a day.
What I want to be able to do is have str be a regx:
regx = '^a.*c/hi/you$' #just assume this is the ideal regex. Doesn't really matter.
if regx in my_dict['some_key']:
print('abc/hi/you') #print the actual path, not the regx
What is a clean and fast way to implement something like this?
You need to loop through the set rather than a simple in call.
To avoid setting up the whole dictionary of sets for the example I have abstracted it as simply my_set.
import re
my_set = {'abc/hi/you','xyz/hi/you','jkl/hi/you'}
regx = re.compile('^a.*c/hi/you$')
for path in my_set:
if regx.match(path):
print(path)
I chose to compile instead of simply re.match() because the set could have 1 million plus elements in the actual implementation.
You can subclass the set class and implement the a in b operator
import re
from collections import defaultdict
class MySet(set):
def __contains__(self, regexStr):
regex = re.compile(regexStr)
for e in self:
if regex.match(e):
return True
return False
my_dict = defaultdict(MySet)
my_dict['some_key'].add('abc/hi/you')
regx = '^a.*c/hi/you$'
if regx in my_dict['some_key']:
print('abc/hi/you')

retrieve subset of string with regex - python [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
p = "\home\gef\Documents\abc_this_word_dfg.gz.tar"
I'm looking for a way to retrieve this_word.
base = os.path.basename(p)
base1 = base.replace("abc_","")
base1.replace("_dfg.gz.tar","")
this works, but it's not ideal because I would need to know in advance what strings I want to remove. Maybe a regex would be appropriate here?
You don't give much information, but from what is shown can't you just use string slicing?
Maybe like this:
>>> p = os.path.join('home', 'gef', 'Documents', 'abc_this_word_dfg.gz.tar')
>>> p
'home/gef/Documents/abc_this_word_dfg.gz.tar'
>>> os.path.dirname(p)
'home/gef/Documents'
>>> os.path.basename(p)
'abc_this_word_dfg.gz.tar'
>>> os.path.basename(p)[4:-11]
'this_word'
You don't give much information, but from what is shown can't you just split on _ chars?
Maybe like this:
>>> p = os.path.join('home', 'gef', 'Documents', 'abc_this_word_dfg.gz.tar')
>>> p
'home/gef/Documents/abc_this_word_dfg.gz.tar'
>>> os.path.dirname(p)
'home/gef/Documents'
>>> os.path.basename(p)
'abc_this_word_dfg.gz.tar'
>>> '_'.join(
... os.path.basename(p).split('_')[1:-1])
'this_word'
It splits by underscores, then discards the first and last part, finally joining the other parts together with underscore (if this_word had no underscores, then there will be only one part left and no joining will be done).

How to check if a string contains only characters from a given set in python [duplicate]

This question already has answers here:
In Python, how to check if a string only contains certain characters?
(9 answers)
Closed 7 months ago.
I have a a user-inputted polynomial and I only want to use it if it only has characters in the string 1234567890^-+x.
How can I check if it does or not without using external packages? I only want to use built-in Python 2.5 functions.
I am writing a program that runs on any Mac without needing external packages.
Here are some odd ;-) ways to do it:
good = set('1234567890^-+x')
if set(input_string) <= good:
# it's good
else:
# it's bad
or
if input_string.strip('1234567890^-+x'):
# it's bad!
else:
# it's good
Use a regular expression:
import re
if re.match('^[-0-9^+x]*$', text):
# Valid input
The re module comes with Python 2.5, and is your fastest option.
Demo:
>>> re.match('^[-0-9^+x]*$', '1x2^4-2')
<_sre.SRE_Match object at 0x10f0b6780>
You can convert the valid chars to a set, as sets offer faster lookup
Then you can use all function like this
valid_chars = set("1234567890^-+x") # Converting to a set
if all(char in valid_chars for char in input_string):
# Do stuff if input is valid
We can convert the input string also a set and check if all characters in the inputstring is in the valid list.
valid_chars = set("1234567890^-+x") # Converting to a set
if set(input_string).issubset(valid_chars):
# Do stuff if input is valid
What about just convert both the string into set and checking input_set is subset of good_set as below:
>>> good_set = set('1234567890^-+x')
>>> input_set1 = set('xajfb123')
>>> input_set2 = set('122-32+x')
>>> input_set1.issubset(good_set)
False
>>> input_set2.issubset(good_set)
True
>>>
Yet another way to do it, now using string.translate():
>>> import string
>>> all_chars = string.maketrans('', '')
>>> has_only = lambda s, valid_chars: not s.translate(all_chars, valid_chars)
>>> has_only("abc", "1234567890^-+x.")
False
>>> has_only("x^2", "1234567890^-+x.")
True
It is not the most readable way. It should be one of the fastest if you need it.
whitelist = '1234567890^-+x'
str = 'x^2+2x+1'
min([ch in whitelist for ch in str])
True
str='x**2 + 1'
min([ch in whitelist for ch in str])
False

String concatenation produces incorrect output in Python?

I have this code:
filenames=["file1","FILE2","file3","fiLe4"]
def alignfilenames():
#build a string that can be used to add labels to the R variables.
#format goal: suffixes=c(".fileA",".fileB")
filestring='suffixes=c(".'
for filename in filenames:
filestring=filestring+str(filename)+'",".'
print filestring[:-3]
#now delete the extra characters
filestring=filestring[-1:-4]
filestring=filestring+')'
print "New String"
print str(filestring)
alignfilenames()
I'm trying to get the string variable to look like this format: suffixes=c(".fileA",".fileB".....) but adding on the final parenthesis is not working. When I run this code as is, I get:
suffixes=c(".file1",".FILE2",".file3",".fiLe4"
New String
)
Any idea what's going on or how to fix it?
Does this do what you want?
>>> filenames=["file1","FILE2","file3","fiLe4"]
>>> c = "suffixes=c(%s)" % (",".join('".%s"' %f for f in filenames))
>>> c
'suffixes=c(".file1",".FILE2",".file3",".fiLe4")'
Using a string.join is a much better way to add a common delimiter to a list of items. It negates the need to have to check for being on the last item before adding the delimiter, or in your case attempting to strip off the last one added.
Also, you may want to look into List Comprehensions
It looks like you might be trying to use python to write an R script, which can be a quick solution if you don't know how to do it in R. But in this case the R-only solution is actually rather simple:
R> filenames= c("file1","FILE2","file3","fiLe4")
R> suffixes <- paste(".", tolower(filenames), sep="")
R> suffixes
[1] ".file1" ".file2" ".file3" ".file4"
R>
What's going on is that this slicing returns an empty string
filestring=filestring[-1:-4]
Because the end is before the begin. Try the following on the command line:
>>> a = "hello world"
>>> a[-1:-4]
''
The solution is to instead do
filestring=filestring[:-4]+filestring[-1:]
But I think what you actually wanted was to just drop the last three characters.
filestring=filestring[:-3]
The better solution is to use the join method of strings as sberry2A suggested

str.strip() strange behavior [duplicate]

This question already has answers here:
How do the .strip/.rstrip/.lstrip string methods work in Python?
(4 answers)
Closed 28 days ago.
>>> t1 = "abcd.org.gz"
>>> t1
'abcd.org.gz'
>>> t1.strip("g")
'abcd.org.gz'
>>> t1.strip("gz")
'abcd.org.'
>>> t1.strip(".gz")
'abcd.or'
Why is the 'g' of '.org' gone?
strip(".gz") removes any of the characters ., g and z from the beginning and end of the string.
x.strip(y) will remove all characters that appear in y from the beginning and end of x.
That means
'foo42'.strip('1234567890') == 'foo'
becuase '4' and '2' both appear in '1234567890'.
Use os.path.splitext if you want to remove the file extension.
>>> import os.path
>>> t1 = "abcd.org.gz"
>>> os.path.splitext(t1)
('abcd.org', '.gz')
In Python 3.9, there are two new string methods .removeprefix() and .removesuffix() to remove the beginning or end of a string, respectively. Thankfully this time, the method names make it aptly clear what these methods are supposed to perform.
>>> print (sys.version)
3.9.0
>>> t1 = "abcd.org.gz"
>>> t1.removesuffix('gz')
'abcd.org.'
>>> t1
'abcd.org.gz'
>>> t1.removesuffix('gz').removesuffix('.gz')
'abcd.org.' # No unexpected effect from last removesuffix call
The argument given to strip is a set of characters to be removed, not a substring. From the docs:
The chars argument is a string specifying the set of characters to be removed.
as far as I know strip removes from the beginning or end of a string only. If you want to remove from the whole string use replace.

Categories

Resources