Python-replace tripple and 4 times double quotes with a double quote - python

I have String like this:
"""{\""College\""
Want to remove 3 and 2 times double quotes to only one time like this:
"{\"College\"
How can I do that , I tried replace() function but it removes all the quotes.Can someone help please I am new to Python
Tried replace("\"\"","\"") but not working

1
the following code repeatedly replaces 2 " with 1 ".
It does do log(n) loops and is not as efficient as a single loop would be, but it is readable and the algorithm is obvious.
mystring = '"""{\""College\""'
while '""' in mystring:
mystring = mystring.replace('""', '"')
print(mystring)
(timeit shows 430 ns per loop)
2
alternate more complicated + more efficient(for larger strings) oneliner:
print('"'.join((e for e in ('|'+mystring+'|').split('"') if e))[1:-1])
(timeit shows 945 ns per loop)
the '|' is used to pad the string to account for the edge case where there are " on the ends.
3
and finally a regex based solution (sometimes pretty slow in python)
import re
print(re.sub('"+', '"', mystring))
(timeit shows 1.28 µs per loop)

You could solve this with a simple regular expression (Docs).
For instance the expression: \"{2,3} would match any double or triple quotes.

Related

Add spaces at the beginning of the print output in python

I'm wondering how Am I suppose to add 4 spaces at the beginnings of the print outputs with f-string or format in python?
This is what I use to print now:
print('{:<10}'.format('Constant'),'{:{width}.{prec}f}'.format(122, width=10, prec=3))
and my output look like this:
Constant 122.000
but what I want is to have 4 spaces before the constant in the output like:
( 4 spaces here )Constant 122.000
Any ideas? Thanks a lot!
You can use ' ' + string (as suggested), but a more robust approach could be:
string="Test String leading space to be added"
spaces_to_add = 4
string_length=len(string) + spaces_to_add # will be adding 4 extra spaces
string_revised=string.rjust(string_length)
result:
' Test String leading space to be added'
There's a couple ways you could do it:
If Constant is really an unchanging constant, why not just
print(f" {Constant}", ...)
before your other string?
With your current implementation, you are left-aligning to a width of 10 characters. If you swap that to right-align, like '{:>12}'.format('Constant') ("Constant" is 8 characters, 12 - 8 = 4 spaces) It will put 4 characters in front of the string.
Here's a Python f-string syntax cheat sheet I've used before:
https://myshell.co.uk/blog/2018/11/python-f-string-formatting-cheatsheet/
And the official docs: PEP 3101

Removing parts of a string after certain chars in Python

New to Python.
I'd like to remove the substrings between the word AND and the comma character in the following string:
MyString = ' x.ABC AND XYZ, \ny.DEF AND Type, \nSome Long String AND Qwerty, \nz.GHI AND Tree \n'
The result should be:
MyString = ' x.ABC,\ny.DEF,\nSome Long String,\nz.GHI\n'
I'd like to do it without using regex.
I have tried various methods with splits and joins and indexes to no avail.
Any direction appreciated.
Thanks.
While Moses's answer is really good, I have a funny feeling this is a homework question and meant for you not to use any imports. Anyways here's an answer with no imports, it's not as efficient as other answers like Moses' or Regex but it works just not as well as others.
MyString = 'x.ABC AND XYZ, \ny.DEF AND Type, \nSome Long String AND Qwerty, \nz.GHI AND Tree \n'
new_string = ''
for each in [[y for y in x.split(' AND ')][0] for x in MyString.split('\n')]:
new_string+=each
new_string+='\n'
print(new_string)
You can split the string into lines, and further split the lines into words and use itertools.takewhile to drop all words after AND (itself included):
from itertools import takewhile
''.join(' '.join(takewhile(lambda x: x != 'AND', line.split())) + ',\n'
for line in MyString.splitlines())
Notice that the newline character and a comma are manually added after each line is reconstructed with str.join.
All the lines are then finally joined using str.join.
Now it is working.. and probably avoiding the 'append' keyword makes it really fast...
In [19]: ',\n'.join([x.split('AND')[0].strip() for x in MyString.split('\n')])
Out[19]: 'x.ABC,\ny.DEF,\nSome Long String,\nz.GHI,\n'
You can check this answer to understand why...
Comparing list comprehensions and explicit loops (3 array generators faster than 1 for loop)

Fastest way to split a concatenated string into a tuple and ignore empty strings

I have a concatenated string like this:
my_str = 'str1;str2;str3;'
and I would like to apply split function to it and then convert the resulted list to a tuple, and get rid of any empty string resulted from the split (notice the last ';' in the end)
So far, I am doing this:
tuple(filter(None, my_str.split(';')))
Is there any more efficient (in terms of speed and space) way to do it?
How about this?
tuple(my_str.split(';')[:-1])
('str1', 'str2', 'str3')
You split the string at the ; character, and pass all off the substrings (except the last one, the empty string) to tuple to create the result tuple.
That is a very reasonable way to do it. Some alternatives:
foo.strip(";").split(";") (if there won't be any empty slices inside the string)
[ x.strip() for x in foo.split(";") if x.strip() ] (to strip whitespace from each slice)
The "fastest" way to do this will depend on a lot of things… but you can easily experiment with ipython's %timeit:
In [1]: foo = "1;2;3;4;"
In [2]: %timeit foo.strip(";").split(";")
1000000 loops, best of 3: 1.03 us per loop
In [3]: %timeit filter(None, foo.split(';'))
1000000 loops, best of 3: 1.55 us per loop
If you only expect an empty string at the end, you can do:
a = 'str1;str2;str3;'
tuple(a.split(';')[:-1])
or
tuple(a[:-1].split(';'))
Try tuple(my_str.split(';')[:-1])
Yes, that is quite a Pythonic way to do it. If you have a love for generator expressions, you could also replace the filter() with:
tuple(part for part in my_str.split(';') if part)
This has the benefit of allowing further processing on each part in-line.
It's interesting to note that the documentation for str.split() says:
... If sep is not specified or is None, any whitespace string is a
separator and empty strings are removed from the result.
I wonder why this special case was done, without allowing it for other separators...
use split and then slicing:
my_str.split(';')[:-1]
or :
lis=[x for x in my_str.split(';') if x]
if number of items in your string is fixed, you could also de-structure inline like this:
(str1, str2, str3) = my_str.split(";")
more on that here:
https://blog.teclado.com/destructuring-in-python/
I know this is an old question, but I just came upon this and saw that the top answer (David) doesn't return a tuple like OP requested. Although the solution works for the one example OP gave, the highest voted answer (Levon) strips the trailing semicolon with a substring, which would error on an empty string.
The most robust and pythonic solution is voithos' answer:
tuple(part for part in my_str.split(';') if part)
Here's my solution:
tuple(my_str.strip(';').split(';'))
It returns this when run against an empty string though:
('',)
So I'll be replacing mine with voithos' answer. Thanks voithos!

Python String Concatenation - concatenating '\n'

I am new to Python and need help trying to understand two problems i am getting relating to concatenating strings. I am aware that strings can be added to concatenate each other using + symbol like so.
>>> 'a' + 'b'
'ab'
However, i just recently found out you do not even need to use the + symbol to concatenate strings (by accident/fiddling around), which leads to my first problem to understand - How/why is this possible!?
>>> print 'a' + 'b'
ab
Furthermore, I also understand that the '\n' string produces a 'newline'. But when used in conjunction with my first problem. I get the following.
>>> print '\n' 'a'*7
a
a
a
a
a
a
a
So my second problem arises - "Why do i get 7 new lines of the letter 'a'. In other words, shouldn't the repeater symbol, *, repeat the letter 'a' 7 times!? As follows.
>>> print 'a'*7
aaaaaaa
Please help me clarify what is going on.
When "a" "b" is turned into "ab", this ins't the same as concatenating the strings with +. When the Python source code is being read, adjacent strings are automatically joined for convenience.
This isn't a normal operation, which is why it isn't following the order of operations you expect for + and *.
print '\n' 'a'*7
is actually interpreted the same as
print '\na'*7
and not as
print '\n' + 'a'*7
Python concatenates strings together when you do not separate them with a comma:
>>> print 'a' 'b'
ab
>>> print 'a', 'b'
a b
So you are actually printing '\na' 7 times.
I'm not sure what you mean by "how is it possible". You write a rule: two strings next to each other get concatenated. Then you implement it in the parser. Why? Because it allows you do conveniently do things like this:
re.findall('(<?=(foo))' # The first part of a complicated regexp
'>asdas s' # The next part
'[^asd]' # The last part
)
That way, you can describe just what you're doing.
When you do A * B + C, the computer always does A times B first, then adds C, because multiplication comes before addition.
When you do string concatenation by putting the string literals next to each other, and multiplication, the special string concatenation comes first. This means '\n' 'a' * 7 is the same as ('\n' 'a') * 7, so the string you're repeating is '\na'.
You've probably already realised that relying on the implicit concatenation of adjacent strings is sometimes problematic. Also, concatenating with the + operator is not efficient. It's not noticeable if joining only a few small strings, but it is very noticeable at scale.
Be explicit about it; use ''.join()
print '\n'.join(['a'*7])

Split a string by a delimiter in python

How to split this string where __ is the delimiter
MATCHES__STRING
To get an output of ['MATCHES', 'STRING']?
For splitting specifically on whitespace, see How do I split a string into a list of words?.
To extract everything before the first delimiter, see Splitting on first occurrence.
To extract everything before the last delimiter, see partition string in python and get value of last segment after colon.
You can use the str.split method: string.split('__')
>>> "MATCHES__STRING".split("__")
['MATCHES', 'STRING']
You may be interested in the csv module, which is designed for comma-separated files but can be easily modified to use a custom delimiter.
import csv
csv.register_dialect( "myDialect", delimiter = "__", <other-options> )
lines = [ "MATCHES__STRING" ]
for row in csv.reader( lines ):
...
When you have two or more elements in the string (in the example below there are three), then you can use a comma to separate these items:
date, time, event_name = ev.get_text(separator='#').split("#")
After this line of code, the three variables will have values from three parts of the variable ev.
So, if the variable ev contains this string and we apply separator #:
Sa., 23. März#19:00#Klavier + Orchester: SPEZIAL
Then, after the split operation the variable
date will have value Sa., 23. März
time will have value 19:00
event_name will have value Klavier + Orchester: SPEZIAL
Besides split and rsplit, there is partition/rpartition. It separates string once, but the way question was asked, it may apply as well.
Example:
>>> "MATCHES__STRING".partition("__")
('MATCHES', '__', 'STRING')
>>> "MATCHES__STRING".partition("__")[::2]
('MATCHES', 'STRING')
And a bit faster then split("_",1):
$ python -m timeit "'validate_field_name'.split('_', 1)[-1]"
2000000 loops, best of 5: 136 nsec per loop
$ python -m timeit "'validate_field_name'.partition('_')[-1]"
2000000 loops, best of 5: 108 nsec per loop
Timeit lines are based on this answer
For Python 3.8, you actually don't need the get_text method, you can just go with ev.split("#"), as a matter of fact the get_text method is throwing an att. error.
So if you have a string variable, for example:
filename = 'file/foo/bar/fox'
You can just split that into different variables with comas as suggested in the above comment but with a correction:
W, X, Y, Z = filename.split('_')
W = 'file'
X = 'foo'
Y = 'bar'
Z = 'fox'

Categories

Resources