How to print double quotes around a variable? - python

For instance, we have:
word = 'Some Random Word'
print '"' + word + '"'
Is there a better way to print double quotes around a variable?

Update :
From Python 3.6, you can use f-strings
>>> print(f'"{word}"')
"Some Random Word"
Original Answer :
You can try %-formatting
>>> print('"%s"' % word)
"Some Random Word"
OR str.format
>>> print('"{}"'.format(word))
"Some Random Word"
OR escape the quote character with \
>>> print("\"%s\"" % word)
"Some Random Word"
And, if the double-quotes is not a restriction (i.e. single-quotes would do)
>>> from pprint import pprint, pformat
>>> print(pformat(word))
'Some Random Word'
>>> pprint(word)
'Some Random Word'
OR like others have already said (include it in your declaration)
>>> word = '"Some Random Word"'
>>> print(word)
"Some Random Word"
Use whichever you feel to be better or less confusing.
And, if you need to do it for multiple words, you might as well create a function
def double_quote(word):
return '"%s"' % word
print(double_quote(word), double_quote(word2))
And (if you know what you're doing &) if you're concerned about performance of these, see this comparison.

How about json.dumps:
>>> import json
>>> print(json.dumps("hello world"))
"hello world"
The advantage over other approaches mentioned here is that it escapes quotes inside the string as well (take that str.format!), always uses double quotes and is actually intended for reliable serialization (take that repr()!):
>>> print(json.dumps('hello "world"!'))
"hello \"world\"!"

You can try repr
Code:
word = "This is a random text"
print repr(word)
Output:
'This is a random text'

It seems silly, but works fine to me. It's easy to read.
word = "Some Random Word"
quotes = '"'
print quotes + word + quotes

word = '"Some Random Word"' # <-- did you try this?

Using format method or f-string with repr(), you can write it more elegant.
a = "foo"
print("{!r}".format(a))
b = "bar"
print(f"{b!r}")

Use escape sequence
Example:
int x = 10;
System.out.println("\"" + x + "\"");
O/P
"10"

Related

Find first word in string Python

I have to write a single function that should return the first word in the following strings:
("Hello world") -> return "Hello"
(" a word ") -> return "a"
("don't touch it") -> return "don't"
("greetings, friends") -> return "greetings"
("... and so on ...") -> return "and"
("hi") -> return "hi"
All have to return the first word and as you can see some start with a whitespace, have apostrophes or end with commas.
I've used the following options:
return text.split()[0]
return re.split(r'\w*, text)[0]
Both error at some of the strings, so who can help me???
Try the below code. I tested with all your inputs and it works fine.
import re
text=["Hello world"," a word ","don't touch it","greetings, friends","... and so on ...","hi"]
for i in text:
rgx = re.compile("(\w[\w']*\w|\w)")
out=rgx.findall(i)
print out[0]
Output:
Hello
a
don't
greetings
and
hi
It is tricky to distinguish apostrophes which are supposed to be part of a word and single quotes which are punctuation for the syntax. But since your input examples do not show single quotes, I can go with this:
re.match(r'\W*(\w[^,. !?"]*)', text).groups()[0]
For all your examples, this works. It won't work for atypical stuff like "'tis all in vain!", though. It assumes that words end on commas, dots, spaces, bangs, question marks, and double quotes. This list can be extended on demand (in the brackets).
try this one:
>>> def pm(s):
... p = r"[a-zA-Z][\w']*"
... m = re.search(p,s)
... print m.group(0)
...
test result:
>>> pm("don't touch it")
don't
>>> pm("Hello w")
Hello
>>> pm("greatings, friends")
greatings
>>> pm("... and so on...")
and
>>> pm("hi")
hi
A non-regex solution: stripping off leading punctation/whitespace characters, splitting the string to get the first word, then removing trailing punctuation/whitespace:
from string import punctuation, whitespace
def first_word(s):
to_strip = punctuation + whitespace
return s.lstrip(to_strip).split(' ', 1)[0].rstrip(to_strip)
tests = [
"Hello world",
"a word",
"don't touch it",
"greetings, friends",
"... and so on ...",
"hi"]
for test in tests:
print('#{}#'.format(first_word(test)))
Outputs:
#Hello#
#a#
#don't#
#greetings#
#and#
#hi#
You can try something like this:
import re
pattern=r"[a-zA-Z']+"
def first_word(words_tuple):
match=re.findall(pattern,words_tuple)
for i in match:
if i[0].isalnum():
return i
print(first_word(("don't touch it")))
output:
don't
I've done this by using the first occurrence of white space to stop the "getting" of the first word. Something like this:
stringVariable = whatever sentence
firstWord = ""
stringVariableLength = len(stringVariable)
for i in range(0, stringVariableLength):
if stringVariable[i] != " ":
firstWord = firstWord + stringVariable[i]
else:
break
This code will parse through the string variable that you want to get the first word of, and add it into a new variable called firstWord, until it gets to the first occurance of white space. I'm not exactly sure how you would put that into a function as I'm pretty new to this whole thing, but I'm sure it could be done!

Get string between 2 other strings - Python 2.7.8

So I have a huge string, where some strings occur a lot. I need the text in between.
"I don't need this""This is what I need""I also don't need this."
This happens many times, and I'd like all the strings I need in a list.
There's also a lot of special characters, but no ' so I can use them for strings.
I have tried with the re library, but I can't get it to work.
I tried splitting too
listy = hugestring.split('delim1')
for element in listy:
element = element.split('delim2')
But the second splitting doesn't work.
You could use a regex like this
>>> import re
>>> your_str = "foo This is what I need bar foo This is what I need too bar"
>>> left_delim = "foo "
>>> right_delim = " bar"
>>> pattern = "(?<={})[ \w]*?(?={})".format(left_delim,right_delim)
>>> re.findall(pattern,your_str)
['This is what I need', 'This is what I need too']
This will give you a list of all strings within quotes contained in a string:
import re
in_str = "I don't need this\"This is what I need\"I also don't need this."
out_str = re.findall(r'\"(.+?)\"', in_str)
print out_str
So in the above example, print out_str[0] will give you what you need as there's only the one quote in there.
this is the result of what you say in comment , so whats problem now ?:
>>> n= s.split("I don't need this")
['', "This is what I needI also don't need this."]
>>> [i.split("I also don't need this") for i in n]
[[''], ['This is what I need', '.']]

How do I replace punctuation in a string in Python?

I would like to replace (and not remove) all punctuation characters by " " in a string in Python.
Is there something efficient of the following flavour?
text = text.translate(string.maketrans("",""), string.punctuation)
This answer is for Python 2 and will only work for ASCII strings:
The string module contains two things that will help you: a list of punctuation characters and the "maketrans" function. Here is how you can use them:
import string
replace_punctuation = string.maketrans(string.punctuation, ' '*len(string.punctuation))
text = text.translate(replace_punctuation)
Modified solution from Best way to strip punctuation from a string in Python
import string
import re
regex = re.compile('[%s]' % re.escape(string.punctuation))
out = regex.sub(' ', "This is, fortunately. A Test! string")
# out = 'This is fortunately A Test string'
This workaround works in python 3:
import string
ex_str = 'SFDF-OIU .df !hello.dfasf sad - - d-f - sd'
#because len(string.punctuation) = 32
table = str.maketrans(string.punctuation,' '*32)
res = ex_str.translate(table)
# res = 'SFDF OIU df hello dfasf sad d f sd'
There is a more robust solution which relies on a regex exclusion rather than inclusion through an extensive list of punctuation characters.
import re
print(re.sub('[^\w\s]', '', 'This is, fortunately. A Test! string'))
#Output - 'This is fortunately A Test string'
The regex catches anything which is not an alpha-numeric or whitespace character
Replace by ''?.
What's the difference between translating all ; into '' and remove all ;?
Here is to remove all ;:
s = 'dsda;;dsd;sad'
table = string.maketrans('','')
string.translate(s, table, ';')
And you can do your replacement with translate.
In my specific way, I removed "+" and "&" from the punctuation list:
all_punctuations = string.punctuation
selected_punctuations = re.sub(r'(\&|\+)', "", all_punctuations)
print selected_punctuations
str = "he+llo* ithis& place% if you * here ##"
punctuation_regex = re.compile('[%s]' % re.escape(selected_punctuations))
punc_free = punctuation_regex.sub("", str)
print punc_free
Result: he+llo ithis& place if you here

Remove all special characters, punctuation and spaces from string

I need to remove all special characters, punctuation and spaces from a string so that I only have letters and numbers.
This can be done without regex:
>>> string = "Special $#! characters spaces 888323"
>>> ''.join(e for e in string if e.isalnum())
'Specialcharactersspaces888323'
You can use str.isalnum:
S.isalnum() -> bool
Return True if all characters in S are alphanumeric
and there is at least one character in S, False otherwise.
If you insist on using regex, other solutions will do fine. However note that if it can be done without using a regular expression, that's the best way to go about it.
Here is a regex to match a string of characters that are not a letters or numbers:
[^A-Za-z0-9]+
Here is the Python command to do a regex substitution:
re.sub('[^A-Za-z0-9]+', '', mystring)
Shorter way :
import re
cleanString = re.sub('\W+','', string )
If you want spaces between words and numbers substitute '' with ' '
TLDR
I timed the provided answers.
import re
re.sub('\W+','', string)
is typically 3x faster than the next fastest provided top answer.
Caution should be taken when using this option. Some special characters (e.g. ø) may not be striped using this method.
After seeing this, I was interested in expanding on the provided answers by finding out which executes in the least amount of time, so I went through and checked some of the proposed answers with timeit against two of the example strings:
string1 = 'Special $#! characters spaces 888323'
string2 = 'how much for the maple syrup? $20.99? That s ridiculous!!!'
Example 1
'.join(e for e in string if e.isalnum())
string1 - Result: 10.7061979771
string2 - Result: 7.78372597694
Example 2
import re
re.sub('[^A-Za-z0-9]+', '', string)
string1 - Result: 7.10785102844
string2 - Result: 4.12814903259
Example 3
import re
re.sub('\W+','', string)
string1 - Result: 3.11899876595
string2 - Result: 2.78014397621
The above results are a product of the lowest returned result from an average of: repeat(3, 2000000)
Example 3 can be 3x faster than Example 1.
Python 2.*
I think just filter(str.isalnum, string) works
In [20]: filter(str.isalnum, 'string with special chars like !,#$% etcs.')
Out[20]: 'stringwithspecialcharslikeetcs'
Python 3.*
In Python3, filter( ) function would return an itertable object (instead of string unlike in above). One has to join back to get a string from itertable:
''.join(filter(str.isalnum, string))
or to pass list in join use (not sure but can be fast a bit)
''.join([*filter(str.isalnum, string)])
note: unpacking in [*args] valid from Python >= 3.5
#!/usr/bin/python
import re
strs = "how much for the maple syrup? $20.99? That's ricidulous!!!"
print strs
nstr = re.sub(r'[?|$|.|!]',r'',strs)
print nstr
nestr = re.sub(r'[^a-zA-Z0-9 ]',r'',nstr)
print nestr
you can add more special character and that will be replaced by '' means nothing i.e they will be removed.
Differently than everyone else did using regex, I would try to exclude every character that is not what I want, instead of enumerating explicitly what I don't want.
For example, if I want only characters from 'a to z' (upper and lower case) and numbers, I would exclude everything else:
import re
s = re.sub(r"[^a-zA-Z0-9]","",s)
This means "substitute every character that is not a number, or a character in the range 'a to z' or 'A to Z' with an empty string".
In fact, if you insert the special character ^ at the first place of your regex, you will get the negation.
Extra tip: if you also need to lowercase the result, you can make the regex even faster and easier, as long as you won't find any uppercase now.
import re
s = re.sub(r"[^a-z0-9]","",s.lower())
string.punctuation contains following characters:
'!"#$%&\'()*+,-./:;<=>?#[\]^_`{|}~'
You can use translate and maketrans functions to map punctuations to empty values (replace)
import string
'This, is. A test!'.translate(str.maketrans('', '', string.punctuation))
Output:
'This is A test'
s = re.sub(r"[-()\"#/#;:<>{}`+=~|.!?,]", "", s)
Assuming you want to use a regex and you want/need Unicode-cognisant 2.x code that is 2to3-ready:
>>> import re
>>> rx = re.compile(u'[\W_]+', re.UNICODE)
>>> data = u''.join(unichr(i) for i in range(256))
>>> rx.sub(u'', data)
u'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xaa\xb2 [snip] \xfe\xff'
>>>
The most generic approach is using the 'categories' of the unicodedata table which classifies every single character. E.g. the following code filters only printable characters based on their category:
import unicodedata
# strip of crap characters (based on the Unicode database
# categorization:
# http://www.sql-und-xml.de/unicode-database/#kategorien
PRINTABLE = set(('Lu', 'Ll', 'Nd', 'Zs'))
def filter_non_printable(s):
result = []
ws_last = False
for c in s:
c = unicodedata.category(c) in PRINTABLE and c or u'#'
result.append(c)
return u''.join(result).replace(u'#', u' ')
Look at the given URL above for all related categories. You also can of course filter
by the punctuation categories.
For other languages like German, Spanish, Danish, French etc that contain special characters (like German "Umlaute" as ü, ä, ö) simply add these to the regex search string:
Example for German:
re.sub('[^A-ZÜÖÄa-z0-9]+', '', mystring)
This will remove all special characters, punctuation, and spaces from a string and only have numbers and letters.
import re
sample_str = "Hel&&lo %% Wo$#rl#d"
# using isalnum()
print("".join(k for k in sample_str if k.isalnum()))
# using regex
op2 = re.sub("[^A-Za-z]", "", sample_str)
print(f"op2 = ", op2)
special_char_list = ["$", "#", "#", "&", "%"]
# using list comprehension
op1 = "".join([k for k in sample_str if k not in special_char_list])
print(f"op1 = ", op1)
# using lambda function
op3 = "".join(filter(lambda x: x not in special_char_list, sample_str))
print(f"op3 = ", op3)
Use translate:
import string
def clean(instr):
return instr.translate(None, string.punctuation + ' ')
Caveat: Only works on ascii strings.
This will remove all non-alphanumeric characters except spaces.
string = "Special $#! characters spaces 888323"
''.join(e for e in string if (e.isalnum() or e.isspace()))
Special characters spaces 888323
import re
my_string = """Strings are amongst the most popular data types in Python. We can create the strings by enclosing characters in quotes. Python treats single quotes the
same as double quotes."""
# if we need to count the word python that ends with or without ',' or '.' at end
count = 0
for i in text:
if i.endswith("."):
text[count] = re.sub("^([a-z]+)(.)?$", r"\1", i)
count += 1
print("The count of Python : ", text.count("python"))
After 10 Years, below I wrote there is the best solution.
You can remove/clean all special characters, punctuation, ASCII characters and spaces from the string.
from clean_text import clean
string = 'Special $#! characters spaces 888323'
new = clean(string,lower=False,no_currency_symbols=True, no_punct = True,replace_with_currency_symbol='')
print(new)
Output ==> 'Special characters spaces 888323'
you can replace space if you want.
update = new.replace(' ','')
print(update)
Output ==> 'Specialcharactersspaces888323'
function regexFuntion(st) {
const regx = /[^\w\s]/gi; // allow : [a-zA-Z0-9, space]
st = st.replace(regx, ''); // remove all data without [a-zA-Z0-9, space]
st = st.replace(/\s\s+/g, ' '); // remove multiple space
return st;
}
console.log(regexFuntion('$Hello; # -world--78asdf+-===asdflkj******lkjasdfj67;'));
// Output: Hello world78asdfasdflkjlkjasdfj67
import re
abc = "askhnl#$%askdjalsdk"
ddd = abc.replace("#$%","")
print (ddd)
and you shall see your result as
'askhnlaskdjalsdk

splitting merged words in python

I am working with a text where all "\n"s have been deleted (which merges two words into one, like "I like bananasAnd this is a new line.And another one.") What I would like to do now is tell Python to look for combinations of a small letter followed by capital letter/punctuation followed by capital letter and insert a whitespace.
I thought this would be easy with reg. expressions, but it is not - I couldnt find an "insert" function or anything, and the string commands seem not to be helpful either. How do I do this?
Any help would be greatly appreciated, I am despairing over here...
Thanks, patrick
Try the following:
re.sub(r"([a-z\.!?])([A-Z])", r"\1 \2", your_string)
For example:
import re
lines = "I like bananasAnd this is a new line.And another one."
print re.sub(r"([a-z\.!?])([A-Z])", r"\1 \2", lines)
# I like bananas And this is a new line. And another one.
If you want to insert a newline instead of a space, change the replacement to r"\1\n\2".
Using re.sub you should be able to make a pattern that grabs a lowercase and uppercase letter and substitutes them for the same two letters, but with a space in between:
import re
re.sub(r'([a-z][.?]?)([A-Z])', '\\1\n\\2', mystring)
You're looking for the sub function. See http://docs.python.org/library/re.html for documentation.
Hmm, interesting. You can use regular expressions to replace text with the sub() function:
>>> import re
>>> string = 'fooBar'
>>> re.sub(r'([a-z][.!?]*)([A-Z])', r'\1 \2', string)
'foo Bar'
If you really don't have any caps except at the beginning of a sentence, it will probably be easiest to just loop through the string.
>>> import string
>>> s = "a word endsA new sentence"
>>> lastend = 0
>>> sentences = list()
>>> for i in range(0, len(s)):
... if s[i] in string.uppercase:
... sentences.append(s[lastend:i])
... lastend = i
>>> sentences.append(s[lastend:])
>>> print sentences
['a word ends', 'A new sentence']
Here's another approach, which avoids regular expressions and does not use any imported libraries, just built-ins...
s = "I like bananasAnd this is a new line.And another one."
with_whitespace = ''
last_was_upper = True
for c in s:
if c.isupper():
if not last_was_upper:
with_whitespace += ' '
last_was_upper = True
else:
last_was_upper = False
with_whitespace += c
print with_whitespace
Yields:
I like bananas And this is a new line. And another one.

Categories

Resources