Replace string with quotes, brackets, braces, and slashes in python - python

I have a string where I am trying to replace ["{\" with [{" and all \" with ".
I am struggling to find the right syntax in order to do this, does anyone have a solid understanding of how to do this?
I am working with JSON, and I am inserting a string into the JSON properties. This caused it to put a single quotes around my inserted data from my variable, and I need those single quotes gone. I tried to do json.dumps() on the data and do a string replace, but it does not work.
Any help is appreciated. Thank you.

You can use the replace method.
See documentation and examples here

I would recommend maybe posting more of your code below so we can suggest a better answer. Just based on the information you have provided, I would say that what you are looking for are escape characters. I may be able to help more once you provide us with more info!

Use the target/replacement strings as arguments to replace().
The general format is mystring = mystring.replace("old_text", "new_text")
Since your target strings have backslashes, you also probably want to use raw strings to prevent them from being interpreted as special characters.
mystring = "something"
mystring = mystring.replace(r'["{\"', '[{"')
mystring = mystring.replace(r'\"', '"')

if its two characters you want to replace then you have to first check for first character and then the second(which should be present just after the first one and so on) and shift(shorten the whole array by 3 elements in first case whenever the condition is satisfied and in the second case delete \ from the array.
You can also find particular substring by using inbuilt function and replace it by using replace() function to insert the string you want in its place

Related

Format a string to a proper JSON object

I have a string (from an API call) that looks something like this:
val=
{input:a,matches:[{in:["w","x","y","z"],output:{num1:0d-2,num2:7.0d-1}},
{in:["w","x"],output:{num1:0d-2,num2:8.0d-1}}]}
I need to do temp=json.loads(val); but the problem is that the string is not a valid JSON. The keys and values do not have the quotes around them. I tried explicitly putting the quotes and that worked.
How can I programatically include the quotes for such a string before reading it as a JSON?
Also, how can I replace the numbers scientific notations with decimals? eg. 0d-2 becomes "0" and 8.0d-1 becomes "0.8"?
You could catch anything thats a string with regex and replace it accordingly.
Assuming your strings that need quotes:
start with a letter
can have numbers at the end
never start with numbers
never have numbers or special characters in between them
This would be a regex code to catch them:
([a-z]*\d*):
You can try it out here. Or learn more about regex here.
Let's do it in python:
import re
# catch a string in json
json_string = '{input:a,matches:[{in:["w","x","y","z"],output:{num1:0d-2,num2:7.0d-1}},
{in:["w","x"],output:{num1:0d-2,num2:8.0d-1}}]}' # note the single quotes!
# search the strings according to our rule
string_search = re.search('([a-z]*\d*):', json_string)
# extract the first capture group; so everything we matched in brackets
# this is to exclude the colon at the end from the found string as
# we don't want to enquote the colons as well
extracted_strings = string_search.group(1)
This is a solution in case you will build a loop later.
However if you just want to catch all possible strings in python as a list you can do simply the following instead:
import re
# catch ALL strings in json
json_string = '{input:a,matches:[{in:["w","x","y","z"],output:{num1:0d-2,num2:7.0d-1}},
{in:["w","x"],output:{num1:0d-2,num2:8.0d-1}}]}' # note the single quotes!
extract_all_strings = re.findall(r'([a-z]*\d*):', json_string)
# note that this by default catches only our capture group in brackets
# so no extra step required
This was about basically regex and finding everything.
With these basics you could either use re.sub to replace everything with itself just in quotes, or generate a list of replacements to verify first that everything went right (probably somethign you'd rather want to do with this maybe a little bit unstable approach) like this.
Note that this is why I made this kind of comprehensive answer instead of just pointing you to a "re.sub" one-liner.
You can apporach your scientific number notation problem accordingly.

Get the string within brackets and remove useless string in Python

I have a string like this '0x69313430303239377678(i1400297vx)' I only want the value i1400297vx and nothing else.
Is there a simple way for example using strip method or I'm forced to use Regex,I'm not good at...
Someone could kindly help me?
This works, using split and strip:
'0x69313430303239377678(i1400297vx)'.split('(')[1].strip(')')
but a regex would be more readable!

regex search&replace a variable string including a regex statement

I want to use re.sub to replace a part of a string I know exactly what looks like. relevant part of code:
print "Regex statement: ", foundStatements[iterator]
print "string to replace with : \n", latexPreparedString
print "string to search&replace in: \n", fileAsString
processedString = re.sub(foundStatements[iterator], latexPreparedString, fileAsString)
print "processed string: \n", processedString
In my testing case, foundStatements[iterator] is "%#import script_example.py ( *out =(.|\n)*?return out)" But even though processedString contains foundStatements[iterator], processedString looks exactly like fileAsString, so it hasn't accomplished the re.sub task. What am I doing wrong?
EDIT: Ok, it definitely has something to do with the string I'm searching to replace containing regex code. Is there a way to make it just interpret it foundStatements[iterator] as a raw string to search for? The only solution I can think of is to create a function that replaces any regex symbols in a string with \regexsymbol (e.g. * -> \*), but it'd make sense for there to be a way to solve this with inbuilt functions. It'd also be a bit overkill since I'd have to make sure it works with every single regex symbol, of which there are quite a few :/
EDIT2: Well, just changing it to re.sub(re.escape(foundStatements[iterator]), latexPreparedString, fileAsString) seems to work. except when the regex statement doesn't hit anything in the original file. To explain, latexPreparedString is generated by using the regex-part of the foundStatements[iterator]. While it's logical that it shouldn't be able to set latexPreparedString to anything when the regex statement doesn't hit anything, I set latexPreparedString = "" by default, so in that case it should re.sub replace it with a blank string if it doesn't hit anything. Here's how to code looks at the moment: pastebin.com/wUedK3LN
First, for replacing an exact match in a string, you should use [string.replace()][1]:
processedString = fileAsString(foundStatements[iterator], latexPreparedString)
However, this will still fail in your case, because foundStatements[iterator] has a newline character in it. To escape it, you need to use the r prefix when declaring foundStatements[iterator].
If you still want to use re.sub, you have to both prefix the string with r and use re.escape(foundStatements[iterator]) instead of foundStatements[iterator]. You can read more about re.escape here.

A simple regexp in python

My program is a simple calculator, so I need to parse te expression which the user types, to get the input more user-friendly. I know I can do it with regular expressions, but I'm not familar enough about this.
So I need transform a input like this:
import re
input_user = "23.40*1200*(12.00-0.01)*MM(H2O)/(8.314 *func(2*x+273.15,x))"
re.some_stuff( ,input_user) # ????
in this:
"23.40*1200*(12.00-0.01)*MM('H2O')/(8.314 *func('2*x+273.15',x))"
just adding these simple quotes inside the parentheses. How can I do that?
UPDATE:
To be more clear, I want add simple quotes after every sequence of characters "MM(" and before the ")" which comes after it, and after every sequence of characters "func(" and before the "," which comes after it.
This is the sort of thing where regexes can work, but they can potentially result in major problems unless you consider exactly what your input will be like. For example, can whatever is inside MM(...) contain parentheses of its own? Can the first expression in func( contain a comma? If the answers to both questions is no, then the following could work:
input_user2 = re.sub(r'MM\(([^\)]*)\)', r"MM('\1')", input_user)
output = re.sub(r'func\(([^,]*),', r"func('\1',", input_user)
However, this will not work if the answer to either question is yes, and even without that could cause problems depending upon what sort of inputs you expect to receive. Essentially, the first re.sub here looks for MM( ('MM('), followed by any number (including 0) of characters that aren't a close-parenthesis ('([^)]*)') that are then stored as a group (caused by the extra parentheses), and then a close-parenthesis. It replaces that section with the string in the second argument, where \1 is replaced by the first and only group from the pattern. The second re.sub works similarly, looking for any number of characters that aren't a comma.
If the answer to either question is yes, then regexps aren't appropriate for the parsing, as your language would not be regular. The answer to this question, while discussing a different application, may give more insight into that matter.

Python: what kind of literal delimiter is "better" to use?

What is the best literal delimiter in Python and why? Single ' or double "? And most important, why?
I'm a beginner in Python and I'm trying to stick with just one. I know that in PHP, for example " is preferred, because PHP does not try to search for the 'string' variable. Is the same case in Python?
' because it's one keystroke less than ". Save your wrists!
They're otherwise identical (except you have to escape whichever you choose to use, if they appear inside the string).
Consider these strings:
"Don't do that."
'I said, "okay".'
"""She said, "That won't work"."""
Which quote is "best"?
Semantically there is no difference in Python; use either. Python also provides the handy triple string delimiter """ or ''' which can simplify multi-line quotes. There is also the raw string literal (r"..." or r'...') to inhibit \ escapes. The Language Reference has all the details.
For string constants containing a single quote use the double quote as delimiter.
The other way around, if you need a double quote inside.
Quick, shiftless typing leads to single quote delimiters.
>>> "it's very simple"
>>> 'reference to the "book"'
Single and double quotes act identically in Python. Escapes (\n) always work, and there is no variable interpolation. (If you don't want escapes, you can use the r flag, as in r"\n".)
Since I'm coming from a Perl background, I have a habit of using single quotes for plain strings and double-quotes for formats used with the % operator. But there is really no difference.
Other answers are about nested quoting. Another point of view I've come across, but I'm not sure I subscribe to, is to use single-quotes(') for characters (which are strings, but ord/chr are quick picky) and to use double-quotes for strings. Which disambiguates between a string that is supposed to be one character and one that just happens to be one character.
Personally I find most touch typists aren't affected noticably by the "load" of using the shift-key. YMMV on that part. Going down the "it's faster to not use the shift" is a slippery slope. It's also faster to use hyper-condensed variable/function/class/module names. Everyone just so loves the fast and short 8.3 DOS files names too. :) Pick what makes semantic sense to you, then optimize.
This is a rule I have heard about:
") If the string is for human consuption, that is interface text or output, use ""
') If the string is a specifier, like a dictionary key or an option, use ''
I think a well-enforced rule like that can make sense for a project, but it's nothing that I would personally care much about. I like the above, since I read it, but I always use "" (since I learned C first wayy back?).
I don't think there is a single best string delimiter. I like to use different delimiters to indicate different kinds of string. Specifically, I like to use "..." to delimit stings that are used for interpolation or that are natural language messages, and '...' to delimit small symbol-like strings. This gives me a subtle extra clue to the expected use for the string literal.
I try to always use raw strings (r"...") for regular expressions because (1) I don't have to escape backslash characters and (2) my editor recognises this convention and does syntax highlighting inside the regex.
The stylistic issues of single- vs. double-quotes are covered in question 56011.

Categories

Resources