Converting string to tuple and adding to tuple - python

I have a config file like this.
[rects]
rect1=(2,2,10,10)
rect2=(12,8,2,10)
I need to loop through the values and convert them to tuples.
I then need to make a tuple of the tuples like
((2,2,10,10), (12,8,2,10))

Instead of using a regex or int/string functions, you could also use the ast module's literal_eval function, which only evaluates strings that are valid Python literals. This function is safe (according to the docs).
http://docs.python.org/library/ast.html#ast.literal_eval
import ast
ast.literal_eval("(1,2,3,4)") # (1,2,3,4)
And, like others have said, ConfigParser works for parsing the INI file.

To turn the strings into tuples of ints (which is, I assume, what you want), you can use a regex like this:
x = "(1,2,3)"
t = tuple(int(v) for v in re.findall("[0-9]+", x))
And you can use, say, configparser to parse the config file.

Considering that cp is the ConfigParser object for the cfg file having the config.
[rects]
rect1=(2,2,10,10)
rect2=(12,8,2,10)
>> import ast
>> tuple(ast.literal_eval(v[1]) for v in cp.items('rects'))
((2,2,10,10), (12,8,2,10))
Edit : Changed eval() to a safer version literal_eval()
From python docs - literal_eval() does following :
Safely evaluate an expression node or a string containing a Python
expression. The string or node provided may only consist of the following
Python literal structures: strings, numbers, tuples, lists, dicts, booleans,
and None

You can simply make a tuple of tuples like
new_tuple = (rect1,rect2) # ((2,2,10,10), (12,8,2,10))
If you want to loop through values
for i in rect1+rect2:
print i
If you want to regroup the numbers you could do
tuple_regrouped = zip(rect1,rect2) #((2,12),(2,8),(10,2), (10,10))
EDIT:
Didn't notice the string part. If you have lines in strings, like from reading a config file, you can do something like
# line = "rect1 = (1,2,3,4)"
config_dict = {}
var_name, tuple_as_str = line.replace(" ","").split("=")
config_dict[var_name] = tuple([int(i) for i in tuple_as_str[1:-1].split(',')])
# and now you'd have config_dict['rect1'] = (1,2,3,4)

The easiest way to do this would be to use Michael Foord's ConfigObject library. It has an unrepr mode, which'll directly convert the string into a tuple for you.

Related

How to convert a regular string to a raw string? [duplicate]

I have a string s, its contents are variable. How can I make it a raw string? I'm looking for something similar to the r'' method.
i believe what you're looking for is the str.encode("string-escape") function. For example, if you have a variable that you want to 'raw string':
a = '\x89'
a.encode('unicode_escape')
'\\x89'
Note: Use string-escape for python 2.x and older versions
I was searching for a similar solution and found the solution via:
casting raw strings python
Raw strings are not a different kind of string. They are a different way of describing a string in your source code. Once the string is created, it is what it is.
Since strings in Python are immutable, you cannot "make it" anything different. You can however, create a new raw string from s, like this:
raw_s = r'{}'.format(s)
As of Python 3.6, you can use the following (similar to #slashCoder):
def to_raw(string):
return fr"{string}"
my_dir ="C:\data\projects"
to_raw(my_dir)
yields 'C:\\data\\projects'. I'm using it on a Windows 10 machine to pass directories to functions.
raw strings apply only to string literals. they exist so that you can more conveniently express strings that would be modified by escape sequence processing. This is most especially useful when writing out regular expressions, or other forms of code in string literals. if you want a unicode string without escape processing, just prefix it with ur, like ur'somestring'.
For Python 3, the way to do this that doesn't add double backslashes and simply preserves \n, \t, etc. is:
a = 'hello\nbobby\nsally\n'
a.encode('unicode-escape').decode().replace('\\\\', '\\')
print(a)
Which gives a value that can be written as CSV:
hello\nbobby\nsally\n
There doesn't seem to be a solution for other special characters, however, that may get a single \ before them. It's a bummer. Solving that would be complex.
For example, to serialize a pandas.Series containing a list of strings with special characters in to a textfile in the format BERT expects with a CR between each sentence and a blank line between each document:
with open('sentences.csv', 'w') as f:
current_idx = 0
for idx, doc in sentences.items():
# Insert a newline to separate documents
if idx != current_idx:
f.write('\n')
# Write each sentence exactly as it appared to one line each
for sentence in doc:
f.write(sentence.encode('unicode-escape').decode().replace('\\\\', '\\') + '\n')
This outputs (for the Github CodeSearchNet docstrings for all languages tokenized into sentences):
Makes sure the fast-path emits in order.
#param value the value to emit or queue up\n#param delayError if true, errors are delayed until the source has terminated\n#param disposable the resource to dispose if the drain terminates
Mirrors the one ObservableSource in an Iterable of several ObservableSources that first either emits an item or sends\na termination notification.
Scheduler:\n{#code amb} does not operate by default on a particular {#link Scheduler}.
#param the common element type\n#param sources\nan Iterable of ObservableSource sources competing to react first.
A subscription to each source will\noccur in the same order as in the Iterable.
#return an Observable that emits the same sequence as whichever of the source ObservableSources first\nemitted an item or sent a termination notification\n#see ReactiveX operators documentation: Amb
...
Just format like that:
s = "your string"; raw_s = r'{0}'.format(s)
With a little bit correcting #Jolly1234's Answer:
here is the code:
raw_string=path.encode('unicode_escape').decode()
s = "hel\nlo"
raws = '%r'%s #coversion to raw string
#print(raws) will print 'hel\nlo' with single quotes.
print(raws[1:-1]) # will print hel\nlo without single quotes.
#raws[1:-1] string slicing is performed
The solution, which worked for me was:
fr"{orignal_string}"
Suggested in comments by #ChemEnger
I suppose repr function can help you:
s = 't\n'
repr(s)
"'t\\n'"
repr(s)[1:-1]
't\\n'
Just simply use the encode function.
my_var = 'hello'
my_var_bytes = my_var.encode()
print(my_var_bytes)
And then to convert it back to a regular string do this
my_var_bytes = 'hello'
my_var = my_var_bytes.decode()
print(my_var)
--EDIT--
The following does not make the string raw but instead encodes it to bytes and decodes it.

How can I convert string to dict or list?

I have strings such as:
'[1, 2, 3]'
and
"{'a': 1, 'b': 2}"
How do I convert them to list/dict?
Someone mentions that ast.literal_eval or eval can parse a string that converts to list/dict.
What's the difference between ast.literal_eval and eval?
ast.literal_eval parses 'abstract syntax trees.' You nearly have json there, for which you could use json.loads, but you need double quotes, not single quotes, for dictionary keys to be valid.
import ast
result = ast.literal_eval("{'a': 1, 'b': 2}")
assert type(result) is dict
result = ast.literal_eval("[1, 2, 3]")
assert type(result) is list
As a plus, this has none of the risk of eval, because it doesn't get into the business of evaluating functions. eval("subprocess.call(['sudo', 'rm', '-rf', '/'])") could remove your root directory, but ast.literal_eval("subprocess.call(['sudo', 'rm', '-rf', '/'])") fails predictably, with your file system intact.
Use the eval function:
l = eval('[1, 2, 3]')
d = eval("{'a':1, 'b': 2}")
Just make sure you know where these strings came from and that you aren't allowing user input to be evaluated and do something malicious.
python script to convert this string to dict : -
import json
inp_string = '{"1":"one", "2":"two"}'
out = json.loads(inp_string)
print out["1"]
O/P is like :
"one"
You can eval() but only with safe data. Otherwise, if you parse unsafe data, take a look into safer ast.literal_eval().
JSON parser is also a possibility, most of python dicts and lists have the same syntax.
You can convert string to list/dict by ast.literal_eval() or eval() function. ast.literal_eval() only considers a small subset of Python's syntax to be valid:
The string or node provided may only consist of the following Python
literal structures: strings, numbers, tuples, lists, dicts, booleans,
and None.
Passing __import__('os').system('rm -rf /') into ast.literal_eval() will raise an error, but eval() will happily wipe your drive.
Since it looks like you're only letting the user input a plain dictionary, use ast.literal_eval(). It safely does what you want and nothing more.

How to convert a list into a string and then convert it back in python?

I want to send a list through UDP/TCP, but since they support string list only, I need to convert the list into string and convert it back.
My list is like
['S1','S2','H1','C1','D8']
I know I can use
string_ = ''.join(list_)
to convert it into string.
But how to convert it back?
Or there is another way I can use UDP/TCP to send a list?
Custom format would depend on the assumptions about the list items format, so json looks like the safest way to go:
>>> import json
>>> data = json.dumps(['S1','S2','H1','C1','D8'])
>>> data
'["S1", "S2", "H1", "C1", "D8"]'
>>> json.loads(data)
[u'S1', u'S2', u'H1', u'C1', u'D8']
Use a separator:
string_ = ';'.join(list_)
list_ = string_.split(';')
You need to make sure the separator character can't be within your string. If it is, you might need encoding.
If you have python on both ends of network communication, you can use dumps and loads functions from pickle module design especially for serializing python objects:
import pickle
a = ['S1','S2','H1','C1','D8']
string_ = pickle.dumps(a)
...
a = pickle.loads(string_)
Otherwise solution proposed by #bereal is better one because there are json libraries for most programming languages. But it will demand some processing for data types not supported by json.
EDIT
As #bereal noticed there can be security problem with pickle because it's actually an executable language.
Maybe you could append a separator between the list elements and use it calling split to get the list back.
EDIT:
As #Eric Fortin mentionned in his answer, the separator should be something that cannot be in your string. One possibility is -- as he suggested -- to use encoding. Another possiblity is to send elements one by one, but that would obviously increase the communication overhead.
Note that your separator may be a sequence, it does not need to be one single character.
str_list = separator.join(list_)
new_list = str_list.split(separator)
If you know the format of your list elements, you may even go without using separators!
str_list = "".join(list_)
re.split(reg_exp, str_list)

trying to parse a string and convert it to nested lists

I'm new to Python and blocking on this problem:
trying to go from a string like this:
mystring = '[ [10, 20], [20,50], [ [0,400], [50, 328], [22, 32] ], 30, 12 ]'
to the nested list that is represented by the string. Basically, the reverse of
str(mylist)
If I try the obvious option
list(mystring)
it separates each character into a different element and I lose the nesting.
Is there an attribute to the list or str types that does this that I missed in the doc (I use Python 3.3)? Or do I need to code a function that does this?
additionnaly, how would you go about implementing that function? I have no clue what would be required to create nested lists of arbitrary depth...
Thanks,
--Louis H.
Call the ast.literal_eval function on the string.
To implement it by oneself, one could use a recursive function which would convert the string into a list of strings which represent lists. Then those strings would be passed to the function and so on.
If I try the obvious solution list(mystring) it separates each character into a different element and I lose the nesting.
This is because list() actually generates a list out of an iterable, which list() converts into a iterator using the __iter__() method of strings. When a string is converted into an iterator, each character is generated.
Alternately if you're looking to do this for a more general conversion from strings to objects I would suggest using the json module. Works with dictionaries, and returns a tried and true specification that can be readily used throughout the developer and web space.
import json
nested_list = json.reads(mystring)
# You can even go the other way
mystring == json.dumps(nested_list)
>>> True
Additionally, there are convenient methods for dealing directly with files that contain this kind of string representation:
# Instead of
data_structure = json.loads(open(filename).read())
# Just
data_structure = json.load(filename)
The same works in reverse with dump instead of load
If you want to know why you should use json instead of ast.literal_eval(), it's an extremely established point and you should read this question.

Parse XML File with Python and get the letter 'u' in every list element

I have a XML file with some elements like this:
<RMS>[14.470156174, 14.470156174, 14.485567944, 14.496014765]</RMS>
I want to get a list with all the elements
So i tried some Regex with the following code:
string = dom.getElementsByTagName('RMS')[0].toxml()
string2 = re.findall("[\-]*[0-9]*\.[0-9]*", string)
Now, when I want to print the list, it looks like this:
[u'14.470156174', u'14.470156174', u'14.485567944', u'14.496014765']
What's going on with the 'u'?
Are there any ideas how to solve the problem?
Thanks for helping.
Strings that start with a u are unicode string literals. Since XML contains unicode data, the XML parser returns your data in the correct type, which is the python unicode() type.
You do not need to remove them, you do not have a problem. You may want to read up on Unicode and Python in the Python Unicode HOWTO but there is no problem here.
Since these are numbers, you can convert the unicode values straight to float instances.
There is no need to use regex here. In fact, your regex may not work for some floats such as 1.4e1.
Since you are using minidom you could do this:
import xml.dom.minidom as minidom
import ast
content = "<RMS>[14.470156174, 14.470156174, 14.485567944, 14.496014765]</RMS> "
dom = minidom.parseString(content)
text = dom.getElementsByTagName('RMS')[0].childNodes[0].wholeText
If you
print(text)
you get
[14.470156174, 14.470156174, 14.485567944, 14.496014765]
but if you
print(repr(text))
you get
u'[14.470156174, 14.470156174, 14.485567944, 14.496014765]'
The u indicates that text is a unicode object, not a str object. Similarly, your code produces a list of unicode objects. When you print a list, Python prints the repr of each of the elements inside the list. This is why you see
[u'14.470156174', u'14.470156174', u'14.485567944', u'14.496014765']
Now upon rereading your question, I see you want a list of the elements in text. Since they are numbers, I assume you want a list of floats. In that case, you could use ast.literal_eval:
values = ast.literal_eval(text)
print(values)
yields
[14.470156174, 14.470156174, 14.485567944, 14.496014765]
where values is a list of floats.

Categories

Resources