I'm new to Python and blocking on this problem:
trying to go from a string like this:
mystring = '[ [10, 20], [20,50], [ [0,400], [50, 328], [22, 32] ], 30, 12 ]'
to the nested list that is represented by the string. Basically, the reverse of
str(mylist)
If I try the obvious option
list(mystring)
it separates each character into a different element and I lose the nesting.
Is there an attribute to the list or str types that does this that I missed in the doc (I use Python 3.3)? Or do I need to code a function that does this?
additionnaly, how would you go about implementing that function? I have no clue what would be required to create nested lists of arbitrary depth...
Thanks,
--Louis H.
Call the ast.literal_eval function on the string.
To implement it by oneself, one could use a recursive function which would convert the string into a list of strings which represent lists. Then those strings would be passed to the function and so on.
If I try the obvious solution list(mystring) it separates each character into a different element and I lose the nesting.
This is because list() actually generates a list out of an iterable, which list() converts into a iterator using the __iter__() method of strings. When a string is converted into an iterator, each character is generated.
Alternately if you're looking to do this for a more general conversion from strings to objects I would suggest using the json module. Works with dictionaries, and returns a tried and true specification that can be readily used throughout the developer and web space.
import json
nested_list = json.reads(mystring)
# You can even go the other way
mystring == json.dumps(nested_list)
>>> True
Additionally, there are convenient methods for dealing directly with files that contain this kind of string representation:
# Instead of
data_structure = json.loads(open(filename).read())
# Just
data_structure = json.load(filename)
The same works in reverse with dump instead of load
If you want to know why you should use json instead of ast.literal_eval(), it's an extremely established point and you should read this question.
I'm new to python and this is just to automate something on my PC. I want to concatenate all the items in a list. The problem is that
''.join(list)
won't work as it isn't a list of strings.
This site http://www.skymind.com/~ocrow/python_string/ says the most efficient way to do it is
''.join([`num` for num in xrange(loop_count)])
but that isn't valid python...
Can someone explain the correct syntax for including this sort of loop in a string.join()?
You need to turn everything in the list into strings, using the str() constructor:
''.join(str(elem) for elem in lst)
Note that it's generally not a good idea to use list for a variable name, it'll shadow the built-in list constructor.
I've used a generator expression there to apply the str() constructor on each and every element in the list. An alternative method is to use the map() function:
''.join(map(str, lst))
The backticks in your example are another spelling of calling repr() on a value, which is subtly different from str(); you probably want the latter. Because it violates the Python principle of "There should be one-- and preferably only one --obvious way to do it.", the backticks syntax has been removed from Python 3.
Here is another way (discussion is about Python 2.x):
''.join(map(str, my_list))
This solution will have the fastest performance and it looks nice and simple imo. Using a generator won't be more efficient. In fact this will be more efficient, as ''.join has to allocate the exact amount of memory for the string based on the length of the elements so it will need to consume the whole generator before creating the string anyway.
Note that `` has been removed in Python 3 and it's not good practice to use it anymore, be more explicit by using str() if you have to eg. str(num).
just use this, no need of [] and use str(num):
''.join(str(num) for num in xrange(loop_count))
for list just replace xrange(loop_count) with the list name.
example:
>>> ''.join(str(num) for num in xrange(10)) #use range() in python 3.x
'0123456789'
If your Python is too old for "list comprehensions" (the odd [x for x in ...] syntax), use map():
''.join(map(str, list))
What is the best way to do the following in Python:
for item in [ x.attr for x in some_list ]:
do_something_with(item)
This may be a nub question, but isn't the list comprehension generating a new list that we don't need and just taking up memory? Wouldn't it be better if we could make an iterator-like list comprehension.
Yes (to both of your questions).
By using parentheses instead of brackets you can make what's called a "generator expression" for that sequence, which does exactly what you've proposed. It lets you iterate over the sequence without allocating a list to hold all the elements simultaneously.
for item in (x.attr for x in some_list):
do_something_with(item)
The details of generator expressions are documented in PEP 289.
Why not just:
for x in some_list:
do_something_with(x.attr)
This question is tagged functional-programming without an appropriate answer, so here's a functional solution:
from operator import itemgetter
map(do_something_with, map(itemgetter('attr'), some_list))
Python 3's map() uses an iterator, but Python 2 creates a list. For Python 2 use itertools.imap() instead.
If you're returning some_list, you can simplify it further using a generator expression and lazy evaluation :
def foo(some_list):
return (do_something_with(item.attr) for item in some_list)
This has always confused me. It seems like this would be nicer:
["Hello", "world"].join("-")
Than this:
"-".join(["Hello", "world"])
Is there a specific reason it is like this?
It's because any iterable can be joined (e.g, list, tuple, dict, set), but its contents and the "joiner" must be strings.
For example:
'_'.join(['welcome', 'to', 'stack', 'overflow'])
'_'.join(('welcome', 'to', 'stack', 'overflow'))
'welcome_to_stack_overflow'
Using something other than strings will raise the following error:
TypeError: sequence item 0: expected str instance, int found
This was discussed in the String methods... finally thread in the Python-Dev achive, and was accepted by Guido. This thread began in Jun 1999, and str.join was included in Python 1.6 which was released in Sep 2000 (and supported Unicode). Python 2.0 (supported str methods including join) was released in Oct 2000.
There were four options proposed in this thread:
str.join(seq)
seq.join(str)
seq.reduce(str)
join as a built-in function
Guido wanted to support not only lists and tuples, but all sequences/iterables.
seq.reduce(str) is difficult for newcomers.
seq.join(str) introduces unexpected dependency from sequences to str/unicode.
join() as a free-standing built-in function would support only specific data types. So using a built-in namespace is not good. If join() were to support many data types, creating an optimized implementation would be difficult: if implemented using the __add__ method then it would be O(n²).
The separator string (sep) should not be omitted. Explicit is better than implicit.
Here are some additional thoughts (my own, and my friend's):
Unicode support was coming, but it was not final. At that time UTF-8 was the most likely about to replace UCS-2/-4. To calculate total buffer length for UTF-8 strings, the method needs to know the character encoding.
At that time, Python had already decided on a common sequence interface rule where a user could create a sequence-like (iterable) class. But Python didn't support extending built-in types until 2.2. At that time it was difficult to provide basic iterable class (which is mentioned in another comment).
Guido's decision is recorded in a historical mail, deciding on str.join(seq):
Funny, but it does seem right! Barry, go for it...
Guido van Rossum
Because the join() method is in the string class, instead of the list class.
See http://www.faqs.org/docs/diveintopython/odbchelper_join.html:
Historical note. When I first learned
Python, I expected join to be a method
of a list, which would take the
delimiter as an argument. Lots of
people feel the same way, and there’s
a story behind the join method. Prior
to Python 1.6, strings didn’t have all
these useful methods. There was a
separate string module which contained
all the string functions; each
function took a string as its first
argument. The functions were deemed
important enough to put onto the
strings themselves, which made sense
for functions like lower, upper, and
split. But many hard-core Python
programmers objected to the new join
method, arguing that it should be a
method of the list instead, or that it
shouldn’t move at all but simply stay
a part of the old string module (which
still has lots of useful stuff in it).
I use the new join method exclusively,
but you will see code written either
way, and if it really bothers you, you
can use the old string.join function
instead.
--- Mark Pilgrim, Dive into Python
I agree that it's counterintuitive at first, but there's a good reason. Join can't be a method of a list because:
it must work for different iterables too (tuples, generators, etc.)
it must have different behavior between different types of strings.
There are actually two join methods (Python 3.0):
>>> b"".join
<built-in method join of bytes object at 0x00A46800>
>>> "".join
<built-in method join of str object at 0x00A28D40>
If join was a method of a list, then it would have to inspect its arguments to decide which one of them to call. And you can't join byte and str together, so the way they have it now makes sense.
Why is it string.join(list) instead of list.join(string)?
This is because join is a "string" method! It creates a string from any iterable. If we stuck the method on lists, what about when we have iterables that aren't lists?
What if you have a tuple of strings? If this were a list method, you would have to cast every such iterator of strings as a list before you could join the elements into a single string! For example:
some_strings = ('foo', 'bar', 'baz')
Let's roll our own list join method:
class OurList(list):
def join(self, s):
return s.join(self)
And to use it, note that we have to first create a list from each iterable to join the strings in that iterable, wasting both memory and processing power:
>>> l = OurList(some_strings) # step 1, create our list
>>> l.join(', ') # step 2, use our list join method!
'foo, bar, baz'
So we see we have to add an extra step to use our list method, instead of just using the builtin string method:
>>> ' | '.join(some_strings) # a single step!
'foo | bar | baz'
Performance Caveat for Generators
The algorithm Python uses to create the final string with str.join actually has to pass over the iterable twice, so if you provide it a generator expression, it has to materialize it into a list first before it can create the final string.
Thus, while passing around generators is usually better than list comprehensions, str.join is an exception:
>>> import timeit
>>> min(timeit.repeat(lambda: ''.join(str(i) for i in range(10) if i)))
3.839168446022086
>>> min(timeit.repeat(lambda: ''.join([str(i) for i in range(10) if i])))
3.339879313018173
Nevertheless, the str.join operation is still semantically a "string" operation, so it still makes sense to have it on the str object than on miscellaneous iterables.
Think of it as the natural orthogonal operation to split.
I understand why it is applicable to anything iterable and so can't easily be implemented just on list.
For readability, I'd like to see it in the language but I don't think that is actually feasible - if iterability were an interface then it could be added to the interface but it is just a convention and so there's no central way to add it to the set of things which are iterable.
- in "-".join(my_list) declares that you are converting to a string from joining elements a list.It's result-oriented. (just for easy memory and understanding)
I made an exhaustive cheatsheet of methods_of_string for your reference.
string_methods_44 = {
'convert': ['join','split', 'rsplit','splitlines', 'partition', 'rpartition'],
'edit': ['replace', 'lstrip', 'rstrip', 'strip'],
'search': ['endswith', 'startswith', 'count', 'index', 'find','rindex', 'rfind',],
'condition': ['isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isnumeric','isidentifier',
'islower','istitle', 'isupper','isprintable', 'isspace', ],
'text': ['lower', 'upper', 'capitalize', 'title', 'swapcase',
'center', 'ljust', 'rjust', 'zfill', 'expandtabs','casefold'],
'encode': ['translate', 'maketrans', 'encode'],
'format': ['format', 'format_map']}
Primarily because the result of a someString.join() is a string.
The sequence (list or tuple or whatever) doesn't appear in the result, just a string. Because the result is a string, it makes sense as a method of a string.
The variables my_list and "-" are both objects. Specifically, they're instances of the classes list and str, respectively. The join function belongs to the class str. Therefore, the syntax "-".join(my_list) is used because the object "-" is taking my_list as an input.
You can't only join lists and tuples. You can join almost any iterable.
And iterables include generators, maps, filters etc
>>> '-'.join(chr(x) for x in range(48, 55))
'0-1-2-3-4-5-6'
>>> '-'.join(map(str, (1, 10, 100)))
'1-10-100'
And the beauty of using generators, maps, filters etc is that they cost little memory, and are created almost instantaneously.
Just another reason why it's conceptually:
str.join(<iterator>)
It's efficient only granting str this ability. Instead of granting join to all the iterators: list, tuple, set, dict, generator, map, filter all of which only have object as common parent.
Of course range(), and zip() are also iterators, but they will never return str so they cannot be used with str.join()
>>> '-'.join(range(48, 55))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected str instance, int found
I 100% agree with your issue. If we boil down all the answers and comments here, the explanation comes down to "historical reasons".
str.join isn't just confusing or not-nice looking, it's impractical in real-world code. It defeats readable function or method chaining because the separator is rarely (ever?) the result of some previous computation. In my experience, it's always a constant, hard-coded value like ", ".
I clean up my code — enabling reading it in one direction — using tools.functoolz:
>>> from toolz.functoolz import curry, pipe
>>> join = curry(str.join)
>>>
>>> a = ["one", "two", "three"]
>>> pipe(
... a,
... join("; ")
>>> )
'one; two; three'
I'll have several other functions in the pipe as well. The result is that it reads easily in just one direction, from beginning to end as a chain of functions. Currying map helps a lot.