joining string from set reverses order - python

I have the string 'ABBA'. If I turn it into a set, I get this:
In: set('ABBA')
Out: {'A', 'B'}
However, if I join them as a string, it reverses the order:
In: ''.join(set('ABBA'))
Out: 'BA'
The same happens if I try to turn the string into a list:
In: list(set('ABBA'))
Out: ['B', 'A']
Why is this happening and how do I address it?
EDIT
The reason applying sorted doesn't work is that if I make a set out of 'CDA', it will return 'ACD', thus losing the order of the original string. My question pertains to preserving the original order of the set itself.

Sets are unordered collection i.e. you will get a different order every time you run the command and sets also have unique elements therefore there will be no repetition in the elements of the set.
if you try running this command for a few times set('ABBA') sometimes you will get the output as {'A', 'B'} and sometimes as {'B', 'A'} and that what happens when you are using the join command the output is sometimes taken as BA and sometimes it will show AB.
There is an ordered set recipe for this which is referred to from the Python 2 Documentation. This runs on Py2.6 or later and 3.0 or later without any modifications. The interface is almost exactly the same as a normal set, except that initialisation should be done with a list.
OrderedSet([1, 2, 3])
This is a MutableSet, so the signature for .union doesn't match that of set, but since it includes or something similar can easily be added

b = "AAEEBBCCDD"
a = set(b)#unordered
print(a)#{'B', 'D', 'C', 'A', 'E'}/{'A', 'E', 'B', 'D', 'C'}/,,,
#do not have reverses the order,only random
print(''.join(a))
print(list(a))
print(sorted(a, key=b.index))#Save original sequence(b)

Related

Create new column with matched value between a unique set of item and a long list

I am new to data wrangling in python.
I have a column in a dataframe that has text like:
I really like Product A!
I think Product B is for me!
I will go with Product C.
My objective is to create a new column with Product Name (Including the word 'Product'). I do not want to use Regex. Product name is unique in a row. So there will be no row with string such as
I really like Product A and Product B
Problem in generic form: I have a list of unique items. lets call it list A. I have another list of strings where each string includes atmost one of the items from list A. How do I create a new list with matched item.
I have written the following code. It works fine. But even I (new to progamming) can tell this is highly inefficient.
Any better and elegant solution?
product_type = ['Product A', 'Product B', 'Product C', 'Product D']
product_list = [None] * len(fed_df['product_line'])
for i in range(len(product_list)):
for product in product_type:
if product in fed_df['product_line'][i]:
product_list[i] = product
fed_df['product_line'] = product_list
Short Background
Fundamentally, at some point, each element of each list will need to be compared similarly to how you've written it (although you can skip to the next loop once a match has been found). But the trick with writing good python code, is to utilise functionality written on a lower level for efficiency, rather than trying to write it yourself. For example: You should try to avoid using
for i in range(len(myList)): #code which accesses myList[i]
when you can use
for myListElement in myList: #code which uses myListElement
since in the latter, the accessing of myList is handled internally, and more efficiently than python calculating i manually, then accessing the ith element of myList. This fact is true of some other high-level programming languages too.
Actual Answer
Anyway, to answer your question, I came up with the following and I believe it would be more efficient:
answer = map(lambda product_line_element: next(filter(lambda product: product in product_line_element,product_type),None), fed_df['product_line'])
What this does is it maps each line (map) of the fed_df['product_line'] and replaces that element with the first element (next) in a list containing the product types found in each line of products in fed_df['product_line'] (filter).
How I tested
To test this I made a list of lists to use as fed_df['productline']
[['h', 'a', 'g'], ['k', 'b', 'l'], ['u', 't', 'a'], ['r', 'e', 'p'], ['g', 'e', 'b']]
and searched for "a" and "b" "product_types", which gave
['a', 'b', 'a', None, 'b']
as a result, which I think is what you are after...
These mapping functions are usually preferred over for loops, since it promotes no mutation, and can be made multi-threaded/multi-process more easily.
Another bonus of this solutions is that the result isn't calculated until future code attempts to access answer, which spreads the CPU usage a bit better. You can force it to be calculated by converting answer into a list (list(answer)), but it shouldn't be necessary.
I hope I understood your problem correctly. Let me know if you have any questions :)

How to return a value of a key from a dictionary in python

Let me have a dictionary:
P={'S':['dB','N'],'B':['C','CcB','bA']}
How can I get second value o the second key from dictionary P ?
Also, if the value is a string with more than one character like 'bA' (third value of key 'B'), can I somehow return first character of this value ?
Like #jonrsharpe has stated before, dictionaries aren't ordered by design.
What this means, is that everytime you attempt to access a dictionary "by order" you may encounter a different result.
Observe the following (python interactive interpreter):
>>>P={'S':['dB','N'],'B':['C','CcB','bA'], 'L':["qqq"]}
>>>P.keys()
['S', 'B', 'L']
Its easy to see that in this notice, the "order" as we defined is, matches the order that we receive from the Dictionary.keys() function.
However, you may also observe this result:
>>> P={'S':['dB','N'],'B':['C','CcB','bA'], 'L':["qqq"], 'A':[]}
>>> P.keys()
['A', 'S', 'B', 'L']
In this example, the value 'A' should be fourth in our list, but, it is actually the first value.
This is just a small example why you may not treat dictionaries as ordered lists.
Maybe you could go ahead and tell us what your intentions are and an alternative may be suggested.

Changing the predicate used in "Set" operation?

Set means to remove duplicate Items. Duplicate is identified by a equals to b.
Can I change equals to to a different predicate? I am trying to do this in Python but any language is fine.
I would like to know if there are any inbuilt set mechanisms to reduce ['aaaa', 'aa', 'b', 'bb', 'c'] to ['aaaa', 'bb' 'c']. Here duplicate is defined by if it is substring of something else. I feel there is a similarity in how duplicates are found. But if it's using hashing then I could be wrong.

Please explain "set difference" in python

Trying to learn Python I encountered the following:
>>> set('spam') - set('ham')
set(['p', 's'])
Why is it set(['p', 's']) - i mean: why is 'h' missing?
The - operator on python sets is mapped to the difference method, which is defined as the members of set A which are not members of set B. So in this case, the members of "spam" which are not in "ham"are "s" and "p". Notice that this method is not commutative (that is, a - b == b - a is not always true).
You may be looking for the symmetric_difference or ^ method:
>>> set("spam") ^ set("ham")
{'h', 'p', 's'}
This operator is commutative.
Because that is the definition of a set difference. In plain English, it is equivalent to "what elements are in A that are not also in B?".
Note the reverse behavior makes this more obvious
>>> set('spam') - set('ham')
{'s', 'p'}
>>> set('ham') - set('spam')
{'h'}
To get all unique elements, disregarding the order in which you ask, you can use symmetric_difference
>>> set('spam').symmetric_difference(set('ham'))
{'s', 'h', 'p'}
There are two different operators:
Set difference. This is defined as the elements of A not present in B, and is written as A - B or A.difference(B).
Symmetric set difference. This is defined as the elements of either set not present in the other set, and is written as A ^ B or A.symmetric_difference(B).
Your code is using the former, whereas you seem to be expecting the latter.
The set difference is the set of all characters in the first set that are not in the second set. 'p' and 's' appear in the first set but not in the second, so they are in the set difference. 'h' does not appear in the first set, so it is not in the set difference (regardless of whether or not it is in the first set).
You can also obtain the desired result as:
>>> (set('spam') | set('ham')) - (set('spam') & set('ham'))
set(['p', 's', 'h'])
Create union using | and intersection using & and then do the set difference, i.e. differences between all elements and common elements.

Can I reliably use the indexes of list generated by python dictionary key method?

Let's say I have this dictionary:
mydict = {'1': ['a', 'b', 'c'],
'2': ['d', 'e', 'f'],
'3': ['g', 'h', 'i'],
'4': ['j', 'k', 'l'],
'5': ['m', 'n', 'o']}
According to the Python documentation,
The keys() method of a dictionary object returns a list of all the
keys used in the dictionary, in arbitrary order
When I call mydict.keys() method, it will give me a list of keys in the mydict without any particular order like this:
['1', '3', '2', '5', '4']
My Question is does the key list(above) generated by .keys() method has the same order everytime I call it? I have tried it using for loop like this:
for i in range(100):
print mydict.keys()
It seems to me that the resulting list has always the same order. But I just want to confirm that there is no hidden case that will change the order of output list.
In other word, if i use mydict.keys()[0] , will I get the same item every time?
You should never rely on ordering when using a dict. Even though you test it 1 billion times and it looks homogeneous, there's nothing in the specification that states that it is, and thus there can be some hidden code that says "every full moon that begins on an even day, change the order".
If you want to rely on ordering, then use OrderedDict from collections.
My Question is does the key list(above) generated by .keys() method
has the same order everytime I call it?
If there are no intervening modifications to the dictionary, then the order would be the same.
Quoting from the documentation:
If items(), keys(), values(), iteritems(), iterkeys(), and
itervalues() are called with no intervening modifications to the
dictionary, the lists will directly correspond. This allows the
creation of (value, key) pairs using zip(): pairs = zip(d.values(), d.keys()). The same relationship holds for the iterkeys() and
itervalues() methods: pairs = zip(d.itervalues(), d.iterkeys())
provides the same value for pairs. Another way to create the same list
is pairs = [(v, k) for (k, v) in d.iteritems()].
The short answer is no, you cannot rely on the order.
The reason is that the keys are hashed and placed in a table accordingly. The size of the table is adjusted up whenever the dictionary is 2/3 full (I think this is the number) in order to avoid too many collisions and maintain an O(1) access time, and adjusted down (I think when it is 1/3rd full), to manage memory utilization.
Because it is a hash table, the sequence in which the dictionary is constructed will affect the ordering of the keys.
The hash function may change in future versions...
If you need to use the keys in a reliable order, you could look into collections OrderedDict, this might be what you are looking for. [edit: I just noticed you are using python 2.7, OrderedDict is only available in Python 3.+]

Categories

Resources