Remove duplicates from a list of tuples containing floats

Remove duplicates from a list of tuples containing floats - python

I have a list of size 2 tuples which have floats in them. Some of the floats are nearly equal and are close enough to be considered equal. numpy isclose() can be used with good effect here. I need to remove the duplicates in the list while always retaining the first value.
import numpy as np
data=zip(C1,C2)
comparray=[]
eval1=np.isclose(data[0],data[1])
comparray.append(eval1[0])
i=0
while i<(len(data)-1):
eval=np.isclose(data[i],data[i+1])
print eval
comparray.append(eval[0])
i+=1
l1=[a for a,b in zip(data,comparray) if not b]
I have this code which does what I need, but it seems really poor. Is there a more pythonic way of doing this?
Thanks for the help.

If I understood correctly you can do
out=[ a for a,b in zip(data,data[1:]) if not np.isclose(a,b) ]
but I can't really test this, as you didn't provide any input/output examples.

Are you familiar with the structure called a "Set"?
Sets are a collection of unordered unique elements. I believe this structure would save you a lot of overhead and be a much better fit based on your description.
https://docs.python.org/2/library/sets.html

You can use a function like this
def nearly_equal(a,b,sig_fig=2):
return ( a==b or
int(a*10**sig_fig) == int(b*10**sig_fig)
)
>>>print nearly_equal(3.456,3.457)
True

Related

Convert for loop into list comprehension with assignment?

I am trying to convert a for loop with an assignment into a list comprehension.
More precisely I am trying to only replace one element from a list with three indexes.
Can it be done?
for i in range(len(data)):
data[i][0] = data[i][0].replace('+00:00','Z').replace(' ','T')
Best

If you really, really want to convert it to a list comprehension, you could try something like this, assuming the sub-lists have three elements, as you stated in the questions:
new_data = [[a.replace('+00:00','Z').replace(' ','T'), b, c] for (a, b, c) in data]
Note that this does not modify the existing list, but creates a new list, though. However, in this case I'd just stick with a regular for loop, which much better conveys what you are actually doing. Instead of iterating the indices, you could iterate the elements directly, though:
for x in data:
x[0] = x[0].replace('+00:00','Z').replace(' ','T')

I believe it could be done, but that's not the best way to do that.
First you would create a big Jones Complexity for a foreign reader of your code.
Second you would exceed preferred amount of chars on a line, which is 80. Which again will bring complexity problems for a reader.
Third is that list comprehension made to return things from comprehensing of lists, here you change your original list. Not the best practice as well.

List comprehension is useful when making lists. So, it is not recommended here. But still, you can try this simple solution -
print([ele[0].replace('+00:00','Z').replace(' ','T') for ele in data])

Although I don't recommend you use list-comprehension in this case, but if you really want to use it, here is a example.
It can handle different length of data, if you need it.
code:
data = [["1 +00:00",""],["2 +00:00","",""],["3 +00:00"]]
print([[i[0].replace('+00:00','Z').replace(' ','T'),*i[1:]] for i in data])
result:
[['1TZ', ''], ['2TZ', '', ''], ['3TZ']]

Check if list of numpy arrays are equal

I have a list of numpy arrays, and want to check if all the arrays are equal. What is the quickest way of doing this?
I am aware of the numpy.array_equal function (https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.array_equal.html), however as far as I am aware this only applies to two arrays and I want to check N arrays against each other.
I also found this answer to test all elements in a list: check if all elements in a list are identical.
However, when I try each method in the accepted answer I get an exception (ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all())
Thanks,

You could simply adapt a general iterator method for your array comparison
def all_equal(iterator):
try:
iterator = iter(iterator)
first = next(iterator)
return all(np.array_equal(first, rest) for rest in iterator)
except StopIteration:
return True
If this is not working, it means that your arrays are not equal.
Demo:
>>> i = [np.array([1,2,3]),np.array([1,2,3]),np.array([1,2,3])]
>>> print(all_equal(i))
True
>>> j = [np.array([1,2,4]),np.array([1,2,3]),np.array([1,2,3])]
>>> print(all_equal(j))
False

You can use np.array_equal() in a list comprehension to compare each array to the first one:
all([np.array_equal(list_of_arrays[0], arr) for arr in list_of_arrays])

If your arrays are of equal size, this solution using numpy_indexed (disclaimer: I am its author) should work and be very efficient:
import numpy_indexed as npi
npi.all_unique(list_of_arrays)

#jtr's answer is great, but I would like to suggest a slightly different alternative.
First of all, I think using array_equal is not a great idea, because you could have two arrays of floats and maybe you can end up having very small differences that you are willing to tolerate, but array_equal returns True if and only if the two arrays have the same shape and exact same elements. So let's use allclose instead, which allows to select the absolute and relative tolerances that meet your needs.
Then, I would use the built-in zip function, which makes the code more elegant.
Here is the code:
all([np.allclose(array, array_expected), for array, array_expected in zip(array_list, array_list_expected)])

I guess you can use the function unique.
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.unique.html#numpy.unique
if all sub-arrays in the array is the same, it should return only one item.
Here's better described how to use it.
Find unique rows in numpy.array

Flatten, remove duplicates, and sort a list of lists in python

From this answer I have a flattened list.
Now I want to remove duplicates and sort the list. Currently I have the following:
x = itertools.chain.from_iterable(results[env].values()) #From the linked answer
y = sorted(list(set(x)), key=lambda s:s.lower())
Is there a better way of accomplishing this? In my case x is of size ~32,000 and y ends up being of size ~1,100. What I have works, but I'd like to see if there's anything better (faster, more readable, etc)

Actually, if you just remove the list() which isn't needed, you've got a nice neat solution to your original problem. Your code is perfectly readable and efficient I think.
y = sorted(set(x), key=lambda s:s.lower())

Since results[env] is a dictionary you can create a set of union values instead of flattening the values then sort the result:
>>> sorted(set().union(*results[env].values()), key=str.lower)
Also note that you don't need a lambda function as your key, you can simple use str.lower method.

Python elementwise addition from a dict

I have a dictionary in the form.
dictName = {'Hepp': [-1,0,1], 'Fork': [-1,-1,-1], 'Dings': [0,0,1]}
and I basically want to pull out the values ( the lists )
and add them together elementwise and get a vector as a result, like
[-2,-1,1]
I am having a hard time figuring out how to code this, and all examples I have found for adding lists assumes that I can make it into tuples, but I might have to add like 100 lists together.
Can anyone of you guys help out?

You can use a list comprehension, and zip:
[sum(t) for t in zip(*dictName.itervalues())]

Check if all values of iterable are zero

Is there a good, succinct/built-in way to see if all the values in an iterable are zeros? Right now I am using all() with a little list comprehension, but (to me) it seems like there should be a more expressive method. I'd view this as somewhat equivalent to a memcmp() in C.
values = (0, 0, 0, 0, 0)
# Test if all items in values tuple are zero
if all([ v == 0 for v in values ]) :
print 'indeed they are'
I would expect a built-in function that does something like:
def allcmp(iter, value) :
for item in iter :
if item != value :
return False
return True
Does that function exist in python and I'm just blind, or should I just stick with my original version?
Update
I'm not suggesting that allcmp() is the solution. It is an example of what I think might be more meaningful. This isn't the place where I would suggest new built-ins for Python.
In my opinion, all() isn't that meaningful. It doesn't express what "all" is checking for. You could assume that all() takes an iterable, but it doesn't express what the function is looking for (an iterable of bools that tests all of them for True). What I'm asking for is some function like my allcmp() that takes two parameters: an iterable and a comparison value. I'm asking if there is a built-in function that does something similar to my made up allcmp().
I called mine allcmp() because of my C background and memcmp(), the name of my made up function is irrelevant here.

Use generators rather than lists in cases like that:
all(v == 0 for v in values)
Edit:
all is standard Python built-in. If you want to be efficient Python programmer you should know probably more than half of them (http://docs.python.org/library/functions.html). Arguing that alltrue is better name than all is like arguing that C while should be call whiletrue. Is subjective, but i think that most of the people prefer shorter names for built-ins. This is because you should know what they do anyway, and you have to type them a lot.
Using generators is better than using numpy because generators have more elegant syntax. numpy may be faster, but you will benefit only in rare cases (generators like showed are fast, you will benefit only if this code is bottleneck in your program).
You probably can't expect nothing more descriptive from Python.
PS. Here is code if you do this in memcpm style (I like all version more, but maybe you will like this one):
list(l) == [0] * len(l)

If you know that the iterable will contain only integers then you can just do this:
if not any(values):
# etc...

If values is a numpy array you can write
import numpy as np
values = np.array((0, 0, 0, 0, 0))
all(values == 0)

The any() function may be the most simple and easy way to achieve just that. If the iterable is empty,e.g. all elements are zero, it will return False.
values = (0, 0, 0, 0, 0)
print (any(values)) # return False

The built-in set is given an iterable and returns a collection (set) of unique values.
So it can be used here as:
set(it) == {0}
assuming it is the iterable
{0} is a set containing only zero
More info on python set-types-set-frozenset here in docs.

I prefer using negation:
all(not v for v in values)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove duplicates from a list of tuples containing floats - python

If I understood correctly you can do out=[ a for a,b in zip(data,data[1:]) if not np.isclose(a,b) ] but I can't really test this, as you didn't provide any input/output examples.

Are you familiar with the structure called a "Set"? Sets are a collection of unordered unique elements. I believe this structure would save you a lot of overhead and be a much better fit based on your description. https://docs.python.org/2/library/sets.html

You can use a function like this def nearly_equal(a,b,sig_fig=2): return ( a==b or int(a*10**sig_fig) == int(b*10**sig_fig) ) >>>print nearly_equal(3.456,3.457) True

Related

Convert for loop into list comprehension with assignment?

Check if list of numpy arrays are equal

Flatten, remove duplicates, and sort a list of lists in python

Python elementwise addition from a dict

Check if all values of iterable are zero

Categories

Resources