Remove duplicate data from an array in python

Remove duplicate data from an array in python - python

I have this array of data
data = [20001202.05, 20001202.05, 20001202.50, 20001215.75, 20021215.75]
I remove the duplicate data with list(set(data)), which gives me
data = [20001202.05, 20001202.50, 20001215.75, 20021215.75]
But I would like to remove the duplicate data, based on the numbers before the "period"; for instance, if there is 20001202.05 and 20001202.50, I want to keep one of them in my array.

As you don't care about the order of the items you keep, you could do:
>>> {int(d):d for d in data}.values()
[20001202.5, 20021215.75, 20001215.75]
If you would like to keep the lowest item, I can't think of a one-liner.
Here is a basic example for anybody who would like to add a condition on the key or value to keep.
seen = set()
result = []
for item in sorted(data):
key = int(item) # or whatever condition
if key not in seen:
result.append(item)
seen.add(key)

Generically, with python 3.7+, because dictionaries maintain order, you can do this, even when order matters:
data = {d:None for d in data}.keys()
However for OP's original problem, OP wants to de-dup based on the integer value, not the raw number, so see the top voted answer. But generically, this will work to remove true duplicates.

data1 = [20001202.05, 20001202.05, 20001202.50, 20001215.75, 20021215.75]
for i in data1:
if i not in ls:
ls.append(i)
print ls

Related

Python3 match, reverse match and dedupe

The intention of the code below is to process the two dictionaries and add matching symbol values from each dictionary to the pairs list if the value contains the item in cur but not if the value contains either item in the curpair list.
I'm successful with the value matching cur but I can't figure out how to do the reverse match against the items in curpair. Also, a secondary issue is that it seems to create duplicates, likely because of the additional for loop to compare against the items in curpair. Either way, I'm not sure if there's a way to dedupe in-line or if that needs to be another routine.
I'm sure there may be a way to do all of this, and simplify the code at the same time, with list comprehension, but maybe not. My trying to understand list comprehension only results in reassuring me that my Python experience is far too brief to be able to make sense of that yet :)
Grateful for any insights.
cur='EUR'
curpair=['BUSD', 'USDT']
def get_pairs(tickers):
pairs = []
for entry in tickers:
if cur in entry['symbol']:
for cp in curpair:
if cp not in entry['symbol']:
pairs.append(entry['symbol'])
return pairs
# d1 and d2 # https://pastebin.com/NfNAeqD4
spot_pairs_list = get_pairs(d1)
margin_pairs_list = get_pairs(d2)
print(f"from d1: {spot_pairs_list}")
print(f"from d2: {margin_pairs_list}")
Output:
from d1: ['BTCEUR', 'BTCEUR', 'ETHEUR', 'ETHEUR', 'BNBEUR', 'BNBEUR', 'XRPEUR', 'XRPEUR', 'EURBUSD', 'EURUSDT', 'SXPEUR', 'SXPEUR', 'LINKEUR', 'LINKEUR', 'DOTEUR', 'DOTEUR', 'LTCEUR', 'LTCEUR', 'ADAEUR', 'ADAEUR', 'BCHEUR', 'BCHEUR', 'YFIEUR', 'YFIEUR', 'XLMEUR', 'XLMEUR', 'GRTEUR', 'GRTEUR', 'EOSEUR', 'EOSEUR', 'DOGEEUR', 'DOGEEUR', 'EGLDEUR', 'EGLDEUR', 'AVAXEUR', 'AVAXEUR', 'UNIEUR', 'UNIEUR', 'CHZEUR', 'CHZEUR', 'ENJEUR', 'ENJEUR', 'MATICEUR', 'MATICEUR', 'LUNAEUR', 'LUNAEUR', 'THETAEUR', 'THETAEUR', 'BTTEUR', 'BTTEUR', 'HOTEUR', 'HOTEUR', 'WINEUR', 'WINEUR', 'VETEUR', 'VETEUR', 'WRXEUR', 'WRXEUR', 'TRXEUR', 'TRXEUR', 'SHIBEUR', 'SHIBEUR', 'ETCEUR', 'ETCEUR', 'SOLEUR', 'SOLEUR', 'ICPEUR', 'ICPEUR']
from d2: ['ADAEUR', 'ADAEUR', 'BCHEUR', 'BCHEUR', 'BNBEUR', 'BNBEUR', 'BTCEUR', 'BTCEUR', 'DOTEUR', 'DOTEUR', 'ETHEUR', 'ETHEUR', 'EURBUSD', 'EURUSDT', 'LINKEUR', 'LINKEUR', 'LTCEUR', 'LTCEUR', 'SXPEUR', 'SXPEUR', 'XLMEUR', 'XLMEUR', 'XRPEUR', 'XRPEUR', 'YFIEUR', 'YFIEUR']

The problem with double values can easily be solved by using set instead of list.
As for the other problem, this loop isn't doing the right thing:
for cp in curpair:
if cp not in entry['symbol']:
pairs.append(entry['symbol'])
This will append the symbol to the list if any of the elements in curpair is missing. For example, if the first cp is not in symbol, it's accepted even if the second element is in symbol. But it seems that you want to include only symbols that include none of the elements in curpair.
In other words, you only want to append if cp in symbol is False for all cp.
This, indeed, can easily be done with list comprehensions:
def get_pairs(tickers):
pairs = set() # set instead of list
for entry in tickers:
symbol = entry['symbol']
if cur in symbol and not any([cp in symbol for cp in curpair]):
pairs.add(symbol) # note it's 'add' for sets, not append
return pairs
[cp in symbol for cp in curpair] is the same as this (deliberately verbose) loop:
cp_check = []
for cp in curpair:
if cp in curpair:
cp_check.append(True)
else:
cp_check.append(False)
So you will get a list of True and False values. any() returns True
if any of the list elements are True, i.e., it basically does the opposite
of what you want. Hence we need to reverse its truth value with not, which will give you True if all of the list elements are False, exactly what we need.

Python Remove Duplicate Dict

I am trying to find a way to remove duplicates from a dict list. I don't have to test the entire object contents because the "name" value in a given object is enough to identify duplication (i.e., duplicate name = duplicate object). My current attempt is this;
newResultArray = []
for i in range(0, len(resultArray)):
for j in range(0, len(resultArray)):
if(i != j):
keyI = resultArray[i]['name']
keyJ = resultArray[j]['name']
if(keyI != keyJ):
newResultArray.append(resultArray[i])
, which is wildly incorrect. Grateful for any suggestions. Thank you.

If name is unique, you should just use a dictionary to store your inner dictionaries, with name being the key. Then you won't even have the issue of duplicates, and you can remove from the list in O(1) time.
Since I don't have access to the code that populates resultArray, I'll simply show how you can convert it into a dictionary in linear time. Although the best option would be to use a dictionary instead of resultArray in the first place, if possible.
new_dictionary = {}
for item in resultArray:
new_dictionary[item['name']] = item
If you must have a list in the end, then you can convert back into a dictionary as such:
new_list = [v for k,v in new_dictionary.items()]

Since "name" provides uniqueness... and assuming "name" is a hashable object, you can build an intermediate dictionary keyed by "name". Any like-named dicts will simply overwrite their predecessor in the dict, giving you a list of unique dictionaries.
tmpDict = {result["name"]:result for result in resultArray}
newArray = list(tmpDict.values())
del tmpDict
You could shrink that down to
newArray = list({result["name"]:result for result in resultArray}.values())
which may be a bit obscure.

iter through the dict store the key value and iter again to look for similar word in dict and delete form dict eg(Light1on,Light1off) in Python

[I had problem on how to iter through dict to find a pair of similar words and output it then the delete from dict]
My intention is to generate a random output label then store it into dictionary then iter through the dictionary and store the first key in the list or some sort then iter through the dictionary to search for similar key eg Light1on and Light1off has Light1 in it and get the value for both of the key to store into a table in its respective columns.
such as
Dict = {Light1on,Light2on,Light1off...}
store value equal to Light1on the iter through the dictionary to get eg Light1 off then store its Light1on:value1 and Light1off:value2 into a table or DF with columns name: On:value1 off:value2
As I dont know how to insert the code as code i can only provide the image sry for the trouble,its my first time asking question here thx.
from collections import defaultdict
import difflib, random
olist = []
input = 10
olist1 = ['Light1on','Light2on','Fan1on','Kettle1on','Heater1on']
olist2 = ['Light2off','Kettle1off','Light1off','Fan1off','Heater1off']
events = list(range(input + 1))
for i in range(len(olist1)):
output1 = random.choice(olist1)
print(output1,'1')
olist1.remove(output1)
output2 = random.choice(olist2)
print(output2,'2')
olist2.remove(output2)
olist.append(output1)
olist.append(output2)
print(olist,'3')
outputList = {olist[i]:events[i] for i in range(10)}
print (str(outputList),'4')
# Iterating through the keys finding a pair match
for s in range(5):
for i in outputList:
if i == list(outputList)[0]:
skeys = difflib.get_close_matches(i, outputList, n=2, cutoff=0.75)
print(skeys,'5')
del outputList[skeys]
# Modified Dictionary
difflib.get_close_matches('anlmal', ['car', 'animal', 'house', 'animaltion'])
['animal']
Updated: I was unable to delete the pair of similar from the list(Dictionary) after founding par in the dictionary

You're probably getting an error about a dictionary changing size during iteration. That's because you're deleting keys from a dictionary you're iterating over, and Python doesn't like that:
d = {1:2, 3:4}
for i in d:
del d[i]
That will throw:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
To work around that, one solution is to store a list of the keys you want to delete, then delete all those keys after you've finished iterating:
keys_to_delete = []
d = {1:2, 3:4}
for i in d:
if i%2 == 1:
keys_to_delete.append(i)
for i in keys_to_delete:
del d[i]
Ta-da! Same effect, but this way avoids the error.
Also, your code above doesn't call the difflib.get_close_matches function properly. You can use print(help(difflib.get_close_matches)) to see how you are meant to call that function. You need to provide a second argument that indicates the items to which you wish to compare your first argument for possible matches.
All of that said, I have a feeling that you can accomplish your fundamental goals much more simply. If you spend a few minutes describing what you're really trying to do (this shouldn't involve any references to data types, it should just involve a description of your data and your goals), then I bet someone on this site can help you solve that problem much more simply!

How to add to python dictionary without replacing

the current code I have is category1[name]=(number) however if the same name comes up the value in the dictionary is replaced by the new number how would I make it so instead of the value being replaced the original value is kept and the new value is also added, giving the key two values now, thanks.

You would have to make the dictionary point to lists instead of numbers, for example if you had two numbers for category cat1:
categories["cat1"] = [21, 78]
To make sure you add the new numbers to the list rather than replacing them, check it's in there first before adding it:
cat_val = # Some value
if cat_key in categories:
categories[cat_key].append(cat_val)
else:
# Initialise it to a list containing one item
categories[cat_key] = [cat_val]
To access the values, you simply use categories[cat_key] which would return [12] if there was one key with the value 12, and [12, 95] if there were two values for that key.
Note that if you don't want to store duplicate keys you can use a set rather than a list:
cat_val = # Some value
if cat_key in categories:
categories[cat_key].add(cat_val)
else:
# Initialise it to a set containing one item
categories[cat_key] = set(cat_val)

a key only has one value, you would need to make the value a tuple or list etc
If you know you are going to have multiple values for a key then i suggest you make the values capable of handling this when they are created

It's a little hard to understand your question.
I think you want this:
>>> d[key] = [4]
>>> d[key].append(5)
>>> d[key]
[4, 5]

Depending on what you expect, you could check if name - a key in your dictionary - already exists. If so, you might be able to change its current value to a list, containing both the previous and the new value.
I didn't test this, but maybe you want something like this:
mydict = {'key_1' : 'value_1', 'key_2' : 'value_2'}
another_key = 'key_2'
another_value = 'value_3'
if another_key in mydict.keys():
# another_key does already exist in mydict
mydict[another_key] = [mydict[another_key], another_value]
else:
# another_key doesn't exist in mydict
mydict[another_key] = another_value
Be careful when doing this more than one time! If it could happen that you want to store more than two values, you might want to add another check - to see if mydict[another_key] already is a list. If so, use .append() to add the third, fourth, ... value to it.
Otherwise you would get a collection of nested lists.

You can create a dictionary in which you map a key to a list of values, in which you would want to append a new value to the lists of values stored at each key.
d = dict([])
d["name"] = 1
x = d["name"]
d["name"] = [1] + x

I guess this is the easiest way:
category1 = {}
category1['firstKey'] = [7]
category1['firstKey'] += [9]
category1['firstKey']
should give you:
[7, 9]
So, just use lists of numbers instead of numbers.

how to print dict values based on key containing delimiters

Actually i have a dict
x1={'b;0':'A1;B2;C3','b;1':'aa1;aa2;aa3','a;1': 'a1;a2;a3', 'a;0': 'A;B;C'}
Actually here my convention is 'a;0','b;0' will contain tags and 'a;1','b;1' will have corresponding values, based on this i have to group and print.
From this dict what output i want is
<a> #this is group name
<A>a1</A> # this are tags n values
<B>a2</B>
<C>a3</C>
</a>
<b>
<A1>aa1</A1>
<B2>aa2</B2>
<C1>aa3</C1>
</b>
This is the sample dict which i given like this many groups may come like c;0:.... d;0.....
I am using code like
a=[]
b=[]
c=[]
d=[]
e=[]
for k,v in x1.iteritems():
if k.split(";").count('0')==1: # i am using this bcoz a;0,b;0 contains tag so i am checking if they contain zero split it.
a=k.split(";") # this contains a=['a','0','b','0']
b=v.split(";") # this contains 'a;0','b;0' values
else:
c=v.split(";") # this contains 'a;1','b;1' values
for i in range(0,len(b)):
d=b[i]
e=c[i]
print "<%s>%s<%s>"%(c,e,c)
Actually this code is working only 50% when single group is their in
dict('a;1': 'a1;a2;a3', 'a;0': 'A;B;C') and when multiple groups r their in
dict ('b;0':'A1;B2;C3','b;1':'aa1;aa2;aa3','a;1': 'a1;a2;a3', 'a;0': 'A;B;C')
in both cases it prints
aa1
aa2
aa3
its printing only recent value not all values

Be aware: dictionaries have no order. So the iteritems() loop does not necessarily start with 'b;0'. Try for example
for k,v in x1.iteritems():
print k
to see. On my computer it gives
a;1
a;0
b;0
b;1
This gives a problem since your code assumes the keys to come in the order they appear in the definition of x1 [edit: or rather that they come in order]. You can e.g. iterate over sorted keys instead:
for k in sorted(x1.keys()):
v = x1[k]
print k, v
Then the problem with the order is solved. But I think you have more problems in your code.
Edit: Data structures:
it might be better to store your data in some way like
x1 = {'a': [('A','a1'),('B','a2'),('C','a3')], 'b': ... }
if you cannot change the format, this is how you could convert your data:
x1f = {}
for k in x1.iterkeys():
tag, id = k.split(';')
if int(id) == 0:
x1f[tag] = zip(x1[k].split(';'), x1[tag+';'+'1'].split(';'))
print x1f
From there it should be easier to convert to the desired output.
And depending if you want extend the complexity of the output in future,
you might want to consider using pyxml:
from xml.dom import minidom
doc = minidom.Document()
then you can use the createElement and appendChild methods.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove duplicate data from an array in python - python

data1 = [20001202.05, 20001202.05, 20001202.50, 20001215.75, 20021215.75] for i in data1: if i not in ls: ls.append(i) print ls

Related

Python3 match, reverse match and dedupe

Python Remove Duplicate Dict

iter through the dict store the key value and iter again to look for similar word in dict and delete form dict eg(Light1on,Light1off) in Python

How to add to python dictionary without replacing

how to print dict values based on key containing delimiters

Categories

Resources