how to remove both duplicate entries in python pandas

how to remove both duplicate entries in python pandas - python

I am simply trying to remove both entries if one is duplicate...for example if the array is
(9,1,2,2,3,4)...I need to output (9,1,3,4)
Most of the pandas methods like drop_duplicates() keep either the top or bottom entry. My data has always double duplicates and even number of elements always!
So example (1,4,6,7,3,3,0,0) output should be 1,4,6,7

import collections
a = (1,4,6,7,3,3,0,0)
a = [x for x,y in collections.Counter(a).items() if y == 1]

You have tuples there not arrays (lists in python use []). If you want to keep it as tuples you could do:
# if you need a new tuple
a = (1,4,6,7,3,3,0,0)
b = ()
for i in a:
if a.count(i) == 1:
b = b + (i,)
print b # (1, 4, 6, 7)
You could even do it in one line:
# if you want to replace the original tuple
a = tuple(i for i in a if a.count(i) == 1)
print a # (1, 4, 6, 7)

Related

Parallel iteration through multiple lists with indexing on the iterated item

I'm trying to execute a formula on multiple items in parallel which are taken from different lists.
First thing my lists are the following:
l1= [1,2,3]
l2= [4,5,6]
l3= [7,8,9]
Next i need to create new set of lists from my main lists in order to compare each pair of elements inside the list:
newl1 = list(zip(l1, l1[1:])) #outputs [(1,2), (2,3)]
newl2 = list(zip(l2, l2[1:])) #outputs [(4,5), (5,6)]
newl3 = list(zip(l3, l3[1:])) #outputs [(7,8), (8,9)]
Now i want to iterate through all the new set of lists in parallel, with the ability to compare the tuple lists elements:
for pair_of_newl1, pair_of_newl2, pair_of_newl3 in newl1, newl2, newl3:
if pair_of_newl1[0] > pair_of_newl1[1]:
x = pair_of_newl1[0] + pair_of_newl2[1] + pair_of_newl3[1]
print (x)
elif pair_of_newl1[0] < pair_of_newl1[1]:
x = pair_of_newl1[0] - pair_of_newl2[1] - pair_of_newl3[1]
print (x)
Expecting that in first iteration:
pair_of_newl1 = (1,2)
pair_of_newl2 = (4,5)
pair_of_newl3 = (7,8)
Thus being able to compare the items inside them by indexing.
I'm getting the following error:
ValueError: not enough values to unpack (expected 3, got 2)
I was confused, and deleted the last list leaving only two lists to work with:
l1= [1,2,3]
l2= [4,5,6]
newl1 = list(zip(l1, l1[1:])) #outputs [(1,2), (2,3)]
newl2 = list(zip(l2, l2[1:])) #outputs [(4,5), (5,6)]
for pair_of_newl1, pair_of_newl2 in newl1, newl2:
if pair_of_newl1[0] > pair_of_newl1[1]:
x = pair_of_newl1[0] + pair_of_newl2[1]
print (x)
elif pair_of_newl1[0] < pair_of_newl1[1]:
x = pair_of_newl1[0] - pair_of_newl2[1]
print (x)
print (pair_of_newl1, pair_of_newl2) #just to see how the loop works
And i'm getting:
-1
(1, 2) (2, 3)
-1
(4, 5) (5, 6)
So by my understanding pair_of_newl2 is taken as the second item of newl1, but why not newl2 ?
Help please.

You forget to add zip to iterate them:
for pair_of_newl1, pair_of_newl2 in zip(newl1, newl2):
And the result:
-4
(1, 2) (4, 5)
-4
(2, 3) (5, 6)

The error you get is because the line: for pair_of_newl1, pair_of_newl2, pair_of_newl3 in newl1, newl2, newl3 tries to iterate through the three lists, extracting three elements from each. But the newl-lists only have two elements, each. That won't work.
If you want to loop through parallel lists, you may either zip them (which would make the indexing a little more complicated in your case) or you may use an index iterator:
for i in range(len(newl1)): # For index 0 and 1, each of the list has
if newl1[i][0] > newl1[i][1]:
x = newl1[i][0] + newl2[i][1] + newl2[i][1]
print(x)
elif newl1[i][0] > newl1[i][1]:
x = newl1[i][0] - newl2[i][1] - newl2[i][1]
print (x)
This will loop through the two tuples of the three lists and either add the second element of the second and third list to the first element of the first or subtract them.
Is this what you wanted to do?

Is there an efficient way to find the shared value of two list elements, provided each has to contain another specified value?

This problem is a bit difficult to concisely explain in a single question line, so I'll start off by giving my code and saying that the printed value is the correct result:
valin = 4
valout = 3
gdict = {
(0,3): 0,
(1,3): 1,
(2,3): 2,
(0,4): 3,
(4,3): 4,
(0,5): 5,
(5,4): 6,
(4,6): 7,
(6,3): 8,
}
keys = list(gdict)
nin = [x for x in keys if x[0]==valin]
nout = [x for x in keys if x[1]==valout]
shared_val_from_vals = [x[1] for x in nin for y in nout if x[1]==y[0]][0]
print(shared_val_from_vals)
___________________________
6
I have two values: valin and valout. I'm looking for
one key where the 0th element equals valin and
another key where the 1st element equals valout and
the 1st element of the first key equals the 0th element of the second key
Then I will be using this shared value for another part of my code.
Although my code gets the job done, this code will be used with large dictionaries, so I would like to optimize these operations if I can. Is there a more pythonic or generally more concise way to accomplish this result?

Loop over the dictionary keys, creating sets of the tuple elements that match each criteria. Then intersect the two sets.
set_in = {}
set_out = {}
for in_var, out_var in gdict:
if in_var == valin:
set_out.append(out_var)
if out_var == valout:
set_in.append(in_var)
shared_val = set_in.intersect(set_out).pop()

You can only compact it by using a single for loop:
nin_out = [x for x in keys if x[0]==valin or x[1]==valout]
shared_val_from_vals = [x[1] for x in nin_out for y in nin_out if x[1]==y[0]][0]

How can I figure out which arbitrary number occurs twice in a list of integers from input? (Python)

Say I'm receiving a list of arbitrary numbers from input, like
[1,2,3,4,5,6,7,8,8,9,10]
My code doesn't know what numbers these are going to be before it receives the list, and I want to return the number that appears twice automatically. How do I go about doing so?
Thank you.

You could do:
input = [1,2,3,4,5,6,7,8,8,9,10]
list_of_duplicates = []
for i in input:
if i not in list_of_duplicates:
list_of_duplicates.append(i)
input.pop(i)
print(input)
Now input will have all the numbers that were in the list multiple times.

You can use Counter By defualt Method in python 2 and 3
from collections import Counter
lst=[1,2,3,4,5,6,7,8,8,9,10]
items=[k for k,v in Counter(lst).items() if v==2]
print(items)

Hope this helps.
input = [1,2,3,4,5,6,7,8,8,9,10]
unique = set(input)
twice = []
for item in unique:
if input.count(item) == 2:
twice.append(item)

I've created something monstrous that does it in one line because my brain likes to think when it's time for bed I guess?
This will return a list of all duplicate values given a list of integers.
dupes = list(set(map(lambda x: x if inputList.count(x) >= 2 else None, inputList))-set([None]))
How does it work? The map() function applies a function every value of a list, in your case our input list with possible duplicates is called "inputList". It then applies a lambda function that returns the value of the integer being iterated over IF the iterated value when applied to the inputList via the .count() method is greater than or equal to two, else if it doesn't count as a duplicate it will return None. With this lambda function being applied by the map function, we get a list back that contains a bunch of None's and the actual integers detected as duplicates via the lambda function. Given that this is a list, we the use set to de-duplicate it. We then minus the set of duplicates against a static set made from a list with one item of None, stripping None values from our set of the map returned list. Finally we take the set after subtraction and convert it to a list called "dupes" for nice and easy use.
Example usage...
inputList = [1, 2, 3, 4, 4, 4, 5, 6, 6, 7, 1001, 1002, 1002, 99999, 100000, 1000001, 1000001]
dupes = list(set(map(lambda x: x if inputList.count(x) >= 2 else None, inputList))-set([None]))
print(dupes)
[1000001, 1002, 4, 6]
I'll let someone else elaborate on potential scope concerns..... or other concerns......

This will create a list of the numbers that are duplicated.
x = [1, 2, 3, 4, 5, 6, 7, 8, 8, 9, 10]
s = {}
duplicates = []
for n in x:
try:
if s[n]:
duplicates.append(n)
s[n] = False
except KeyError:
s[n] = True
print(duplicates)
Assuming the list doesn't contain 0

Can I use python slicing to access one "column" of a nested tuple?

I have a nested tuple that is basically a 2D table (returned from a MySQL query). Can I use slicing to get a list or tuple of one "column" of the table?
For example:
t = ((1,2,3),(3,4,5),(1,4,5),(9,8,7))
x = 6
How do I efficiently check whether x appears in the 3rd position of any of the tuples?
All the examples of slicing I can find only operate within a single tuple. I don't want to slice a "row" out of t. I want to slice it the other way -- vertically.

Your best bet here is to use a generator expression with the any() function:
if any(row[2] == x for row in t):
# x appears in the third row of at least one tuple, do something
As far as using slicing to just get a column, here are a couple of options:
Using zip():
>>> zip(*t)[2]
(3, 5, 5, 7)
Using a list comprehension:
>>> [row[2] for row in t]
[3, 5, 5, 7]

I'll chime in with the numpy solution
import numpy
t = ((1,2,3),(3,4,5),(1,4,5),(9,8,7))
x = 6
col_id = 2
a = numpy.array(t)
print a[a[:,col_id] == x]

Getting one value from a tuple

Is there a way to get one value from a tuple in Python using expressions?
def tup():
return (3, "hello")
i = 5 + tup() # I want to add just the three
I know I can do this:
(j, _) = tup()
i = 5 + j
But that would add a few dozen lines to my function, doubling its length.

You can write
i = 5 + tup()[0]
Tuples can be indexed just like lists.
The main difference between tuples and lists is that tuples are immutable - you can't set the elements of a tuple to different values, or add or remove elements like you can from a list. But other than that, in most situations, they work pretty much the same.

For anyone in the future looking for an answer, I would like to give a much clearer answer to the question.
# for making a tuple
my_tuple = (89, 32)
my_tuple_with_more_values = (1, 2, 3, 4, 5, 6)
# to concatenate tuples
another_tuple = my_tuple + my_tuple_with_more_values
print(another_tuple)
# (89, 32, 1, 2, 3, 4, 5, 6)
# getting a value from a tuple is similar to a list
first_val = my_tuple[0]
second_val = my_tuple[1]
# if you have a function called my_tuple_fun that returns a tuple,
# you might want to do this
my_tuple_fun()[0]
my_tuple_fun()[1]
# or this
v1, v2 = my_tuple_fun()
Hope this clears things up further for those that need it.

General
Single elements of a tuple a can be accessed -in an indexed array-like fashion-
via a[0], a[1], ... depending on the number of elements in the tuple.
Example
If your tuple is a=(3,"a")
a[0] yields 3,
a[1] yields "a"
Concrete answer to question
def tup():
return (3, "hello")
tup() returns a 2-tuple.
In order to "solve"
i = 5 + tup() # I want to add just the three
you select the 3 by:
tup()[0] # first element
so all together:
i = 5 + tup()[0]
Alternatives
Go with namedtuple that allows you to access tuple elements by name (and by index). Details are at https://docs.python.org/3/library/collections.html#collections.namedtuple
>>> import collections
>>> MyTuple=collections.namedtuple("MyTuple", "mynumber, mystring")
>>> m = MyTuple(3, "hello")
>>> m[0]
3
>>> m.mynumber
3
>>> m[1]
'hello'
>>> m.mystring
'hello'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to remove both duplicate entries in python pandas - python

import collections a = (1,4,6,7,3,3,0,0) a = [x for x,y in collections.Counter(a).items() if y == 1]

Related

Parallel iteration through multiple lists with indexing on the iterated item

Is there an efficient way to find the shared value of two list elements, provided each has to contain another specified value?

How can I figure out which arbitrary number occurs twice in a list of integers from input? (Python)

Can I use python slicing to access one "column" of a nested tuple?

Getting one value from a tuple

Categories

Resources