Remove duplicates from user input - python

I want to ignore any duplicate entry given by user as input. I have below code :
def pITEMName():
global ITEMList,fITEMList
pITEMList = []
fITEMList = []
ITEMList = str(raw_input('Enter pipe separated list of ITEMS : ')).upper().strip()
items = ITEMList.split("|")
count = len(items)
print 'Total Distint ITEM Count : ', count
pipelst = [i.replace('-mc','').replace('-MC','').replace('$','').replace('^','') for i in ITEMList.split('|')]
filepath = '/location/data.txt'
f = open(filepath, 'r')
for lns in f:
split_pipe = lns.split(':', 1)
if split_pipe[0] in pipelst:
index = pipelst.index(split_pipe[0])
pITEMList=split_pipe[0]+"|"
fITEMList.append(pITEMList)
del pipelst[index]
for lns in pipelst:
print bcolors.red + lns,' is wrong ITEM Name' + bcolors.ENDC
f.close()
When I execute above code it prompts me for user input as :
Enter pipe separated list of items :
And if I provide the input as :
Enter pipe separated list of items : AAA|IFA|AAA
After pressing enter I am getting the result as :
Enter pipe separated list of Items : AAA|IFA|AAA
Total Distint Item Count : 3
AAA is wrong Item Name
Items Belonging to other Centers :
Other Centers :
Item Count From Other Center = 0
Items Belonging to Current Centers :
Active Items in US1:
^IFA$
Active Items in US2 :
^AAA$|^AAA$
Ignored Item Count From Current Center = 0
You Have Entered ItemList belonging to this Center as:
^IFA$|^AAA$|^AAA$
Active Item Count : 3
Do You Want To Continue [YES|Y|NO|N] :
In above result you must be noticing that I have mentioned AAA entry twice so its counting as wrong Item. I want as duplicate entry to be ignored. Here I want to ignore the case sensitive condition also. Means If I give AAA|aaa|ifa, one 'aaa' should get ignored.
Please help me that how I can implement this.

First, you're doing ITEMList.split("|") several times. You should just use your already calculated items.
Second, you probably want:
items = set(ITEMList.lower().split("|"))
This way you get a set with unique, all lowercase elements.
I assume this doesn't matter since you can discard either uppercase or lowercase.

If item order is not important, then a set will do this very well.
items = set(ITEMList.split("|"))

Lots of great answers here; throwing my hat into the ring as well. One straightforward way to do this:
items = list(set(ITEMList.split("|")))
items.sort()
This preserves your items object as a list and orders it (which is something you may or may not prefer in this case).
If you decide later that you want to return an element of your items list in your code, you will be able to do it by referring to the list index (this functionality doesn't exist with sets).
If you want to preserve the value of the variable count, you could implement the code as:
items = ITEMList.split("|")
count = len(items)
items = list(set(ITEMList.split("|")))
items.sort()
You will also want to adjust this line:
pipelst = [i.replace('-mc','').replace('MC','').replace('$','').replace('^','') for i in ITEMList.split('|')]
to this:
pipelst = [i.replace('-mc','').replace('MC','').replace('$','').replace('^','') for i in items]

if order is important
my_list = "^IFA$|^AAA$|^AAA$"
"|".join(collections.Counter(my_list.upper().split("|")).keys())
is one way to do it

Related

How can I add two or more elements to a list from a single input in Python?

What my script is doing now is adding elements to a list. For example, if the user types "JO", I will add "John" to the list. What I want to do now is that, if the user types "2 JO", I add two elements to the list: "John" and "John".
This is how the database looks like now:
Sample database copy.png
This is the code now:
import pandas
data = pandas.read_excel("Sample database copy.xlsx")
name = dict(zip(data["Abbreviation"],data["Name"]))
list1 = []
incoming_msg = input(Please type what you want to add: )
list1.append(name[incoming_msg])
I need to do it all from the same input, I cannot ask separately for quantity and abbreviation. I wanted to know if there is any library that can do this somehow easily because I am a beginner coder. If there is no library but you have any idea how I could solve it, it would be awesome as well.
Thank you so much in advance!
you can use string.split() to split the string by space into a list then use the first element to multiply a list that contains the value from the dictionary and increment it to the result list. see the code
name = dict(zip(data["Abbreviation"],data["Name"]))
list1 = []
incoming_msg = input('Please type what you want to add: ')
incoming_msg = incoming_msg.split() # split the string by space
if len(incoming_msg) == 2: # if there are two elements in the list (number and name)
list1 += [name[incoming_msg[1]]] * int(incoming_msg[0])
else:
list1.append(name[incoming_msg[0]])

Finding average value of list elements, if a condition is met using Python?

I have a list that has the following format:
mylist = ["Joe, 100%", "Joe, 80%", "Joe, 90%", "Sally, 95%", "Sally, 80%", "Jimmy, 90%", ...]
What I am trying to do is, first count the number of times each name appears. If a name appears 2 or more times, append that name along with the average percent. So, I'm trying to get to the following output:
newlist = ["Joe, 90%", "Sally, 87.5%"]
To try this, I did mylist.split(", ") to get the names only, and used Counter() to find how many times the name appears. Then, I used a simple if >= 2 statement to append the name to newlist if the name appears 2 or more times.
However, despite trying many different things, but I wasn't able to figure out how to get the percentages back with the name in the final list. I also am unsure how to word my question on Google, so I wasn't able to find any help. Does anyone know how to do this? If this question is a duplicate, please let me know (and provide a link so I can learn), and I can delete this question. Thanks!
You can try this:
from collections import defaultdict
counts = defaultdict(int)
percents = defaultdict(int)
for item in mylist:
name, percent = item.split(',')
percent = int(percent.lstrip().rstrip('%'))
percents[name]+=percent
counts[name]+=1
result = []
for k,v in counts.items():
if v > 1:
result.append(f"{k}, {percents[k]/v}%")
print(result)
Output
['Joe, 90.0%', 'Sally, 87.5%']
I would recommend you create a dictionary of the scores, where the key would be the name and the value would be a list of their scores. This snippet shows how you can achieve that:
mydict = {}
for item in mylist:
name, score = item.split(", ") # splits each item into a score and a name
score = float(score.replace("%", "")) # converts string score to a float
if mydict[name]: # checks if the name already exists in the dictionary
mydict[name].append(score)
else:
mydict[name] = [score]
This would would leave you with a dictionary of scores that is organized by their name. Now all you would have to do is average the scores in the dictionary:
newlist = []
for name in mydict:
if len(mydict[name]) >= 2:
average = str(sum(mydict[name]))/len(mydict[name])) + "%"
straverage = name + ", " + average
newlist.append(straverage)

Comparing the elements of a list with themselves

I have lists of items:
['MRS_103_005_010_BG_001_v001',
'MRS_103_005_010_BG_001_v002',
'MRS_103_005_010_FG_001_v001',
'MRS_103_005_010_FG_001_v002',
'MRS_103_005_010_FG_001_v003',
'MRS_103_005_020_BG_001_v001',
'MRS_103_005_020_BG_001_v002',
'MRS_103_005_020_BG_001_v003']
I need to identify the latest version of each item and store it to a new list. Having trouble with my logic.
Based on how this has been built I believe I need to first compare the indices to each other. If I find a match I then check to see which number is greater.
I figured I first needed to do a check to see if the folder names matched between the current index and the next index. I did this by making two variables, 0 and 1, to represent the index so I could do a staggered incremental comparison of the list on itself. If the two indices matched I then needed to check the vXXX number on the end. whichever one was the highest would be appended to the new list.
I suspect that the problem lies in one copy of the list getting to an empty index before the other one does but I'm unsure of how to compensate for that.
Again, I am not a programmer by trade. Any help would be appreciated! Thank you.
# Preparing variables for filtering the folders
versions = foundVerList
verAmountTotal = len(foundVerList)
verIndex = 0
verNextIndex = 1
highestVerCount = 1
filteredVersions = []
# Filtering, this will find the latest version of each folder and store to a list
while verIndex < verAmountTotal:
try:
nextVer = (versions[verIndex])
nextVerCompare = (versions[verNextIndex])
except IndexError:
verNextIndex -= 1
if nextVer[0:24] == nextVerCompare[0:24]:
if nextVer[-3:] < nextVerCompare [-3:]:
filteredVersions.append(nextVerCompare)
else:
filteredVersions.append(nextVer)
verIndex += 1
verNextIndex += 1
My expected output is:
print filteredVersions
['MRS_103_005_010_BG_001_v002', 'MRS_103_005_010_FG_001_v003']
['MRS_103_005_020_BG_001_v003']
The actual output is:
print filteredVersions
['MRS_103_005_010_BG_001_v002', 'MRS_103_005_010_FG_001_v002',
'MRS_103_005_010_FG_001_v003']
['MRS_103_005_020_BG_001_v002', 'MRS_103_005_020_BG_001_v003']
During the with loop I am using os.list on each folder referenced via verIndex. I believe the problem is that a list is being generated for every folder that is searched but I want all the searches to be combined in a single list which will THEN go through the groupby and sorted actions.
Seems like a case for itertools.groupby:
from itertools import groupby
grouped = groupby(data, key=lambda version: version.rsplit('_', 1)[0])
result = [sorted(group, reverse=True)[0] for key, group in grouped]
print(result)
Output:
['MRS_103_005_010_BG_001_v002',
'MRS_103_005_010_FG_001_v003',
'MRS_103_005_020_BG_001_v003']
This groups the entries by everything before the last underscore, which I understand to be the "item code".
Then, it sorts each group in reverse order. The elements of each group differ only by the version, so the entry with the highest version number will be first.
Lastly, it extracts the first entry from each group, and puts it back into a result list.
Try this:
text = """MRS_103_005_010_BG_001_v001
MRS_103_005_010_BG_001_v002
MRS_103_005_010_FG_001_v001
MRS_103_005_010_FG_001_v002
MRS_103_005_010_FG_001_v003
MRS_103_005_020_BG_001_v001
MRS_103_005_020_BG_001_v002
MRS_103_005_020_BG_001_v003
"""
result = {}
versions = text.splitlines()
for item in versions:
v = item.split('_')
num = int(v.pop()[1:])
name = item[:-3]
if result.get(name, 0) < num:
result[name] = num
filteredVersions = [k + str(v) for k, v in result.items()]
print(filteredVersions)
output:
['MRS_103_005_010_BG_001_v2', 'MRS_103_005_010_FG_001_v3', 'MRS_103_005_020_BG_001_v3']

Python Aggregation without PANDAS

I have a sorted and nested list. Each element in the list has 3 sub-elements; 'Drugname', Doctor_id, Amount. For a given drugname (which repeats) the doctor ids are different and so are the amounts. See sample list below..
I need an output where, for each drugname, I need to count the total UNIQUE doctor ids and the sum of the dollar amount for that drug. For ex, for the list snippet below..
[
['CIPROFLOXACIN HCL', 1801093968, 61.49],
['CIPROFLOXACIN HCL', 1588763981, 445.23],
['HYDROCODONE-ACETAMINOPHEN', 1801093968, 251.52],
['HYDROCODONE-ACETAMINOPHEN', 1588763981, 263.16],
['HYDROXYZINE HCL', 1952310666, 945.5],
['IBUPROFEN', 1801093968, 67.06],
['INVEGA SUSTENNA', 1952310666, 75345.68]
]
The desired output is as below.
[
['CIPROFLOXACIN HCL', 2, 516.72],
['HYDROCODONE-ACETAMINOPHEN', 2, 514.68]
['HYDROXYZINE HCL', 1, 945.5]
['IBUPROFEN', 1, 67.06]
['INVEGA SUSTENNA', 1, 75345.68]
]
In a database world this is the easiest thing with a simple GROUP BY on drugname. In Python, I am not allowed to use PANDAS, NumPy etc. Just the basic building blocks of Python. I tried the below code but I am unable to reset the count variable to count doctor ids and amounts. This commented code is one of several attempts. Not sure if I need to use a nested for loop or a for loop-while loop combo.
All help is appreciated!
aggr_list = []
temp_drug_name = ''
doc_count = 0
amount = 0
for list_element in sorted_new_list:
temp_drug_name = list_element[0]
if temp_drug_name == list_element[0]:
amount += float(amount)
doc_count += 1
aggr_list.append([temp_drug_name, doc_count, amount])
print(aggr_list)
Since the list is already sorted you can simply iterate through the list (named l in the example below) and keep track of the name of the last iteration, and if the name of the current iteration is different from the last, insert a new entry to the output. Use a set to keep track of the doctor IDs already seen for the current drug, and only increment the the second item of the last entry of the output by 1 if the doctor ID is not seen. And increment the third item of the last entry of the output by the amount of the current iteration:
output = []
last = None
for name, id, amount in l:
if name != last:
output.append([name, 0, 0])
last = name
ids = set()
if id not in ids:
output[-1][1] += 1
ids.add(id)
output[-1][2] += amount
output becomes:
[['CIPROFLOXACIN HCL', 2, 506.72],
['HYDROCODONE-ACETAMINOPHEN', 2, 514.6800000000001],
['HYDROXYZINE HCL', 1, 945.5],
['IBUPROFEN', 1, 67.06],
['INVEGA SUSTENNA', 1, 75345.68]]
Note that decimal floating points are approximated in the binary system that the computer uses (please read Is floating point math broken?), so some minor errors are inevitable as seen in the sum of the second entry above.
Here is a solution with a focus on readability, it doesn't consider that the entries in your original list are sorted by drug name.
It does one pass on all the entries of your data , then a pass on the number of unique drugs.
To do only a single pass on all the entries of your sorted data, see #blhsing solution
from collections import defaultdict, namedtuple
Entry = namedtuple('Entry',['doctors', 'prices'])
processed_data = defaultdict(lambda: Entry(doctors=set(), prices=[]))
for entry in data:
drug_name, doctor_id, price = entry
processed_data[drug_name].doctors.add(doctor_id)
processed_data[drug_name].prices.append(price)
stat_list = [[drug_name, len(entry.doctors), sum(entry.prices)] for drug_name, entry in processed_data.items()]
Without Pandas or defaultdict:
d={}
for row in l:
if row[0] in d:
d[row[0]].append(row[1])
d[row[0]].append(row[2])
else:
d[row[0]]=[row[1]]
d[row[0]].append(row[2])
return [[key, len(set(val[0::2])), sum(val[1::2])] for key, val in d.items()]
Reusable solution, meant for those who arrive here trough Google:
def group_by(rows, key):
m = {}
for row in rows:
k = key(row)
try:
m[k].append(row)
except KeyError:
m[k] = [row]
return m.values()
grouped_by_drug = group_by(data, key=lambda row: row[0])
result = [
(
drug_rows[0][0],
len(drug_rows),
sum(row[2] for row in drug_rows)
)
for drug_rows in grouped_by_drug
]
You can also use defaultdict in this implementation, which for my use case is slightly faster.

Python Code line by line meaning

I have got a code and need to get the line by line meaning of this python code.
marksheet = []
for i in range(0,int(input())):
marksheet.append([raw_input(), float(input())])
second_highest = sorted(list(set([marks for name, marks in marksheet])))[1]
print('\n'.join([a for a,b in sorted(marksheet) if b == second_highest]))
I highly recommend you to go through the python tutorial
Just for your understanding of this code, I've added the comments.
#initialising an empty list!
marksheet = []
#iterating through a for loop starting from zero, to some user input(default type string) - that is converted to int
for i in range(0,int(input())):
#appending user input(some string) and another user input(a float value) as a list to marksheet
marksheet.append([raw_input(), float(input())])
#[marks for name, marks in marksheet] - get all marks from list
#set([marks for name, marks in marksheet]) - getting unique marks
#list(set([marks for name, marks in marksheet])) - converting it back to list
#sorting the result in decending order with reverse=True and getting the value as first index which would be the second largest.
second_highest = sorted(list(set([marks for name, marks in marksheet])),reverse=True)[1]
#printing the name and mark of student that has the second largest mark by iterating through the sorted list.
#If the condition matches, the result list is appended to tuple -`[a for a,b in sorted(marksheet) if b == second_highest])`
#now join the list with \n - newline to print name and mark of student with second largest mark
print('\n'.join([a for a,b in sorted(marksheet) if b == second_highest]))
Hope it helps!
Would do this in a comment but I don't have 50 reputation yet:
You don't need to use sorted on second_highest but apparently it is not a good habit to rely on this so you can keep the sorted. Calling sorted on an already sorted list doesn't use a lot of resources anyway.
second_highest = sorted(list(set([marks for name, marks in marksheet])))[1]
Also if the list contains something like [1,3,2,5,3,2,1] it will give 2 as result and not 1 since a set removes all duplicates.
If you want to keep duplicates use:
second_highest = sorted([marks for name, marks in marksheet]))[1]

Categories

Resources