Forming the sublist using same indices - python

Hi I am trying to solve a problem where I have to return the indices in a sublist of the same person. When i say same person , I mean if they have the same username,phone or email(any one of them).
I understand that these identites are mostly unique but for the sake of questions lets assume.
eg.
data = [("username1","phone_number1", "email1"),
("usernameX","phone_number1", "emailX"),
("usernameZ","phone_numberZ", "email1Z"),
("usernameY","phone_numberY", "emailX"),
("username2","phone_number2", "emailX")]
Expected output :
[[0,1,3,4][2]]
Explaination: As 0,1 have the same phone and 3 and 4 have the same email so They all fall under one category. and 2 index falls in the other catoegry.
My approach until now is :
data = [("username1","phone_number1", "email1"),
("usernameX","phone_number1", "emailX"),
("usernameZ","phone_numberZ", "email1Z"),
("usernameY","phone_numberY", "emailX"),
]
def match(t1,t2):
if(t1[0] == t2[0] or t1[1] == t2[1] or t1[2] == t2[2]):
return True
else:
return False
# print(match(data[1],data[3]))
together = []
for i in range(len(data)):
temp = {i}
for j in range(len(data)):
if(match(data[i],data[j])):
temp.add(j)
together.append(temp)
for i in range(len(data)):
ans = together[i]
for j in range(i+1,len(data)):
if(bool(ans.intersection(together[j]))):
ans = ans.union(together[j])
print(ans)
I am not able to reach desired result.
Any help is appreciated. Thank you.

A first solution is similar to yours with some enhancements:
Leveraging any for the match, such that it doesn't require to know the number of items inside the tuples.
Checking if a user is already identified as part of "together" to skip useless comparison
Here it is:
together = set()
for user_idx, user in enumerate(data):
if user_idx in together:
continue # That user is already matched
# No need to check with previous users
for other_idx, other in enumerate(data[user_idx + 1 :], user_idx + 1):
# Match
if any(val_ref == val_other for val_ref, val_other in zip(user, other)):
together.update((user_idx, other_idx))
isolated = set(range(len(data))) ^ together
Another solution use tricks by going through a numpy array to identify isolated users. With numpy it is easy to compare a user to every other user (aka the original array). An isolated user will only match one time to itself on each of its fields, hence summing the boolean values along fields will return, for an isolated user, the length of the tuple of fields.
data = np.array(data)
# For each user, match it with the whole matrice
matches = sum(user == data for user in data)
# Isolated users only match with themselves, hence only have 1 on their line
isolated = set(np.where(np.sum(matches, axis=1) == data.shape[1])[0])
# Together are other users
together = set(range(len(data))) ^ set(isolated)
see the matches array for better understanding:
[[1 2 1]
[1 2 3]
[1 1 1]
[1 1 3]
[1 1 3]]
However, it is not leveraging any of the optimisation mentioned before.
Still, numpy is fast so it should be ok.

Related

How to create coordinates for nodes of graph

I have a list that has strings separated by commas. The values of each string are nothing but the navigation steps/action of the same procedure done by different users. I want to create coordinates for these steps/actions and store them for creating graph. Each unique steps/actions
will have one coordinate. My idea is I will consider a string with more steps first. I will assign them coordinates ranging from (1,0) to (n,0). Here first string will have 'y' as 0 saying all the actions will be in one layer. When i check for steps/actions in second string, if there are any missing ones i will assign them (1,1) to (n,1). So on... Care has to be taken that if first steps/actions of one string falls in between of another bigger string, the coordinates should be after that.
This sounds confusing, but in simple terms, i want to create coordinates for user flow of a website.
Assume list,
A = ['A___O___B___C___D___E___F___G___H___I___J___K___L___M___N',
'A___O___B___C___E___D___F___G___H___I___J___K___L___M___N',
'A___B___C___D___E___F___G___H___I___J___K___L___M___N',
'A___B___C___E___D___F___G___H___I___J___K___L___M___N',
'A___Q___C___D___E___F___G___H___I___J___K___L___M___N',
'E___P___F___G___H___I___J___K___L___M___N']
I started below code, but it is getting complicated. Any help is appreciated.
A1 = [i.split('___') for i in A]
# A1.sort(key=len, reverse=True)
A1 = sorted(A1, reverse=True)
if len(A1)>1:
Actions = {}
horizontalVal = {}
verticalVal = {}
restActions = []
for i in A1:
for j in i[1:]:
restActions.append(j)
for i in range (len(A1)):
if A1[i][0] not in restActions and A1[i][0] not in Actions.keys():
Actions[A1[i][0]] = [i,0]
horizontalVal[A1[i][0]] = i
verticalVal[A1[i][0]] = 0
unmarkedActions = []
for i in range(len(sortedLen)):
currLen = sortedLen[i]
for j in range(len(A1)):
if len(A1[j]) == currLen:
if j == 0:
for k in range(len(A1[j])):
currK = A1[j][k]
if currK not in Actions.keys():
Actions[currK] = [k,0]
horizontalVal[currK] = k
verticalVal[currK] = 0
else:
currHori = []
print(A1[j])
for k in range(len(A1[j])):
currK = A1[j][k]
.
. to be continued

Timeout Issue in Looping Through Hackerrank Example

can anyone explain why my code for a hacker rank example is timing out. I'm new to whole idea of efficiency of code based on processing time. The code seems to work on small sets, but once I start testing cases using large datasets it times out. I've provided a brief explanation of the method and its purpose for context. But if you could provide any tips if you notice functions I'm using that might consume a large amount of runtime that would be great.
Complete the migratoryBirds function below.
Params: arr: an array of tallies of species of birds sighted by index.
For example. arr = [Type1 = 1, Type2 = 4, Type3 = 4, Type4 = 4, Type5 = 5, Type6 = 3]
Return the lowest type of the the mode of sightings. In this case 4 sightings is the
mode. Type2 is the lowest type that has the mode. So return integer 2.
def migratoryBirds(arr):
# list of counts of occurrences of birds types with the same
# number of sightings
bird_count_mode = []
for i in range(1, len(arr) + 1):
occurr_count = arr.count(i)
bird_count_mode.append(occurr_count)
most_common_count = max(bird_count_mode)
common_count_index = bird_count_mode.index(most_common_count) + 1
# Find the first occurrence of that common_count_index in arr
# lowest_type_bird = arr.index(common_count_index) + 1
# Expect Input: [1,4,4,4,5,3]
# Expect Output: [1 0 1 3 1 0], 3, 4
return bird_count_mode, most_common_count, common_count_index
P.S. Thank you for the edit Chris Charley. I just tried to edit it at the same time
Use collections.Counter() to create a dictionary that maps species to their counts. Get the maximum count from this, then get all the species with that count. Then search the list for the first element of one of those species.
import collections
def migratoryBirds(arr):
species_counts = collections.Counter(arr)
most_common_count = max(species_counts.values())
most_common_species = {species for species, count in species_counts if count = most_common_count}
for i, species in arr:
if species in most_common_species:
return i

Python ( iteration problem ) with an exercice

The code :
import pandas as pd
import numpy as np
import csv
data = pd.read_csv("/content/NYC_temperature.csv", header=None,names = ['temperatures'])
np.cumsum(data['temperatures'])
printcounter = 0
list_30 = [15.22]#first temperature , i could have also added it by doing : list_30.append(i)[0] since it's every 30 values but doesn't append the first one :)
list_2 = [] #this is for the values of the subtraction (for the second iteration)
for i in data['temperatures']:
if (printcounter == 30):
list_30.append(i)
printcounter = 0
printcounter += 1
**for x in list_30:
substract = list_30[x] - list_30[x+1]**
list_2.append(substraction)
print(max(list_2))
Hey guys ! i'm really having trouble with the black part.
**for x in list_30:
substract = list_30[x] - list_30[x+1]**
I'm trying to iterate over the elements and sub stracting element x with the next element (x+1) but the following error pops out TypeError: 'float' object is not iterable. I have also tried to iterate using x instead of list_30[x] but then when I use next(x) I have another error.
for x in list_30: will iterate on list_30, and affect to x, the value of the item in the list, not the index in the list.
for your case you would prefer to loop on your list with indexes:
index = 0
while index < len(list_30):
substract = list_30[index] - list_30[index + 1]
edit: you will still have a problem when you will reach the last element of list_30 as there will be no element of list_30[laste_index + 1],
so you should probably stop before the end with while index < len(list_30) -1:
in case you want the index and the value, you can do:
for i, v in enumerate(list_30):
substract = v - list_30[i + 1]
but the first one look cleaner i my opinion
if you`re trying to find ifference btw two adjacent elements of an array (like differentiate it), you shoul probably use zip function
inp = [1, 2, 3, 4, 5]
delta = []
for x0,x1 in zip(inp, inp[1:]):
delta.append(x1-x0)
print(delta)
note that list of deltas will be one shorter than the input

More on dynamic programming

Two weeks ago I posted THIS question here about dynamic programming. User Andrea Corbellini answered precisely what I wanted, but I wanted to take the problem one more step further.
This is my function
def Opt(n):
if len(n) == 1:
return 0
else:
return sum(n) + min(Opt(n[:i]) + Opt(n[i:])
for i in range(1, len(n)))
Let's say you would call
Opt( [ 1,2,3,4,5 ] )
The previous question solved the problem of computing the optimal value. Now,
instead of the computing the optimum value 33 for the above example, I want to print the way we got to the most optimal solution (path to the optimal solution). So, I want to print the indices where the list got cut/divided to get to the optimal solution in the form of a list. So, the answer to the above example would be :
[ 3,2,1,4 ] ( Cut the pole/list at third marker/index, then after second index, then after first index and lastly at fourth index).
That is the answer should be in the form of a list. The first element of the list will be the index where the first cut/division of the list should happen in the optimal path. The second element will be the second cut/division of the list and so on.
There can also be a different solution:
[ 3,4,2,1 ]
They both would still lead you to the correct output. So, it doesn't matter which one you printed. But, I have no idea how to trace and print the optimal path taken by the Dynamic Programming solution.
By the way, I figured out a non-recursive solution to that problem that was solved in my previous question. But, I still can't figure out to print the path for the optimal solution. Here is the non-recursive code for the previous question, it might be helpful to solve the current problem.
def Opt(numbers):
prefix = [0]
for i in range(1,len(numbers)+1):
prefix.append(prefix[i-1]+numbers[i-1])
results = [[]]
for i in range(0,len(numbers)):
results[0].append(0)
for i in range(1,len(numbers)):
results.append([])
for j in range(0,len(numbers)):
results[i].append([])
for i in range(2,len(numbers)+1): # for all lenghts (of by 1)
for j in range(0,len(numbers)-i+1): # for all beginning
results[i-1][j] = results[0][j]+results[i-2][j+1]+prefix[j+i]-prefix[j]
for k in range(1,i-1): # for all splits
if results[k][j]+results[i-2-k][j+k+1]+prefix[j+i]-prefix[j] < results[i-1][j]:
results[i-1][j] = results[k][j]+results[i-2-k][j+k+1]+prefix[j+i]-prefix[j]
return results[len(numbers)-1][0]
Here is one way of printing the selected :
I used the recursive solution using memoization provided by #Andrea Corbellini in your previous question. This is shown below:
cache = {}
def Opt(n):
# tuple objects are hashable and can be put in the cache.
n = tuple(n)
if n in cache:
return cache[n]
if len(n) == 1:
result = 0
else:
result = sum(n) + min(Opt(n[:i]) + Opt(n[i:])
for i in range(1, len(n)))
cache[n] = result
return result
Now, we have the cache values for all the tuples including the selected ones.
Using this, we can print the selected tuples as shown below:
selectedList = []
def printSelected (n, low):
if len(n) == 1:
# No need to print because it's
# already printed at previous recursion level.
return
minVal = math.Inf
minTupleLeft = ()
minTupleRight = ()
splitI = 0
for i in range(1, len(n)):
tuple1ToI = tuple (n[:i])
tupleiToN = tuple (n[i:])
if (cache[tuple1ToI] + cache[tupleiToN]) < minVal:
minVal = cache[tuple1ToI] + cache[tupleiToN]
minTupleLeft = tuple1ToI
minTupleRight = tupleiToN
splitI = low + i
print minTupleLeft, minTupleRight, minVal
print splitI # OP just wants the split index 'i'.
selectedList.append(splitI) # or add to the list as requested by OP
printSelected (list(minTupleLeft), low)
printSelected (list(minTupleRight), splitI)
You call the above method like shown below:
printSelected (n, 0)

2d list not working

I am trying to create a 2D list, and I keep getting the same error "TypeError: list indices must be integers, not tuple" I do not understand why, or how to use a 2D list correctly.
Total = 0
server = xmlrpclib.Server(url);
mainview = server.download_list("", "main")
info = [[]]
info[0,0] = hostname
info[0,1] = time
info[0,2] = complete
info[0,3] = Errors
for t in mainview:
Total += 1
print server.d.get_hash(t)
info[Total, 0] = server.d.get_hash(t)
info[Total, 1] = server.d.get_name(t)
info[Total, 2] = server.d.complete(t)
info[Total, 3] = server.d.message(t)
if server.d.complete(t) == 1:
Complete += 1
else:
Incomplete += 1
if (str(server.d.message(t)).__len__() >= 3):
Error += 1
info[0,2] = Complete
info[0,3] = Error
everything works, except for trying to deal with info.
Your mistake is in accessing the 2D-list, modify:
info[0,0] = hostname
info[0,1] = time
info[0,2] = complete
info[0,3] = Errors
to:
info[0].append(hostname)
info[0].append(time)
info[0].append(complete)
info[0].append(Errors)
Same goes to info[Total, 0] and etc.
The way you created info, it is a list containing only one element, namely an empty list. When working with lists, you have to address the nested items like
info[0][0] = hostname
For initialization, you have to create a list of lists by e.g.
# create list of lists of 0, size is 10x10
info = [[0]*10 for i in range(10)]
When using numpy arrays, you can address the elements as you did.
One advantage of "lists of lists" is that not all entries of the "2D list" shall have the same data type!
info = [[] for i in range(4)] # create 4 empty lists inside a list
info[0][0].append(hostname)
info[0][1].append(time)
info[0][2].append(complete)
info[0][3].append(Errors)
You need to create the 2d array first.

Categories

Resources