I am working on a project and I need to write a function to generate ID's for every client in our the company. There's an existing list of clients already and some of them have a numerical 5 to 6 digits ID ranging from 40,000 to 200,000. There are other existing clients that do not have an ID and I would like to keep consistency with the already existing ID numbers (e.g. 43606 or 125490).
So in order to keep a similar format I created an Exclusion_List that contains all of the existing ID numbers. Then I was going to write a function using np.random.uniform(low=40000, high=200000) so that generates a number within that range that would look similar to the other ID numbers.
The problem that I have is that I don't know how to set a loop to check if the randomly generated ID is already in the exclusion list and if so; to generate a new one then.
This is what I have so far:
exclusions = [43606,125490,...]
def ID_Generator(new_clients): # This is a list of new client
new_client_IDs = []
for client in new_clients:
ID = int(np.random.uniform(low=40000, high=200000))
while ID not in exclusions:
new_client_IDs.append(ID)
I am not sure how to handle the scenario when the randomly generated number is in the exclusion list. I would love the function to output a dataframe containing the client names in one column and the ID number in a second column.
Appreciate any help on this!
Similar answer to Niranjan, but no list comprehension needed,
import numpy as np
import pandas as pd
exlcusion_list = [43606,125490]
free_ids = np.arange(40000, 200000)
free_ids = free_ids[~np.isin(free_ids, exlcusion_list)]
def get_ids(client_names):
new_client_ids = np.random.choice(free_ids, len(client_names), replace=False)
return pd.DataFrame(data=new_client_ids, index=client_names, columns=["id"])
print(get_ids(["Bob", "Fred", "Max"]))
which gives
id
Bob 125205
Fred 185058
Max 86158
Simple approach I can think as of now is.
Generate a list from 40000-200000.
Remove all the exclusions from the above list.
Randomly pick any id from the remaining list (In case order matters, use ids sequentially).
import random
exclusions = [43606,125490]
all = range(40000,200000)
remaining = [x for x in all if x not in exclusions]
random.choice(remaining)
exclusions = [43606,125490,...]
def ID_Generator(new_clients): # This is a list of new client
new_client_IDs = []
while len(new_client_IDs) < len(new_clients):
ID = randint(40000, 200000)
if ID not in exclusions:
new_client_IDs.append(ID)
if list(dict.fromkeys(new_client_IDs)):
new_client_IDs = list(dict.fromkeys(new_client_IDs))
Related
I've got a list name users, and I want a second list named account_no to take only the first part of the contents of users.
users = [
'GB1520*Adam Johnson*07293105657',
'ZA5584*Dean Davids*07945671883'
]
account_no = []
def find_accountno():
for index in range(len(users)):
# I want to take the first 6 characters of users[index]
acc = users[index: 6]
account_no.append(acc)
print(users)
find_accountno()
print(account_no)
And this is the desired output:
['GB1520', 'ZA5584']
But, instead, I'm getting this:
[['GB1520*Adam Johnson*07293105657', 'ZA5584*Dean Davids*07945671883'], ['ZA5584*Dean Davids*07945671883']]
You should read a bit more about slicing; you'll see that it doesn't work the way you think it does.
You wrote:
acc = users[index: 6]
This is saying "take every element in users from index index to index 6 (including index, not including 6), form a new list from them, and put them in acc".
For example:
l = [0,1,2]
b = l[0:2]
Would have the list [0,1] inside b.
If what you want is to grab the first six characters of users[index], then you simply want users[index][0:6] (so users[index] is the string you wish to slice; then [0:6] employs slicing as described above to only grab the first 6 elements: 0 to 5). You can also drop the 0 (so [:6]).
Some extras:
Another two solutions, just to show you some fun alternatives (these use what's known as list comprehension):
def find_accountno_alt1():
numbers = [user[:6] for user in users]
account_no.extend(numbers)
def find_accountno_alt2():
numbers = [user.split('*')[0] for user in users]
account_no.extend(numbers)
Another point: I'd personally recommend simply passing the list (account_no) as a parameter to make the method neater and more self-contained.
In your code, you need to use acc=users[index][:6].
users = ['GB1520*Adam Johnson*07293105657', 'ZA5584*Dean Davids*07945671883']
account_no = []
def find_accountno():
for index in range(len(users)):
acc = users[index][:6] #I want to take the first 6 characters of users[index]
account_no.append(acc)
#print(users)
find_accountno()
print(account_no)
As for the multiple output, you are getting that because you are also printing the users list.
I suggest you to split the strings by "*" char and take only the first part (your account id)
account_no = [user.split("*")[0] for user in users]
EDIT:
full code for your task
users = ['GB1520*Adam Johnson*07293105657', 'ZA5584*Dean Davids*07945671883']
account_no = [user.split("*")[0] for user in users]
print(users)
print(account_no)
Here is an alternative and more Pythonic way to write your code and get the results you want:
def find_account_numbers(users):
return [user[:6] for user in users]
users = [
'GB1520*Adam Johnson*07293105657',
'ZA5584*Dean Davids*07945671883'
]
account_numbers = find_account_numbers(users)
print(account_numbers)
The code snippet above will result in the following output:
['GB1520', 'ZA5584']
So I have a csv file which contains several data like this
1,8dac2b,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
2,668d39,aeqok,furniture,phone1,9759243157894736,jp,50.201.125.84,jmqlhflrzwuay9c
3,622r49,arqek,vehicle,phone2,9759544365415694736,az,53.001.135.54,weqlhrerreuert6f
4,6444t43,rrdwk,vehicle,phone9,9759543263245434353,au,54.241.234.64,weqqyqtqwrtert6f
and I'm tryna use this this function def popvote(list) to return the most popular thing in auction which in the example is vehicle
so I want my function to return what's the most popular thing in the 4th row.. which is vehicle in this case
This is what I have so far
def popvote(list):
for x in list:
g = list(x)
return list.sort(g.sort)
However, this doesn't really work.. what should I change to make sure this works??
Note: The answer should be returned as a set
Edit: so I'm trying to return the value that is repeated most in the list based on what's indicated in (** xxxx **) below
1,8dac2b,ewmzr,**jewelry**,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
2,668d39,aeqok,**furniture**,phone1,9759243157894736,jp,50.201.125.84,jmqlhflrzwuay9c
3,622r49,arqek,**vehicle**,phone2,9759544365415694736,az,53.001.135.54,weqlhrerreuert6f
4,6444t43,rrdwk,**vehicle**,phone9,9759543263245434353,au,54.241.234.64,weqqyqtqwrtert6f
So in this case, vehicle should be the output.
import pandas as pd
df = pd.read_csv("filename.csv")
most_common = df[df.columns[3]].value_counts().idxmax()
Any questions? Down in the comments.
An alternative solution could be (assuming you have your records as list of lists):
from statistics import mode
mode(list(zip(*your_csv))[3]) # item type is listed as 4th argument
I have produced a set of matching IDs from a database collection that looks like this:
{ObjectId('5feafffbb4cf9e627842b1d9'), ObjectId('5feaffcfb4cf9e627842b1d8'), ObjectId('5feb247f1bb7a1297060342e')}
Each ObjectId represents an ID on a collection in the DB.
I got that list by doing this: (which incidentally I also think I am doing wrong, but I don't yet know another way)
# Find all question IDs
question_list = list(mongo.db.questions.find())
all_questions = []
for x in question_list:
all_questions.append(x["_id"])
# Find all con IDs that match the question IDs
con_id = list(mongo.db.cons.find())
con_id_match = []
for y in con_id:
con_id_match.append(y["question_id"])
matches = set(con_id_match).intersection(all_questions)
print("matches", matches)
print("all_questions", all_questions)
print("con_id_match", con_id_match)
And that brings up all the IDs that are associated with a match such as the three at the top of this post. I will show what each print prints at the bottom of this post.
Now I want to get each ObjectId separately as a variable so I can search for these in the collection.
mongo.db.cons.find_one({"con": matches})
Where matches (will probably need to be a new variable) will be one of each ObjectId's that match the DB reference.
So, how do I separate the ObjectId in the matches so I get one at a time being iterated. I tried a for loop but it threw an error and I guess I am writing it wrong for a set. Thanks for the help.
Print Statements:
**matches** {ObjectId('5feafffbb4cf9e627842b1d9'), ObjectId('5feaffcfb4cf9e627842b1d8'), ObjectId('5feb247f1bb7a1297060342e')}
**all_questions** [ObjectId('5feafb52ae1b389f59423a91'), ObjectId('5feafb64ae1b389f59423a92'), ObjectId('5feaffcfb4cf9e627842b1d8'), ObjectId('5feafffbb4cf9e627842b1d9'), ObjectId('5feb247f1bb7a1297060342e'), ObjectId('6009b6e42b74a187c02ba9d7'), ObjectId('6010822e08050e32c64f2975'), ObjectId('601d125b3c4d9705f3a9720d')]
**con_id_match** [ObjectId('5feb247f1bb7a1297060342e'), ObjectId('5feafffbb4cf9e627842b1d9'), ObjectId('5feaffcfb4cf9e627842b1d8')]
Usually you can just use find method that yields documents one-by-one. And you can filter documents during iterating with python like that:
# fetch only ids
question_ids = {question['_id'] for question in mongo.db.questions.find({}, {'_id': 1})}
matches = []
for con in mongo.db.cons.find():
con_id = con['question_id']
if con_id in question_ids:
matches.append(con_id)
# you can process matched and loaded con here
print(matches)
If you have huge amount of data you can take a look to aggregation framework
I'm stuck on the following problem:
I have a list with a ton of duplicative data. This includes entry numbers and names.
The following gives me a list of unique (non duplicative) names of people from the Data2014 table:
tablequery = c.execute("SELECT * FROM Data2014")
tablequery_results = list(people2014)
people2014_count = len(tablequery_results)
people2014_list = []
for i in tablequery_results:
if i[1] not in people2014_list:
people2014_list.append(i[1])
people2014_count = len(people2014_list)
# for i in people2014_list:
# print(i)
Now that I have a list of people. I need to iterate through tablequery_results again, however, this time I need to find the number of unique entry numbers each person has. There are tons of duplicates in the tablequery_results list. Without creating a block of code for each individual person's name, is there a way to iterate through tablequery_results using the names from people2014_list as the unique identifier? I can replicate the code from above to give me a list of unique entry numbers, but I can't seem to match the names with the unique entry numbers.
Please let me know if that does not make sense.
Thanks in advance!
I discovered my answer after delving into SQL a bit more. This gives me a list with two columns. The person's name in the first column, and then the numbers of entries that person has in the second column.
def people_data():
data_fetch = c.execute("SELECT person, COUNT(*) AS `NUM` FROM Data2014 WHERE ACTION='UPDATED' GROUP BY Person ORDER BY NUM DESC")
people_field_results = list(data_fetch)
people_field_results_count = len(people_field_results)
for i in people_field_results:
print(i)
print(people_field_results_count)
I have a Couch DB with followers and friends ids of a single twitter user. Friends are identified under the group “friend_edges” and followers under “follower_edges”.
I am trying to find ids of those who are both followers and friends (at the same time) of that user.
In order to do that, I was requested to convert lists of followers and friends into sets, and then use the intersection operation between sets-- like set1.intersection(set.2)
Below is my code. It returns the only 2 values of friends who are also followers. Since the dataset has almost 2,000 ids, I’m positive this value is wrong.
Can someone tell me what is wrong with my code?… I appreciate your guidance but, although there are many ways program these tasks, I do need to use the Sets and .intersection, so please try and help me using those only... =)
from twitter_login import oauth_login
from twitter_DB import load_from_DB
from sets import Set
def friends_and_followers(users):
#open a lists for friends and another for followers
friends_list, followers_list = [], []
#find the users id under the label "friend_edges"
if id in users["friend_edges"] :
#loop in the "friend edges" group and find id's values
for value in id:
#add value to the list of friends
friends_list += value
#put the rest of the ids under the followers' list
else:
followers_list += value
return friends_list, followers_list
print friends_list, followers_list
#convert list of friends into a set
flist= set(friends_list)
#convert list of friends into a set
follwlist= set(followers_list)
if __name__ == '__main__':
twitter_api = oauth_login()
# check couchdb to look at this database
DBname = 'users-thatguy-+-only'
# load all the tweets
ff_results = load_from_DB(DBname)
#show number loaded
print 'number loaded', len(ff_results)
#iterate over values in the file
for user_id in ff_results:
#run the function over the values
both_friends_followers = friends_and_followers(user_id)
print "Friends and Followers of that guy: ", len(both_friends_followers)
The reason you get a length of two is because you return this:
return friends_list, followers_list
Which is a tuple of two lists, then take the length of that tuple, which is two.
I managed to convert from dictionary to set by extracting the values and adding those to a list using list.append(), as follows:
if 'friend_edges' in doc.keys():
flist = []
for x in doc['friend_edges']:
flist.append(x)