Loop through entries in a list and create new list - python

I have a list of strings that looks like that
name=['Jack','Sam','Terry','Sam','Henry',.......]
I want to create a newlist with the logic shown below. I want to go to every entry in name and assign it a number if the entry is seen for the first time. If it is being repeated(as in the case with 'Sam') I want to assign it the corresponding number, include it in my newlist and continue.
newlist = []
name[1] = 'Jack'
Jack = 1
newlist = ['Jack']
name[2] = 'Sam'
Sam = 2
newlist = ['Jack','Sam']
name[3] = 'Terry'
Terry = 3
newlist = ['Jack','Sam','Terry']
name[4] = 'Sam'
Sam = 2
newlist = ['Jack','Sam','Terry','Sam']
name[5] = 'Henry'
Henry = 5
newlist = ['Jack','Sam','Terry','Sam','Henry']
I know this can be done with something like
u,index = np.unique(name,return_inverse=True)
but for me it is important to loop through the individual entries of the list name and keep the logic above. Can someone help me with this?

Try using a dict and checking if keys are already paired to a value:
name = ['Jack','Sam','Terry','Sam','Henry']
vals = {}
i = 0
for entry in name:
if entry not in vals:
vals[entry] = i + 1
i += 1
print vals
Result:
{'Henry': 5, 'Jack': 1, 'Sam': 2, 'Terry': 3}
Elements can be accessed by "index" (read: key) just like you would do for a list, except the "index" is whatever the key is; in this case, the keys are names.
>>> vals['Henry']
5
EDIT: If order is important, you can enter the items into the dict using the number as the key: in this way, you will know which owner is which based on their number:
name = ['Jack','Sam','Terry','Sam','Henry']
vals = {}
i = 0
for entry in name:
#Check if entry is a repeat
if entry not in name[0:i]:
vals[i + 1] = entry
i += 1
print (vals)
print (vals[5])
This code uses the order in which they appear as the key. To make sure we don't overwrite or create duplicates, it checks if the current name has appeared before in the list (anywhere from 0 up to i, the current index in the name list).
In this way, it is still in the "sorted order" which you want. Instead of accessing items by the name of the owner you simply index by their number. This will give you the order you desire from your example.
Result:
>>> vals
{1: 'Jack', 2: 'Sam', 3: 'Terry', 5: 'Henry'}
>>> vals[5]
'Henry'

If you really want to create variable.By using globals() I am creating global variable .If you want you can create local variable using locals()
Usage of globals()/locals() create a dictionary which is the look up table of the variable and their values by adding key and value you are creating a variable
lists1 = ['Jack','Sam','Terry','Sam','Henry']
var = globals()
for i,n in enumerate(nl,1):
if n not in var:
var[n] = [i]
print var
{'Jack':1,'Sam': 2,'Terry': 3, 'Henry':5}
print Jack
1

If order of the original list is key, may I suggest two data structures, a dictionary and a newlist
d = {}
newlist = []
for i,n in enumerate(nl):
if n not in d:
d[n] = [i+1]
newlist.append({n: d[n]})
newlist will return
[{'Jack': [1]}, {'Sam': [2]}, {'Terry': [3]}, {'Sam': [2]}, {'Henry': [5]}]
to walk it:
for names in newlist:
for k, v in names.iteritems():
print('{} is number {}'.format(k, v))
NOTE: This does not make it easy to lookup the number based on the name as other suggested above. That would require more data structure logic. This does however let you keep the original list order, but keep track of the time the name was first found essentially.

Edit: Since order is important to you. Use orderedDict() from the collections module.
Use a dictionary. Iterate over your list with a for loop and then check to see if you have the name in the dictionary with a if statement. enumerate gives you the index number of your name, but keep in mind that index number start from 0 so in accordance to your question we append 1 to the index number giving it the illusion that we begin indexing from 1
import collections
nl = ['Jack','Sam','Terry','Sam','Henry']
d = collections.OrderedDict()
for i,n in enumerate(nl):
if n not in d:
d[n] = [i+1]
print d
Output:
OrderedDict([('Jack', [1]), ('Sam', [2]), ('Terry', [3]), ('Henry', [5])])
EDIT:
The ordered dict is still a dictionary. So you can use .items() to get the key value pairs as tuples. the number is essectially a list so you can do this:
for i in d.items():
print '{} = {}'.format(i[0],i[1][0]) #where `i[1]` gives you the number at that position in the tuple, then the `[0]` gives you the first element of the list.
Output:
Jack = 1
Sam = 2
Terry = 3
Henry = 5

Related

Counting the most common element in a 2D List in Python

I am looking for a way to count occurency in a 2D List. For example I have a list like that:
[[John, 3],[Chris, 3],[Bryan,5],[John,3],[John,7]]
As an output I want to count which numbers does John have most common like that
Most common number for the John is: 3
I did it for all the names easily with
Counter(my_list[1]).most_common(5)
Does anyone have a suggestion to do it?
This should work.
from collections import Counter
main_list = [['John', 3],['Chris', 3],['Bryan',5],['John',3],['John',7]] #your_list
new_list = [i[1] for i in main_list if i[0]=='John']
print(Counter(new_list).most_common(1)[0][0])
I would probably re-shape the input data before doing queries on it. Maybe name vs values:
name_lookup = defaultdict(list)
for name, value in my_list:
name_lookup[name].append(value)
name = 'John'
most_common, _ = Counter(name_lookup[name]).most_common(1)[0]
print(f"Most common number for {name} is: {most_common}")
You could also do filtering and mapping:
my_list = [['John', 3],['Chris', 3],['Bryan',5],['John',3],['John',7]]
print(Counter(map(lambda y: y[1], filter(lambda x: x[0] == "John", my_list))).most_common(1))
If you are 100% sure that there always will be exactly 1 most-frequent element for each name, you migth sort by name, groupby by name and then use statistics.mode following way:
import itertools
import statistics
some_data = [['John', 3],['Chris', 3],['Bryan',5],['John',3],['John',7]]
sorted_data = sorted(some_data,key=lambda x:x[0]) # sort by name
most_frequent = {name:statistics.mode(j[-1] for j in list(data)) for name,data in itertools.groupby(sorted_data,key=lambda x:x[0])}
print(most_frequent) # {'Bryan': 5, 'Chris': 3, 'John': 3}
itertools.groupby returns pairs of name-data, but as datas itself contain both key (name in our case) and value (number in our case) we need comprehension to get "raw" values.

Dictionary initialization syntax

def __init__(self, devices, queue):
'''
'''
self.devices = devices
self.queue = queue
values = {k:0 for k in devices.keys()}
values[0xbeef] = len(values) # the number of devices
super(CallbackDataBlock, self).__init__(values)
Can someone help me explain the following two lines:
values = {k:0 for k in devices.keys()}
What does k:0 do?
values[0xbeef] = len(values) # the number of devices
Does this mean that new item {0xbeef: length} is appended in the dict?
The k is the field in the dictionary. The set of all fields is stored in the device.keys() which is most probably a list, we loop through the list, take names of fields and initialize them by zero.
Yes, you are right. The next statement is responsible for adding a new field and initializing it to the length of the array.
{k:0 for k in devices.keys()} creates a dictionary with all keys and 0 for all values. And your assessment is correct, it creates a new key with {value of 0xbeef : number of keys in the dictionary}
in python documentation you can see List Comprehensions
this pattern is important :
expression for item in list if conditional else
or for simple usage :
expression for item in list
in list data type we can use :
list = [0,1,2,3,4,5]
a = [x for x in list]
print (a)
printed :
[1,2,3,4,5]
and we have :
a = [x*2 for x in list]
print (a)
printed :
[2,4,6,8,10]
and for dictionary
in dictionary we have this syntax:
{key1:value1, key2:value2, . . .}
and example :
list = [0,1,2,3,4,5]
d = [k:0 for k in list]
print (d)
in example k:0 maens : put 0 for value of each k
printed :
{1: 0, 2: 0, 3: 0, 4: 0, 5: 0}
one more thing :
python dictionary have to Helpful method:dict.keys(),dict.values()
when we use dict.keys, python return a list of dict's keys
d = {"name":"sam", "job":"programer", "salary":"25000"}
print(d.keys())
print(d.values())
printed :
['name','job','salary']
['sam','programer','25000']
for add a new object in a dictionary we use :
d[newkey]= newValue
for example :
d[10] = 'mary'
print(d[10])
printed :
'mary'
now your answer :
in your code
1) k:0 maens : put 0 for value of each k
2) 0xbeef is a hex code == 48879 in decimal
values[48879] = len(values)
its fill by length of list.

Check if string in list of strings equals one value in list in dictionary, then assign key value to variable

Set-up
I am working on a scraping project with Scrapy.
I have a list of strings,
l = ['john', 'jim', 'jacob', … ,'jonas']
and a dictionary containing lists,
d = {
'names1': ['abel', 'jacob', 'poppy'],
'names2': ['john', 'tom', 'donald']
}
Problem
I'd like to check for each name in l wether it is in contained in one of the lists in d. If so, I want the key of that list in d to be assigned to a new variable. E.g. jacob would get newvar = names1.
For now I have,
found = False
for key,value in d.items():
for name in l:
if name == value:
newvar = key
found = True
break
else:
newvar = 'unknown'
if found:
break
However, this results in newvar = 'unknown' for every name in l.
I think I'm not breaking out of the outer for loop correctly. Other than that, is there perhaps a more elegant way to solve my problem?
I think the issue you are having is you want to see if any string in l exists in a value in d. Your code now is assuming that the value is a string, not a list, and you are thus trying to make a comparison, which will always be false, therefore assigning "unkown" to every value. Try this code below:
l = ['john', 'jim', 'jacob', 'jonas']
d = {'names1': ['abel', 'jacob', 'poppy'], 'names2': ['john', 'tom','donald']}
found = False
for a, b in d.items():
for i in l:
if i in b:
d[a] = "unkown"
flag = True
break
if flag:
break
print d
This will give you:
{'names2': 'unkown', 'names1': ['abel', 'jacob', 'poppy']}
Since you break out of the loop, only one key has an "unknown" value, but if you wanted to replace every value that contains a string in l with unkown, this will work as well:
final_dict = {a:"unkown" if i not in b else b for i in l for a, b in d.items()}
print final_dict

Python get keys from ordered dict

python noob here. So I'm making a program that will take a JSON file from a url and parse the information and put it into a database. I have the JSON working, thankfully, but now I am stuck, I'll explain it through my code.
playerDict = {
"zenyatta" : 0,
"mei" : 0,
"tracer" : 0,
"soldier76" : 0,
"ana" : 0,
...}
So this is my original dictionary with the which I then fill with the players data for each hero.
topHeroes = sorted(playerDict.items(),key = operator.itemgetter(1),reverse = True)
I then sort this list and it turns the heroes with the most amount of hours played first.
topHeroesDict = topHeroes[0:3]
playerDict['tophero'] = topHeroesDict[0]
I then get the top three heroes. The second line here prints out a list like so:
'secondhero': ('mercy', 6.0)
Whereas I want the output to be:
'secondhero': 'mercy'
Would appreciate any help i have tried the code below with and without list.
list(topHeroes.keys())[0]
So thanks in advance and apologies for the amount of code!
You could take an approach with enumerate, if instead of "firsthero" you are ok with "Top 1" and so on. With enumerate you can iterate over the list and keep track of the current index, which is used to name the key in this dictionary comprehension. j[0] is the name of the hero, which is the first element of the tuple.
topHeroes = sorted(playerDict.items(),key = operator.itemgetter(1),reverse = True)
topHeroesDict = {"Top "+str(i): j[0] for i, j in enumerate(topHeroes[0:3])}
Alternatively, you could use a dictionary which maps the index to first like this:
topHeroes = sorted(playerDict.items(),key = operator.itemgetter(1),reverse = True)
top = {0: "first", 1: "second", 2: "third"}
topHeroesDict = {top[i]+"hero": j[0] for i, j in enumerate(topHeroes[0:3])}
You do not need any imports to achieve this. Without itemgetter, you can do it in one line like this:
top = {0: "first", 1: "second", 2: "third"}
topHeroesDict = {top[i]+"hero": j[0] for i, j in enumerate(sorted([(i, playerDict[i]) for i in playerDict.keys()], key = lambda x: x[1], reverse = True)[0:3])}
You're sorting an iterable of tuples returned by the items method of the dict, so each item in the sorted list is a tuple containing the hero and their score.
You can avoid using sorted and dict.items altogether and get the leading heroes (without their score) by simply using collections.Counter and then getting the most_common 3 heroes.
from collections import Counter
player_dict = Counter(playerDict)
leading_heroes = [hero for hero, _ in player_dict.most_common(3)]

How to append the number of item frequencies in a list in Python 3.2?

I'm new to programming and python is the first language I've learned.
The question I want to ask is how do you count the frequency of items in a list
so they add up in order with "PARTY_INDICES"? in my case that is.
This is a docstring for what I need to do:
''' (list of str) -> tuple of (str, list of int)
votes is a list of single-candidate ballots for a single riding.
Based on votes, return a tuple where the first element is the name of the party
winning the seat and the second is a list with the total votes for each party in
the order specified in PARTY_INDICES.
>>> voting_plurality(['GREEN', 'GREEN', 'NDP', 'GREEN', 'CPC'])
('GREEN', [1, 3, 0, 1])
'''
Since PARTY_INDICES = [NDP_INDEX, GREEN_INDEX, LIBERAL_INDEX, CPC_INDEX]
This produces a tuple of the winning party (In this case 'GREEN') and the list of
frequencies, where [1, 3, 0, 1]
These are global variables, lists and dictionaries:
# The indices where each party's data appears in a 4-element list.
NDP_INDEX = 0
GREEN_INDEX = 1
LIBERAL_INDEX = 2
CPC_INDEX = 3
# A list of the indices where each party's data appears in a 4-element list.
PARTY_INDICES = [NDP_INDEX, GREEN_INDEX, LIBERAL_INDEX, CPC_INDEX]
# A dict where each key is a party name and each value is that party's index.
NAME_TO_INDEX = {
'NDP': NDP_INDEX,
'GREEN': GREEN_INDEX,
'LIBERAL': LIBERAL_INDEX,
'CPC': CPC_INDEX
}
# A dict where each key is a party's index and each value is that party's name.
INDEX_TO_NAME = {
NDP_INDEX: 'NDP',
GREEN_INDEX: 'GREEN',
LIBERAL_INDEX: 'LIBERAL',
CPC_INDEX: 'CPC'
}
This is my work:
def voting_plurality(votes):
my_list = []
my_dct = {}
counter = 0
for ballot in votes:
if (ballot in my_dct):
my_dct[ballot] += 1
else:
my_dct[ballot] = 1
if (my_dct):
my_dct = my_dct.values()
new_list = list(my_dct)
return (max(set(votes), key = votes.count), new_list)
it returns:
>>> voting_plurality(['GREEN', 'GREEN', 'NDP', 'GREEN', 'CPC'])
('GREEN', [1, 1, 3])
But I want it to also include the party with no votes and is in order with PARTY_INDICES [1, 3, 0, 1]
My code may look like nonsense, but I'm really stuck and confused.
Also I cannot IMPORT anything.
There are two main problems you have. The first is that you have to capture the zero, but since there are no votes for the "liberal party", the zero will not be reflected.
Tip Maybe you want to initialize your dictionary?
The second problem is that you are calling dict.values() which will not be in any sort of order. You need to use the dictionary, and the PARTY_INDICES to create the correctly ordered list of numbers.
Tip Maybe you can reference the keys in the dictionary and their respective positions in the PARTY_INDICIES list
See if you can come up with something given these tips, and update your question. If you can't, I'm sure someone will post a full answer eventually.
Seeing as it has been 4 hours - here is a solution:
def voting_plurality(votes):
sums = dict(zip(INDEX_TO_NAME.values(), [0] * len(INDEX_TO_NAME)))
for vote in votes:
if vote in sums:
sums[vote] += 1
else:
print "Bad vote: %s" % vote
votes_by_index = sorted([(NAME_TO_INDEX[k], v) for k, v in sums.items()])
votes_by_rank = sorted(votes_by_index, key=lambda x: x[1], reverse=True)
votes_by_parts = [item[1] for item in votes_by_index]
highest_votes = INDEX_TO_NAME[votes_by_rank[0][0]]
return (highest_votes, votes_by_parts)

Categories

Resources