Suppose I have a list where each index is either a name, or a list of rooms the preceding name index reserved.
[["Bob"],["125A, "154B", "643A"],["142C", "192B"], ["653G"],
["Carol"], ["95H", 123C"], ["David"], ["120G"]]
So in this case, Bob has the rooms: 125A, 154B, 643A, 152C, 192B, and 653G reserved, etc.
How do I construct a function which would make the above into the following format:
[["Bob", "125A, "154B", "643A", "142C", "192B", "653G"], ["Carol"...
Essentially concatenating [name] with all the [list of room reservations], until the next instance of [name]. I have a function which takes a list, and returns True if a list is a name, and False if it is a list of room reservations, so effectively I have:
[True, False, False, False, True, False, True False] for the above list, but not sure how that would help me, if at all. Assume that if a list contains names, it only has one name.
Given the following method
def is_name(x):
return # if x is a name or not
a simply and short solution is to use a defaultdict
Example:
from collections import defaultdict
def do_it(source):
dd = defaultdict(lambda: [])
for item in sum(source, []): # just use your favourite flattening method here
if is_name(item):
name = item
else:
dd[name].append(item)
return [[k]+v for k,v in dd.items()]
for s in do_it(l):
print s
Output:
['Bob', '125A', '154B', '643A', '142C', '192B', '653G']
['Carol', '95H', '123C']
['David', '120G']
Bonus:
This one uses a generator for laziness
import itertools
def do_it(source):
name, items = None, []
for item in itertools.chain.from_iterable(source):
if is_name(item):
if name:
yield [name] + items
name, items = None, []
name = item
else:
items.append(item)
yield [name] + items
I'll preface this by saying that I strongly agree with #uʍopǝpısdn's suggestion. However if your setup precludes changing it for some reason, this seems to work (although it isn't pretty):
# Original list
l = [["Bob"],["125A", "154B", "643A"],["142C", "192B"], ["653G"], ["Carol"], ["95H", "123C"], ["David"], ["120G"]]
# This is the result of your checking function
mapper = [True, False, False, False, True, False, True, False]
# Final list
combined = []
# Generic counters
# Position in arrays
i = 0
# Position in combined list
k = 0
# Loop through the main list until the end.
# We don't use a for loop here because we want to be able to control the
# position of i.
while i < len(l):
# If the corresponding value is True, start building the list
if mapper[i]:
# This is an example of how the code gets messy quickly
combined.append([l[i][0]])
i += 1
# Now that we've hit a name, loop until we hit another, adding the
# non-name information to the original list
while i < len(mapper) and not mapper[i]:
combined[k].append(l[i][0])
i += 1
# increment the position in our combined list
k += 1
print combined
Assume that the function which takes a list and returns True or False based on whether list contains name or rooms is called containsName() ...
def process(items):
results = []
name_and_rooms = []
for item in items:
if containsName(item):
if name_and_rooms:
results.append(name_and_rooms[:])
name_and_rooms = []
name_and_rooms.append(item[0])
else:
name_and_rooms.extend(item)
if name_and_rooms:
results.append(name_and_rooms[:])
return results
This will print out name even if there are no list of rooms to follow, e.g. [['bob'],['susan']].
Also, this will not merge repeated names, e.g. [['bob'],['123'],['bob'],['456']]. If that is desired, then you'll need to shove names into a temporary dict instead, with each room list as values to it. And then spit out the key-values of the dict at the end. But that on its own will not preserve the order of the names. If you care to preserve the order of the names, you can have another list that contains the order of the names and use that when spitting out the values in the dict.
Really, you should be using a dict for this. This assumes that the order of lists doesn't change (the name is always first).
As others suggested you should re-evaluate your data structure.
>>> from itertools import chain
>>> li_combo = list(chain.from_iterable(lst))
>>> d = {}
>>> for i in li_combo:
... if is_name(i):
... k = i
... if k not in d:
... d[k] = []
... else:
... d[k].append(i)
...
>>> final_list = [[k]+d[k] for k in d]
>>> final_list
[['Bob', '125A', '154B', '643A', '142C', '192B', '653G'], ['Carol', '95H', '123C'], ['David', '120G']]
reduce is your answer. Your data is this:
l=[['Bob'], ['125A', '154B', '643A'], ['142C', '192B'], ['653G'], ['Carol'], ['95H', '123C'], ['David'], ['120G']]
You say you've already got a function that determines if an element is a name. Here is my one:
import re
def is_name(s):
return re.match("[A-z]+$",s) and True or False
Then, using reduce, it is a one liner:
reduce(lambda c, n: is_name(n[0]) and c+[n] or c[:-1]+[c[-1]+n], l, [])
Result is:
[['Bob', '125A', '154B', '643A', '142C', '192B', '653G'], ['Carol', '95H', '123C'], ['David', '120G']]
Related
Can you help me with my algorithm in Python to parse a list, please?
List = ['PPPP_YYYY_ZZZZ_XXXX', 'PPPP_TOTO_TATA_TITI_TUTU', 'PPPP_TOTO_MMMM_TITI_TUTU', 'PPPP_TOTO_EHEH_TITI_TUTU', 'PPPP_TOTO_EHEH_OOOO_AAAAA', 'PPPP_TOTO_EHEH_IIII_SSSS_RRRR']
In this list, I have to get the last two words (PARENT_CHILD). For example for PPPP_TOTO_TATA_TITI_TUTU, I only get TITI_TUTU
In the case where there are duplicates, that is to say that in my list, I have: PPPP_TOTO_TATA_TITI_TUTU and PPPP_TOTO_EHEH_TITI_TUTU, I would have two times TITI_TUTU, I then want to recover the GRANDPARENT for each of them, that is: TATA_TITI_TUTU and EHEH_TITI_TUTU
As long as the names are duplicated, we take the level above.
But in this case, if I added the GRANDPARENT for EHEH_TITI_TUTU, I also want it to be added for all those who have EHEH in the name so instead of having OOOO_AAAAA, I would like to have EHEH_OOO_AAAAA and EHEH_IIII_SSSS_RRRR
My final list =
['ZZZZ_XXXX', 'TATA_TITI_TUTU', 'MMMM_TITI_TUTU', 'EHEH_TITI_TUTU', 'EHEH_OOOO_AAAAA', 'EHEH_IIII_SSSS_RRRR']
Thank you in advance.
Here is the code I started to write:
json_paths = ['PPPP_YYYY_ZZZZ_XXXX', 'PPPP_TOTO_TATA_TITI_TUTU',
'PPPP_TOTO_EHEH_TITI_TUTU', 'PPPP_TOTO_MMMM_TITI_TUTU', 'PPPP_TOTO_EHEH_OOOO_AAAAA']
cols_name = []
for path in json_paths:
acc=2
col_name = '_'.join(path.split('_')[-acc:])
tmp = cols_name
while col_name in tmp:
acc += 1
idx = tmp.index(col_name)
cols_name[idx] = '_'.join(json_paths[idx].split('_')[-acc:])
col_name = '_'.join(path.split('_')[-acc:])
tmp = ['_'.join(item.split('_')[-acc:]) for item in json_paths].pop()
cols_name.append(col_name)
print(cols_name.index(col_name), col_name)
cols_name
help ... with ... algorithm
use a dictionary for the initial container while iterating
keys will be PARENT_CHILD's and values will be lists containing grandparents.
>>> s = 'PPPP_TOTO_TATA_TITI_TUTU'
>>> d = collections.defaultdict(list)
>>> *_,grandparent,parent,child = s.rsplit('_',maxsplit=3)
>>> d['_'.join([parent,child])].append(grandparent)
>>> d
defaultdict(<class 'list'>, {'TITI_TUTU': ['TATA']})
>>> s = 'PPPP_TOTO_EHEH_TITI_TUTU'
>>> *_,grandparent,parent,child = s.rsplit('_',maxsplit=3)
>>> d['_'.join([parent,child])].append(grandparent)
>>> d
defaultdict(<class 'list'>, {'TITI_TUTU': ['TATA', 'EHEH']})
>>>
after iteration determine if there are multiple grandparents in a value
if there are, join/append the parent_child to each grandparent
additionally find all the parent_child's with these grandparents and prepend their grandparents. To facilitate build a second dictionary during iteration - {grandparent:[list_of_children],...}.
if the parent_child only has one grandparent use as-is
Instead of splitting each string the info could be extracted with a regular expression.
pattern = r'^.*?_([^_]*)_([^_]*_[^_]*)$'
I have a list of file paths which I need to order in a specific way prior to reading and processing the files. The specific way is defined by a smaller list which contains only some file names, but not all of them. All other file paths which are not listed in presorted_list need to stay in the order they had previously.
Examples:
some_list = ['path/to/bar_foo.csv',
'path/to/foo_baz.csv',
'path/to/foo_bar(ignore_this).csv',
'path/to/foo(ignore_this).csv',
'other/path/to/foo_baz.csv']
presorted_list = ['foo_baz', 'foo']
expected_list = ['path/to/foo_baz.csv',
'other/path/to/foo_baz.csv',
'path/to/foo(ignore_this).csv',
'path/to/bar_foo.csv',
'path/to/foo_bar(ignore_this).csv']
I've found some relating posts:
Sorting list based on values from another list?
How to sort a list according to another list?
But as far as I can tell the questions and answers always rely on two lists of the same length which I don't have (which results in errors like ValueError: 'bar_foo' is not in list) or a presorted list which needs to contain all possible values which I can't provide.
My Idea:
I've come up with a solution which seems to work but I'm unsure if this is a good way to approach the problem:
import os
import re
EXCPECTED_LIST = ['path/to/foo_baz.csv',
'other/path/to/foo_baz.csv',
'path/to/foo(ignore_this).csv',
'path/to/bar_foo.csv',
'path/to/foo_bar(ignore_this).csv']
PRESORTED_LIST = ["foo_baz", "foo"]
def sort_function(item, len_list):
# strip path and unwanted parts
filename = re.sub(r"[\(\[].*?[\)\]]", "", os.path.basename(item)).split('.')[0]
if filename in PRESORTED_LIST:
return PRESORTED_LIST.index(filename)
return len_list
def main():
some_list = ['path/to/bar_foo.csv',
'path/to/foo_baz.csv',
'path/to/foo_bar(ignore_this).csv',
'path/to/foo(ignore_this).csv',
'other/path/to/foo_baz.csv',]
list_length = len(some_list)
sorted_list = sorted(some_list, key=lambda x: sort_function(x, list_length))
assert sorted_list == EXCPECTED_LIST
if __name__ == "__main__":
main()
Are there other (shorter, more pythonic) ways of solving this problem?
Here is how I think I would do it:
import re
from collections import OrderedDict
from itertools import chain
some_list = ['path/to/bar_foo.csv',
'path/to/foo_baz.csv',
'path/to/foo_bar(ignore_this).csv',
'path/to/foo(ignore_this).csv',
'other/path/to/foo_baz.csv']
presorted_list = ['foo_baz', 'foo']
expected_list = ['path/to/foo_baz.csv',
'other/path/to/foo_baz.csv',
'path/to/foo(ignore_this).csv',
'path/to/bar_foo.csv',
'path/to/foo_bar(ignore_this).csv']
def my_sort(lst, presorted_list):
rgx = re.compile(r"^(.*/)?([^/(.]*)(\(.*\))?(\.[^.]*)?$")
d = OrderedDict((n, []) for n in presorted_list)
d[None] = []
for p in some_list:
m = rgx.match(p)
n = m.group(2) if m else None
if n not in d:
n = None
d[n].append(p)
return list(chain.from_iterable(d.values()))
print(my_sort(some_list, presorted_list) == expected_list)
# True
An easy implementation is to add some sentinels to the lines before sorting. So there is no need for specific ordering. Also regex may be avoid if all filenames respect the pattern you gave:
for n,file1 in enumerate(presorted_list):
for m,file2 in enumerate(some_list):
if '/'+file1+'.' in file2 or '/'+file1+'(' in file2:
some_list[m] = "%03d%03d:%s" % (n, m, file2)
some_list.sort()
some_list = [file.split(':',1)[-1] for file in some_list]
print(some_list)
Result:
['path/to/foo_baz.csv',
'other/path/to/foo_baz.csv',
'path/to/foo(ignore_this).csv',
'path/to/bar_foo.csv',
'path/to/foo_bar(ignore_this).csv']
Let me think. It is a unique problem, I'll try to suggest a solution
only_sorted_elements = filter(lambda x:x.rpartition("/")[-1].partition(".")[0] in presorted_list , some_list)
only_sorted_elements.sort(key = lambda x:presorted_list.index(x.rpartition("/")[-1].partition(".")[0]))
expected_list = []
count = 0
for ind, each_element in enumerate(some_list):
if each_element not in presorted_list:
expected_list.append(each_element)
else:
expected_list[ind].append(only_sorted_elements[count])
count += 1
Hope this solves your problem.
I first filter for only those elements which are there in presorted_list,
then I sort those elements according to its order in presorted_list
Then I iterate over the list and append accordingly.
Edited :
Changed index parameters from filename with path to exact filename.
This will retain the original indexes of those files which are not in presorted list.
EDITED :
The new edited code will change the parameters and gives sorted results first and unsorted later.
some_list = ['path/to/bar_foo.csv',
'path/to/foo_baz.csv',
'path/to/foo_bar(ignore_this).csv',
'path/to/foo(ignore_this).csv',
'other/path/to/foo_baz.csv']
presorted_list = ['foo_baz', 'foo']
only_sorted_elements = filter(lambda x:x.rpartition("/")[-1].partition("(")[0].partition(".")[0] in presorted_list , some_list)
unsorted_all = filter(lambda x:x.rpartition("/")[-1].partition("(")[0].partition(".")[0] not in presorted_list , some_list)
only_sorted_elements.sort(key = lambda x:presorted_list.index(x.rpartition("/")[-1].partition("(")[0].partition(".")[0]))
expected_list = only_sorted_elements + unsorted_all
print expected_list
Result :
['path/to/foo_baz.csv',
'other/path/to/foo_baz.csv',
'path/to/foo(ignore_this).csv',
'path/to/bar_foo.csv',
'path/to/foo_bar(ignore_this).csv']
Since python's sort is already stable, you only need to provide it with a coarse grouping for the sort key.
Given the specifics of your sorting requirements this is better done using a function. For example:
def presort(presorted):
def sortkey(name):
filename = name.split("/")[-1].split(".")[0].split("(")[0]
if filename in presorted:
return presorted.index(filename)
return len(presorted)
return sortkey
sorted_list = sorted(some_list,key=presort(['foo_baz', 'foo']))
In order to keep the process generic and simple to use, the presorted_list should be provided as a parameter and the sort key function should use it to produce the grouping keys. This is achieved by returning a function (sortkey) that captures the presorted list parameter.
This sortkey() function returns the index of the file name in the presorted_list or a number beyond that for unmatched file names. So, if you have 2 names in the presorted_list, they will group the corresponding files under sort key values 0 and 1. All other files will be in group 2.
The conditions that you use to determine which part of the file name should be found in presorted_list are somewhat unclear so I only covered the specific case of the opening parenthesis. Within the sortkey() function, you can add more sophisticated parsing to meet your needs.
python noob here. So I'm making a program that will take a JSON file from a url and parse the information and put it into a database. I have the JSON working, thankfully, but now I am stuck, I'll explain it through my code.
playerDict = {
"zenyatta" : 0,
"mei" : 0,
"tracer" : 0,
"soldier76" : 0,
"ana" : 0,
...}
So this is my original dictionary with the which I then fill with the players data for each hero.
topHeroes = sorted(playerDict.items(),key = operator.itemgetter(1),reverse = True)
I then sort this list and it turns the heroes with the most amount of hours played first.
topHeroesDict = topHeroes[0:3]
playerDict['tophero'] = topHeroesDict[0]
I then get the top three heroes. The second line here prints out a list like so:
'secondhero': ('mercy', 6.0)
Whereas I want the output to be:
'secondhero': 'mercy'
Would appreciate any help i have tried the code below with and without list.
list(topHeroes.keys())[0]
So thanks in advance and apologies for the amount of code!
You could take an approach with enumerate, if instead of "firsthero" you are ok with "Top 1" and so on. With enumerate you can iterate over the list and keep track of the current index, which is used to name the key in this dictionary comprehension. j[0] is the name of the hero, which is the first element of the tuple.
topHeroes = sorted(playerDict.items(),key = operator.itemgetter(1),reverse = True)
topHeroesDict = {"Top "+str(i): j[0] for i, j in enumerate(topHeroes[0:3])}
Alternatively, you could use a dictionary which maps the index to first like this:
topHeroes = sorted(playerDict.items(),key = operator.itemgetter(1),reverse = True)
top = {0: "first", 1: "second", 2: "third"}
topHeroesDict = {top[i]+"hero": j[0] for i, j in enumerate(topHeroes[0:3])}
You do not need any imports to achieve this. Without itemgetter, you can do it in one line like this:
top = {0: "first", 1: "second", 2: "third"}
topHeroesDict = {top[i]+"hero": j[0] for i, j in enumerate(sorted([(i, playerDict[i]) for i in playerDict.keys()], key = lambda x: x[1], reverse = True)[0:3])}
You're sorting an iterable of tuples returned by the items method of the dict, so each item in the sorted list is a tuple containing the hero and their score.
You can avoid using sorted and dict.items altogether and get the leading heroes (without their score) by simply using collections.Counter and then getting the most_common 3 heroes.
from collections import Counter
player_dict = Counter(playerDict)
leading_heroes = [hero for hero, _ in player_dict.most_common(3)]
I have a list of strings that looks like that
name=['Jack','Sam','Terry','Sam','Henry',.......]
I want to create a newlist with the logic shown below. I want to go to every entry in name and assign it a number if the entry is seen for the first time. If it is being repeated(as in the case with 'Sam') I want to assign it the corresponding number, include it in my newlist and continue.
newlist = []
name[1] = 'Jack'
Jack = 1
newlist = ['Jack']
name[2] = 'Sam'
Sam = 2
newlist = ['Jack','Sam']
name[3] = 'Terry'
Terry = 3
newlist = ['Jack','Sam','Terry']
name[4] = 'Sam'
Sam = 2
newlist = ['Jack','Sam','Terry','Sam']
name[5] = 'Henry'
Henry = 5
newlist = ['Jack','Sam','Terry','Sam','Henry']
I know this can be done with something like
u,index = np.unique(name,return_inverse=True)
but for me it is important to loop through the individual entries of the list name and keep the logic above. Can someone help me with this?
Try using a dict and checking if keys are already paired to a value:
name = ['Jack','Sam','Terry','Sam','Henry']
vals = {}
i = 0
for entry in name:
if entry not in vals:
vals[entry] = i + 1
i += 1
print vals
Result:
{'Henry': 5, 'Jack': 1, 'Sam': 2, 'Terry': 3}
Elements can be accessed by "index" (read: key) just like you would do for a list, except the "index" is whatever the key is; in this case, the keys are names.
>>> vals['Henry']
5
EDIT: If order is important, you can enter the items into the dict using the number as the key: in this way, you will know which owner is which based on their number:
name = ['Jack','Sam','Terry','Sam','Henry']
vals = {}
i = 0
for entry in name:
#Check if entry is a repeat
if entry not in name[0:i]:
vals[i + 1] = entry
i += 1
print (vals)
print (vals[5])
This code uses the order in which they appear as the key. To make sure we don't overwrite or create duplicates, it checks if the current name has appeared before in the list (anywhere from 0 up to i, the current index in the name list).
In this way, it is still in the "sorted order" which you want. Instead of accessing items by the name of the owner you simply index by their number. This will give you the order you desire from your example.
Result:
>>> vals
{1: 'Jack', 2: 'Sam', 3: 'Terry', 5: 'Henry'}
>>> vals[5]
'Henry'
If you really want to create variable.By using globals() I am creating global variable .If you want you can create local variable using locals()
Usage of globals()/locals() create a dictionary which is the look up table of the variable and their values by adding key and value you are creating a variable
lists1 = ['Jack','Sam','Terry','Sam','Henry']
var = globals()
for i,n in enumerate(nl,1):
if n not in var:
var[n] = [i]
print var
{'Jack':1,'Sam': 2,'Terry': 3, 'Henry':5}
print Jack
1
If order of the original list is key, may I suggest two data structures, a dictionary and a newlist
d = {}
newlist = []
for i,n in enumerate(nl):
if n not in d:
d[n] = [i+1]
newlist.append({n: d[n]})
newlist will return
[{'Jack': [1]}, {'Sam': [2]}, {'Terry': [3]}, {'Sam': [2]}, {'Henry': [5]}]
to walk it:
for names in newlist:
for k, v in names.iteritems():
print('{} is number {}'.format(k, v))
NOTE: This does not make it easy to lookup the number based on the name as other suggested above. That would require more data structure logic. This does however let you keep the original list order, but keep track of the time the name was first found essentially.
Edit: Since order is important to you. Use orderedDict() from the collections module.
Use a dictionary. Iterate over your list with a for loop and then check to see if you have the name in the dictionary with a if statement. enumerate gives you the index number of your name, but keep in mind that index number start from 0 so in accordance to your question we append 1 to the index number giving it the illusion that we begin indexing from 1
import collections
nl = ['Jack','Sam','Terry','Sam','Henry']
d = collections.OrderedDict()
for i,n in enumerate(nl):
if n not in d:
d[n] = [i+1]
print d
Output:
OrderedDict([('Jack', [1]), ('Sam', [2]), ('Terry', [3]), ('Henry', [5])])
EDIT:
The ordered dict is still a dictionary. So you can use .items() to get the key value pairs as tuples. the number is essectially a list so you can do this:
for i in d.items():
print '{} = {}'.format(i[0],i[1][0]) #where `i[1]` gives you the number at that position in the tuple, then the `[0]` gives you the first element of the list.
Output:
Jack = 1
Sam = 2
Terry = 3
Henry = 5
Problem:
Trying to evaluate first 4 characters of each item in list.
If the first 4 chars match another first 4 chars in the list, then append the last three digits to the first four. See example below.
Notes:
The list values are not hard coded.
The list always has this structure "####.###".
Only need to match first 4 chars in each item of list.
Order is not essential.
Code:
Grid = ["094G.016", "094G.019", "194P.005", "194P.015", "093T.021", "093T.102", "094G.032"]
Desired Output:
Grid = ["094G.016\019\032", "194P.005\015", "093T.021\102"]
Research:
I know that sets can find duplicates, could I use a set to evaluate only the 1st 4 chars, would I run into a problem since indexing of sets cannot be done?
Would it be better to split the list items into the 2 parts. The four digits before the period ("094G"), and a separate list of the three digits after the period ("093"), compare them, then join them in a new list?
Is there a better way of doing this all together that I'm not realizing?
Here is one straightforward way to do it.
from collections import defaultdict
grid = ['094G.016', '094G.019', '194P.005', '194P.015', '093T.021', '093T.102', '094G.032']
d = defaultdict(list)
for item in grid:
k,v = item.split('.')
d[k].append(v)
result = ['%s.%s' % (k, '/'.join(v)) for k, v in d.items()]
Gives unordered result:
['093T.021/102', '194P.005/015', '094G.016/019/032']
What you'll most likely want is a dictionary mapping the first part of each code to a list of second parts. You can build the dictionary like so:
mappings = {} #Empty dictionary
for code in Grid: #Loop over each code
first, second = code.split('.') #Separate the code into first.second
if first in mappings: #if the first was already found
mappings[first].append(second) #add the second to those already computed
else:
mappings[first] = [second] #otherwise, put it in a new list
Once you have the dictionary, it will be quite simple to loop over it and combine the second parts together (ideally, using '\\'.join)
Sounds like a job for defaultdict.
from containers import defaultdict
grid = ["094G.016", "094G.019", "194P.005", "194P.015", "093T.021", "093T.102"]
d = defaultdict(set)
for item in grid:
prefix, suffix = item.split(".")
d[prefix].add(suffix)
output = [ "%s.%s" % (prefix, "/".join(d[prefix]), ) for prefix in d ]
>>> from itertools import groupby
>>> Grid = ["094G.016", "094G.019", "194P.005", "194P.015", "093T.021", "093T.102", "094G.032"]
>>> Grid = sorted(Grid, key=lambda x:x.split(".")[0])
>>> gen = ((k, g) for k, g in groupby(Grid, key=lambda x:x.split(".")[0]))
>>> gen = ((k,[x.split(".") for x in g]) for k, g in gen)
>>> gen = list((k + '.' + '/'.join(x[1] for x in g) for k, g in gen))
>>> for x in gen:
... print(x)
...
093T.021/102
094G.016/019/032
194P.005/015