Split list based on first character - Python

Split list based on first character - Python - python

I am new to Python and can't quite figure out a solution to my Problem. I would like to split a list into two lists, based on what the list item starts with. My list looks like this, each line represents an item (yes this is not the correct list notation, but for a better overview i'll leave it like this) :
***
**
.param
+foo = bar
+foofoo = barbar
+foofoofoo = barbarbar
.model
+spam = eggs
+spamspam = eggseggs
+spamspamspam = eggseggseggs
So I want a list that contains all lines starting with a '+' between .param and .model and another list that contains all lines starting with a '+' after model until the end.
I have looked at enumerate() and split(), but since I have a list and not a string and am not trying to match whole items in the list, I'm not sure how to implement them.
What I have is this:
paramList = []
for line in newContent:
while line.startswith('+'):
paramList.append(line)
if line.startswith('.'):
break
This is just my try to create the first list. The Problem is, the code reads the second block of '+'s as well because break just Exits the while Loop, not the for Loop.
I hope you can understand my question and thanks in advance for any pointers!

What you want is really a simple task that can be accomplish using list slices and list comprehension:
data = ['**','***','.param','+foo = bar','+foofoo = barbar','+foofoofoo = barbarbar',
'.model','+spam = eggs','+spamspam = eggseggs','+spamspamspam = eggseggseggs']
# First get the interesting positions.
param_tag_pos = data.index('.param')
model_tag_pos = data.index('.model')
# Get all elements between tags.
params = [param for param in data[param_tag_pos + 1: model_tag_pos] if param.startswith('+')]
models = [model for model in data[model_tag_pos + 1: -1] if model.startswith('+')]
print(params)
print(models)
Output
>>> ['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar']
>>> ['+spam = eggs', '+spamspam = eggseggs']
Answer to comment:
Suppose you have a list containing numbers from 0 up to 5.
l = [0, 1, 2, 3, 4, 5]
Then using list slices you can select a subset of l:
another = l[2:5] # another is [2, 3, 4]
That what we are doing here:
data[param_tag_pos + 1: model_tag_pos]
And for your last question: ...how does python know param are the lines in data it should iterate over and what exactly does the first paramin param for paramdo?
Python doesn't know, You have to tell him.
First param is a variable name I'm using here, it cuold be x, list_items, whatever you want.
and I will translate the line of code to plain english for you:
# Pythonian
params = [param for param in data[param_tag_pos + 1: model_tag_pos] if param.startswith('+')]
# English
params is a list of "things", for each "thing" we can see in the list `data`
from position `param_tag_pos + 1` to position `model_tag_pos`, just if that "thing" starts with the character '+'.

data = {}
for line in newContent:
if line.startswith('.'):
cur_dict = {}
data[line[1:]] = cur_dict
elif line.startswith('+'):
key, value = line[1:].split(' = ', 1)
cur_dict[key] = value
This creates a dict of dicts:
{'model': {'spam': 'eggs',
'spamspam': 'eggseggs',
'spamspamspam': 'eggseggseggs'},
'param': {'foo': 'bar',
'foofoo': 'barbar',
'foofoofoo': 'barbarbar'}}

I am new to Python
Whoops. Don't bother with my answer then.
I want a list that contains all lines starting with a '+' between
.param and .model and another list that contains all lines starting
with a '+' after model until the end.
import itertools as it
import pprint
data = [
'***',
'**',
'.param',
'+foo = bar',
'+foofoo = barbar',
'+foofoofoo = barbarbar',
'.model',
'+spam = eggs',
'+spamspam = eggseggs',
'+spamspamspam = eggseggseggs',
]
results = [
list(group) for key, group in it.groupby(data, lambda s: s.startswith('+'))
if key
]
pprint.pprint(results)
print '-' * 20
print results[0]
print '-' * 20
pprint.pprint(results[1])
--output:--
[['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar'],
['+spam = eggs', '+spamspam = eggseggs', '+spamspamspam = eggseggseggs']]
--------------------
['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar']
--------------------
['+spam = eggs', '+spamspam = eggseggs', '+spamspamspam = eggseggseggs']
This thing here:
it.groupby(data, lambda x: x.startswith('+')
...tells python to create groups from the strings according to their first character. If the first character is a '+', then the string gets put into a True group. If the first character is not a '+', then the string gets put into a False group. However, there are more than two groups because consecutive False strings will form a group, and consecutive True strings will form a group.
Based on your data, the first three strings:
***
**
.param
will create one False group. Then, the next strings:
+foo = bar
+foofoo = barbar
+foofoofoo = barbarbar
will create one True group. Then the next string:
'.model'
will create another False group. Then the next strings:
'+spam = eggs'
'+spamspam = eggseggs'
'+spamspamspam = eggseggseggs'
will create another True group. The result will be something like:
{
False: [strs here],
True: [strs here],
False: [strs here],
True: [strs here]
}
Then it's just a matter of picking out each True group: if key, and then converting the corresponding group to a list: list(group).
Response to comment:
where exactly does python go through data, like how does it know s is
the data it's iterating over?
groupby() works like do_stuff() below:
def do_stuff(items, func):
for item in items:
print func(item)
#Create the arguments for do_stuff():
data = [1, 2, 3]
def my_func(x):
return x + 100
#Call do_stuff() with the proper argument types:
do_stuff(data, my_func) #Just like when calling groupby(), you provide some data
#and a function that you want applied to each item in data
--output:--
101
102
103
Which can also be written like this:
do_stuff(data, lambda x: x + 100)
lambda creates an anonymous function, which is convenient for simple functions which you don't need to refer to by name.
This list comprehension:
[
list(group)
for key, group in it.groupby(data, lambda s: s.startswith('+'))
if key
]
is equivalent to this:
results = []
for key, group in it.groupby(data, lambda s: s.startswith('+') ):
if key:
results.append(list(group))
It's clearer to explicitly write a for loop, however list comprehensions execute much faster. Here is some detail:
[
list(group) #The item you want to be in the results list for the current iteration of the loop here:
for key, group in it.groupby(data, lambda s: s.startswith('+')) #A for loop
if key #Only include the item for the current loop iteration in the results list if key is True
]

I would suggest doing things step by step.
1) Grab every word from the array separately.
2) Grab the first letter of the word.
3) Look if that is a '+' or '.'
Example code:
import re
class Dark():
def __init__(self):
# Array
x = ['+Hello', '.World', '+Hobbits', '+Dwarves', '.Orcs']
xPlus = []
xDot = []
# Values
i = 0
# Look through every word in the array one by one.
while (i != len(x)):
# Grab every word (s), and convert to string (y).
s = x[i:i+1]
y = '\n'.join(s)
# Print word
print(y)
# Grab the first letter.
letter = y[:1]
if (letter == '+'):
xPlus.append(y)
elif (letter == '.'):
xDot.append(y)
else:
pass
# Add +1
i = i + 1
# Print lists
print(xPlus)
print(xDot)
#Run class
Dark()

Related

working with Permutationed list (making pass_word list)

i want to create a pass_word list let's assume i have created a Permutationed list for example :
##
##
##
##
and then i want to add another chars to it (for ex : a,b) a,b is named special chars in this code and ## are added chars
so i want finally get this list :
ab## , ab##,ab##,ab## , ba##, .... a##b,...,b##a , ... , ba##
Note : I don't want any special characters get duplicated for ex i
don't want aa## or bb## (a,b can't be duplicated because they are
special chars #or # can be duplicated because they are added chars )
codes :
master_list=[]
l=[]
l= list(itertools.combinations_with_replacement('##',2)) # get me this list :[(#,#),(#,#),(#,#),(#,#)]
for i in l:
i = i+tuple(s) # adding special char(1 in this example) to created list
master_list.append(i)
print (master_list) # now i have this list : [(#,#,1),(#,#,1),....(#,#,1)
now if i can get all permutation of master_list my problem can be solved but i can't do that

i solved my problem , my idea : first of all i generate all posiable permutation of added chars**(#,#)** and save them to a list and then create another list and save specific chars (a,b) to it now we have to list just we need to merge them and in finally use permute_unique function
def permute_unique(nums):
perms = [[]]
for n in nums:
new_perm = []
for perm in perms:
for i in range(len(perm) + 1):
new_perm.append(perm[:i] + [n] + perm[i:])
# handle duplication
if i < len(perm) and perm[i] == n: break
perms = new_perm
return perms
l= list(itertools.combinations_with_replacement(algorithm,3))
for i in l:
i = i+tuple(s) # merge
master_list.append(i)
print(list(permute_unique))

You can just combine the combinations_with_replacement of the "added" chars with all the permutations of those combinations and the "special" characters:
>>> special = "ab"
>>> added = "##"
>>> [''.join(p)
for a in itertools.combinations_with_replacement(added, 2)
for p in itertools.permutations(a + tuple(special))]
['##ab',
'##ba',
'#a#b',
...
'a#b#',
'ab##',
...
'##ab',
'##ba',
...
'ba##',
'ba##']
If you want to prevent duplicates, pass the inner permuations through a set:
>>> [''.join(p)
for a in itertools.combinations_with_replacement(added, 2)
for p in set(itertools.permutations(a + tuple(special)))]

Search and replace multiple specific sequences of elements in Python list/array

I currently have 6 separate for loops which iterate over a list of numbers looking to match specific sequences of numbers within larger sequences, and replace them like this:
[...0,1,0...] => [...0,0,0...]
[...0,1,1,0...] => [...0,0,0,0...]
[...0,1,1,1,0...] => [...0,0,0,0,0...]
And their inverse:
[...1,0,1...] => [...1,1,1...]
[...1,0,0,1...] => [...1,1,1,1...]
[...1,0,0,0,1...] => [...1,1,1,1,1...]
My existing code is like this:
for i in range(len(output_array)-2):
if output_array[i] == 0 and output_array[i+1] == 1 and output_array[i+2] == 0:
output_array[i+1] = 0
for i in range(len(output_array)-3):
if output_array[i] == 0 and output_array[i+1] == 1 and output_array[i+2] == 1 and output_array[i+3] == 0:
output_array[i+1], output_array[i+2] = 0
In total I'm iterating over the same output_array 6 times, using brute force checking. Is there a faster method?

# I would create a map between the string searched and the new one.
patterns = {}
patterns['010'] = '000'
patterns['0110'] = '0000'
patterns['01110'] = '00000'
# I would loop over the lists
lists = [[0,1,0,0,1,1,0,0,1,1,1,0]]
for lista in lists:
# i would join the list elements as a string
string_list = ''.join(map(str,lista))
# we loop over the patterns
for pattern,value in patterns.items():
# if a pattern is detected, we replace it
string_list = string_list.replace(pattern, value)
lista = list(string_list)
print lista

While this question related to the questions Here and Here, the question from OP relates to fast searching of multiple sequences at once. While the accepted answer works well, we may not want to loop through all the search sequences for every sub-iteration of the base sequence.
Below is an algo which checks for a sequence of i ints only if the sequence of (i-1) ints is present in the base sequence
# This is the driver function which takes in a) the search sequences and
# replacements as a dictionary and b) the full sequence list in which to search
def findSeqswithinSeq(searchSequences,baseSequence):
seqkeys = [[int(i) for i in elem.split(",")] for elem in searchSequences]
maxlen = max([len(elem) for elem in seqkeys])
decisiontree = getdecisiontree(seqkeys)
i = 0
while i < len(baseSequence):
(increment,replacement) = get_increment_replacement(decisiontree,baseSequence[i:i+maxlen])
if replacement != -1:
baseSequence[i:i+len(replacement)] = searchSequences[",".join(map(str,replacement))]
i +=increment
return baseSequence
#the following function gives the dictionary of intermediate sequences allowed
def getdecisiontree(searchsequences):
dtree = {}
for elem in searchsequences:
for i in range(len(elem)):
if i+1 == len(elem):
dtree[",".join(map(str,elem[:i+1]))] = True
else:
dtree[",".join(map(str,elem[:i+1]))] = False
return dtree
# the following is the function does most of the work giving us a) how many
# positions we can skip in the search and b)whether the search seq was found
def get_increment_replacement(decisiontree,sequence):
if str(sequence[0]) not in decisiontree:
return (1,-1)
for i in range(1,len(sequence)):
key = ",".join(map(str,sequence[:i+1]))
if key not in decisiontree:
return (1,-1)
elif decisiontree[key] == True:
key = [int(i) for i in key.split(",")]
return (len(key),key)
return 1, -1
You can test the above code with this snippet:
if __name__ == "__main__":
inputlist = [5,4,0,1,1,1,0,2,0,1,0,99,15,1,0,1]
patternsandrepls = {'0,1,0':[0,0,0],
'0,1,1,0':[0,0,0,0],
'0,1,1,1,0':[0,0,0,0,0],
'1,0,1':[1,1,1],
'1,0,0,1':[1,1,1,1],
'1,0,0,0,1':[1,1,1,1,1]}
print(findSeqswithinSeq(patternsandrepls,inputlist))
The proposed solution represents the sequences to be searched as a decision tree.
Due to skipping the many of the search points, we should be able to do better than O(m*n) with this method (where m is number of search sequences and n is length of base sequence.
EDIT: Changed answer based on more clarity in edited question.

Python: Replace elements of one list with those of another if condition is met

I have two lists one called src with each element in this format:
['SOURCE: filename.dc : 1 : a/path/: description','...]
And one called base with each element in this format:
['BASE: 1: another/path','...]
I am trying to compare the base element's number (in this case it's 4) with the source element's number (in this case it's 1).
If they match then i want to replace the source element's number with the base element's path.
Right now i can split the source element's number with a for loop like this:
for w in source_list:
src_no=(map(lambda s: s.strip(), w.split(':'))[2])
And i can split the base element's path and number with a for loop like this:
for r in basepaths:
base_no=(map(lambda s: s.strip(), r.split(':'))[1])
base_path=(map(lambda s: s.strip(), r.split(':'))[2])
I expect the new list to look like ( base on the example of the two elements above):
['SOURCE: filename.dc : another/path : a/path/: description','...]
the src list is a large list with many elements, the base list is usually three or four elements long and is only used to translate into the new list.

I hacked something together for you, which I think should do what you want:
base_list = ['BASE: 1: another/path']
base_dict = dict()
# First map the base numbers to the paths
for entry in base_list:
n, p = map(lambda s: s.strip(), entry.split(':')[1:])
base_dict[n] = p
source_list = ['SOURCE: filename.dc : 1 : a/path/: description']
# Loop over all source entries and replace the number with the base path of the numbers match
for i, entry in enumerate(source_list):
n = entry.split(':')[2].strip()
if n in base_dict:
new_entry = entry.split(':')
new_entry[2] = base_dict[n]
source_list[i] = ':'.join(new_entry)
Be aware that this is a hacky solution, I think you should use regexp (look into the re module) to extract number and paths and when replacing the number.
This code also alters a list while iterating over it, which may not be the most pythonic thing to do.

Something like this:
for i in range(len(source_list)):
for b in basepaths:
if source_list[i].split(":")[2].strip() == b.split(":")[1].strip():
source_list[i] = ":".join(source_list[i].split(":")[:3] + [b.split(":")[2]] + source_list[i].split(":")[4:])

just get rid of the [] part of the splits:
src=(map(lambda s: s.strip(), w.split(':')))
base=(map(lambda s: s.strip(), r.split(':')))
>> src
>> ['SOURCE', 'filename.dc', '1', 'a/path/', 'description']
the base will similarly be a simple list
now just replace the proper element:
src[2] = base[2]
then put the elements back together if necessary:
src = ' : '.join(src)

def separate(x, separator = ':'):
return tuple(x.split(separator))
sourceList = map(separate, source)
baseDict = {}
for b in map(separate, base):
baseDict[int(b[1])] = b[2]
def tryFind(number):
try:
return baseDict[number]
except:
return number
result = [(s[0], s[1], tryFind(int(s[2])), s[3]) for s in sourceList]
This worked for me, it's not the best, but a start

So there is one large list that will be sequentially browsed and a shorter one. I would turn the short one into a mapping to find immediately for each item of the first list whether there is a match:
base = {}
for r in basepaths:
base_no=(map(lambda s: s.strip(), r.split(':'))[1])
base_path=(map(lambda s: s.strip(), r.split(':'))[2])
base[base_no] = base_path
for w in enumerate source_list:
src_no=(map(lambda s: s.strip(), w.split(':'))[2])
if src_no in base:
path = base[src_no]
# stuff...

Python: Concatenate similiar objects in List

I have a list containing strings as ['Country-Points'].
For example:
lst = ['Albania-10', 'Albania-5', 'Andorra-0', 'Andorra-4', 'Andorra-8', ...other countries...]
I want to calculate the average for each country without creating a new list. So the output would be (in the case above):
lst = ['Albania-7.5', 'Andorra-4.25', ...other countries...]
Would realy appreciate if anyone can help me with this.
EDIT:
this is what I've got so far. So, "data" is actually a dictionary, where the keys are countries and the values are list of other countries points' to this country (the one as Key). Again, I'm new at Python so I don't realy know all the built-in functions.
for key in self.data:
lst = []
index = 0
score = 0
cnt = 0
s = str(self.data[key][0]).split("-")[0]
for i in range(len(self.data[key])):
if s in self.data[key][i]:
a = str(self.data[key][i]).split("-")
score += int(float(a[1]))
cnt+=1
index+=1
if i+1 != len(self.data[key]) and not s in self.data[key][i+1]:
lst.append(s + "-" + str(float(score/cnt)))
s = str(self.data[key][index]).split("-")[0]
score = 0
self.data[key] = lst

itertools.groupby with a suitable key function can help:
import itertools
def get_country_name(item):
return item.split('-', 1)[0]
def get_country_value(item):
return float(item.split('-', 1)[1])
def country_avg_grouper(lst) :
for ctry, group in itertools.groupby(lst, key=get_country_name):
values = list(get_country_value(c) for c in group)
avg = sum(values)/len(values)
yield '{country}-{avg}'.format(country=ctry, avg=avg)
lst[:] = country_avg_grouper(lst)
The key here is that I wrote a function to do the change out of place and then I can easily make the substitution happen in place by using slice assignment.

I would probabkly do this with an intermediate dictionary.
def country(s):
return s.split('-')[0]
def value(s):
return float(s.split('-')[1])
def country_average(lst):
country_map = {}|
for point in lst:
c = country(pair)
v = value(pair)
old = country_map.get(c, (0, 0))
country_map[c] = (old[0]+v, old[1]+1)
return ['%s-%f' % (country, sum/count)
for (country, (sum, count)) in country_map.items()]
It tries hard to only traverse the original list only once, at the expense of quite a few tuple allocations.

Filtering a list of images by using a filter and association lists

I've got an assignment and part of it asks to define a process_filter_description. Basically I have a list of images I want to filter:
images = ["1111.jpg", "2222.jpg", "circle.JPG", "square.jpg", "triangle.JPG"]
Now I have an association list that I can use to filter the images:
assc_list = [ ["numbers", ["1111.jpg", "2222.jpg"]] , ["shapes", ["circle.JPG", "square.jpg", "triangle.JPG"]] ]
I can use a filter description to select which association list I want to apply the filter the keyword is enclosed by colons):
f = ':numbers:'
I'm not exactly sure how to start it. In words I can at least think:
Filter is ':numbers:'
Compare each term of images to each term associated with numbers in the association list.
If term matches, then append term to empty list.
Right now I am just trying to get my code to print only the terms from the numbers association list, but it prints out all of them.
def process_filter_description(f, images, ia):
return_list = []
f = f[1:-1]
counter = 0
if f == ia[counter][0]:
#print f + ' is equal to ' + ia[counter][0]
for key in ial:
for item in key[1]:
#print item
#return_list.append(item)
return return_list

Instead of an "associative list", how about using a dictionary?
filter_assoc = {'numbers': ['1111.jpg', '2222.jpg'] ,
'shapes': ['circle.JPG', 'square.jpg', 'triangle.JPG']}
Now, just see which images are in each group:
>>> filter_assoc['numbers']
['1111.jpg', '2222.jpg']
>>>
>>> filter_assoc['shapes']
['circle.JPG', 'square.jpg', 'triangle.JPG']
Your processing function would become immensely simpler:
def process_filter_description(filter, association):
return association[filter[1:-1]]
I'll just think aloud here, so this is what I'd use as a function to perform the task of the dictionary:
def process_filter_description(index, images, association):
return_list = []
index = index[1:-1]
for top_level in association:
if top_level[0] == index:
for item in top_level[1]:
return_list.append(item)
break
return return_list

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split list based on first character - Python - python

Related

working with Permutationed list (making pass_word list)

Search and replace multiple specific sequences of elements in Python list/array

Python: Replace elements of one list with those of another if condition is met

Python: Concatenate similiar objects in List

Filtering a list of images by using a filter and association lists

Categories

Resources