Related
I have created an array that pulled data from different files and inserted it into the array. This data is of varying # of values 1-7 and order.
i.e. one file can have 3 rows of
label1
label4
label3
the next file may only have
label3
and another yet may have
label7
label1
label3
label2
I have created a Dictionary
Dict = {1:'label1',
2:'label2',
3:'label3',
4:'label4',
5:'label5',
6:'label6',
7:'label7'}
I want to
loop through the array
set each label to the dictionary value (i.e. if label4 then it =4)
order it in the order 1-7
for the missing values, put a 0 in that spot
for the spots with values, put a 1 in that spot
i.e. for [label1,label4,label3]
replace with dictionary value and sort -- [1,3,4]
loop through array, and if that number is missing, put a 0 in that spot, everything else turn to 1 in same location it was in -- [1,0,1,1,0,0,0]
Essentially, I am one-hot-encoding it.
This is what I am trying, but I am messing up the loop logic somewhere:
y_temp = []
for j in y:
for k in y[j]:
if y[j,k]== Dict[1]:
y_temp[k] = y_temp[k].append('1')
else:
y[k] = y_temp[k].append('0')
elif y[j,k] == Dict[2]:
y_temp[k] = y_temp[k].append('2')
else:
y[k] = y_temp[k].append('0')
elif y[j,k] == Dict[3]:
y_temp[k] = y_temp[k].append('3')
else:
y[k] = y_temp[k].append('0')
elif y[j,k] == Dict[4]:
y_temp[k] = y_temp[k].append('4')
else:
y[k] = y_temp[k].append('0')
elif y[j,k] == Dict[5]:
y_temp[k] = y_temp[k].append('5')
else:
y[k] = y_temp[k].append('0')
elif y[j,k] == Dict[6]:
y_temp[k] = y_temp[k].append('6')
else:
y[k] = y_temp[k].append('0')
elif y[j,k] == Dict[7]:
y_temp[k] = y_temp[k].append('7')
else:
y[k] = y_temp[k].append('0')
You should build your dictionary the other way around (i.e. keys should be the labels). This would allow you to convert the labels into indexes.
To obtain your final list of 1s and 0s, you don't need to go through an intermediate step with a list of indexes, you can build that list directly from the source data:
Dict = {'label1':1,
'label2':2,
'label3':3,
'label4':4,
'label5':5,
'label6':6,
'label7':7}
lines1 = """label1
label4
label3""".split("\n")
lines2 = """label3
label1""".split("\n")
lbl = [lines1,lines2] # <-- this is a list of lists (of strings) like yours
result = [0]+[0]*max(Dict.values())
for lineList in lbl:
for line in lineList:
result[Dict.get(line,0)] = 1 # <-- notice how this is using line, not lbl
result = result[1:]
print(result)
# [1, 0, 1, 1, 0, 0, 0]
I agree with #Alain T. that it's better to reverse the Dict. However in case you want to keep it as it is:
Dict = {1:'label1',2:'label2',3:'label3',4:'label4',5:'label5',6:'label6',7:'label7'}
lables_arr=['label1','label4','label3']
nums_arr=[]
for x,y in Dict.items():
for z in lables_arr:
if z==y:
nums_arr.append(x)
nums_arr.sort()
final=[]
for i in range(len(Dict)):
if i not in nums_arr:
final.append(0)
else:
final.append(1)
print(final)
Output:
[0, 1, 0, 1, 1, 0, 0]
There was a little bit not quite right in each version of the solution. I ended creating a solution that combined some components of both. Thank you both #Alain T and #Phineas for your wonderful solutions and answers to my questions. I couldn't have done it without either of you. Thanks!!
Dict = {'label1': 0,
'label2': 1,
'label3': 2,
'label4': 3,
'label5': 4,
'label6': 5,
'label7': 6}
labels_arr = [['label1', 'label5', 'label4'], ['label1', 'label4', 'label3'],
['label1', 'label3'], ['label1'], ['label1', 'label4', 'label3'],
['label1', 'label3', 'label4'],
['label1', 'label2', 'label3', 'label4', 'label5', 'label6', 'label7']]
nums_arr =[] # this array saves the list after each loop
for i in range(len(labels_arr)): # needed first to loop through the list of lists
nums_arr_i=[] # this array needed to append the 1's and 0's to it
for key in Dict.keys(): # after we loop through the Dict keys first
if key in labels_arr[i]: # compares the keys to original labels array at [i]
nums_arr_i.append(1) # append 1 or 0 if it matches or not
else:
nums_arr_i.append(0)
nums_arr.append(nums_arr_i) # end result list of 7 1's or 0's is appended to
print('nums_arr= ', nums_arr) # nums_arr and we loop to the next i in labels_arr
# End Result
nums_arr= [[1, 0, 0, 1, 1, 0, 0], [1, 0, 1, 1, 0, 0, 0], [1, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0], [1, 0, 1, 1, 0, 0, 0], [1, 0, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1]]
I have a series of functions that end up giving a list, with the first item containing a number, derived from the dictionaries, and the second and third items are dictionaries.
These dictionaries have been previously randomly generated.
The function I am using generates a given number of these dictionaries, trying to get the highest number possible as the first item. (It's designed to optimise dice rolls).
This all works fine, and I can print the value of the highest first item from all iterations. However, when I try and print the two dictionaries associated with this first number (bearing in mind they're all in a list together), it just seemingly randomly generates the two other dictionaries.
def repeat(type, times):
best = 0
for i in range(0, times):
x = rollForCharacter(type)
if x[0] > best:
print("BEST:", x)
best = x[0]
print("The highest average success is", best)
return best
This works great. The last thing shown is:
BEST: (3.58, [{'strength': 4, 'intelligence': 1, 'charisma': 1, 'stamina': 4, 'willpower': 2, 'dexterity': 2, 'wits': 5, 'luck': 2}, {'agility': 1, 'brawl': 2, 'investigation': 3, 'larceny': 0, 'melee': 1, 'survival': 0, 'alchemy': 3, 'archery': 0, 'crafting': 0, 'drive': 1, 'magic': 0, 'medicine': 0, 'commercial': 0, 'esteem': 5, 'instruction': 2, 'intimidation': 2, 'persuasion': 0, 'seduction': 0}])
The highest average success is 3.58
But if I try something to store the list which gave this number:
def repeat(type, times):
best = 0
bestChar = []
for i in range(0, times):
x = rollForCharacter(type)
if x[0] > best:
print("BEST:", x)
best = x[0]
bestChar = x
print("The highest average success is", best)
print("Therefore the best character is", bestChar)
return best, bestChar
I get this as the last result, which is fine:
BEST: (4.15, [{'strength': 2, 'intelligence': 3, 'charisma': 4, 'stamina': 4, 'willpower': 1, 'dexterity': 2, 'wits': 4, 'luck': 1}, {'agility': 1, 'brawl': 0, 'investigation': 5, 'larceny': 0, 'melee': 0, 'survival': 0, 'alchemy': 7, 'archery': 0, 'crafting': 0, 'drive': 0, 'magic': 0, 'medicine': 0, 'commercial': 1, 'esteem': 0, 'instruction': 3, 'intimidation': 0, 'persuasion': 0, 'seduction': 0}])
The highest average success is 4.15
but the last line is
Therefore the best character is (4.15, [{'strength': 1, 'intelligence': 3, 'charisma': 4, 'stamina': 4, 'willpower': 1, 'dexterity': 2, 'wits': 2, 'luck': 3}, {'agility': 1, 'brawl': 0, 'investigation': 1, 'larceny': 4, 'melee': 2, 'survival': 0, 'alchemy': 2, 'archery': 4, 'crafting': 0, 'drive': 0, 'magic': 0, 'medicine': 0, 'commercial': 1, 'esteem': 0, 'instruction': 0, 'intimidation': 2, 'persuasion': 1, 'seduction': 0}])
As you can see this doesn't match with what I want, and what is printed literally right above it.
Through a little bit of checking, I realised what it gives out as the "Best Character" is just the last one generated, which is not the best, just the most recent. However, it isn't that simple, because the first element IS the highest result that was recorded, just not from the character in the rest of the list. This is really confusing because it means the list is somehow being edited but at no point can I see where that would happen.
Am I doing something stupid whereby the character is randomly generated every time? I wouldn't think so since x[0] gives the correct result and is stored fine, so what changes when it's the whole list?
From the function rollForCharacter() it returns rollResult, character which is just the number and then the two dictionaries.
I would greatly appreciate it if anyone could figure out and explain where I'm going wrong and why it can print the correct answer to the console yet not store it correctly a line below!
EDIT:
Dictionary 1 Code:
attributes = {}
def assignRow(row, p): # p is the number of points you have to assign to each row
rowValues = {}
for i in range(0, len(row)-1):
val = randint(0, p)
rowValues[row[i]] = val + 1
p -= val
rowValues[row[-1]] = p + 1
return attributes.update(rowValues)
def getPoints():
points = [7, 5, 3]
shuffle(points)
row1 = ['strength', 'intelligence', 'charisma']
row2 = ['stamina', 'willpower']
row3 = ['dexterity', 'wits', 'luck']
for i in range(0, len(points)):
row = eval("row" + str(i+1))
assignRow(row, points[i])
Dictionary 2 Code:
skills = {}
def assignRow(row, p): # p is the number of points you have to assign to each row
rowValues = {}
for i in range(0, len(row) - 1):
val = randint(0, p)
rowValues[row[i]] = val
p -= val
rowValues[row[-1]] = p
return skills.update(rowValues)
def getPoints():
points = [11, 7, 4]
shuffle(points)
row1 = ['agility', 'brawl', 'investigation', 'larceny', 'melee', 'survival']
row2 = ['alchemy', 'archery', 'crafting', 'drive', 'magic', 'medicine']
row3 = ['commercial', 'esteem', 'instruction', 'intimidation', 'persuasion', 'seduction']
for i in range(0, len(points)):
row = eval("row" + str(i + 1))
assignRow(row, points[i])
It does look like the dictionary is being re-generated, which could easily happen if the function rollForCharacter returns either a generator or alternatively is overwriting a global variable which is being overwritten by a subsequent cycle of the loop.
A simple-but-hacky way to solve the problem would be to take a deep copy of the dictionary at the time of storing, so that you're sure you're keeping the values at that point:
def repeat(type, times):
best = 0
bestChar = []
for i in range(0, times):
x = rollForCharacter(type)
if x[0] > best:
print("BEST:", x)
best = x[0]
# Create a brand new tuple, containing a copy of the current dict
bestChar = (x[0], x[1].copy())
The correct answer would be however to pass a unique dictionary variable that is not affected by later code.
See this SO answer with a bit more context about how passing a reference to a dictionary can be risky as it's still a mutable object.
I hope you are all well.
This is how my data looks:
dictionary1 = {2876: 1, 9212: 1, 953997: 1, 9205: 1, 9206: 1, 9207: 1, 9208: 1, 9209: 1, 9210: 1, 9211: 1, 6908: 1, 1532: 1, 945237: 1, 6532: 2, 6432: 4}
data1 = [[2876, 5423],[2312, 4532],[953997, 5643]...]
I am trying to run a statement that looks like this:
for y in data1:
if y[0] in dictionary1 and dictionary1[y[0]] == 1:
dictionary1[y[1]] = 2
Presumably this would create a new dataset looking like this:
dictionary1 = {5423: 2, 953997: 2, 2876: 1, 9212: 1, 953997: 1, 9205: 1, 9206: 1, 9207: 1, 9208: 1, 9209: 1, 9210: 1, 9211: 1, 6908: 1, 1532: 1, 945237: 1, 6532: 2, 6432: 4}
What am I doing wrong? Is dictionary1[y[0]] == 1 the correct way to check a key's value?
Thank you everyone.
Dictionary comprehension converts the list of lists to a dictionary:
dict1 = {t[0]:t[1:] for t in dictionary1}
Then it should be easy to do what you want:
for y in data1:
if y in dict1 and dict1[y] ==1:
dictionary1[y] = 2
You can use dict.get(key, default) to avoid an exception for missing values, and provide a safe default. This reduces your loop to a single condition:
#!python3
dictionary1 = {2876: 1, 9212: 1, 953997: 1, 9205: 1, 9206: 1, 9207: 1, 9208: 1, 9209: 1, 9210: 1, 9211: 1, 6908: 1, 1532: 1, 945237: 1, 6532: 2, 6432: 4}
data1 = [[2876, 5423],[2312, 4532],[953997, 5643]]
for x,y in data1:
if dictionary1.get(x, 0) == 1:
dictionary1[y] = 2
print(dictionary1)
You could use dict.update(other) to bulk-overwrite the values in dictionary1 with a one-liner dict comprehension:
dictcompr = {b:2 for a,b in data1 if dictionary1.get(a,0) == 1}
dictionary1.update(dictcompr)
And then you can combine them into one single, unholy, unmaintainable, barely-readable mess:
dictionary1.update({b:2 for a,b in data1 if dictionary1.get(a,0) == 1})
Update:
To delete all keys having a value of 1, you have some choices:
for k,v in dictionary1.items():
if v == 1:
del dictionary1[k]
# Versus:
d2 = dict(filter(lambda item: item[1] != 1, dictionary1.items()))
dictionary1 = d2
# or
dictionary1.clear()
dictionary1.update(d2)
Frankly, for your purposes the for loop is probably better. The filter approach can take the lambda as a parameter, to configure what gets filtered. Using clear()/update() is a win if you expect multiple references to the dictionary. That is, A = B = dictionary1. In this case, clear/update would keep the same underlying object, so the linkage still holds. (This is also true of the for loop - the benefit is solely for the filter which requires a temporary.)
please try this,
for y in data1:
if y[0] in dictionary1.keys() and dictionary1.keys() == y[0]:
dictionary1[y[1]] = 2
u can simply use
for y in data1:
if dictionary1.has_key(y[0]):
dictionary1[y[1]] = 2
Hope this is what u r looking for .
Given a list of data as follows:
input = [1,1,1,1,5,5,3,3,3,3,3,3,2,2,2,5,5]
I would like to create an algorithm that is able to offset the list of certain number of steps. For example, if the offset = -1:
def offsetFunc(inputList, offsetList):
#make something
return output
where:
output = [0,0,0,0,1,1,5,5,5,5,5,5,3,3,3,2,2]
Important Note: The elements of the list are float numbers and they are not in any progression. So I actually need to shift them, I cannot use any work-around for getting the result.
So basically, the algorithm should replace the first set of values (the 4 "1", basically) with the 0 and then it should:
Detect the lenght of the next range of values
Create a parallel output vectors with the values delayed by one set
The way I have roughly described the algorithm above is how I would do it. However I'm a newbie to Python (and even beginner in general programming) and I have figured out time by time that Python has a lot of built-in functions that could make the algorithm less heavy and iterating. Does anyone have any suggestion to better develop a script to make this kind of job? This is the code I have written so far (assuming a static offset at -1):
input = [1,1,1,1,5,5,3,3,3,3,3,3,2,2,2,5,5]
output = []
PrevVal = 0
NextVal = input[0]
i = 0
while input[i] == NextVal:
output.append(PrevVal)
i += 1
while i < len(input):
PrevVal = NextVal
NextVal = input[i]
while input[i] == NextVal:
output.append(PrevVal)
i += 1
if i >= len(input):
break
print output
Thanks in advance for any help!
BETTER DESCRIPTION
My list will always be composed of "sets" of values. They are usually float numbers, and they take values such as this short example below:
Sample = [1.236,1.236,1.236,1.236,1.863,1.863,1.863,1.863,1.863,1.863]
In this example, the first set (the one with value "1.236") is long 4 while the second one is long 6. What I would like to get as an output, when the offset = -1, is:
The value "0.000" in the first 4 elements;
The value "1.236" in the second 6 elements.
So basically, this "offset" function is creating the list with the same "structure" (ranges of lengths) but with the values delayed by "offset" times.
I hope it's clear now, unfortunately the problem itself is still a bit silly to me (plus I don't even speak good English :) )
Please don't hesitate to ask any additional info to complete the question and make it clearer.
How about this:
def generateOutput(input, value=0, offset=-1):
values = []
for i in range(len(input)):
if i < 1 or input[i] == input[i-1]:
yield value
else: # value change in input detected
values.append(input[i-1])
if len(values) >= -offset:
value = values.pop(0)
yield value
input = [1,1,1,1,5,5,3,3,3,3,3,3,2,2,2,5,5]
print list(generateOutput(input))
It will print this:
[0, 0, 0, 0, 1, 1, 5, 5, 5, 5, 5, 5, 3, 3, 3, 2, 2]
And in case you just want to iterate, you do not even need to build the list. Just use for i in generateOutput(input): … then.
For other offsets, use this:
print list(generateOutput(input, 0, -2))
prints:
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 5, 5, 5, 3, 3]
Using deque as the queue, and using maxlen to define the shift length. Only holding unique values. pushing inn new values at the end, pushes out old values at the start of the queue, when the shift length has been reached.
from collections import deque
def shift(it, shift=1):
q = deque(maxlen=shift+1)
q.append(0)
for i in it:
if q[-1] != i:
q.append(i)
yield q[0]
Sample = [1.236,1.236,1.236,1.236,1.863,1.863,1.863,1.863,1.863,1.863]
print list(shift(Sample))
#[0, 0, 0, 0, 1.236, 1.236, 1.236, 1.236, 1.236, 1.236]
My try:
#Input
input = [1,1,1,1,5,5,3,3,3,3,3,3,2,2,2,5,5]
shift = -1
#Build service structures: for each 'set of data' store its length and its value
set_lengths = []
set_values = []
prev_value = None
set_length = 0
for value in input:
if prev_value is not None and value != prev_value:
set_lengths.append(set_length)
set_values.append(prev_value)
set_length = 0
set_length += 1
prev_value = value
else:
set_lengths.append(set_length)
set_values.append(prev_value)
#Output the result, shifting the values
output = []
for i, l in enumerate(set_lengths):
j = i + shift
if j < 0:
output += [0] * l
else:
output += [set_values[j]] * l
print input
print output
gives:
[1, 1, 1, 1, 5, 5, 3, 3, 3, 3, 3, 3, 2, 2, 2, 5, 5]
[0, 0, 0, 0, 1, 1, 5, 5, 5, 5, 5, 5, 3, 3, 3, 2, 2]
def x(list, offset):
return [el + offset for el in list]
A completely different approach than my first answer is this:
import itertools
First analyze the input:
values, amounts = zip(*((n, len(list(g))) for n, g in itertools.groupby(input)))
We now have (1, 5, 3, 2, 5) and (4, 2, 6, 3, 2). Now apply the offset:
values = (0,) * (-offset) + values # nevermind that it is longer now.
And synthesize it again:
output = sum([ [v] * a for v, a in zip(values, amounts) ], [])
This is way more elegant, way less understandable and probably way more expensive than my other answer, but I didn't want to hide it from you.
I have a dictionary with 3 values associated to each key,
I want to know how to increment the values of noTaken & totalBranch as i pass through the data file, as my current method doesn't change the values, the output of lookup gives me (1,1,0) or (0,1,0) - i need the first two values to increase
for line in datafile.readlines():
items = line.split(' ')
instruction = items[1]
if lookup.has_key(instruction):
if (taken == 1):
lookup[instruction] = (noTaken + 1, totalBranch + 1, prediction)
else:
lookup[instruction] = (noTaken, totalBranch + 1, prediction)
else:
if (taken == 1):
lookup[instruction] = (1, 1, prediction)
else:
lookup[instruction] = (0, 1, prediction)
(noTaken, prediction & totalBranch are all initialised as 0 above this)
Thanks in advance!
A cleaner way to initialize is to use defaultdict , then you can directly refer to elements in dict values e.g.
from collections import defaultdict
lookup = defaultdict(lambda: [0,0,0])
lookup['a'][0] += 1
lookup['b'][1] += 1
lookup['a'][0] += 1
print lookup
output:
{'a': [2, 0, 0], 'b': [0, 1, 0]}
Also note that I am defaulting value to a list instead of tuple so that we can modify values in place, tuple being immutable can't be modified