So I have this bit of code:
Points = [ [400,100],[600,100],[800,100] , [300,300],[400,300],[500,300],[600,300] , [200,500],[400,500],[600,500],[800,500],[1000,500] , [300,700],[500,700][700,700][900,700] , [200,900],[400,900],[600,900] ]
And it produces this Error:
line 43, in <module>
Points = [ [400,100],[600,100],[800,100] , [300,300],[400,300],[500,300],[600,300] , [200,500],[400,500],[600,500],[800,500],[1000,500] , [300,700],[500,700][700,700][900,700] , [200,900],[400,900],[600,900] ]
TypeError: list indices must be integers, not tuple
What can I do to fix it?
You forgot two commas:
[500,700][700,700][900,700]
Now Python sees an attempt to index the list on the left-hand side with a (700, 700) tuple:
>>> [500,700][700,700]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not tuple
The second [900, 700] 'list' would give you the same problem but doesn't yet come into play.
Fix it by adding commas between:
[500, 700], [700, 700], [900, 700]
or, as a complete list:
Points = [[400, 100], [600, 100], [800, 100], [300, 300], [400, 300], [500, 300], [600, 300], [200, 500], [400, 500], [600, 500], [800, 500], [1000, 500], [300, 700], [500, 700], [700, 700], [900, 700], [200, 900], [400, 900], [600, 900]]
You forgot to seperate a few by commas. See the fix.
>>> Points = [[400,100], [600,100], [800,100], [300,300], [400,300], [500,300], [600,300] ,[200,500], [400,500], [600,500], [800,500], [1000,500], [300,700], [500,700], [700,700],[900,700], [200,900], [400,900], [600,900]]
Forgetting the commas leads Python to believe that you're trying to access the first list with the second, which throws an error.
You need to separate each of the lists (in the outer list) with a ,:
Points = [ [400,100],[600,100],[800,100] , [300,300],[400,300],[500,300],[600,300] ,[200,500],[400,500],[600,500],[800,500],[1000,500] , [300,700],[500,700],[700,700],[900,700] , [200,900],[400,900],[600,900] ]
Related
I tried to construct a pipeline that has some optional steps. However, I would like to optimize hyperparameters for those steps as I want to get the best option between not using them and using them with different configurations (in my case SelectFromModel - sfm).
clf = RandomForestRegressor(random_state = 1)
stdscl = StandardScaler()
sfm = SelectFromModel(RandomForestRegressor(random_state=1))
p_grid_lr = {"clf__max_depth": [10, 50, 100, None],
"clf__n_estimators": [10, 50, 100, 200, 500, 800],
"clf__max_features":[0.1, 0.5, 1.0,'sqrt','log2'],
"sfm": ['passthrough', sfm],
"sfm__max_depth": [10, 50, 100, None],
"sfm__n_estimators": [10, 50, 100, 200, 500, 800],
"sfm__max_features":[0.1, 0.5, 1.0,'sqrt','log2'],
}
pipeline=Pipeline([
('scl',stdscl),
('sfm',sfm),
('clf',clf)
])
gs_clf = GridSearchCV(estimator = pipeline, param_grid = p_grid_lr, cv =KFold(shuffle = True, n_splits = 5, random_state=1),scoring = 'r2', n_jobs =- 1)
gs_clf.fit(X_train, y_train)
clf = gs_clf.best_estimator_
The error that I get is 'string' object has no attribute 'set_params' which is understandable. Is there a way to specify which combinations should be tried together, in my case only 'passthrough' by itself and sfm with different hyperparameters?
Thanks!
As specified by #Robin, you might define p_grid_lr as a list of dictionaries. Indeed, here is what the docs of GridSearchCV states at this proposal:
param_grid: dict or list of dictionaries
Dictionary with parameters names (str) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.
p_grid_lr = [
{
"clf__max_depth": [10, 50, 100, None],
"clf__n_estimators": [10, 50, 100, 200, 500, 800],
"clf__max_features": [0.1, 0.5, 1.0,'sqrt','log2'],
"sfm__estimator__max_depth": [10, 50, 100, None],
"sfm__estimator__n_estimators": [10, 50, 100, 200, 500, 800],
"sfm__estimator__max_features": [0.1, 0.5, 1.0,'sqrt','log2'],
},
{
"clf__max_depth": [10, 50, 100, None],
"clf__n_estimators": [10, 50, 100, 200, 500, 800],
"clf__max_features": [0.1, 0.5, 1.0,'sqrt','log2'],
"sfm": ['passthrough'],
}
]
A less scalable alternative (for your case) might be the following
p_grid_lr_ = {
"clf__max_depth": [10, 50, 100, None],
"clf__n_estimators": [10, 50, 100, 200, 500, 800],
"clf__max_features": [0.1, 0.5, 1.0,'sqrt','log2'],
"sfm": ['passthrough',
SelectFromModel(RandomForestRegressor(random_state=1, max_depth=10, n_estimators=10, max_features=0.1)),
SelectFromModel(RandomForestRegressor(random_state=1, max_depth=10, n_estimators=50, max_features=0.1)),
...]
}
specifying all of the possible combinations for your parameters.
Moreover, be aware that to access parameters max_depth, n_estimators and max_features from the RandomForestRegressor estimator within SelectFromModel you should type parameters as
"sfm__estimator__max_depth": [10, 50, 100, None],
"sfm__estimator__n_estimators": [10, 50, 100, 200, 500, 800],
"sfm__estimator__max_features": [0.1, 0.5, 1.0,'sqrt','log2']
rather than as
"sfm__max_depth": [10, 50, 100, None],
"sfm__n_estimators": [10, 50, 100, 200, 500, 800],
"sfm__max_features": [0.1, 0.5, 1.0,'sqrt','log2']
because these parameters are from the estimator itself (max_features in principle might also be a parameter from SelectFromModel, but in such a case it may only attain integer values as from docs).
In general you can access all the parameters to be possibly optimized via pipeline.get_params().keys() (estimator.get_params().keys() in general).
Eventually, here's a nice reading from the user guide for Pipelines.
Referring to this example you could just make a list of dictionaries. One containing sfm and its related parameters and the other one not using "passthrough".
I have the ohlc list as below:
ohlc = [["open", "high", "low", "close"],
[100, 110, 70, 100],
[200, 210, 180, 190],
[300, 310, 300, 310]]
I want to slice it as:
[["open"],[100],[200],[300]]
We can easily slice that list using numpy, but I don't know how to do it without numpy's help.
I tried the method listed below but it didn't show the value I wanted:
ohlc[:][0]
ohlc[:][:1]
ohlc[0][:]
The zip function gets you tuples containing elements from the i-th index of every sublist:
In [217]: ohlc = [["open", "high", "low", "close"],
...: [100, 110, 70, 100],
...: [200, 210, 180, 190],
...: [300, 310, 300, 310]]
...:
In [218]: for t in zip(*ohlc): print(t)
('open', 100, 200, 300)
('high', 110, 210, 310)
('low', 70, 180, 300)
('close', 100, 190, 310)
You're looking for the first one of these, you call on your friend next().
In [219]: next(zip(*ohlc))
Out[219]: ('open', 100, 200, 300)
But that's just a single tuple with all the elements and not a list of lists like you wanted, so use a list comprehension:
In [220]: [[t] for t in next(zip(*ohlc))]
Out[220]: [['open'], [100], [200], [300]]
You can iterate over the list and take the element in index in every sub list
ohlc = [["open", "high", "low", "close"],
[100, 110, 70, 100],
[200, 210, 180, 190],
[300, 310, 300, 310]]
index = 0
result = [[o[index]] for o in ohlc] # [['open'], [100], [200], [300]]
Can it be the right solution this?
list_ = []
for i in ohlc:
list_.append((i[0]))
def slice_list(input):
ans = []
for x in input:
ans.append(x[0])
return ans
I'm stuck again on trying to make this merge sort work.
Currently, I have a 2d array with a Unix timecode(fig 1) and merge sorting using (fig 2) I am trying to check the first value in each array i.e array[x][0] and then move the whole array depending on array[x][0] value, however, the merge sort creates duplicates of data and deletes other data (fig 3) my question is what am I doing wrong? I know it's the merge sort but cant see the fix.
fig 1
[[1422403200 100]
[1462834800 150]
[1458000000 25]
[1540681200 150]
[1498863600 300]
[1540771200 100]
[1540771200 100]
[1540771200 100]
[1540771200 100]
[1540771200 100]]
fig 2
import numpy as np
def sort(data):
if len(data) > 1:
Mid = len(data) // 2
l = data[:Mid]
r = data[Mid:]
sort(l)
sort(r)
z = 0
x = 0
c = 0
while z < len(l) and x < len(r):
if l[z][0] < r[x][0]:
data[c] = l[z]
z += 1
else:
data[c] = r[x]
x += 1
c += 1
while z < len(l):
data[c] = l[z]
z += 1
c += 1
while x < len(r):
data[c] = r[x]
x += 1
c += 1
print(data, 'done')
unixdate = [1422403200, 1462834800, 1458000000, 1540681200, 1498863600, 1540771200, 1540771200,1540771200, 1540771200, 1540771200]
price=[100, 150, 25, 150, 300, 100, 100, 100, 100, 100]
array = np.column_stack((unixdate, price))
sort(array)
print(array, 'sorted')
fig 3
[[1422403200 100]
[1458000000 25]
[1458000000 25]
[1498863600 300]
[1498863600 300]
[1540771200 100]
[1540771200 100]
[1540771200 100]
[1540771200 100]
[1540771200 100]]
I couldn't spot any mistake in your code.
I have tried your code and I can tell that the problem does not happen, at least with regular Python lists: The function doesn't change the number of occurrence of any element in the list.
data = [
[1422403200, 100],
[1462834800, 150],
[1458000000, 25],
[1540681200, 150],
[1498863600, 300],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
]
sort(data)
from pprint import pprint
pprint(data)
Output:
[[1422403200, 100],
[1458000000, 25],
[1462834800, 150],
[1498863600, 300],
[1540681200, 150],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100]]
Edit, taking into account the numpy context and the use of np.column_stack.
-I expect what happens there is that np.column_stack actually creates a view mapping over the two arrays. To get a real array rather than a link to your existing arrays, you should copy that array:-
array = np.column_stack((unixdate, price)).copy()
Edit 2, taking into account the numpy context
This behavior has actually nothing to do with np.column_stack; np.column_stack already performs a copy.
The reason your code doesn't work is because slicing behaves differently with numpy than with python. Slicing create a view of the array which maps indexes.
The erroneous lines are:
l = data[:Mid]
r = data[Mid:]
Since l and r just map to two pieces of the memory held by data, they are modified when data is. This is why the lines data[c] = l[z] and data[c] = r[x] overwrite values and create copies when moving values.
If data is a numpy array, we want l and r to be copies of data, not just views. This can be achieved using the copy method.
l = data[:Mid]
r = data[Mid:]
if isinstance(data, np.ndarray):
l = l.copy()
r = r.copy()
This way, I tested, the copy works.
Note
If you wanted to sort the data using python lists rather than numpy arrays, the equivalent of np.column_stack in vanilla python is zip:
z = zip([10, 20, 30, 40], [100, 200, 300, 400], [1000, 2000, 3000, 4000])
z
# <zip at 0x7f6ef80ce8c8>
# `zip` creates an iterator, which is ready to give us our entries.
# Iterators can only be walked once, which is not the case of lists.
list(z)
# [(10, 100, 1000), (20, 200, 2000), (30, 300, 3000), (40, 400, 4000)]
The entries are (non-mutable) tuples. If you need the entries to be editable, map list on them:
z = zip([10, 20, 30, 40], [100, 200, 300, 400], [1000, 2000, 3000, 4000])
li = list(map(list, z))
# [[10, 100, 1000], [20, 200, 2000], [30, 300, 3000], [40, 400, 4000]]
To transpose a matrix, use zip(*matrix):
def transpose(matrix):
return list(map(list, zip(*matrix)))
transpose(l)
# [[10, 20, 30, 40], [100, 200, 300, 400], [1000, 2000, 3000, 4000]]
You can also sort a python list li using li.sort(), or sort any iterator (lists are iterators), using sorted(li).
Here, I would use (tested):
sorted(zip(unixdate, price))
What is the best way to do this? Looking to take the difference but not like this horrible way. For each A, B, C it is subtracted from subtract from
A = [500, 500, 500, 500, 5000]
B = [100, 100, 540, 550, 1200]
C = [540, 300, 300, 100, 10]
triples= [tuple(A),tuple(B), tuple(C)]
subtract_from = tuple([1234,4321,1234,4321,5555])
diff = []
for main in subtract_from:
for i in range(len(triples)):
for t in triples[i]:
diff[i].append(main-t)
Try something like this:
all_lists = [A, B, C]
[[i-j for i,j in zip(subtract_from,l)] for l in all_lists]
[
[734, 3821, 734, 3821, 555],
[1134, 4221, 694, 3771, 4355],
[694, 4021, 934, 4221, 5545]
]
It is the best practice of doing this. no need to import any library, just use builtins.
You could try using map and operator:
import operator
A = [500, 500, 500, 500, 5000]
B = [100, 100, 540, 550, 1200]
C = [540, 300, 300, 100, 10]
l = [A, B, C]
subtract_from = [1234,4321,1234,4321,5555]
diff = list((list(map(operator.sub, subtract_from , i)) for i in l))
print(diff)
# [[734, 3821, 734, 3821, 555], [1134, 4221, 694, 3771, 4355], [694, 4021, 934, 4221, 5545]]
First of all, if you want tuples, use tuples explicitly without converting lists. That being said, you should write something like this:
a = 500, 500, 500, 500, 5000
b = 100, 100, 540, 550, 1200
c = 540, 300, 300, 100, 10
vectors = a, b, c
data = 1234, 4321, 1234, 4321, 5555
diff = [
[de - ve for de, ve in zip(data, vec)]
for vec in vectors
]
If you want list of tuples, use tuple(de - ve for de, ve in zip(data, vec)) instead of [de - ve for de, ve in zip(data, vec)].
I think everyone else nails it with list comprehensions already so here's a few odd ones in cases if you are using a mutable lists and reusing it in an imperative style is acceptable style, then the following code can be done
A = [500, 500, 500, 500, 5000]
B = [100, 100, 540, 550, 1200]
C = [540, 300, 300, 100, 10]
subtract_from = (1234,4321,1234,4321,5555)
for i,x in enumerate(subtract_from):
A[i], B[i], C[i] = x-A[i], x-B[i], x-C[i]
# also with map
#for i,x in enumerate(zip(subtract_from,A,B,C)):
# A[i], B[i], C[i] = map(x[0].__sub__, x[1:])
diff = [A,B,C]
It's less elegant but more efficient*(...I have not done any benchmark for this claim)
this is my code :
attackUp = [10, 15,10, 15,10, 15]
defenceUp = [10, 15,10, 15,10, 15]
magicUp = [10, 15,10, 15,10, 15]
attType = [1,1,1,1,1,1]
weightDown = [10, 15,10, 15,10, 15]
#装饰器数据
accAttackSword = [100, 100,100, 100,100, 100]
accAttackSaber = [100, 100,100, 100,100, 100]
accAttackAx = [100, 100,100, 100,100, 100]
accAttackHammer = [100, 100,100, 100,100, 100]
accAttackSpear = [100, 100,100, 100,100, 100]
accAttackFight = [100, 100,100, 100,100, 100]
accAttackBow = [100, 100,100, 100,100, 100]
accAttackMagicGun = [100, 100,100, 100,100, 100]
accAttackMagic = [100, 100,100, 100,100, 100]
mStrInstrument = [100, 100,100, 100,100, 100]
mStrCharms = [100, 100,100, 100,100, 100]
accDefencePhy = [100, 100,100, 100,100, 100]
accDefenceMag = [100, 100,100, 100,100, 100]
accWeight = [100, 90, 0, 0, 100, 90]
#战术书数据
bookTurn = [1,1]
bookAttackPhy = [100, 100]
bookAttackMag = [100, 100]
bookStrInstrument = [100, 100]
bookStrCharms = [100, 100]
bookDefencePhy = [100, 100]
bookDefenceMag = [100, 100]
bookWeight = [100, 100]
you can see that : Many variables has the same value , but i cant define them like this :
bookAttackPhy = bookAttackMag =bookStrInstrument=bookStrCharms=bookDefencePhy=[100, 100]
because all change if one of them changes.
Which is the best and easiest to define these variables?
Well, a step in the right direction would be to create a base list and then copy it using slice notation:
base = [100, 100, 100, 100]
value_a = base[:]
value_b = base[:]
and so on. This doesn't gain you much for the shorter lists, but it should be useful for the longer ones at least.
But I think more generally, a richer data structure would be better for something like this. Why not create a class? You could then use setattr to fill up class members in a fairly straightforward way.
class Weapons(object):
def __init__(self, base):
for weapon in ["saber", "sword", "axe"]:
setattr(self, weapon, base[:])
w = Weapons([100, 100, 100])
print w.__dict__
#output: {'sword': [100, 100, 100],
# 'saber': [100, 100, 100],
# 'axe': [100, 100, 100]}
w.axe[0] = 10
print w.axe # output: [10, 100, 100]
print w.sword # output: [100, 100, 100]
Define them all as empty arrays, then group the ones that need the same values into a list and iterate through that list, assigning the common values to each variable.
You could do something like:
defaultAttack = [100, 100,100, 100,100, 100]
accAttackSword = list(defaultAttack)
accAttackSaber = list(defaultAttack)
The list() constructor makes a copy of the list, so they will be able to change independently.
You can use list multiplication
accAttackSword = [100]*6
....
bookWeight = [100]*2
....
You might consider grouping all of the variables with similar prefixes either into dictionaries or nested lists (EDIT - or classes/objects). This could have benefits later for organization, and would allow you to iterate thru and set them all to the same initial values.
bookVars = ['AttackPhy', 'AttackMag', 'StrInstrument', 'StrCharms']
bookValues = dict()
for i in bookVars:
bookValues[i] = [100]*2
And to access...
bookValues
{'AttackMag': [100, 100], 'StrCharms': [100, 100], 'StrInstrument': [100, 100], 'AttackPhy': [100, 100]}
bookValues['AttackMag']
[100, 100]
EDIT - check out senderle's thing too. at a glance his seems a little better, but id definitely consider using one of our ideas - the point is to structure it a little more. whenever you have groups of variables with similar prefixed names, consider grouping them together in a more meaningful way. you are already doing so in your mind, so make the code follow!