Related
I'm trying to convert a numpy array into a new array by using each value in the existing array and finding its corresponding key from a dictionary. The new array should consist of the corresponding dictionary keys.
Here is what I have:
# dictionary where values are lists
available_weights = {0.009174311926605505: [7, 14, 21, 25, 31, 32, 35, 45, 52, 82, 83, 96, 112, 119, 142], 0.009523809523809525: [33, 37, 43, 44, 69, 73, 75, 78, 79, 80, 102, 104, 110, 115, 150], 0.1111111111111111: [91], 0.019230769230769232: [36, 50, 127, 139], 0.010869565217391304: [10, 48, 55, 62, 77, 88, 103, 124, 131, 137, 147], 0.014084507042253521: [2, 3, 4, 22, 27, 30, 41, 53, 87, 122, 123, 132, 143], 0.011494252873563218: [20, 34, 99, 125, 135, 138, 141], 0.045454545454545456: [0, 109], 0.01818181818181818: [49, 64, 72, 90, 146, 148], 0.07142857142857142: [106], 0.01282051282051282: [16, 63, 68, 98, 114, 130, 145], 0.010638297872340425: [8, 28, 40, 57, 61, 66, 71, 74, 76, 84, 85, 86, 128, 144], 0.02040816326530612: [6, 65], 0.021739130434782608: [29, 67, 92, 93], 0.02127659574468085: [47, 118, 120], 0.011111111111111112: [1, 13, 19, 24, 42, 54, 70, 89, 94, 107, 117, 126, 129, 140], 0.015625: [38, 60, 101, 133, 134, 136], 0.03333333333333333: [56, 58, 97, 121], 0.016666666666666666: [5, 26, 105, 113], 0.014705882352941176: [17, 46, 95]}
# existing numpy array
train_idx = [134, 45, 137, 140, 79, 98, 128, 80, 99, 71, 145, 35, 94, 122, 77, 23, 113, 44, 68, 21, 20, 125, 74, 139, 29, 109, 25, 34, 6, 81, 22, 114, 12, 95, 150, 106, 84, 19, 58, 59, 88, 143, 136, 43, 72, 132, 117, 13, 65, 111, 39, 14, 56, 11, 26, 90, 119, 112, 27, 57, 46, 147, 123, 16, 36, 100, 141, 38, 62, 32, 75, 146, 89, 37, 31, 40, 64, 87, 3, 103, 102, 104, 78, 53, 1, 142, 47, 130, 105, 4, 93, 52, 42, 10, 9, 115, 76, 54, 49, 116, 69, 5, 86, 66, 101, 107, 96, 110, 8, 73, 121, 138, 67, 124, 108, 97, 120, 2, 148, 127, 135, 18, 149, 82, 41, 144, 129, 118, 51, 126, 33, 85, 24, 0, 61, 92, 70, 15, 17, 50, 83, 30, 28, 91, 60, 48, 133, 55, 63, 7, 131]
So I want to use each value in train_idx to find the corresponding dictionary key in available_weights. The expected output should look like this (with a length of all 150 values):
new_array = [0.015625, 0.009174311926605505, 0.010869565217391304, ... ,0.01282051282051282, 0.009174311926605505, 0.010869565217391304]
Any help would be appreciated!
result = []
flipped = dict()
for value in train_idx:
flipped[value] = []
for key in available_weights:
if value in available_weights[key]:
flipped[value].append(key)
result.append(key)
I need to plot a QQ graph with the following information:
spcs2k = np.array([[ 49, 524, 16, 87, 157, 58, 4, 41, 110, 90, 2, 41, 136,
495, 249, 40, 48, 3, 72, 294, 49, 28, 163, 61, 89, 2,
168, 286, 23, 67, 19, 11, 63, 4, 246, 130, 2, 378, 176,
251, 78, 138, 97, 34, 33, 183, 12, 209, 82, 87, 9, 33,
19, 77, 54, 28, 59, 88, 202, 12, 53, 86, 146, 26, 112,
176, 35, 94, 180, 93, 8, 32, 26, 5, 145, 13, 5, 138,
205, 42, 17, 134, 19, 54, 133, 134, 10, 173, 3, 59, 223,
109, 175, 266, 314, 68, 283, 71, 77, 147, 32, 70, 131, 112,
32, 29, 19, 28, 85, 25, 57, 16, 130, 157, 13, 167, 29,
2, 442, 10, 150, 185, 95, 57, 63, 150, 41, 22, 72, 59,
2, 8, 5, 156, 51, 161, 243, 152, 289, 93, 34, 140, 74,
34, 37, 9, 121, 138, 94, 67, 65, 202, 67, 13, 240, 209,
2, 296, 6, 61, 2, 134, 196]])
import statsmodels.api as sm
import scipy.stats as stats
from matplotlib import pyplot as plt
fig = sm.qqplot(spcsk2,stats.expon,line="45")
plt.show()
but i get this:
and the idea is get a similar graph like this:
thanks for supporting me
Whoever has this problem should pass fit=True to the sm.qqplot method. This will auto-determine parameters such as loc, scale, and distargs. See docs here: https://www.statsmodels.org/dev/generated/statsmodels.graphics.gofplots.qqplot.html .
The code works fine, it does what it should. QQ plot show if the data that you pass to it is normally distributed or not. In your case this means that the values are not even vaguely normally distributed in spcs2k.
If you run this code, you can see what it looks like on a dataset that came from normal distribution.
data = np.random.normal(0,1, 1000)
fig = sm.qqplot(data, line='45')
plt.show()
I have the following box plot which plots some values with different mean and median values for each box; I am wondering if there is any way to label them so that they appear on the graph legend (because the current box plot plots an orange line for the median and a blue dot for the mean and it is not so clear which is which)? Also is there a way to make one legend for these subplots, instead of having a legend for each one, since they are essentially the same objects just different data?
Here's a code example for one of the subplots, the other subplots are the same but have different data:
fig = plt.figure()
xlim = (4, 24)
ylim = (0, 3700)
plt.subplot(1,5,5)
x_5_diff = {5: [200, 200, 291, 200, 291, 200, 291, 200, 291, 200, 291, 200, 291, 200, 291],
7: [161, 161, 179, 161, 179, 161, 179, 161, 179, 161, 179, 161, 179, 161, 179],
9: [205, 205, 109, 205, 109, 205, 109, 205, 109, 205, 109, 205, 109, 205, 109],
11: [169, 169, 95, 169, 95, 169, 95, 169, 95, 169, 95, 169, 95, 169, 95],
13: [43, 43, 70, 43, 70, 43, 70, 43, 70, 43, 70, 43, 70, 43, 70],
15: [33, 33, 39, 33, 39, 33, 39, 33, 39, 33, 39, 33, 39, 33, 39],
17: [23, 23, 126, 23, 126, 23, 126, 23, 126, 23, 126, 23, 126, 23, 126],
19: [17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17],
21: [15, 15, 120, 15, 120, 15, 120, 15, 120, 15, 120, 15, 120, 15, 120],
23: [63, 63, 25, 63, 25, 63, 25, 63, 25, 63, 25, 63, 25, 63, 25]}
keys = sorted(x_5_diff)
plt.boxplot([x_5_diff[k] for k in keys], positions=keys) # box-and-whisker plot
plt.hlines(y = 1600, colors= 'r', xmin = 5, xmax = 23, label = "Level 1 Completed")
plt.title("x = 5 enemies")
plt.ylim(0,3700)
plt.plot(keys, [sum(x_5_diff[k]) / len(x_5_diff[k]) for k in keys], '-o')
plt.legend()
plt.show()
Any help would be appreciated.
Its a bit late, but try this:
bp = plt.boxplot([x_5_diff[k] for k in keys], positions=keys)
# You can access boxplot items using ist dictionary
plt.legend([bp['medians'][0], bp['means'][0]], ['median', 'mean'])
Store the mean as a separate vector. Loop over the vectors to plot.
(Will try to give implementation, as soon as I have my laptop)
I have a CSV data file, 100 columns * 100,000 lows and one header.
First, I want to make a list containing 1st, 3rd, and 5th to 100,000th columns data of original CSV data file.
In that case, I think I can use the script like below.
#Load data
xy = np.loadtxt('CSV data.csv', delimiter=',', skiprows=1)
x = xy[:,[1,3,5,6,7,8,9,10,11 .......,100000]]
But, as you know, it is not good method. It is difficult to type and it is not good for generalization.
First, I thought the below script could be used but, failed.
x = xy[:,[1,3,5:100000]]
How can I make a separate list using specific columns data, separated and continuous?
np.r_ is a convenience function (actually an object that takes []), that generates an array of indices:
In [76]: np.r_[1,3,5:100]
Out[76]:
array([ 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
This should be usable for both xy[:,np.r_[...]] and the usecols parameter.
In [78]: np.arange(300).reshape(3,100)[:,np.r_[1,3,5:100:10]]
Out[78]:
array([[ 1, 3, 5, 15, 25, 35, 45, 55, 65, 75, 85, 95],
[101, 103, 105, 115, 125, 135, 145, 155, 165, 175, 185, 195],
[201, 203, 205, 215, 225, 235, 245, 255, 265, 275, 285, 295]])
Just use the usecols parameter in np.loadtxt().:
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.loadtxt.html
Another option is to define x by removing columns from xy:
x = np.delete(xy, [0,2,4], axis=1)
Using two Pandas series: series1, and series2, I am willing to make series3.
Each value of series1 is a list, and each value of series2 is a corresponding index of series1.
>>> print(series1)
0 [481, 12, 11, 220, 24, 24, 645, 153, 15, 13, 6...
1 [64, 80, 79, 147, 14, 20, 56, 288, 12, 208, 26...
4 [5, 6, 152, 31, 295, 127, 711, 5, 271, 291, 11...
5 [363, 121, 727, 249, 483, 122, 241, 494, 555]
7 [112, 20, 41, 9, 104, 131, 26, 298, 65, 214, 1...
9 [129, 797, 19, 151, 448, 47, 19, 106, 299, 144...
11 [72, 35, 25, 200, 122, 5, 75, 30, 208, 24, 14,...
18 [137, 339, 71, 14, 19, 54, 61, 15, 73, 104, 43...
>>> print(series2)
0 0
1 3
4 1
5 6
7 4
9 5
11 7
18 2
What I expect:
>>> print(series3)
0 [481, 12, 11, 220, 24, 24, 645, 153, 15, 13, 6...
1 [147, 14, 20, 56, 288, 12, 208, 26...
4 [6, 152, 31, 295, 127, 711, 5, 271, 291, 11...
5 [241, 494, 555]
7 [104, 131, 26, 298, 65, 214, 1...
9 [47, 19, 106, 299, 144...
11 [30, 208, 24, 14,...
18 [71, 14, 19, 54, 61, 15, 73, 104, 43...
My solution 1:
From the fact that the length of series1 and series2 are equal, I could make a for loop to iterate series1 and calculate something like series1.ix[i][series2.ix[i]] and make a new series(series3) to save the result.
My solution 2:
Generate a dataFrame df using df = pd_concat([series1, series2]), and make a new column(row-wise operation using apply function - e.g., df['series3'] = df.apply(lambda x: subList(x), axis=1).
However, I thought above two solutions are not sharp ways to achieve what I want. I would appreciate if you suggest neater solutions!
If you are hoping to avoid creating an intermediate pd.DataFrame, and simply want a new pd.Series, you can use the pd.Series constructor on a map object. So given:
In [6]: S1
Out[6]:
0 [481, 12, 11, 220, 24, 24, 645, 153, 15, 13, 6]
1 [64, 80, 79, 147, 14, 20, 56, 288, 12, 208, 26]
2 [5, 6, 152, 31, 295, 127, 711, 5, 271, 291, 11]
3 [363, 121, 727, 249, 483, 122, 241, 494, 555]
4 [112, 20, 41, 9, 104, 131, 26, 298, 65, 214, 1]
5 [129, 797, 19, 151, 448, 47, 19, 106, 299, 144]
6 [72, 35, 25, 200, 122, 5, 75, 30, 208, 24, 14]
7 [137, 339, 71, 14, 19, 54, 61, 15, 73, 104, 43]
dtype: object
In [7]: S2
Out[7]:
0 0
1 3
2 1
3 6
4 4
5 5
6 7
7 2
dtype: int64
You can do:
In [8]: pd.Series(map(lambda x,y : x[y:], S1, S2), index=S1.index)
Out[8]:
0 [481, 12, 11, 220, 24, 24, 645, 153, 15, 13, 6]
1 [147, 14, 20, 56, 288, 12, 208, 26]
2 [6, 152, 31, 295, 127, 711, 5, 271, 291, 11]
3 [241, 494, 555]
4 [104, 131, 26, 298, 65, 214, 1]
5 [47, 19, 106, 299, 144]
6 [30, 208, 24, 14]
7 [71, 14, 19, 54, 61, 15, 73, 104, 43]
dtype: object
If you want to modify S1 without creating an intermediate container, you can use a for-loop:
In [10]: for i, x in enumerate(map(lambda x,y : x[y:], S1, S2)):
...: S1.iloc[i] = x
...:
In [11]: S1
Out[11]:
0 [481, 12, 11, 220, 24, 24, 645, 153, 15, 13, 6]
1 [147, 14, 20, 56, 288, 12, 208, 26]
2 [6, 152, 31, 295, 127, 711, 5, 271, 291, 11]
3 [241, 494, 555]
4 [104, 131, 26, 298, 65, 214, 1]
5 [47, 19, 106, 299, 144]
6 [30, 208, 24, 14]
7 [71, 14, 19, 54, 61, 15, 73, 104, 43]
dtype: object
You can basically concatinate series specifiying wich axis(0=row,1 column),better be of the same lenght
series3=pd.concat([series2, series1], axis=1).reset_index()