Subtracting a List from a Dictionary - python

I am trying to subtract a list of values from each key in a dictionary. Each key in the dictionary contains 20 y-values for a predicted line. I want to find the difference between these y-values and a different set of given values.
ydata contains 20 points. ycalc has a length of 100 to which keys are assigned for, from L1-L99. Each Key contains 20 points as well. I want to subtract each key from ydata. This is what I have tried, the main issue is that my method return a list of 20 values, when I expect a list of 100 values where each value is a list of 20 points.
ydata = [ 1.2 1.8 1.7 3.0 3.5 3.2 4.5 4.8 5.3 6.2 5.7 6.8 7.0 7.8 8.5 8.6 9.1 11.5 10.3 10.8]
ycalc = 'L0': array([-0.8, -0.6, -0.4, -0.2, 0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2,
1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. ]), 'L1': array([-0.57777778, -0.37777778, -0.17777778, 0.02222222, 0.22222222,
0.42222222, 0.62222222, 0.82222222, 1.02222222, 1.22222222,
1.42222222, 1.62222222, 1.82222222, 2.02222222, 2.22222222,
2.42222222, 2.62222222, 2.82222222, 3.02222222, 3.22222222]), 'L2': array([-0.35555556, -0.15555556, 0.04444444, 0.24444444, 0.44444444,
0.64444444, 0.84444444, 1.04444444, 1.24444444, 1.44444444,
1.64444444, 1.84444444, 2.04444444, 2.24444444, 2.44444444,
2.64444444, 2.84444444, 3.04444444, 3.24444444, 3.44444444]), 'L3': array([-0.13333333, 0.06666667, 0.26666667, 0.46666667, 0.66666667,
0.86666667, 1.06666667, 1.26666667, 1.46666667, 1.66666667,
1.86666667, 2.06666667, 2.26666667, 2.46666667, 2.66666667,
2.86666667, 3.06666667, 3.26666667, 3.46666667, 3.66666667]), 'L4': array([0.08888889, 0.28888889, 0.48888889, 0.68888889, 0.88888889,
1.08888889, 1.28888889, 1.48888889, 1.68888889, 1.88888889,
2.08888889, 2.28888889, 2.48888889, 2.68888889, 2.88888889,
3.08888889, 3.28888889, 3.48888889, 3.68888889, 3.88888889]), etc.
for i in ycalc:
ydiff = - i + array(ydata)
print(ydiff)
returns [-0.2 0. -0.5 0.4 0.5 -0.2 0.7 0.6 0.7 1.2 0.3 1. 0.8 1.2
1.5 1.2 1.3 3.3 1.7 1.8]
but I want something like this:
([-0.2 0. -0.5 0.4 0.5 -0.2 0.7 0.6 0.7 1.2 0.3 1. 0.8 1.2 1.5 1.2 1.3 3.3 1.7 1.8]), ([-0.3 0.1 -0.6 0.4 0.5 -0.2 0.2 0.6 0.8 1.2 0.5 1. 0.8 1.2 1.5 1.2 1.3 3.3 1.7 1.8]), etc.

Related

Python: multiplication with RollingGroupby

I have a dataframe as follows
id
return1
return2
weekday1
0.1
0.2
weekday1
0.2
0.4
weekday1
0.3
0.5
weekday2
0.4
0.7
weekday2
0.5
0.6
weekday2
0.6
0.1
I know how to do the rolling-groupby-sum, which is
df.groupby(df.index.dayofweek) #originally the index is a time series
.rolling(52).sum()
.droplevel(level=0).sort_index()
Now I need to add 1 to all the elements first and then multiply those in the same group as follows.
Step 1 - add 1:
id
return1
return2
weekday1
1.1
1.2
weekday1
1.2
1.4
weekday1
1.3
1.5
weekday2
1.4
1.7
weekday2
1.5
1.6
weekday2
1.6
1.1
Step2 - multiply by group:
id
return1
return2
weekday1
1.1×1.2×1.3
1.2×1.4×1.5
weekday2
1.4×1.5×1.6
1.7×1.6×1.1
I use the following codes
df.transform(lambda x : x+1).groupby(df.index.dayofweek)
.rolling(52).mul()
.droplevel(level=0).sort_index()
but it gives an AttributeError: 'RollingGroupby' object has no attribute 'mul'.
cumprod() doesn't work either. Perhaps it has somthing to do with the rolling part for that there's no such thing as rolling.cumprod() or rolling.mul().
Is there a better way to do the multiplication within a group with rolling part?
Use numpy.prod in Rolling.apply:
df.add(1).groupby(df.index.dayofweek).rolling(52).apply(np.prod)
Btw, from expected ouput seems need GroupBy.prod:
df.add(1).groupby(df.index).prod()

how to generate a list within a list delimited by a space

how do i replicate the structure of result of itertools.product?
so as you know itertools.product gives us an object and we need to put them in a list so we can print it
.. something like this.. right?
import itertools
import numpy as np
CN=np.asarray((itertools.product([0,1], repeat=5)))
print(CN)
i want to be able to make something like that but i want the data to be from a csv file.. so i want to make something like this
#PSEUDOCODE
import pandas as pd
df = pd.read_csv('csv here')
#a b c d are the columns that i want to get
x = list(df['a'] df['c'] df['c'] df['d'])
print(x)
so the result will be something like this
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1]]
how can i do that?
EDIT:
i am trying to learn how to do recursive feature elimination and i saw in some codes in google that they use the iris data set..
from sklearn import datasets
dataset = datasets.load_iris()
x = dataset.data
print(x)
and when printed it looked something like this
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1]]
how could i make my dataset something like that so i can use this RFE template ?
# Recursive Feature Elimination
from sklearn import datasets
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
# load the iris datasets
dataset = datasets.load_iris()
# create a base classifier used to evaluate a subset of attributes
model = LogisticRegression()
# create the RFE model and select 3 attributes
rfe = RFE(model, 3)
print(rfe)
rfe = rfe.fit(dataset.data, dataset.target)
print("features:",dataset.data)
print("target:",dataset.target)
print(rfe)
# summarize the selection of the attributes
print(rfe.support_)
print(rfe.ranking_)
You don't have to. If you want to use rfe.fit function, you need to feed features and target seperately.
So if your df is like:
a b c d target
0 5.1 3.5 1.4 0.2 1
1 4.9 3.0 1.4 0.2 1
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 1
5 5.4 3.9 1.7 0.4 1
6 4.6 3.4 1.4 0.3 0
7 5.0 3.4 1.5 0.2 0
8 4.4 2.9 1.4 0.2 1
9 4.9 3.1 1.5 0.1 1
you can use:
...
rfe = rfe.fit(df[['a', 'b', 'c', 'd']], df['target'])
...

Python terminal output width

My Python 3.5.2 output in the terminal (on a mac) is limited to a width of ca. 80px, even if I increase the size of the terminal window.
This narrow width causes a bunch of line breaks when outputting long arrays which is really a hassle. How do I tell python to use the full command line window width?
For the record, i am not seeing this problem in any other program, for instance my c++ output looks just fine.
For numpy, it turns out you can enable the full output by setting
np.set_printoptions(suppress=True,linewidth=np.nan,threshold=np.nan).
In Python 3.7 and above, you can use
from shutil import get_terminal_size
pd.set_option('display.width', get_terminal_size()[0])
I have the same problem while using pandas. So if this is what you are trying to solve, I fixed mine by doing
pd.set_option('display.width', pd.util.terminal.get_terminal_size()[0])
Default output of a 2x15 matrix is broken:
a.T
array([[ 0.2, -1.4, -0.8, 1.3, -1.5, -1.4, 0.6, -1.5, 0.4, -0.9, 0.3,
1.1, 0.5, -0.3, 1.1],
[ 1.3, -1.2, 1.6, -1.4, 0.9, -1.2, -1.9, 0.9, 1.8, -1.8, 1.7,
-1.3, 1.4, -1.7, -1.3]])
Output is fixed using numpy set_printoptions() command
import sys
np.set_printoptions(suppress=True,linewidth=sys.maxsize,threshold=sys.maxsize)
a.T
[[ 0.2 -1.4 -0.8 1.3 -1.5 -1.4 0.6 -1.5 0.4 -0.9 0.3 1.1 0.5 -0.3 1.1]
[ 1.3 -1.2 1.6 -1.4 0.9 -1.2 -1.9 0.9 1.8 -1.8 1.7 -1.3 1.4 -1.7 -1.3]]
System and numpy versions:
sys.version = 3.8.3 (default, Jul 2 2020, 17:30:36) [MSC v.1916 64 bit (AMD64)]
numpy.__version__ = 1.18.5

Matplotilb- Need to find source data from a class attributes

I have a lines object which was created with the following:
junk = plt.plot([xxxx], [yyyy])
for x in junk:
print type(x)
<class 'matplotlib.lines.Line2D'>
I need to find the names of the two lists 'xxxx' and 'yyyy'. How can I get them from the class attributes?
You can use dir to see the content of an object in python, or check the docs for the class. I guess the objects you are looking for are xdata and ydata (although I'm a bit confused, in your post you ask for the names of the lists?)
In [27]:
import numpy as np
import matplotlib.pyplot as plt
​
x = np.arange(0, 5, 0.1);
y = np.sin(x)
junk = plt.plot(x, y)
for x in junk:
#print(dir(x))
print(x.get_xdata())
print(x.get_ydata())
[ 0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. 1.1 1.2 1.3 1.4
1.5 1.6 1.7 1.8 1.9 2. 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9
3. 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4. 4.1 4.2 4.3 4.4
4.5 4.6 4.7 4.8 4.9]
[ 0. 0.09983342 0.19866933 0.29552021 0.38941834 0.47942554
0.56464247 0.64421769 0.71735609 0.78332691 0.84147098 0.89120736
0.93203909 0.96355819 0.98544973 0.99749499 0.9995736 0.99166481
0.97384763 0.94630009 0.90929743 0.86320937 0.8084964 0.74570521
0.67546318 0.59847214 0.51550137 0.42737988 0.33498815 0.23924933
0.14112001 0.04158066 -0.05837414 -0.15774569 -0.2555411 -0.35078323
-0.44252044 -0.52983614 -0.61185789 -0.68776616 -0.7568025 -0.81827711
-0.87157577 -0.91616594 -0.95160207 -0.97753012 -0.993691 -0.99992326
-0.99616461 -0.98245261]
Hope it helps.

Python: Removing a range of numbers from array list

Im having issues removing elements from a range a through b from an array list. The solutions ive searched online seem to only work for individual elements, adjacent elements and or elements that are whole numbers. Im dealing with float numbers.
self.genx = np.arange(0, 5, 0.1)
temp_select = self.genx[1:3] #I want to remove numbers from 1 - 3 from genx
print(temp_select)
self.genx = list(set(self.genx)-set(temp_select))
print(self.genx)
plt.plot(self.genx,self.geny)
However I get the following in the console and this is because im subtracting floats rather than whole numbers so it literally subtracts rather than removing which is what it would do if dealing with whole numbers:
genx: [ 0.0 , 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9 , 1.0, 1.1 , 1.2 , 1.3 , 1.4 , 1.5 , 1.6 , 1.7 , 1.8 , 1.9 , 2.0, , 2.1 , 2.2 , 2.3 , 2.4 , 2.5 , 2.6 , 2.7 , 2.8 , 2.9
, 3.0 , 3.1 , 3.2 , 3.3 , 3.4 , 3.5 , 3.6 , 3.7 , 3.8 , 3.9 , 4.0 , 4.1 , 4.2 , 4.3 , 4.4
, 4.5 , 4.6 , 4.7 , 4.8 , 4.9]
temp_select: [ 0.1 0.2]
genx(after subtracted): [0.0, 0.5, 2.0, 3.0, 4.0, 1.5, 1.0, 1.1000000000000001, 0.70000000000000007, 0.90000000000000002, 2.7000000000000002, 0.30000000000000004, 2.9000000000000004, 1.9000000000000001, 3.3000000000000003, 0.40000000000000002, 4.7000000000000002, 3.4000000000000004, 2.2000000000000002, 2.8000000000000003, 1.4000000000000001, 0.60000000000000009, 3.6000000000000001, 1.3, 1.2000000000000002, 4.2999999999999998, 4.2000000000000002, 4.9000000000000004, 3.9000000000000004, 3.8000000000000003, 2.3000000000000003, 4.8000000000000007, 3.2000000000000002, 1.7000000000000002, 2.5, 3.5, 1.8, 4.1000000000000005, 2.4000000000000004, 4.4000000000000004, 1.6000000000000001, 0.80000000000000004, 2.6000000000000001, 4.6000000000000005, 2.1000000000000001, 3.1000000000000001, 3.7000000000000002, 4.5]
I didn't test this but you should be able to do something like the following:
self.genx = [ item for item in self.genx if not range_min < item < range_max ]
self.genx = [ item for item in self.genx if not range_min <= item <= range_max ]
Is this what you want??

Categories

Resources