I am trying to subtract a list of values from each key in a dictionary. Each key in the dictionary contains 20 y-values for a predicted line. I want to find the difference between these y-values and a different set of given values.
ydata contains 20 points. ycalc has a length of 100 to which keys are assigned for, from L1-L99. Each Key contains 20 points as well. I want to subtract each key from ydata. This is what I have tried, the main issue is that my method return a list of 20 values, when I expect a list of 100 values where each value is a list of 20 points.
ydata = [ 1.2 1.8 1.7 3.0 3.5 3.2 4.5 4.8 5.3 6.2 5.7 6.8 7.0 7.8 8.5 8.6 9.1 11.5 10.3 10.8]
ycalc = 'L0': array([-0.8, -0.6, -0.4, -0.2, 0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2,
1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. ]), 'L1': array([-0.57777778, -0.37777778, -0.17777778, 0.02222222, 0.22222222,
0.42222222, 0.62222222, 0.82222222, 1.02222222, 1.22222222,
1.42222222, 1.62222222, 1.82222222, 2.02222222, 2.22222222,
2.42222222, 2.62222222, 2.82222222, 3.02222222, 3.22222222]), 'L2': array([-0.35555556, -0.15555556, 0.04444444, 0.24444444, 0.44444444,
0.64444444, 0.84444444, 1.04444444, 1.24444444, 1.44444444,
1.64444444, 1.84444444, 2.04444444, 2.24444444, 2.44444444,
2.64444444, 2.84444444, 3.04444444, 3.24444444, 3.44444444]), 'L3': array([-0.13333333, 0.06666667, 0.26666667, 0.46666667, 0.66666667,
0.86666667, 1.06666667, 1.26666667, 1.46666667, 1.66666667,
1.86666667, 2.06666667, 2.26666667, 2.46666667, 2.66666667,
2.86666667, 3.06666667, 3.26666667, 3.46666667, 3.66666667]), 'L4': array([0.08888889, 0.28888889, 0.48888889, 0.68888889, 0.88888889,
1.08888889, 1.28888889, 1.48888889, 1.68888889, 1.88888889,
2.08888889, 2.28888889, 2.48888889, 2.68888889, 2.88888889,
3.08888889, 3.28888889, 3.48888889, 3.68888889, 3.88888889]), etc.
for i in ycalc:
ydiff = - i + array(ydata)
print(ydiff)
returns [-0.2 0. -0.5 0.4 0.5 -0.2 0.7 0.6 0.7 1.2 0.3 1. 0.8 1.2
1.5 1.2 1.3 3.3 1.7 1.8]
but I want something like this:
([-0.2 0. -0.5 0.4 0.5 -0.2 0.7 0.6 0.7 1.2 0.3 1. 0.8 1.2 1.5 1.2 1.3 3.3 1.7 1.8]), ([-0.3 0.1 -0.6 0.4 0.5 -0.2 0.2 0.6 0.8 1.2 0.5 1. 0.8 1.2 1.5 1.2 1.3 3.3 1.7 1.8]), etc.
I'm currently trying to get into regex expressions - at the moment I want to write one which acts as follows:
import regex
a = '[[0.1 0.1 0.1 0.1]\n [1.2 1.2 1.2 1.2]\n [2.3 2.3 2.3 2.3]\n [3.4 3.4 3.4 3.4]]'
a_transformed = re.sub(regex_expression, a)
# a_transformed = '0.1 0.1 0.1 0.1 1.2 1.2 1.2 1.2 2.3 2.3 2.3 2.3 3.4 3.4 3.4 3.4'
Basically I only need to sub all occurences of (,n,[,]), but currently I'm struggling to get the expression right.
Thanks for the help in advance!
You can try the following:
>>> re.sub(r'[^\d. ]', '', a)
'0.1 0.1 0.1 0.1 1.2 1.2 1.2 1.2 2.3 2.3 2.3 2.3 3.4 3.4 3.4 3.4'
Here '[^\d. ]' means anything except a digit, '.' and space like characters. ^ inside [] means negate this character group.
how do i replicate the structure of result of itertools.product?
so as you know itertools.product gives us an object and we need to put them in a list so we can print it
.. something like this.. right?
import itertools
import numpy as np
CN=np.asarray((itertools.product([0,1], repeat=5)))
print(CN)
i want to be able to make something like that but i want the data to be from a csv file.. so i want to make something like this
#PSEUDOCODE
import pandas as pd
df = pd.read_csv('csv here')
#a b c d are the columns that i want to get
x = list(df['a'] df['c'] df['c'] df['d'])
print(x)
so the result will be something like this
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1]]
how can i do that?
EDIT:
i am trying to learn how to do recursive feature elimination and i saw in some codes in google that they use the iris data set..
from sklearn import datasets
dataset = datasets.load_iris()
x = dataset.data
print(x)
and when printed it looked something like this
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1]]
how could i make my dataset something like that so i can use this RFE template ?
# Recursive Feature Elimination
from sklearn import datasets
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
# load the iris datasets
dataset = datasets.load_iris()
# create a base classifier used to evaluate a subset of attributes
model = LogisticRegression()
# create the RFE model and select 3 attributes
rfe = RFE(model, 3)
print(rfe)
rfe = rfe.fit(dataset.data, dataset.target)
print("features:",dataset.data)
print("target:",dataset.target)
print(rfe)
# summarize the selection of the attributes
print(rfe.support_)
print(rfe.ranking_)
You don't have to. If you want to use rfe.fit function, you need to feed features and target seperately.
So if your df is like:
a b c d target
0 5.1 3.5 1.4 0.2 1
1 4.9 3.0 1.4 0.2 1
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 1
5 5.4 3.9 1.7 0.4 1
6 4.6 3.4 1.4 0.3 0
7 5.0 3.4 1.5 0.2 0
8 4.4 2.9 1.4 0.2 1
9 4.9 3.1 1.5 0.1 1
you can use:
...
rfe = rfe.fit(df[['a', 'b', 'c', 'd']], df['target'])
...
I have an issue with numpy linspace
import numpy as np
temp = np.linspace(1,2,11)
for t in temp:
print(t)
This return :
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7000000000000002
1.8
1.9
2.0
The 1.7 value looks definitely wrong.
It seems related to this issue https://github.com/numpy/numpy/issues/8909
Does anybody ever had such a problem with numpy.linspace ? is it a known issue ?
François
This is nothing to do with numpy, consider:
>>> temp = np.linspace(1,2,11)
>>> temp
array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. ])
>>> # ^ look, numpy displays it fine
>>> for t in temp:
... print(t)
...
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7000000000000002
1.8
1.9
2.0
The "issue" is with how computers represent floats in general. See: https://docs.python.org/3/tutorial/floatingpoint.html.
I am importing data from a PDF which has not been optimised for analysis.
The data has been imported into the following dataframe
NaN NaN Plant_A NaN Plant_B NaN
Pre 1,2 1.1 1.2 6.1 6.2
Pre 3,4 1.3 1.4 6.3 6.4
Post 1,2 2.1 2.2 7.1 7.2
Post 3,4 2.3 2.4 7.3 7.4
and I would like to reorganise it into the following form:
Pre_1 Pre_2 Pre_3 Pre_4 Post_1 Post_2 Post_3 Post_4
Plant_A 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4
Plant_B 6.1 6.2 6.3 6.4 7.1 7.2 7.3 7.4
I started by splitting the 2nd column by commas, and then combining that with the first column to give me Pre_1 and Pre_2 for instance. However I have struggled to match that with the data in the rest of the columns. For instance, Pre_1 with 1.1 and Pre_2 with 1.2
Any help would be greatly appreciated.
I had to make some assumptions in regards to consistency of your data
from itertools import cycle
import pandas as pd
tracker = {}
for temporal, spec, *data in df.itertuples(index=False):
data = data[::-1]
cycle_plant = cycle(['Plant_A', 'Plant_B'])
spec_i = spec.split(',')
while data:
plant = next(cycle_plant)
for i in spec_i:
tracker[(plant, f"{temporal}_{i}")] = data.pop()
pd.Series(tracker).unstack()
Post_1 Post_2 Post_3 Post_4 Pre_1 Pre_2 Pre_3 Pre_4
Plant_A 2.1 2.2 2.3 2.4 1.1 1.2 1.3 1.4
Plant_B 7.1 7.2 7.3 7.4 6.1 6.2 6.3 6.4