Related
I am trying to take data from a pandas dataframe and transform it to a desired dictionary. Here's an example of the data:
data =[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1],[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2]]
Utable = pd.DataFrame(data, columns =['Type1', 'Type2', 'Type3', 'Type4', 'Type5', 'Type6', 'Type7', 'Type8', 'ID'])
The dictionary I need is the ID records as the dict key and the values need to be a list of the unacceptable Type #s ascertained from the column name. The Types are unacceptable if they are 0.0. So for this example the output would be:
{1: [1, 2, 3, 4, 5, 6, 7, 8], 2: [1, 2, 4, 5, 6, 7, 8]}
I could figure out how to get the type values stored as list with the ID as the dict key using:
U = Utable.set_index('ID').T.to_dict('list')
which gives:
{1: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 2: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]}
but I can't figure out how to get the contents from the column name stored in the list as the dict values.
Thanks very much for any help.
You could use the parameter orient=index when converting to a dictionary; then use a list comprehension to get the desired list as values:
out = {k: [int(i[-1]) for i, v in d.items() if v==0]
for k, d in Utable.set_index('ID').to_dict('index').items()}
Output:
{1: [1, 2, 3, 4, 5, 6, 7, 8], 2: [1, 2, 4, 5, 6, 7, 8]}
I have a dictionary with:
inds = [0, 3, 7, 3, 3, 5, 1]
vals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
d = {'inds': inds, 'vals': vals}
print(d) will get me: {'inds': [0, 3, 7, 3, 3, 5, 1], 'vals': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0,
7.0]}
As you can see, inds(keys) are not ordered, there are dupes, and there are missing ones: range is 0 to 7 but there are only 0,1,3,5,7 distinct integers. I want to write a function that takes the dictionary (d) and decompresses this into a full vector like shown below. For any repeated indices (3 in this case), I'd like to sum the corresponding values, and for the missing indices, want 0.0.
# ind: 0 1 2 3* 4 5 6 7
x == [1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
Trying to write a function that returns me a final list... something like this:
def decompressor (d, n=None):
final_list=[]
for i in final_list:
final_list.append()
return(final_list)
# final_list.index: 0 1 2 3* 4 5 6 7
# final_list = [1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
Try it,
xyz = [0.0 for x in range(max(inds)+1)]
for i in range(max(inds)):
if xyz[inds[i]] != 0.0:
xyz[inds[i]] += vals[i]
else:
xyz[inds[i]] = vals[i]
Some things are still not clear to me but supposing you are trying to make a list in which the maximum index is the one you can find in your inds list, and you want a list as a result you can do something like this:
inds = [0, 3, 7, 3, 3, 5, 1]
vals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
#initialize a list of zeroes with lenght max index
res=[float(0)]*(max(inds)+1)
#[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
#Loop indexes and values in pairs
for i, v in zip(inds, vals):
#Add the value to the corresponding index
res[i] += v
print (res)
#[1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
inds = [0, 3, 7, 3, 3, 5, 1]
vals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
first you have to initialise the dictionary , ranging from min to max value in the inds list
max_id = max(inds)
min_id = min(inds)
my_dict={}
i = min_id
while i <= max_id:
my_dict[i] = 0.0
i = i+1
for i in range(len(inds)):
my_dict[inds[i]] += vals[i]
my_dict = {0: 1.0, 1: 7.0, 2: 0, 3: 11.0, 4: 0, 5: 6.0, 6: 0, 7: 3.0}
Given a list:
x = [0.0, 0.87, 0.0, 0.0, 0.0, 0.32, 0.46, 0.0, 0.0, 0.10, 0.0, 0.0]
I want to get the indexes of all the values that are not 0 and store them in d['inds']
Then using the indexes in d['inds'] go through the list of x and get the values.
So I would get something like:
d['inds'] = [1, 5, 6, 9]
d['vals'] = [0.87, 0.32, 0.46, 0.10]
I already got the indexes using:
d['inds'] = [i for i,m in enumerate(x) if m != 0]
but I'm not sure how to get d['vals']
d['vals'] = [x[i] for i in d['inds']]
Better yet, do both at once:
vals = []
inds = []
for i,v in enumerate(x):
if v!=0:
vals.append(v)
inds.append(i)
d['vals']=vals
d['inds']=inds
or
import numpy as np
d['inds'],d['vals'] = np.array([(i,v) for i,v in enumerate(x) if v!=0]).T
you can use numpy, its indexing features are designed for tasks like this one:
import numpy as np
x = np.array([0.0, 0.87, 0.0, 0.0, 0.0, 0.32, 0.46, 0.0, 0.0, 0.10, 0.0, 0.0])
x[x!=0]
Out: array([ 0.87, 0.32, 0.46, 0.1 ])
and if you're still interested in the indices:
np.argwhere(x!=0)
Out:
array([[1],
[5],
[6],
[9]], dtype=int64)
You can use a dict comprehension:
m = {i:j for i,j in enumerate(x) if j!=0}
list(m.keys())
Out[183]: [1, 5, 6, 9]
list(m.values())
Out[184]: [0.87, 0.32, 0.46, 0.1]
if you want to save this in a dictionary d then you can do:
d = {}
d['vals']=list(m.values())
d['ind']=list(m.keys())
d
{'vals': [0.87, 0.32, 0.46, 0.1], 'ind': [1, 5, 6, 9]}
Using Pandas:
x = [0.0, 0.87, 0.0, 0.0, 0.0, 0.32, 0.46, 0.0, 0.0, 0.10, 0.0, 0.0]
import pandas as pd
data = pd.DataFrame(x)
inds = data[data[0]!=0.0].index
print(inds)
Output: Int64Index([1, 5, 6, 9], dtype='int64')
Much easier:
df['vals']=list(filter(None,x))
df['idx']=df['vals'].apply(x.index)
Exaplantion:
Use filter(None,x) for filtering non-0 values, (None basically neans no statement (or not False)
Then use pandas apply for getting the index basically go trough the 'vals' column then then get the values index in the list x
I have a fortran array of the type
DATA ELEV \1.2,3.2,2*0.0,3.9,3*0.0\
which in python would be
ELEV = [1.2, 3.2, 0.0, 0.0, 3.9, 0.0, 0.0, 0.0]
Notice how 2*0.0 was not 0.0 but instead 2 elements with value 0.0.
Is there some way to use numpy or other python methods(or libraries) to write it similarly in python3 ?
I essentially have array in the fortran format which I want to use in my python code instead of mere representation.
Use the new * unpacking generalizations and list multiplication.
>>> [1.2, 3.2, *2*[0.0], 3.9, *3*[0.0]]
[1.2, 3.2, 0.0, 0.0, 3.9, 0.0, 0.0, 0.0]
You can also multiply strings and tuples in Python.
>>> 'abc'*3
'abcabcabc'
>>> (1, 2, 3)*2
(1, 2, 3, 1, 2, 3)
You can unpack any iterable, and this also works in tuple displays, etc.
>>> (1.2, 3.2, *'xy'*2, 3.9, *3*(0.0,), *'foo')
(1.2, 3.2, 'x', 'y', 'x', 'y', 3.9, 0.0, 0.0, 0.0, 'f', 'o', 'o')
Python's built-in lists already have a very similar functionality:
[1.2, 3.2] + [0.0] * 2 + [3.9] + [0.0] * 3
results in
[1.2, 3.2, 0.0, 0.0, 3.9, 0.0, 0.0, 0.0]
Perhaps the most natural way of doing this in numpy is with the repeat function/method:
In [252]: a = np.array([1.2,3.2,0,3.9,0])
In [253]: b = a.repeat([1,1,2,1,3])
In [254]: b
Out[254]: array([1.2, 3.2, 0. , 0. , 3.9, 0. , 0. , 0. ])
or if there are a lot of 0s, copy the nonzero values to a zeros array
In [255]: c = np.zeros(8, float)
In [256]: c[[0,1,4]] = [1.2,3.2,3.9]
In [257]: c
Out[257]: array([1.2, 3.2, 0. , 0. , 3.9, 0. , 0. , 0. ])
#fireball, I think it's better if you will have a reusable code block (function) for this.
Here it is get_nums(), which takes 2 parameters.
First parameter n is the number we want to repeat and second parameter freq denotes frequency of n.
Function returns a list containing n, freq times which we unpack in the calling statement using *.
Here you don't need to use + operator again and again multiple times and divide the list in multiple sub lists like below.
[65, 54] + [0.0] * 5 + [6, 9]
Please have a look at the below python code and its output.
Try it online at http://rextester.com/AJDEUE76668
def get_nums(n, freq):
l = [n] * freq
return l
# TEST CASE 1
ELEV = [1.2, 3.2, *get_nums(0.0, 2) ,3.9, *get_nums(0.0, 3)]
print(ELEV)
print() # newline
# TEST CASE 2
arr = [45, *get_nums(1, 4), *get_nums(9, 3), 34, 99, *get_nums(7, 1), 12, 21, *get_nums(-1, 5)]
print(arr)
Output ยป
[1.2, 3.2, 0.0, 0.0, 3.9, 0.0, 0.0, 0.0]
[45, 1, 1, 1, 1, 9, 9, 9, 34, 99, 7, 12, 21, -1, -1, -1, -1, -1]
I'm looking to collate 2 entries into one for many columns in a data array, by checking to see if several values in the two entries are the same.
0 A [[0.0, 0.5, 2.5, 2.5]
1 B [0.5, 1.0, 2.0, 2.0]
2 M [2.5, 2.5, 0.5, 0.0]
3 N [2.0, 2.0, 1.0, 0.5]
4 R [14.3, 13.8, 13.9, 14.2]]
Above shows the format the array takes, with the numbering and annotation of the rows on the left. Each column in the array is one distinct measurement.
Rows 0-3 are the x-locations along a straight line of 2 pairs of electrodes used to make a measurement (pair 1 = A & B, pair 2 = M & N); R is the measured resistivity when the four electrodes above it are used. As can be seen, in the 1st and 4th measurement, pair AB of measurement 1 = pair MN of measurement 4, and vice versa. The same is true of the 2nd and 3rd reading.
What I'm trying to do is to search through the array to find each pair of measurements, then collate that into one entry. This entry would take the first measurement's electrode locations (A,B,M &N), together with the first measurement's R value, but would also contain an extra row with the second measurement's R value. The result from the example above can be seen below.
0 A [[0.0, 0.5]
1 B [0.5, 1.0]
2 M [2.5, 2.5]
3 N [2.0, 2.0]
4 R1 [14.3, 13.8]
5 R2 [13.9, 14.2]]
Some information that may be useful:
The numbers are floats
The first set of measurements (i.e. before there will be any pairs) are in the first half of the dataset. What I mean by that is if there was an array with 100 columns(equalling 100 measurements), the columns 51-100 would be the pairs of the columns 1-50. The columns 51-100 do not follow the same pattern as the columns 1-50 though (i.e. column 1 wouldn't always equal column 51 in that example).
The electrodes do always follow the same pattern in the pair of measurements; "A" in measurement 1 will always = "M in measurement 2 in the pair, equally B = N, M = A & N = B.
I've been thinking bout how to do it, and I've thought that some kind of if statement such as the one below may be a start, but really I'm a complete novice, and this is quite a complex problem to search for an answer.
if all(A1 == M2, B1 == N2, M1 == A2, N1 == B2):
Any help would be really appreciated, even if it's just a pointer to wherever would be a good starting point to search for more information.
Thanks in advance!
Edit
Just to clarify, the order of R2 is liable to change for each dataset, and isn't the same as the order of R1. What I'm after doing is querying the A, B, M & N values to find the pairs of readings, then adding the paired R2 reading under its corresponding R1 reading.
Here is an example dataset that is a little larger:
#Input array
Arr1 =
[[0.5, 0.5, 1, 1, 1.5, 1.5, 1.5, 5, 4.5, 4.5, 3.5, 2.5, 2, 1]
[0, 0, 0.5, 0.5, 1, 1, 0.5, 5.5, 5, 5.5, 4, 3, 2.5, 1.5]
[1, 3.5, 2.5, 5, 2, 4.5, 4.5, 1, 1.5, 1.5, 0.5, 1, 1.5, 0.5]
[1.5, 4, 3, 5.5, 2.5, 5, 5.5, 0.5, 1, 0.5, 0, 0.5, 1, 0]
[14.3, 13.3, 25.1, 17.2, 19.9, 15.4, 16.1, 17.1, 15.3, 16.1, 13.4, 25.1, 19.8, 14.4]]
#Output array - extra R row and half the columns
Arr2 =
[[0.5, 0.5, 1, 1, 1.5, 1.5, 1.5]
[0, 0, 0.5, 0.5, 1, 1, 0.5]
[1, 3.5, 2.5, 5, 2, 4.5, 4.5]
[1.5, 4, 3, 5.5, 2.5, 5, 5.5]
[14.3, 13.3, 25.1, 17.2, 19.9, 15.4, 16.1]
[14.4, 13.4, 25.1, 17.1, 19.8, 15.3, 16.1]]
Here's a way to find the index of each R2 value that you're after and create the final transformation to your specifications, edited based on our earlier dialogue in the comments below:
#Input array
Arr1 = [[0.5, 0.5, 1, 1, 1.5, 1.5, 1.5, 5, 4.5, 4.5, 3.5, 2.5, 2, 1],
[0, 0, 0.5, 0.5, 1, 1, 0.5, 5.5, 5, 5.5, 4, 3, 2.5, 1.5],
[1, 3.5, 2.5, 5, 2, 4.5, 4.5, 1, 1.5, 1.5, 0.5, 1, 1.5, 0.5],
[1.5, 4, 3, 5.5, 2.5, 5, 5.5, 0.5, 1, 0.5, 0, 0.5, 1, 0],
[14.3, 13.3, 25.1, 17.2, 19.9, 15.4, 16.1,
17.1, 15.3, 16.1, 13.4, 25.1, 19.8, 14.4]]
#Output array - extra R row and half the columns
Arr2 = [[0.5, 0.5, 1, 1, 1.5, 1.5, 1.5],
[0, 0, 0.5, 0.5, 1, 1, 0.5],
[1, 3.5, 2.5, 5, 2, 4.5, 4.5],
[1.5, 4, 3, 5.5, 2.5, 5, 5.5],
[14.3, 13.3, 25.1, 17.2, 19.9, 15.4, 16.1],
[14.4, 13.4, 25.1, 17.1, 19.8, 15.3, 16.1]]
# get the first half of each list in Arr1
half_1 = [i[:len(i)//2] for i in Arr1[:-1]]
# 'flip' the arrays so that there's a list for each element 0, 1, ...
half_1_flip = [[i[j] for i in half_1] for j in range(len(half_1[0]))]
# get the second half of each list in Arr1
half_2 = [i[len(i)//2:] for i in Arr1[:-1]]
# 'rotate' the arrays so that A / B and M / N switch places
half_2_rotate = half_2[len(half_2)//2:] + half_2[:len(half_2)//2]
# 'flip' the arrays so that there's a list for each element 0, 1, ...
half_2_flip = [[i[j] for i in half_2_rotate]
for j in range(len(half_2_rotate[0]))]
# find each matching index of the first flipped list in the second list
seek_indices = [half_2_flip.index(a) for i, a in enumerate(half_1_flip)]
# pull out original R1 and R2
r1 = Arr1[-1][:len(Arr1[-1])//2]
r2 = Arr1[-1][len(Arr1[-1])//2:]
# reorder R2 based on indices
ordered_r2 = [r2[i] for i in seek_indices]
# get final transform
transform = half_1 + [r1] + [ordered_r2]
assert transform == Arr2
Another approach to the problem might be slicing the data using the following function:
import numpy as np
def transform(arr):
arr1 = arr[:,0:2]
arr1 = np.append(arr1,[arr[-1,2:]],axis=0)
return arr1
using given data:
arr = np.array([[0.0, 0.5, 2.5, 2.5],
[0.5, 1.0, 2.0, 2.0],
[2.5, 2.5, 0.5, 0.0],
[2.0, 2.0, 1.0, 0.5],
[14.3, 13.8, 13.9, 14.2]])
transform(arr) returns:
array([[ 0. , 0.5],
[ 0.5, 1. ],
[ 2.5, 2.5],
[ 2. , 2. ],
[ 14.3, 13.8],
[ 13.9, 14.2]])