Reduce python list size while preserving information

Reduce python list size while preserving information - python

I have multiple long lists in my program. Each list has approximately 3000 float values.
And there are around 100 such lists.
I want to reduce the size of each list to say, 500, while preserving the information in the original list. I know that it is not possible to completely preserve the information, but I would like to have the elements in the original list to have contribution to the values of the smaller list.
Let's say we have the following list, and want to shorten it to a lists of size 3 or 4.
myList = [[4.3, 2.3, 5.1, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1],
[7.3, 3.5, 6.2, 7.4, 2.6, 3.7, 2.6, 7.1, 3.4, 7.1],
[4.7, 2.6, 5.6, 7.4, 3.7, 7.7, 3.5, 6.5, 7.2, 4.1],
[7.3, 7.3, 4.1, 6.6, 2.2, 3.9, 1.6, 3.0, 2.3, 4.6],
[4.7, 2.3, 5.7, 6.4, 3.4, 6.8, 7.2, 6.9, 8.4, 7.1]]
Is there some way to do this. Maybe by averaging of some sort (?)

You can do something like this:
from statistics import mean, stdev
myList = [[4.3, 2.3, 5.1, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1], [2.3, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1]]
shorten_list = [[max(i)-min(i), mean(i), round(stdev(i), 5)] for i in myList]
You can also include information such as the sum of the list or the mode. If you just want to take the mean of each list within your list, you can just do this:
from statistics import mean
mean_list = list(map(mean, myList))

batching may work.
I request you to look at this question
How do I split a list into equally-sized chunks?
this converts the list into equal batches.
or can sequence the dimension of the list using max pool layer
import numpy as np
from keras.models import Sequential
from keras.layers import MaxPooling2D
image = np.array([[4.3, 2.3, 5.1, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1],
[7.3, 3.5, 6.2, 7.4, 2.6, 3.7, 2.6, 7.1, 3.4, 7.1],
[4.7, 2.6, 5.6, 7.4, 3.7, 7.7, 3.5, 6.5, 7.2, 4.1],
[7.3, 7.3, 4.1, 6.6, 2.2, 3.9, 1.6, 3.0, 2.3, 4.6],
[4.7, 2.3, 5.7, 6.4, 3.4, 6.8, 7.2, 6.9, 8.4, 7.1]]
)
image = image.reshape(1, 5, 10, 1)
model = Sequential([MaxPooling2D(pool_size =(1,10), strides = (1))])
output = model.predict(image)
print(output)
this gives output as
[[[[7.7]]
[[7.4]]
[[7.7]]
[[7.3]]
[[8.4]]]]
if you want to change the output size, can change the pool size.

Related

Removing incomplete numbers from list Python

I'm given this list
[1.3, 2.2, 2.3, 4.2, 5.1, 3.2, 5.3, 3.3, 2.1, 1.1, 5.2, 3.1]
and I'm supposed to remove the elements 1,3, 4.2 and 1.1 so that it becomes
[2.2, 2.3, 5.1, 3.2, 5.3, 3.3, 2.1, 5.2, 3.1]
I have written this code and it becomes wrong. What am I doing wrong?
def removeIncomplete(id):
numbers_buf = id
idComplete = id[:]
for ind, item in enumerate(id):
if item == 1.3 and item == 4.2 and item == 1.1:
numbers_buf.remove(item)
return numbers_buf
#return idComplete
import numpy as np
print(removeIncomplete(np.array([1.3, 2.2, 2.3, 4.2, 5.1,
3.2, 5.3, 3.3, 2.1, 1.1, 5.2, 3.1])))
#Correct output [ 2.2 2.3 5.1 3.2 5.3 3.3 2.1 5.2 3.1]

def removeIncomplete(id):
numbers_buf = id
idComplete = id[:]
for ind, item in enumerate(id):
if item == 1.3 or item == 4.2 or item == 1.1:
numbers_buf = np.delete(numbers_buf, np.where(numbers_buf == item))
return numbers_buf
#return idComplete
import numpy as np
print(removeIncomplete(np.array([1.3, 2.2, 2.3, 4.2, 5.1,
3.2, 5.3, 3.3, 2.1, 1.1, 5.2, 3.1])))

what about using list comprehension
data = [1.3, 2.2, 2.3, 4.2, 5.1, 3.2, 5.3, 3.3, 2.1, 1.1, 5.2, 3.1]
exclude = [1.3, 4.2, 1.1]
out = [val for val in data if val not in exclude]
print(out)
>>>
[2.2, 2.3, 5.1, 3.2, 5.3, 3.3, 2.1, 5.2, 3.1]

How to combine these two numpy arrays?

How would I combine these two arrays:
x = np.asarray([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3], [3.0, 3.1, 3.2, 3.3],
[4.0, 4.1, 4.2, 4.3], [5.0, 5.1, 5.2, 5.3]])
y = np.asarray([[0.1], [0.2], [0.3], [0.4], [0.5]])
Into something like this:
xy = [[0.1, [1.0, 1.1, 1.2, 1.3]], [0.2, [2.0, 2.1, 2.2, 2.3]...
Thank you for the assistance!
Someone suggested I post code that I have tried and I realized I had forgot to:
xy = np.array(list(zip(x, y)))
This is my current solution, however it is extremely inefficient.

You can use zip to combine
[[a,b] for a,b in zip(y,x)]
Out:
[[array([0.1]), array([1. , 1.1, 1.2, 1.3])],
[array([0.2]), array([2. , 2.1, 2.2, 2.3])],
[array([0.3]), array([3. , 3.1, 3.2, 3.3])],
[array([0.4]), array([4. , 4.1, 4.2, 4.3])],
[array([0.5]), array([5. , 5.1, 5.2, 5.3])]]

A pure numpy solution will be much faster than list comprehension for large arrays.
I do have to say your use case makes no sense, as there is no logic in putting these arrays into a single data structure, and I believe you should re check your design.
Like #user2357112 supports Monica was subtly implying, this is very likely an XY problem. See if this is really what you are trying to solve, and not something else. If you want something else, try asking about that.
I strongly suggest checking what you want to do before moving on, as you will put yourself in a place with bad design.
That aside, here's a solution
import numpy as np
x = np.asarray([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3], [3.0, 3.1, 3.2, 3.3],
[4.0, 4.1, 4.2, 4.3], [5.0, 5.1, 5.2, 5.3]])
y = np.asarray([[0.1], [0.2], [0.3], [0.4], [0.5]])
xy = np.hstack([y, x])
print(xy)
prints
[[0.1 1. 1.1 1.2 1.3]
[0.2 2. 2.1 2.2 2.3]
[0.3 3. 3.1 3.2 3.3]
[0.4 4. 4.1 4.2 4.3]
[0.5 5. 5.1 5.2 5.3]]

Find median for each element in list

I have some large list of data, between 1000 and 10000 elements. Now I want to filter out some peak values with the help of the median function.
#example list with just 10 elements
my_list = [4.5, 4.7, 5.1, 3.9, 9.9, 5.6, 4.3, 0.2, 5.0, 4.6]
#list of medians calculated from 3 elements
my_median_list = []
for i in range(len(my_list)):
if i == 0:
my_median_list.append(statistics.median([my_list[0], my_list[1], my_list[2]])
elif i == (len(my_list)-1):
my_median_list.append(statistics.median([my_list[-1], my_list[-2], my_list[-3]])
else:
my_median_list.append(statistics.median([my_list[i-1], my_list[i], my_list[i+1]])
print(my_median_list)
# [4.7, 4.7, 4.7, 5.1, 5.6, 5.6, 4.3, 4.3, 4.6, 4.6]
This works so far. But I think it looks ugly and is maybe inefficient? Is there a way with statistics or NumPy to do it faster? Or another solution? Also, I look for a solution where I can pass an argument from how many elements the median is calculated. In my example, I used the median always from 3 elements. But with my real data, I want to play with the median setting and then maybe use the median out of 10 elements.

You are calculating too many values since:
my_median_list.append(statistics.median([my_list[i-1], my_list[i], my_list[i+1]])
and
my_median_list.append(statistics.median([my_list[0], my_list[1], my_list[2]])
are the same when i == 1. The same error happens at the end so you get one too many end values.
It's easier and less error-prone to do this with zip() which will make the three element tuples for you:
from statistics import median
my_list = [4.5, 4.7, 5.1, 3.9, 9.9, 5.6, 4.3, 0.2, 5.0, 4.6]
[median(l) for l in zip(my_list, my_list[1:], my_list[2:])]
# [4.7, 4.7, 5.1, 5.6, 5.6, 4.3, 4.3, 4.6]
For groups of arbitrary size collections.deque is super handy because you can set a max size. Then you just keep pushing items on one end and it removes items off the other to maintain the size. Here's a generator example that takes you groups size as n:
from statistics import median
from collections import deque
def rolling_median(l, n):
d = deque(l[0:n], n)
yield median(d)
for num in l[n:]:
d.append(num)
yield median(d)
my_list = [4.5, 4.7, 5.1, 3.9, 9.9, 5.6, 4.3, 0.2, 5.0, 4.6]
list(rolling_median(my_list, 3))
# [4.7, 4.7, 5.1, 5.6, 5.6, 4.3, 4.3, 4.6]
list(rolling_median(my_list, 5))
# [4.7, 5.1, 5.1, 4.3, 5.0, 4.6]

Problems when using tf.data.Dataset.batch

I want to make clear how tf.data.Dataset.batch work with my dataset. The dataset is as follows:
dataset = tf.convert_to_tensor([[5.1, 3.3, 1.7, 0.5, ],
[5.9, 3.0, 4.2, 1.5],
[6.9, 3.1, 5.4, 2.1],
[2.3, 1.3, 6.4, 9.3]])
Then I use batch method:
dataset = dataset.batch(2)
and iterate the dataset once.
x = tfe.Iterator(dataset).next()
As I suppose, the result should be a 2*4 array, but it returns the whole 4*4 dataset.
Could anyone give me some details about how to apply the batch method?

You need to convert your dataset Tensor into a TensorSliceDataset, i.e. telling Tensorflow to slice the tensor and make a dataset of it.
import tensorflow as tf
data = tf.convert_to_tensor([[5.1, 3.3, 1.7, 0.5],
[5.9, 3.0, 4.2, 1.5],
[6.9, 3.1, 5.4, 2.1],
[2.3, 1.3, 6.4, 9.3]])
dataset = tf.data.Dataset.from_tensor_slices(data).batch(2)
batch_iterator = dataset.make_one_shot_iterator().get_next()
sess = tf.InteractiveSession()
batch = sess.run(batch_iterator)
print(batch)
# [[ 5.1 3.3 1.7 0.5 ]
# [ 5.9 3. 4.2 1.5 ]]

Creating a numpy array of 3D coordinates from three 1D arrays, first index changing fastest

similar to the question here
I have three arbitrary 1D arrays, for example:
x_p = np.array((0.0,1.1, 2.2, 3.3, 4.4))
y_p = np.array((5.5,6.6,7.7))
z_p = np.array((8.8, 9.9))
I need
points = np.array([[0.0, 5.5, 8.8],
[1.1, 5.5, 8.8],
[2.2, 5.5, 8.8],
...
[4.4, 7.7, 9.9]])
1) with the first index changing fastest.2) points are float coordinates, not integer index.
3) I noticed from version 1.7.0, numpy.meshgrid has changed behavior with default indexing='xy' and need to use
np.vstack(np.meshgrid(x_p,y_p,z_p,indexing='ij')).reshape(3,-1).T
to get the result points with last index changing fast, which is not I want.(It was mentioned only from 1.7.0,meshgrid supports dimension>2, I didn't check)

I found this with some trial and error.
I think the ij v xy indexing has been in meshgrid forever (it's the sparse parameter that's newer). It just affects the order of the 3 returned elements.
To get x_p varying fastest I put it last in the argument list, and then used a ::-1 to reverse column order at the end.
I used stack to join the arrays on a new axis at the end, so I don't need to transpose. But the reshaping and transpose's are all cheap (time wise). So they can be used in any combination that works and is understandable.
In [100]: np.stack(np.meshgrid(z_p, y_p, x_p, indexing='ij'),3).reshape(-1,3)[:,::-1]
Out[100]:
array([[ 0. , 5.5, 8.8],
[ 1.1, 5.5, 8.8],
[ 2.2, 5.5, 8.8],
[ 3.3, 5.5, 8.8],
[ 4.4, 5.5, 8.8],
[ 0. , 6.6, 8.8],
...
[ 2.2, 7.7, 9.9],
[ 3.3, 7.7, 9.9],
[ 4.4, 7.7, 9.9]])

You might permute axes with np.transpose to achieve the output in that desired format -
np.array(np.meshgrid(x_p, y_p, z_p)).transpose(3,1,2,0).reshape(-1,3)
Sample output -
In [104]: np.array(np.meshgrid(x_p, y_p, z_p)).transpose(3,1,2,0).reshape(-1,3)
Out[104]:
array([[ 0. , 5.5, 8.8],
[ 1.1, 5.5, 8.8],
[ 2.2, 5.5, 8.8],
[ 3.3, 5.5, 8.8],
[ 4.4, 5.5, 8.8],
[ 0. , 6.6, 8.8],
[ 1.1, 6.6, 8.8],
[ 2.2, 6.6, 8.8],
[ 3.3, 6.6, 8.8],
[ 4.4, 6.6, 8.8],
[ 0. , 7.7, 8.8],
[ 1.1, 7.7, 8.8],
....
[ 3.3, 7.7, 9.9],
[ 4.4, 7.7, 9.9]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reduce python list size while preserving information - python

Related

Removing incomplete numbers from list Python

How to combine these two numpy arrays?

Find median for each element in list

Problems when using tf.data.Dataset.batch

Creating a numpy array of 3D coordinates from three 1D arrays, first index changing fastest

Categories

Resources