How would I combine these two arrays:
x = np.asarray([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3], [3.0, 3.1, 3.2, 3.3],
[4.0, 4.1, 4.2, 4.3], [5.0, 5.1, 5.2, 5.3]])
y = np.asarray([[0.1], [0.2], [0.3], [0.4], [0.5]])
Into something like this:
xy = [[0.1, [1.0, 1.1, 1.2, 1.3]], [0.2, [2.0, 2.1, 2.2, 2.3]...
Thank you for the assistance!
Someone suggested I post code that I have tried and I realized I had forgot to:
xy = np.array(list(zip(x, y)))
This is my current solution, however it is extremely inefficient.
You can use zip to combine
[[a,b] for a,b in zip(y,x)]
Out:
[[array([0.1]), array([1. , 1.1, 1.2, 1.3])],
[array([0.2]), array([2. , 2.1, 2.2, 2.3])],
[array([0.3]), array([3. , 3.1, 3.2, 3.3])],
[array([0.4]), array([4. , 4.1, 4.2, 4.3])],
[array([0.5]), array([5. , 5.1, 5.2, 5.3])]]
A pure numpy solution will be much faster than list comprehension for large arrays.
I do have to say your use case makes no sense, as there is no logic in putting these arrays into a single data structure, and I believe you should re check your design.
Like #user2357112 supports Monica was subtly implying, this is very likely an XY problem. See if this is really what you are trying to solve, and not something else. If you want something else, try asking about that.
I strongly suggest checking what you want to do before moving on, as you will put yourself in a place with bad design.
That aside, here's a solution
import numpy as np
x = np.asarray([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3], [3.0, 3.1, 3.2, 3.3],
[4.0, 4.1, 4.2, 4.3], [5.0, 5.1, 5.2, 5.3]])
y = np.asarray([[0.1], [0.2], [0.3], [0.4], [0.5]])
xy = np.hstack([y, x])
print(xy)
prints
[[0.1 1. 1.1 1.2 1.3]
[0.2 2. 2.1 2.2 2.3]
[0.3 3. 3.1 3.2 3.3]
[0.4 4. 4.1 4.2 4.3]
[0.5 5. 5.1 5.2 5.3]]
Related
I have an
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2])
I want to get the sum of the equation:
(5.1 - 1)
(4.9 - 1)
(4.7 - 1)
(4.6 - 1)
How do I get the every arrays' first element?
Assuming this is a Numpy array, you can just subtract from the first column and let broadcasting do the work. If you want the sum of that result, just use sum():
import numpy as np
arr = np.array([
[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2]
])
a = arr[:, 0] - 1
#array([4.1, 3.9, 3.7, 3.6])
a.sum()
15.299999999999999
If you are bothered by the inexact sum, make sure you read Is floating point math broken?
There's an axis argument in array.sum that you can set to sum the array vertically.
(arr-1).sum(axis=0)
array([15.3, 8.8, 1.6, -3.2])
I have multiple long lists in my program. Each list has approximately 3000 float values.
And there are around 100 such lists.
I want to reduce the size of each list to say, 500, while preserving the information in the original list. I know that it is not possible to completely preserve the information, but I would like to have the elements in the original list to have contribution to the values of the smaller list.
Let's say we have the following list, and want to shorten it to a lists of size 3 or 4.
myList = [[4.3, 2.3, 5.1, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1],
[7.3, 3.5, 6.2, 7.4, 2.6, 3.7, 2.6, 7.1, 3.4, 7.1],
[4.7, 2.6, 5.6, 7.4, 3.7, 7.7, 3.5, 6.5, 7.2, 4.1],
[7.3, 7.3, 4.1, 6.6, 2.2, 3.9, 1.6, 3.0, 2.3, 4.6],
[4.7, 2.3, 5.7, 6.4, 3.4, 6.8, 7.2, 6.9, 8.4, 7.1]]
Is there some way to do this. Maybe by averaging of some sort (?)
You can do something like this:
from statistics import mean, stdev
myList = [[4.3, 2.3, 5.1, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1], [2.3, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1]]
shorten_list = [[max(i)-min(i), mean(i), round(stdev(i), 5)] for i in myList]
You can also include information such as the sum of the list or the mode. If you just want to take the mean of each list within your list, you can just do this:
from statistics import mean
mean_list = list(map(mean, myList))
batching may work.
I request you to look at this question
How do I split a list into equally-sized chunks?
this converts the list into equal batches.
or can sequence the dimension of the list using max pool layer
import numpy as np
from keras.models import Sequential
from keras.layers import MaxPooling2D
image = np.array([[4.3, 2.3, 5.1, 6.4, 3.2, 7.7, 1.5, 6.5, 7.4, 4.1],
[7.3, 3.5, 6.2, 7.4, 2.6, 3.7, 2.6, 7.1, 3.4, 7.1],
[4.7, 2.6, 5.6, 7.4, 3.7, 7.7, 3.5, 6.5, 7.2, 4.1],
[7.3, 7.3, 4.1, 6.6, 2.2, 3.9, 1.6, 3.0, 2.3, 4.6],
[4.7, 2.3, 5.7, 6.4, 3.4, 6.8, 7.2, 6.9, 8.4, 7.1]]
)
image = image.reshape(1, 5, 10, 1)
model = Sequential([MaxPooling2D(pool_size =(1,10), strides = (1))])
output = model.predict(image)
print(output)
this gives output as
[[[[7.7]]
[[7.4]]
[[7.7]]
[[7.3]]
[[8.4]]]]
if you want to change the output size, can change the pool size.
I want to make clear how tf.data.Dataset.batch work with my dataset. The dataset is as follows:
dataset = tf.convert_to_tensor([[5.1, 3.3, 1.7, 0.5, ],
[5.9, 3.0, 4.2, 1.5],
[6.9, 3.1, 5.4, 2.1],
[2.3, 1.3, 6.4, 9.3]])
Then I use batch method:
dataset = dataset.batch(2)
and iterate the dataset once.
x = tfe.Iterator(dataset).next()
As I suppose, the result should be a 2*4 array, but it returns the whole 4*4 dataset.
Could anyone give me some details about how to apply the batch method?
You need to convert your dataset Tensor into a TensorSliceDataset, i.e. telling Tensorflow to slice the tensor and make a dataset of it.
import tensorflow as tf
data = tf.convert_to_tensor([[5.1, 3.3, 1.7, 0.5],
[5.9, 3.0, 4.2, 1.5],
[6.9, 3.1, 5.4, 2.1],
[2.3, 1.3, 6.4, 9.3]])
dataset = tf.data.Dataset.from_tensor_slices(data).batch(2)
batch_iterator = dataset.make_one_shot_iterator().get_next()
sess = tf.InteractiveSession()
batch = sess.run(batch_iterator)
print(batch)
# [[ 5.1 3.3 1.7 0.5 ]
# [ 5.9 3. 4.2 1.5 ]]
If X is an array, what is the meaning of X[:,0]? In fact, it is not the first time I see such thing, and it's confusing me, but I can't see what is its meaning? Could anyone be able to show me an example? A full clear answer would be appreciated on this question of comma.
Please see the file https://github.com/lazyprogrammer/machine_learning_examples/blob/master/ann_class/forwardprop.py
The comma inside the bricks seperates the rows from the columns you want to slide from your array.
x[row,column]
You can place ":" before or after the row and column values. Before the value it means "unitl" and after the value it means "from".
For example you have:
x: array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
[4.4, 2.9, 1.4, 0.2]])
x[:,:] would mean u want every row and every column.
x[3,3] would mean u want the 3 row and the 3 column value
x[:3,:3] would mean u want the rows and columns until 3
x[:, 3] would mean u want the 3 column and every row
>>> x = [1, 2, 3]
>>> x[:, 0] Traceback (most recent call last):
File "<stdin>", line 1, in <module> TypeError: list indices must be integers, not tuple
If you see that, then the variable is not a list, but something else. A numpy array, perhaps.
I am creating an example matrix:
import numpy as np
np.random.seed(0)
F = np.random.randint(2,5, size=(3, 4), dtype = 'int32' )
F
Query cutting matrix rows:
F[0:2]
Query cutting matrix columns:
F[:,2]
to be straight at point it is X[rows, columns] as some one mentioned but you may ask wat just colon means : in "X[:,0]" it means you say list all.
So X[:,0] - > would say list elements in all rows as it just colon : present in first column so the column of entire matrix is printed out. dimension is [no_of_rows * 1]
Similarly, X[:,1] - > this would list the second column from all rows.
Hope this clarifies you
Pretty clear. Check this out!
Load some data
from sklearn import datasets
iris = datasets.load_iris()
samples = iris.data
Explore first 10 elements of 2D array
samples[:10]
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
[4.4, 2.9, 1.4, 0.2],
[4.9, 3.1, 1.5, 0.1]])
Test our annotation
x = samples[:,0]
x[:10]
array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4, 4.9])
y = samples[:,1]
y[:10]
array([3.5, 3. , 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1])
P.S. The length of samples is 150, I've cut it to 10 for clarity.
My current text file that I intend to use for LSTM training in Tensorflow looks like this:
> 0.2, 4.3, 1.2
> 1.1, 2.2, 3.1
> 3.5, 4.1, 1.1, 4300
>
> 1.2, 3.3, 1.2
> 1.5, 2.4, 3.1
> 3.5, 2.1, 1.1, 4400
>
> ...
There are 3 sequences 3 features vectors with only 1 label for each sample. I formatted this text file so it can be consistent with the LSTM training as the latter requires a time-steps of the sequences or in general, LSTM training requires a 3D tensor (batch, num of time-steps, num of features).
My question: How should I use Numpy or TensorFlow.TextReader in order to reformat the 3x3 sequence vectors and the singleton Labels so it can become compatible with Tensorflow?
Edit: I saw many tutorials on reformatting text or CSV files that has vectors and labels but unfortunately they were for 1 to 1 relationships e.g.
0.2, 4.3, 1.2, Class1
1.1, 2.2, 3.1, Class2
3.5, 4.1, 1.1, Class3
becomes:
[0.2, 4.3, 1.2, Class1], [1.1, 2.2, 3.1, Class2], [3.5, 4.1, 1.1, Class3]
which clearly is readable by Numpy and can build vectors easily from it dedicated for simple Feed-Forward NN tasks. But this procedure doesn't actually build an LSTM friendly CSV.
EDIT: The TensorFlow tutorial on CSV formats, covers only 2D arrays as an example. The features = col1, col2, col3 doesn't assume that there might be time-steps for each sequence array and hence my question.
I'm a little confused as to whether you are more interested in the numpy array(s) structure, or the csv fomat.
The np.savetxt csv file writer can't readily produce text like:
0.2, 4.3, 1.2
1.1, 2.2, 3.1
3.5, 4.1, 1.1, 4300
1.2, 3.3, 1.2
1.5, 2.4, 3.1
3.5, 2.1, 1.1, 4400
savetxt is not tricky. It opens a file for writing, and then iterates on the input array, writing it, one row at a time to the file. Effectively:
for row in arr:
f.write(fmt % tuple(row))
where fmt has a % field for each element of the the row. In the simple case it constructs fmt = delimiter.join(['fmt']*(arr.shape[1])). In other words repeating the simgle field fmt for the number of columns. Or you can give it a multifield fmt.
So you could use normal line/file writing methods to write a custom display. The simplest is to construct it using the usual print commends, and then redirect those to a file.
But having done that, there's the question of how to read that back into a numpy session. np.genfromtxt can handle missing data, but you still have to include the delimiters. It's also trickier to have it read blocks (3 lines separated by a blank line). It's not impossible, but you have to do some preprocessing.
Of course genfromtxt isn't that tricky either. It reads the file line by line, converts each line into a list of numbers or strings, and collects those lists in a master list. Only at the end is that list converted into an array.
I can construct an array like your text with:
In [121]: dt = np.dtype([('lbl',int), ('block', float, (3,3))])
In [122]: A = np.zeros((2,),dtype=dt)
In [123]: A
Out[123]:
array([(0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]),
(0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]])],
dtype=[('lbl', '<i4'), ('block', '<f8', (3, 3))])
In [124]: A['lbl']=[4300,4400]
In [125]: A[0]['block']=np.array([[.2,4.3,1.2],[1.1,2.2,3.1],[3.5,4.1,1.1]])
In [126]: A
Out[126]:
array([(4300, [[0.2, 4.3, 1.2], [1.1, 2.2, 3.1], [3.5, 4.1, 1.1]]),
(4400, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]])],
dtype=[('lbl', '<i4'), ('block', '<f8', (3, 3))])
In [127]: A['block']
Out[127]:
array([[[ 0.2, 4.3, 1.2],
[ 1.1, 2.2, 3.1],
[ 3.5, 4.1, 1.1]],
[[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]]])
I can load it from a txt that has all the block values flattened:
In [130]: txt=b"""4300, 0.2, 4.3, 1.2, 1.1, 2.2, 3.1, 3.5, 4.1, 1.1"""
In [131]: txt
Out[131]: b'4300, 0.2, 4.3, 1.2, 1.1, 2.2, 3.1, 3.5, 4.1, 1.1'
genfromtxt can handle a complex dtype, allocating values in order from the flat line list:
In [133]: data=np.genfromtxt([txt],delimiter=',',dtype=dt)
In [134]: data['lbl']
Out[134]: array(4300)
In [135]: data['block']
Out[135]:
array([[ 0.2, 4.3, 1.2],
[ 1.1, 2.2, 3.1],
[ 3.5, 4.1, 1.1]])
I'm not sure about writing it. I have have to reshape it into a 10 column or field array, if I want to use savetxt.
UPDATE: addition to the previos answer:
df.stack().to_csv('d:/temp/1D.csv', index=False)
1D.csv:
0.2
4.3
1.2
4300.0
1.1
2.2
3.1
4300.0
3.5
4.1
1.1
4300.0
1.2
3.3
1.2
4400.0
1.5
2.4
3.1
4400.0
3.5
2.1
1.1
4400.0
OLD answer:
Here is a Pandas solution.
Assume we have the following text file:
0.2, 4.3, 1.2
1.1, 2.2, 3.1
3.5, 4.1, 1.1, 4300
1.2, 3.3, 1.2
1.5, 2.4, 3.1
3.5, 2.1, 1.1, 4400
Code:
import pandas as pd
In [95]: fn = r'D:\temp\.data\data.txt'
In [96]: df = pd.read_csv(fn, sep=',', skipinitialspace=True, header=None, names=list('abcd'))
In [97]: df
Out[97]:
a b c d
0 0.2 4.3 1.2 NaN
1 1.1 2.2 3.1 NaN
2 3.5 4.1 1.1 4300.0
3 1.2 3.3 1.2 NaN
4 1.5 2.4 3.1 NaN
5 3.5 2.1 1.1 4400.0
In [98]: df.d = df.d.bfill()
In [99]: df
Out[99]:
a b c d
0 0.2 4.3 1.2 4300.0
1 1.1 2.2 3.1 4300.0
2 3.5 4.1 1.1 4300.0
3 1.2 3.3 1.2 4400.0
4 1.5 2.4 3.1 4400.0
5 3.5 2.1 1.1 4400.0
now you can save it back to CSV:
df.to_csv('d:/temp/out.csv', index=False, header=None)
d:/temp/out.csv:
0.2,4.3,1.2,4300.0
1.1,2.2,3.1,4300.0
3.5,4.1,1.1,4300.0
1.2,3.3,1.2,4400.0
1.5,2.4,3.1,4400.0
3.5,2.1,1.1,4400.0