List conversion

List conversion - python

I am looking for a way to convert a list like this
[[1.1, 1.2, 1.3, 1.4, 1.5],
[2.1, 2.2, 2.3, 2.4, 2.5],
[3.1, 3.2, 3.3, 3.4, 3.5],
[4.1, 4.2, 4.3, 4.4, 4.5],
[5.1, 5.2, 5.3, 5.4, 5.5]]
to something like this
[[(1.1,1.2),(1.2,1.3),(1.3,1.4),(1.4,1.5)],
[(2.1,2.2),(2.2,2.3),(2.3,2.4),(2.4,2.5)]
.........................................

The following line should do it:
[list(zip(row, row[1:])) for row in m]
where m is your initial 2-dimensional list
UPDATE for second question in comment
You have to transpose (= exchange columns with rows) your 2-dimensional list. The python way to achieve a transposition of m is zip(*m):
[list(zip(column, column[1:])) for column in zip(*m)]

In response to further comment from questioner, two answers:
# Original grid
grid = [[1.1, 1.2, 1.3, 1.4, 1.5],
[2.1, 2.2, 2.3, 2.4, 2.5],
[3.1, 3.2, 3.3, 3.4, 3.5],
[4.1, 4.2, 4.3, 4.4, 4.5],
[5.1, 5.2, 5.3, 5.4, 5.5]]
# Window function to return sequence of pairs.
def window(row):
return [(row[i], row[i + 1]) for i in range(len(row) - 1)]
ORIGINAL QUESTION:
# Print sequences of pairs for grid
print [window(y) for y in grid]
UPDATED QUESTION:
# Take the nth item from every row to get that column.
def column(grid, columnNumber):
return [row[columnNumber] for row in grid]
# Transpose grid to turn it into columns.
def transpose(grid):
# Assume all rows are the same length.
numColumns = len(grid[0])
return [column(grid, columnI) for columnI in range(numColumns)]
# Return windowed pairs for transposed matrix.
print [window(y) for y in transpose(grid)]

Another version would be to use lambda and map
map(lambda x: zip(x,x[1:]),m)
where m is your matrix of choice.

List comprehensions provide a concise way to create lists:
http://docs.python.org/tutorial/datastructures.html#list-comprehensions
[[(a[i],a[i+1]) for i in xrange(len(a)-1)] for a in A]

Related

How to remove rows from numpy array if certain number of an element is present

I have a 2d numpy array that contains some numbers like:
data =
[[1.1, 1.2, 1.3, 1.4],
[2.1, 2.2, 2.3, -1.0],
[-1.0, 3.2, 3.3, -1.0],
[-1.0, -1.0. -1.0, -1.0]]
I want to remove every row that contains the value -1.0 2 or more times, so I'm left with
data =
[[1.1, 1.2, 1.3, 1.4],
[2.1, 2.2, 2.3, -1.0]]
I found this question which looks like it's very close to what I'm trying to do, but I can't quite figure out how I can rewrite that to fit my use case.

You can easily do it with this piece of code:
new_data = data[(data == -1).sum(axis=1) < 2]
Result:
>>> new_data
array([[ 1.1, 1.2, 1.3, 1.4],
[ 2.1, 2.2, 2.3, -1. ]])

def remove_rows(data, threshold):
mask = np.array([np.sum(row == -1) < threshold for row in data])
return data[mask]
This function will return a new array with no rows having -1's more than or equal to the threshold
You need to pass in a Numpy array for it to work.

How to combine these two numpy arrays?

How would I combine these two arrays:
x = np.asarray([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3], [3.0, 3.1, 3.2, 3.3],
[4.0, 4.1, 4.2, 4.3], [5.0, 5.1, 5.2, 5.3]])
y = np.asarray([[0.1], [0.2], [0.3], [0.4], [0.5]])
Into something like this:
xy = [[0.1, [1.0, 1.1, 1.2, 1.3]], [0.2, [2.0, 2.1, 2.2, 2.3]...
Thank you for the assistance!
Someone suggested I post code that I have tried and I realized I had forgot to:
xy = np.array(list(zip(x, y)))
This is my current solution, however it is extremely inefficient.

You can use zip to combine
[[a,b] for a,b in zip(y,x)]
Out:
[[array([0.1]), array([1. , 1.1, 1.2, 1.3])],
[array([0.2]), array([2. , 2.1, 2.2, 2.3])],
[array([0.3]), array([3. , 3.1, 3.2, 3.3])],
[array([0.4]), array([4. , 4.1, 4.2, 4.3])],
[array([0.5]), array([5. , 5.1, 5.2, 5.3])]]

A pure numpy solution will be much faster than list comprehension for large arrays.
I do have to say your use case makes no sense, as there is no logic in putting these arrays into a single data structure, and I believe you should re check your design.
Like #user2357112 supports Monica was subtly implying, this is very likely an XY problem. See if this is really what you are trying to solve, and not something else. If you want something else, try asking about that.
I strongly suggest checking what you want to do before moving on, as you will put yourself in a place with bad design.
That aside, here's a solution
import numpy as np
x = np.asarray([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3], [3.0, 3.1, 3.2, 3.3],
[4.0, 4.1, 4.2, 4.3], [5.0, 5.1, 5.2, 5.3]])
y = np.asarray([[0.1], [0.2], [0.3], [0.4], [0.5]])
xy = np.hstack([y, x])
print(xy)
prints
[[0.1 1. 1.1 1.2 1.3]
[0.2 2. 2.1 2.2 2.3]
[0.3 3. 3.1 3.2 3.3]
[0.4 4. 4.1 4.2 4.3]
[0.5 5. 5.1 5.2 5.3]]

Find median for each element in list

I have some large list of data, between 1000 and 10000 elements. Now I want to filter out some peak values with the help of the median function.
#example list with just 10 elements
my_list = [4.5, 4.7, 5.1, 3.9, 9.9, 5.6, 4.3, 0.2, 5.0, 4.6]
#list of medians calculated from 3 elements
my_median_list = []
for i in range(len(my_list)):
if i == 0:
my_median_list.append(statistics.median([my_list[0], my_list[1], my_list[2]])
elif i == (len(my_list)-1):
my_median_list.append(statistics.median([my_list[-1], my_list[-2], my_list[-3]])
else:
my_median_list.append(statistics.median([my_list[i-1], my_list[i], my_list[i+1]])
print(my_median_list)
# [4.7, 4.7, 4.7, 5.1, 5.6, 5.6, 4.3, 4.3, 4.6, 4.6]
This works so far. But I think it looks ugly and is maybe inefficient? Is there a way with statistics or NumPy to do it faster? Or another solution? Also, I look for a solution where I can pass an argument from how many elements the median is calculated. In my example, I used the median always from 3 elements. But with my real data, I want to play with the median setting and then maybe use the median out of 10 elements.

You are calculating too many values since:
my_median_list.append(statistics.median([my_list[i-1], my_list[i], my_list[i+1]])
and
my_median_list.append(statistics.median([my_list[0], my_list[1], my_list[2]])
are the same when i == 1. The same error happens at the end so you get one too many end values.
It's easier and less error-prone to do this with zip() which will make the three element tuples for you:
from statistics import median
my_list = [4.5, 4.7, 5.1, 3.9, 9.9, 5.6, 4.3, 0.2, 5.0, 4.6]
[median(l) for l in zip(my_list, my_list[1:], my_list[2:])]
# [4.7, 4.7, 5.1, 5.6, 5.6, 4.3, 4.3, 4.6]
For groups of arbitrary size collections.deque is super handy because you can set a max size. Then you just keep pushing items on one end and it removes items off the other to maintain the size. Here's a generator example that takes you groups size as n:
from statistics import median
from collections import deque
def rolling_median(l, n):
d = deque(l[0:n], n)
yield median(d)
for num in l[n:]:
d.append(num)
yield median(d)
my_list = [4.5, 4.7, 5.1, 3.9, 9.9, 5.6, 4.3, 0.2, 5.0, 4.6]
list(rolling_median(my_list, 3))
# [4.7, 4.7, 5.1, 5.6, 5.6, 4.3, 4.3, 4.6]
list(rolling_median(my_list, 5))
# [4.7, 5.1, 5.1, 4.3, 5.0, 4.6]

The meaning of the comma inside X[:,0]

If X is an array, what is the meaning of X[:,0]? In fact, it is not the first time I see such thing, and it's confusing me, but I can't see what is its meaning? Could anyone be able to show me an example? A full clear answer would be appreciated on this question of comma.
Please see the file https://github.com/lazyprogrammer/machine_learning_examples/blob/master/ann_class/forwardprop.py

The comma inside the bricks seperates the rows from the columns you want to slide from your array.
x[row,column]
You can place ":" before or after the row and column values. Before the value it means "unitl" and after the value it means "from".
For example you have:
x: array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
[4.4, 2.9, 1.4, 0.2]])
x[:,:] would mean u want every row and every column.
x[3,3] would mean u want the 3 row and the 3 column value
x[:3,:3] would mean u want the rows and columns until 3
x[:, 3] would mean u want the 3 column and every row

>>> x = [1, 2, 3]
>>> x[:, 0] Traceback (most recent call last):
File "<stdin>", line 1, in <module> TypeError: list indices must be integers, not tuple
If you see that, then the variable is not a list, but something else. A numpy array, perhaps.

I am creating an example matrix:
import numpy as np
np.random.seed(0)
F = np.random.randint(2,5, size=(3, 4), dtype = 'int32' )
F
Query cutting matrix rows:
F[0:2]
Query cutting matrix columns:
F[:,2]

to be straight at point it is X[rows, columns] as some one mentioned but you may ask wat just colon means : in "X[:,0]" it means you say list all.
So X[:,0] - > would say list elements in all rows as it just colon : present in first column so the column of entire matrix is printed out. dimension is [no_of_rows * 1]
Similarly, X[:,1] - > this would list the second column from all rows.
Hope this clarifies you

Pretty clear. Check this out!
Load some data
from sklearn import datasets
iris = datasets.load_iris()
samples = iris.data
Explore first 10 elements of 2D array
samples[:10]
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
[4.4, 2.9, 1.4, 0.2],
[4.9, 3.1, 1.5, 0.1]])
Test our annotation
x = samples[:,0]
x[:10]
array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4, 4.9])
y = samples[:,1]
y[:10]
array([3.5, 3. , 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1])
P.S. The length of samples is 150, I've cut it to 10 for clarity.

How to improve my performance in filling gaps in time series and data lists with Python

I'm having a time series data sets comprising of 10 Hz data over several years. For one year my data has around 3.1*10^8 rows of data (each row has a time stamp and 8 float values). My data has gaps which I need to identify and fill with 'NaN'. My python code below is capable of doing so but the performance is by far too bad for my kind of problem. I cannot get though my data set in anything even close to a reasonable time.
Below an minimal working example.
I have for example series (time-seris-data) and data as lits with same lengths:
series = [1.1, 2.1, 3.1, 7.1, 8.1, 9.1, 10.1, 14.1, 15.1, 16.1, 20.1]
data_a = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
data_b = [1.2, 1.2, 1.2, 2.2, 2.2, 2.2, 2.2, 3.2, 3.2, 3.2, 4.2]
I would like series to advance in intervals of 1, hence the gaps of series are 4.1, 5.1, 6.1, 11.1, 12.1, 13.1, 17.1, 18.1, 19.1. The data_a and data_b lists shall be filled with float(nan)'s.
so data_a for example should become:
[1.2, 1.2, 1.2, nan, nan, nan, 2.2, 2.2, 2.2, 2.2, nan, nan, nan, 3.2, 3.2, 3.2, nan, nan, nan, 4.2]
I archived this using:
d_max = 1.0 # Normal increment in series where no gaps shall be filled
shift = 0
for i in range(len(series)-1):
diff = series[i+1] - series[i]
if diff > d_max:
num_fills = round(diff/d_max)-1 # Number of fills within one gap
for it in range(num_fills):
data_a.insert(i+1+it+shift, float(nan))
data_b.insert(i+1+it+shift, float(nan))
shift = int(shift + num_fills) # Shift the index by the number of inserts from the previous gap filling
I searched for other solutions to this problems but only came across the use of the find() function yielding the indices of the gaps. Is the function find() faster than my solution? But then how would I insert NaN's in data_a and data_b in a more efficient way?

First, realize that your innermost loop is not necessary:
for it in range(num_fills):
data_a.insert(i+1+it+shift, float(nan))
is the same as
data_a[i+1+shift:i+1+shift] = [float(nan)] * int(num_fills)
That might make it slightly faster because there's less allocation and less moving items going on.
Then, for large numerical problems, always use NumPy. It may take some effort to learn, but the performance is likely to go up orders of magnitude. Start with something like:
import numpy as np
series = np.array([1.1, 2.1, 3.1, 7.1, 8.1, 9.1, 10.1, 14.1, 15.1, 16.1, 20.1])
data_a = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
data_b = [1.2, 1.2, 1.2, 2.2, 2.2, 2.2, 2.2, 3.2, 3.2, 3.2, 4.2]
d_max = 1.0 # Normal increment in series where no gaps shall be filled
shift = 0
# the following two statements use NumPy's broadcasting
# to implicit run some loop at the C level
diff = series[1:] - series[:-1]
num_fills = np.round(diff / d_max) - 1
for i in np.where(diff > d_max)[0]:
nf = num_fills[i]
nans = [np.nan] * nf
data_a[i+1+shift:i+1+shift] = nans
data_b[i+1+shift:i+1+shift] = nans
shift = int(shift + nf)

IIRC, inserts into python lists are expensive, with the size of the list.
I'd recommend not loading your huge data sets into memory, but to iterate through them with a generator function something like:
from itertools import izip
series = [1.1, 2.1, 3.1, 7.1, 8.1, 9.1, 10.1, 14.1, 15.1, 16.1, 20.1]
data_a = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
data_b = [1.2, 1.2, 1.2, 2.2, 2.2, 2.2, 2.2, 3.2, 3.2, 3.2, 4.2]
def fillGaps(series,data_a,data_b,d_max=1.0):
prev = None
for s, a, b in izip(series,data_a,data_b):
if prev is not None:
diff = s - prev
if s - prev > d_max:
for x in xrange(int(round(diff/d_max))-1):
yield (float('nan'),float('nan'))
prev = s
yield (a,b)
newA = []
newB = []
for a,b in fillGaps(series,data_a,data_b):
newA.append(a)
newB.append(b)
E.g. read the data into the izip and write it out instead of list appends.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

List conversion - python

Another version would be to use lambda and map map(lambda x: zip(x,x[1:]),m) where m is your matrix of choice.

List comprehensions provide a concise way to create lists: http://docs.python.org/tutorial/datastructures.html#list-comprehensions [[(a[i],a[i+1]) for i in xrange(len(a)-1)] for a in A]

Related

How to remove rows from numpy array if certain number of an element is present

How to combine these two numpy arrays?

Find median for each element in list

The meaning of the comma inside X[:,0]

How to improve my performance in filling gaps in time series and data lists with Python

Categories

Resources