Skipping INF values in 2d array Python

Skipping INF values in 2d array Python - python

I am currently working with a data set that contains a lot of 'INF' values that are messing up my calculations. I've tried to remove these values, but I have been unable to find a way to do this with a 2D array. Some of my code is below:
def date2str(date_str):
date = datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S')
return date.toordinal()
def readfiles(file_list):
data = []
for fname in file_list:
data.append(
np.loadtxt(fname,
usecols=(0,18),
comments='#', # skip comment lines
delimiter='\t',
converters = { 0 : date2str },
dtype=None))
return data
data = readfiles(['soundTransit1_remote_rawMeasurements_15m.txt'])
print data
data = data[np.isfinite(data)]
np.set_printoptions(threshold='nan')
column_0 = np.array(data)[0][:,0]
column_1 = np.array(data)[0][:,1]
thermTemp1_degC = 1/(1.401E-3 + 2.377E-4*np.log(column_1) + 9.730E-8*np.log(column_1)**3)-273.15
I have read in two of the columns (one containing data and the other containing the associated date and time) and separated them so I could manipulate the data (on the last line of my code). I need to be able to skip the lines of my data that contain 'INF'. The current solution I have, 'data = data[np.isfinite(data)] is not working. I receive the error 'TypeError: only integer arrays with one element can be converted to an index'. Can anyone give me some guidance?

Your function readfiles returns a python list, not a numpy array. Python lists can not be indexed with one of numpy's boolean arrays, which is why you get the error with data[np.isfinite(data)]; data is a python list, but np.isfinite(data) is a numpy array of boolean values.
Either return a numpy array from readfiles with something like
return np.array(data)
or convert the result to an array before you try to use numpy's boolean indexing:
data = np.array(data)
data = data[np.isfinite(data)]
You might have to the massage the list a bit to get an array with the desired shape.

Related

Print Pandas without dtype

I've read a few other posts about this but the other solutions haven't worked for me. I'm trying to look at 2 different CSV files and compare data from 1 column from each file. Here's what I have so far:
import pandas as pd
import numpy as np
dataBI = pd.read_csv("U:/eu_inventory/EO BI Orders.csv")
dataOrderTrimmed = dataBI.iloc[:,1:2].values
dataVA05 = pd.read_csv("U:\eu_inventory\VA05_Export.csv")
dataVAOrder = dataVA05.iloc[:,1:2].values
dataVAList = []
ordersInBoth = []
ordersInBI = []
ordersInVA = []
for order in np.nditer(dataOrderTrimmed):
if order in dataVAOrder:
ordersInBoth.append(order)
else:
ordersInBI.append(order)
So if the order number from dataOrderTrimmed is also in dataVAOrder I want to add it to ordersInBoth, otherwise I want to add it to ordersInBI. I think it splits the information correctly but if I try to print ordersInBoth each item prints as array(5555555, dtype=int64) I want to have a list of the order numbers not as an array and not including the dtype information. Let me know if you need more information or if the way I've typed it out is confusing. Thanks!

The way you're using .iloc is giving you a DataFrame, which becomes 2D array when you access values. If you just want the values in the column at index 1, then you should just say:
dataOrderTrimmed = dataBI.iloc[:, 1].values
Then you can iterate over dataOrderTrimmed directly (i.e. you don't need nditer), and you will get regular scalar values.

Numpy/Pandas: Error converting ndarray to series

I have the ndarray "diffTemp":
diffTemp = np.diff([df.Temp])
Where Temp are temperature values whose differences I compute using the difference operator. In this case using print() I get:
print(diffTemp) = [[-0.16 -0.05]]
To convert it into a column vector I use:
diffTemp = diffTemp.transpose()
And then convert is from ndarray into Series using:
diffTemp = pd.Series([diffTemp])
(This allows me later to concatenate diffTime with its corresponding Series dates (diffDates).)
Unfortunately this outputs that diffTemp is:
print(diffTemp) = 0 [[-0.16000000000000014], [-0.05000000000000071]]
If I instead use (i.e. without hard brackets [ ]), such that instead:
diffTemp = pd.Series(diffTemp)
I instead get the error message:
Exception: Data must be 1-dimensional
Totally new to Python and have tried google the last few days without any success. Any help is much much appreciated.

The issue here is that you are trying to convert a two-dimensional array into a 1-dimensional series. Notice that there are two brackets around [[-0.16 -0.05]]. You can write the following to get back a series by just grabbing the 1-d array that you want:
diffTemp = pd.Series(diffTemp[0])

Find big differences in numpy array

I have a csv file that contains data from two led measurements. There are some mistakes in the file that gives huge sparks in the graph. I want to locate this places where this happens.
I have this code that makes two arrays that I plot.
x625 = np.array(df['LED Group 625'].dropna(axis=0, how='all'))
x940 = np.array(df['LED Group 940'].dropna(axis=0, how='all'))

I will provide an answer with some artificial data since you have not posted any data yet.
So after you convert the pandas columns into a numpy array, you can do something like this:
import numpy as np
# some random data. 100 lines and 1 column
x625 = np.random.rand(100,1)
# Assume that the maximum value in `x625` is a spark.
spark = x625.max()
# Find where these spark are in the `x625`
np.where(x625==spark)
#(array([64]), array([0]))
The above means that a value equal to spark is located on the 64th line of the 0th column.
Similarly, you can use np.where(x625 > any_number_here)
If instead of the location you need to create a boolean mask use this:
boolean_mask = (x625==spark)
# verify
np.where(boolean_mask)
# (array([64]), array([0]))
EDIT 1
You can use numpy.diff() to get the element wise differences of all elements into the list (variable).
diffs = np.diff(x625.ravel())
This will have in index 0 the results of element1-element0.
If the vaules in diffs are big in a specific index positio, then a spark occured in that position.

numpy.take range of array elements Python

I have an array of integers.
data = [10,20,30,40,50,60,70,80,90,100]
I want to extract a range of integers from the array and get a smaller array.
data_extracted = [20,30,40]
I tried numpy.take.
data = [10,20,30,40,50,60,70,80,90,100]
start = 1 # index of starting data entry (20)
end = 3 # index of ending data entry (40)
data_extracted = np.take(data,[start:end])
I get a syntax error pointing to the : in numpy.take.
Is there a better way to use numpy.take to store part of an array in a separate array?

You can directly slice the list.
import numpy as np
data = [10,20,30,40,50,60,70,80,90,100]
data_extracted = np.array(data[1:4])
Also, you do not need to use numpy.array, you could just store the data in another list:
data_extracted = data[1:4]
If you want to use numpy.take, you have to pass it a list of the desired indices as second argument:
import numpy as np
data = [10,20,30,40,50,60,70,80,90,100]
data_extracted = np.take(data, [1, 2, 3])
I do not think numpy.take is needed for this application though.

You ought to just use a slice to get a range of indices, there is no need for numpy.take, which is intended as a shortcut for fancy indexing.
data_extracted = data[1:4]

As others have mentioned, you can use fancy indexing in this case. However, if you need to use np.take because e.g. the axis you're slicing over is variable, you might try:
axis=0
data.take(range(1,4), axis=axis)
Note: this might be slower than:
data_extracted = data[1:4]

numpy array dimension mismatch error

I am quite new to numpy and python in general. I am getting a dimension mismatch error when I try to append values even though I have made sure that both arrays have the same dimension. Also another question I have is why does numpy create a single dimensional array when reading in data from a tab delimited text file.
import numpy as np
names = ["Angle", "RX_Power", "Frequency"]
data = np.array([0,0,0],float) #experimental
data = np.genfromtxt("rx_power_mode 0.txt", dtype=float, delimiter='\t', names = names, usecols=[0,1,2], skip_header=1)
freq_177 = np.zeros(shape=(data.shape))
print(freq_177.shape) #outputs(315,)
for i in range(len(data)):
if data[i][2] == 177:
#np.concatenate(freq_177,data[i]) has same issue
np.append(freq_177,data[i],0)
The output I am getting is
all the input arrays must have same number of dimensions

Annotated code:
import numpy as np
names = ["Angle", "RX_Power", "Frequency"]
You don't need to 'initialize' an array - unless you are going to assign values to individual elements.
data = np.array([0,0,0],float) #experimental
This data assignment completely overwrites the previous one.
data = np.genfromtxt("rx_power_mode 0.txt", dtype=float, delimiter='\t', names = names, usecols=[0,1,2], skip_header=1)
Look at data at this point. What is data.shape? What is data.dtype? Print it, or at least some elements. With names I'm guessing that this is a 1d array, with a 3 field dtype. It's not a 2d array, though, with all floats it could transformed/view as such.
Why are you making a 1d array of zeros?
freq_177 = np.zeros(shape=(data.shape))
print(freq_177.shape) #outputs(315,)
With a structured array like data, the preferred way to index a given element is by field name and row number, eg. data['frequency'][i]`. Play with that.
np.append is not the same as the list append. It returns a value; it does not change freq_177 in place. Same for concatenate. I recommend staying away from np.append. It's too easy to use it in the wrong way and place.
for i in range(len(data)):
if data[i][2] == 177:
#np.concatenate(freq_177,data[i]) has same issue
np.append(freq_177,data[i],0)
It looks like you want to collect in freq_177 all the terms of the data array for which the 'frequency' field is 177.
I = data['frequency'].astype(int)==177
freq_177 = data[I]
I have used astype(int) because the == test with floats is uncertain. It is best used with integers.
I is a boolean mask, true where the values match; data[I] then is the corresponding elements of data. The dtype will match that of data, that is, it will have 3 fields. You can't append or concatenate it to an array of float zeros (your original freq_177).
If you must iterate and collect values, I suggest using list append, e.g.
alist = []
for row in data:
if int(row['frequency'])==177:
alist.append(row)
freq177 = np.array(alist)
I don't think np.append is discussed much except in its own doc page and text. It comes up periodically in SO questions.
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.append.html
Returns: append : ndarray
A copy of arr with values appended to axis. Note that append does not occur in-place: a new array is allocated and filled.
See also help(np.append) in an interpreter shell.
For genfromtxt - it too has docs, and lots of SO discussion. But to understand what it returned in this case, you need to also read about structured arrays and compound dtype. (add links?)
Try loading the data with:
data = np.genfromtxt("rx_power_mode 0.txt", dtype=float, delimiter='\t', usecols=[0,1,2], skip_header=1)
Since you are skipping the header line, and just using columns with floats, data should be a 2d array with 3 columns, (N, 3). In that case you could access the 'frequency' values with data[:,2]
I = int(data[:,2])==177
freq_177 = data[I,:]
freq_177 is now be a 3 column array - with a subset of the data rows.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Skipping INF values in 2d array Python - python

Related

Print Pandas without dtype

Numpy/Pandas: Error converting ndarray to series

Find big differences in numpy array

numpy.take range of array elements Python

numpy array dimension mismatch error

Categories

Resources