I have a list of values and would like to convert it to the log of that list or pass the log of a list to a function. I'm more familiar with R and you can usually throw some () around anything. When I attempt this in Python I get the error:
TypeError: must be real number, not list
List looks like this:
pressures[:5]
Out[11]: [1009.58, 1009.58, 1009.55, 1009.58, 1009.65]
It doesn't really matter where I try to take the log, I get the same error...in a function:
plt.plot(timestamps, log(pressures))
plt.xlabel('Timestamps')
plt.ylabel('Air Pressure')
plt.show()
Whilst parsing data:
pressures = log([record['air_pressure'] for record in data])
There are a couple of ways to handle this. Python has some basic, built in functions in the math module. One is log. It takes a float or an int as a parameter and outputs a float:
> from math import log
> log(20)
2.995732273553991
To process a list with this function, you'd need to call it on every item in the list:
> data = [1, 2, 3]
> [log(x) for x in data]
[0.0, 0.6931471805599453, 1.0986122886681098]
On the other hand, and I mention this because it looks like you're already using some related libraries, numpy can process an entire list at once.
> import numpy as np
> np.log([1, 2, 3])
array([ 0. , 0.69314718, 1.09861229]) # Notice this is a numpy array
If you want to use numpy and get a list back, you could do this instead:
> list(np.log([1, 2, 3]))
[0.0, 0.69314718055994529, 1.0986122886681098]
You can only use log() with a single number. So you'll need to write a loop to iterate over your list and apply log() to each number.
Fortunately, you have already written a loop that, with some modification, will do the trick. Instead of:
pressures = log([record['air_pressure'] for record in data])
Write:
pressures = [log(record['air_pressure']) for record in data]
If you wanted to do logs and you have a list of integers you can use the math lib for that.
import math
my_data = [1,2,3,4,5,6,7,8,9]
log_my_data = [math.log(x) for x in my_data]
print(log_my_data)
Related
I am trying to bin the values in my data and put them in a dictionary in Python.
However, after creating the dictionary, its key-range produces weird aritfacts, like 0.6900000000000001 instead of 0.69. They only appear after creating the dictionary, though, the initial array "key_range" has only normal values. Therefore, the last two lines of my code produce KeyErrors, since the value 0.69 does not exist.
Does anyone know what is going on? Is it wrong to use the zip-function? Can I not create a functioning dictionary like this? I suppose I can iterate through the key values and round them manually, but I imagine there are more elegant solutions.
Cheers, and thanks
import numpy as np
key_range = np.arange(0, 1, 0.01) # these numbers are perfectly OK.
values = [0] * len(key_range)
value_dict = dict(zip(key_range, values)) # and here, I get weird artifacts.
print(value_dict)
for i in range(0, len(data)):
value_dict[data[i]] = value_dict[data[i]] + 1
I suppose I can iterate through the key values and round them manually, but I imagine there are more elegant solutions.. For what it is worth, you can fix them within your expression that creates value_dict, which still looks pretty elegant to me:
value_dict = dict(zip(map(lambda x: round(x,2), key_range), values))
I have a list of list of float lists and I want to test if a value pair (e.g. [2.0, 1.1]) is already in this list. Therefor I wrote a simple code to check this. As far as I understand my code it should always write the result array. I think the if statement is not correct formulated or is at least not doing what I intended to do. The result array should be look like: [0, 1, 2]. It is like a 'self-check'.
import numpy as np
array_a = np.asarray([[2.0, 1.1], [3.3, 4.4], [2.5, 3.0]])
array_a_list = array_a.tolist()
result = np.zeros(np.size(array_a_list, axis=0))
for i in range(np.size(array_a_list, axis=0)):
print(i)
if array_a_list[i] in array_a_list: # Shouldn't this be always true? At least that's what I expect it to be.
result[i] = array_a_list.index(i) # here I'm expecting to get the index back, where the entry is stored
test = array_a_list[i]
So the overall idea is to check if an entry is already in a list. If that's the case I want to get back the index at which the entry is stored. In my case an entry is an array of float which is looking like the following: [2.0, 1.1]. The idea came from this question.
Got it solved. I edited the result line now:
result[i] = array_a_list.index(array_a_list[i])
I got a list of values and i would like to convert it in an array in order to extract easily columns, but i m embarassed with " which doesn t allow to use : " x = np.array(a, dtype=float)"
['"442116.503118","442116.251106"',
'"442141.502863","442141.247462"',
...
The message obtained is :
"could not convert string to float: "442116.503118","442116.251106""
Answering based on the VERY limited information given, but if that is your list it looks like a list of nested strings, not floats. Try
x = np.array([float(i.replace("\"","")) for i in a], dtype=float)"
This is just wrong... This does the trick for me though:
import numpy as np
wtf = ['"442116.503118","442116.251106"',
'"442141.502863","442141.247462"']
to_list = []
for nest1 in wtf:
nest2 = nest1.split(',')
for each in nest2:
to_list.append(float(each.strip('"')))
to_array = np.asarray(to_list)
Not exactly elegant. You need to deal with each level of nesting in your input data. I'd recommend you reconsider the way you're formatting the data you're inputting.
I am trying to get an output such as this:
169.764569892, 572870.0, 19.6976
However I have a problem because the files that I am inputing have a format similar to the output I just showed, but some line in the data have 'nan' as a variable which I need to remove.
I am trying to use this to do so:
TData_Pre_Out = map(itemgetter(0, 7, 8), HDU_DATA)
TData_Pre_Filter = [Data for Data in TData_Pre_Out if Data != 'nan']
Here I am trying to use list comprehension to get the 'nan' to go away, but the output still displays it, any help on properly filtering this would be much appreciated.
EDIT: The improper output looks like this:
169.519361471, nan, nan
instead of what I showed above. Also, some more info:1) This is coming from a special data file, not a text file, so splitting lines wont work. 2) The input is exactly the same as the output, just mapped using the map() line that I show above and split into the indices I actually need (i.e. instead of using all of a data list like L = [(1,2,3),(3,4,5)] I only pull 1 and 3 from that list, to give you the gist of the data structure)
The Data is read in as so:
with pyfits.open(allfiles) as HDU:
HDU_DATA = HDU[1].data
The syntax is from a specialized program but you get the idea
TData_Pre_Out = map(itemgetter(0, 7, 8), HDU_DATA)
This statement gives you a list of tuples. And then you compare the tuple with a string. All the != comparisions success.
Without showing how you read in your data, the solution can only be guessed.
However, if HDU_DATA stores real NaN values, try following:
Comparing variable to NaNs does not work with the equality operator ==:
foo == nan
where nan and foo are both NaNs gives always false.
Use math.isnan() instead:
import math
...if math.isnan(Data)…
Based on my understanding of your description, this could work
with open('path/to/file') as infile:
for line in infile:
vals = line.strip().split(',')
print[v for v in vals if v!='nan']
I've just joined two arrays of unequal length together with the command:
allorders = map(None,todayorders, lastyearorders)
where "none" is given where today orders fails to have a value (as the todayorders array is not as long).
However, when I try to pass the allorders array into a matplotlib bar chart:
p10= plt.bar(ind, allorders[9], width, color='#0000DD', bottom=allorders[8])
..I get the following error:
TypeError: unsupported operand type(s) for +=: 'int' and 'NoneType'
So, is there a way for matplotlib to accept none datatypes? if not, how do I replace the 'Nones' with zeroes in my allorders array?
If you can, as I am a Python newbie (coming over from the R community), please provide detailed code from start to finish that I can use/test.
Use a list comprehension:
allorders = [i if i[0] is not None else (0, i[1]) for i in allorders]
With numpy:
import numpy as np
allorders = np.array(allorders)
This creates an arrray of objects due to the Nones. We can replace them with zeros:
allorders[allorders == None] = 0
Then convert the array to the proper type:
allorders.astype(int)
Since it sounds like you want this all to be in numpy, the direct answer to your question is really just an aside, and the right answer doesn't being until the "Of course…" paragraph.
If you think about it, you're using map with a None first parameter as a zip_longest, because Python doesn't have a zip_longest. But it does have one, in itertools—and it allows you to specify a custom fillvalue. So, you can do this all in one step with izip_longest:
>>> import itertools
>>> todayorders = [1, 2]
>>> lastyearorders = [1, 2, 3]
>>> allorders = itertools.izip_longest(todayorders, lastyearorders, fillvalue=0)
>>> list(allorders)
[(1, 1), (2, 2), (0, 3)]
This only fills in 0 for the Nones that show up as extra values for the shorter list; if you want to replace every None with a 0, you have to do it Martijn Pieters's way. But I think this is what you want.
Also, note that list(allorders) at the end: izip_longest, like most things in itertools, returns an iterator, not a list. Or, in terms you might be more familiar with, it returns a "lazy" sequence rather than a "strict" one. If you're just going to iterate over the result, that's actually better, but if you need to use it with some function that requires a list (like printing it out in human-readable form—or accessing allorders[9], as in your example), you need to explicitly convert it first.
If you actually want a numpy.array rather than a list, you can get there directly, without going through a list first. (If all you're ever going to do with it is matplotlib it, you probably do want an array.) The clearest way is to just use np.fromiter(allorders) instead of list(allorders). You might want to pass an explicit dtype=int (or whatever's appropriate). And, if you know the size (which you do—it's max(len(todayorders), len(lastyearorders))), in some cases it's faster or simpler to pass an explicit count as well.
Of course if any of the numpy stuff sounds appealing, you probably should stay within numpy in the first place, instead of using map or izip_longest:
>>> todayorders.resize(lastyearorders.shape)
>>> allorders = np.vstack(todayorders, lastyearorders).transpose()
Unfortunately, that mutates todayorders, and as far as I know, the equivalent immutable function numpy.resize doesn't give you any way to "zero-extend", but instead repeats the values. Hopefully I'm wrong and someone will suggest the easy way, but otherwise, you have to do it explicitly:
>>> extrazeros = np.zeros(len(lastyearorders) - len(todayorders), dtype=int)
>>> allorders = np.vstack(np.concatenate((todayorders, extrazeros)), lastyearorders)
>>> allorders = allorders.transpose()
array([[ 1, 1],
[ 2, 2],
[ 0, 3]])
Of course if you do a lot of that, I'd write a zeroextend function that takes a pair of arrays and extends one to match the other (or, if you're not just dealing with 1D, extends the shorter one on each axis to make the other).
At any rate, aside from being faster and using less temporary memory than using map, izip_longest, etc., this also means that you end up with a final array with the right dtype (int rather than object)—which means your result also uses less long-term memory, and everything you do from then on will also be faster and use less temporary memory.
For completeness: It is possible to have pyplot handle None values, but I don't think it's what you want. For example, you can pass it a Transform object whose transform method converts None to 0. But this will be effectively the same as Martijn Pieters's answer but much more verbose, and there's no advantage at all unless you need to plot tons of such arrays.