Missing first entry when writing data to csv using numpy.savetxt() - python

I'm trying to write a numpy array to a .csv using numpy.savetxt using a comma delimiter, however it's missing the very first entry (row 1 column 1), and I have no idea why.
I'm fairly new to programming in Python, and this might be simply a problem with the way I'm calling numpy.savetxt or maybe the way I'm defining my array. Anyway here's my code:
import numpy as np
import csv
# preparing csv file
csvfile = open("np_csv_test.csv", "w")
columns = "ymin, ymax, xmin, xmax\n"
csvfile.write(columns)
measurements = np.array([[0.9, 0.3, 0.2, 0.4],
[0.8, 0.5, 0.2, 0.3],
[0.6, 0.7, 0.1, 0.5]])
np.savetxt("np_csv_test.csv", measurements, delimiter = ",")
I expected four columns with 3 rows under the headers ymin, ymax, xmin, and xmax, and I did, but I'm missing 0.9. As in, row 2 column 1 of my .csv is empty, and in Notepad I'm getting:
ymin, ymax, xmin, xmax
,2.999999999999999889e-01,2.000000000000000111e-01,4.000000000000000222e-01
8.000000000000000444e-01,5.000000000000000000e-01,2.000000000000000111e-01,2.999999999999999889e-01
5.999999999999999778e-01,6.999999999999999556e-01,1.000000000000000056e-01,5.000000000000000000e-01
What am I doing wrong?

When you call np.savetxt with a path to the output file, it will try to overwrite any existing file, which is not what you want. Here's how you can write your desired file with column headers:
import numpy as np
# preparing csv file
columns = "ymin, ymax, xmin, xmax"
measurements = np.array([[0.9, 0.3, 0.2, 0.4],
[0.8, 0.5, 0.2, 0.3],
[0.6, 0.7, 0.1, 0.5]])
np.savetxt("np_csv_test.csv", measurements, delimiter = ",", header=columns)
As pointed out by Andy in the comments, you can get np.savetxt to append to an existing file by passing in a file handle instead of a file name. So another valid way to get the file you want would be:
import numpy as np
import csv
# preparing csv file
csvfile = open("np_csv_test.csv", "w")
columns = "ymin, ymax, xmin, xmax\n"
csvfile.write(columns)
measurements = np.array([[0.9, 0.3, 0.2, 0.4],
[0.8, 0.5, 0.2, 0.3],
[0.6, 0.7, 0.1, 0.5]])
np.savetxt(csvfile, measurements, delimiter = ",")
# have to close the file yourself in this case
csvfile.close()

Related

Is there a python package that allows you to model a time series of compositional data?

I have a 2 time series that look like this:
import pandas as pd
series_1 = pd.DataFrame({'time': [0,1,2,3,4], 'value_1': [0.3, 0.5, 0.4, 0.8, 0.7]})
series_2 = pd.DataFrame({'time': [0,1,2,3,4], 'value_2': [0.7, 0.5, 0.6, 0.2, 0.3]})
As you can notice, at each point in time the sum of value_ is equal to 1.
From what I read this type of time series is called "compositional".
My question is, is there a python package that can help me model this type of time series ?
I have tried using prophet to model each series_ separately, and later scale the forecasting values so that they sum to 1, but I am not sure if this approach is appropriate for this type of time series data, any thoughts on that ?

Python: how to compare data from 2 lists in a loop to pick the correct range to plot

I am trying to write a code that would get rid of speed data above water level. So far I have 9 bins (each 25 cm) and speed is measured for each of them but I need to compare the measured water level that I have with the bin height to make sure it is not using the above water level data.
so far I have made a list of the bins:
#The sample dataframe looks like this :
df=pd.DataFrame([[1.5, 0.2, 0.3, 0.33], [1.3, 0.25, 0.31, 0.35], [1.4, 0.21, 0.32, 0.36]], columns=['pressure', 'bin1', 'bin2', 'bin3'])
df2= pd.DataFrame ([1.25, 1.35, 1.55], columns=['bin heights'])
#to make things easier I defined separate lists
y1 = df['pressure'][:] #shows water level
s1 = df['bin1'][:] #shows speed for bin 1
s2= df['bin2'][:] #shows speed for bin 2
s3= df['bin3'][:] #shows speed for bin 3
#cleaning up data above water level; gives me the right index
diff1=np.subtract(y1, df2['bin heights'][0])
p1=diff1[(diff1<= 0.05) & (0<diff1)].index
diff2=np.subtract(y1, df2['bin heights'][1])
p2=diff2[(diff2<= 0.05) & (0<diff2)].index
diff3=np.subtract(y1, df2['bin heights'][2])
p3=diff3[(diff3<= 0.05) & (0<diff3)].index
I created the data frame below and it seems to work:
index = p1.append([p2,p3])
values=[df['bin1'][p1], df['bin2'][p2], df['bin3'][p3]]
df0 = pd.DataFrame(values)
df0=df0.sort_index()
df02 = df0.T
Now there is only one value for each row and the rest are NaN,
how do I plot row by row and get that value without having to specify the column?
Found it (I defined a new column with all non NaN values):
cols = [df02.columns.str.startswith('speed')]
df02['speed'] = df02.filter(like='speed').max(1)
print(df02)

How to distribute values in Python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
so I am trying to calculate the likeliness of floats on CSGO skins.
A float is a value between 0 and 1 and they are distinguished in five sections.
Factory New (0 to 0.07) 3%, Minimal Wear (0.07 to 0.14) 24%, Field-Tested (0.14 to 0.38) 33%, Well-Worn (0.38 to 0.45) 24% and Battle-Scarred (0.45 to 1.0) 16%.
As you can see the distribution among the float values is not even, but weighed. However in each section the values are then spread evenly, for example:
https://blog.csgofloat.com/content/images/2020/07/image-6.png
It then gets tricky when you introduce float caps, meaning the float is no longer between 0 and 1, but fo example between 0.14 and 0.65.
The value is calculated how follows:
A section is selected according to their weights.
A float in the range of that section is randomly generated.
The final float is calculated according to this formula:
final_float = float * (max_float - min_float) + min_float
float being the randomly generated value, max and min_float the upper and lower cap (in this case 0.14 and 0.65).
I now want to calculate the distribution of skins with a cap among the five sections.
How would I do this?
Thank you in advance.
It's simple using numpy library:
import numpy as np
# input data
n_types = 5
types_weights = np.array([0.03, 0.24, 0.33, 0.24, 0.16])
types_intervals = np.array([0.0, 0.07, 0.14, 0.38, 0.45, 1.0])
# simulate distribution, by generating `n_samples` random floats
n_samples = 1000000
type_samples = np.random.choice(range(n_types), p=types_weights, size=n_samples, replace=True, )
float_ranges_begin = types_intervals[type_samples]
float_ranges_end = types_intervals[type_samples + 1]
float_samples = float_ranges_begin + np.random.rand(n_samples) * (float_ranges_end - float_ranges_begin)
# plot results
import matplotlib.pyplot as plt
plt.figure(figsize=(10,8))
plt.hist(float_samples, bins=100, density=True, rwidth=0.8)
# to see types names instead
# plt.xticks(types_intervals, types + ['1.0'], rotation='vertical', fontsize=16)
plt.xlabel('Float', fontsize=16)
plt.ylabel('Probability density', fontsize=16);
EDIT
If you wanted to find the exact distribution, then it's easy aswell, though your "scalable" requirement is not fully clear to me
n_types = 5
types = ['Factory New', 'Minimal Wear', 'Field-Tested', 'Well-Worn', 'Battle-Scarred']
types_weights = np.array([0.03, 0.24, 0.33, 0.24, 0.16])
types_intervals = np.array([-0.0001, 0.07, 0.14, 0.38, 0.45, 1.0])
# corerspond to top values on my plot, approximately [0.4 3.4 1.37 3.4 0.3]
types_probability_density = types_weights / (types_intervals[1:] - types_intervals[:-1])
def float_probability_density(x):
types = np.searchsorted(types_intervals, x) - 1
return types_probability_density[types]
sample_floats = np.linspace(0.0, 1.0, 100)
plt.figure(figsize=(16,8))
plt.bar(sample_floats, float_probability_density(sample_floats), width=0.005)

plotting data from a list in python

I need to plot the velocities of some objects(cars).
Each velocity are being calculated through a routine and written in a file, roughly through this ( I have deleted some lines to simplify):
thefile_v= open('vels.txt','w')
for car in cars:
velocities.append(new_velocity)
if len(car.velocities) > 4:
try:
thefile_v.write("%s\n" %car.velocities) #write vels once we get 5 values
thefile_v.close
except:
print "Unexpected error:", sys.exc_info()[0]
raise
The result of this is a text file with list of velocities for each car.
something like this:
[0.0, 3.8, 4.5, 4.3, 2.1, 2.2, 0.0]
[0.0, 2.8, 4.0, 4.2, 2.2, 2.1, 0.0]
[0.0, 1.8, 4.2, 4.1, 2.3, 2.2, 0.0]
[0.0, 3.8, 4.4, 4.2, 2.4, 2.4, 0.0]
Then I wanted to plot each velocity
with open('vels.txt') as f:
lst = [line.rstrip() for line in f]
plt.plot(lst[1]) #lets plot the second line
plt.show()
This is what I found. The values are taken as a string and put them as yLabel.
I got it working through this:
from numpy import array
y = np.fromstring( str(lst[1])[1:-1], dtype=np.float, sep=',' )
plt.plot(y)
plt.show()
What I learnt is that, the set of velocity lists I built previously were treated as lines of data.
I had to convert them to arrays to be able to plot them. However the brackets [] were getting into the way. By converting the line of data to string and removing the brackets through this (i.e. [1:-1]).
It is working now, but I'm sure there is a better way of doing this.
Any comments?
Just say you had the array [0.0, 3.8, 4.5, 4.3, 2.1, 2.2, 0.0], to graph this the code would look something like:
import matplotlib.pyplot as plt
ys = [0.0, 3.8, 4.5, 4.3, 2.1, 2.2, 0.0]
xs = [x for x in range(len(ys))]
plt.plot(xs, ys)
plt.show()
# Make sure to close the plt object once done
plt.close()
if you wanted to have different intervals for the x axis then:
interval_size = 2.4 #example interval size
xs = [x * interval_size for x in range(len(ys))]
Also when reading your values from the text file make sure that you have converted your values from strings back to integers. This maybe why your code is assuming your input is the y label.
The example is not complete, so some assumptions must be made here. In general, use numpy or pandas to store your data.
Suppose car is an object, with a velocity attribute, you can write all velocities in a list, save this list as text file with numpy, read it again with numpy and plot it.
import numpy as np
import matplotlib.pyplot as plt
class Car():
def __init__(self):
self.velocity = np.random.rand(5)
cars = [Car() for _ in range(5)]
velocities = [car.velocity for car in cars]
np.savetxt("vels.txt", np.array(velocities))
####
vels = np.loadtxt("vels.txt")
plt.plot(vels.T)
## or plot only the first velocity
#plt.plot(vels[0]
plt.show()
Just one possible easy solution. Use the map function. Say in your file, you have the data stored like, without any [ and ] non-convertible letters.
#file_name: test_example.txt
0.0, 3.8, 4.5, 4.3, 2.1, 2.2, 0.0
0.0, 2.8, 4.0, 4.2, 2.2, 2.1, 0.0
0.0, 1.8, 4.2, 4.1, 2.3, 2.2, 0.0
0.0, 3.8, 4.4, 4.2, 2.4, 2.4, 0.0
Then the next step is;
import matplotlib.pyplot as plt
path = r'VAR_DIRECTORY/test_example.txt' #the full path of the file
with open(path,'rt') as f:
ltmp = [list(map(float,line.split(','))) for line in f]
plt.plot(ltmp[1],'r-')
plt.show()
In top, I just assume you want to plot the second line, 0.0, 2.8, 4.0, 4.2, 2.2, 2.1, 0.0. Then here is the result.

Aligning two data sets in Python

I want to develop some python code to align datasets obtained by different instruments recording the same event.
As an example, say I have two sets of measurements:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Define some data
data1 = pd.DataFrame({'TIME':[1.1, 2.4, 3.2, 4.1, 5.3],\
'VALUE':[10.3, 10.5, 11.0, 10.9, 10.7],\
'ERROR':[0.2, 0.1, 0.4, 0.3, 0.2]})
data2 = pd.DataFrame({'TIME':[0.9, 2.1, 2.9, 4.2],\
'VALUE':[18.4, 18.7, 18.9, 18.8],\
'ERROR':[0.3, 0.2, 0.5, 0.4]})
# Plot the data
plt.errorbar(data1.TIME, data1.VALUE, yerr=data1.ERROR, fmt='ro')
plt.errorbar(data2.TIME, data2.VALUE, yerr=data2.ERROR, fmt='bo')
plt.show()
The result is plotted here:
What I would like to do now is to align the second dataset (data2) to the first one (data1). i.e. to get this:
The second dataset must be shifted to match the first one by subtracting a constant (to be determined) from all its values. All I know is that the datasets are correlated since the two instruments are measuring the same event but with different sampling rates.
At this stage I do not want to make any assumptions about what function best describes the data (fitting will be done after alignment).
I am cautious about using means to perform shifts since it may produce bad results, depending on how the data is sampled. I was considering taking each data2[TIME_i] and working out the shortest distance to data1[~TIME_i]. Then minimizing the sum of those. But I am not sure that would work well either.
Does anyone have any suggestions on a good method to use? I looked at mlpy but it seems to only work on 1D arrays.
Thanks.
You can substract the mean of the difference: data2.VALUE-(data2.VALUE - data1.VALUE).mean()
import pandas as pd
import matplotlib.pyplot as plt
# Define some data
data1 = pd.DataFrame({
'TIME': [1.1, 2.4, 3.2, 4.1, 5.3],
'VALUE': [10.3, 10.5, 11.0, 10.9, 10.7],
'ERROR': [0.2, 0.1, 0.4, 0.3, 0.2],
})
data2 = pd.DataFrame({
'TIME': [0.9, 2.1, 2.9, 4.2],
'VALUE': [18.4, 18.7, 18.9, 18.8],
'ERROR': [0.3, 0.2, 0.5, 0.4],
})
# Plot the data
plt.errorbar(data1.TIME, data1.VALUE, yerr=data1.ERROR, fmt='ro')
plt.errorbar(data2.TIME, data2.VALUE-(data2.VALUE - data1.VALUE).mean(),
yerr=data2.ERROR, fmt='bo')
plt.show()
Another possibility is to subtract the mean of each series
You can calculate the offset of the average and subtract that from every value. If you do this for every value they should align relatively well. This would assume both dataset look relatively similar, so it might not work the best.
Although this question is not Matlab related, you might still be interested in this:
Remove unknown DC Offset from a non-periodic discrete time signal

Categories

Resources