I have pred_data.txt as
19.08541,17.41787,16.59118,16.03507,15.68560
20.01880,18.21,19.48975,19.32,19.29945
17.32453,17.434,15.4253,12.422,11.4311
f=open('pred_data.txt','r')
for value in f:
exam=np.array(value)
pred=clf.predict(exam)
print(pred)
When I run this, I got
ValueError: could not convert string to float:'19.08541,17.41787,16.59118,16.03507,15.68560\n'
But when I try like this:example=np.array([19.08541,17.41787,16.59118,16.03507,15.68560])
pred=clf.predict(example)
I got the predicted output. How to access the data from the file to get output?
You should use the function "fromstring" from numpy.
I think in your case it should be something like:
f = open("pred_data.txt", 'r').read()
preds = np.fromstring(f, sep=",")
print(preds)
It's might not be the best way, but it's work.
See:
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.fromstring.html
I did not test this, but wouln't it help to split the line/value into an array beforehand?
I mean
for value in f:
exam=np.array(value.split(','))
...
This way it would be easier to convert a list if strings into a list of floats instead of converting a full line of floats as strings.
When you read a line from the file, it comes out as a str. So in your example this:
for value in f:
np.array(value)
Is the same as this:
np.array('19.08541,17.41787,16.59118,16.03507,15.68560\n')
You need to get rid of the \n with strip and break this into actual units using split:
values_strs = value.strip().split(',')
But that will leave you will a list of strs. It's better to cast those as well using float:
# This is a comprehension. It's a bit clearer and more obvious than
# calling `map(float, value.strip().split(','))`, but they boil down
# to a similar idea.
values_flt = [float(v) for v in value.strip().split(',')]
Altogether, you could just simplify to:
exam = np.array(float(v) for v in value.strip().split(','))
Use Numpy's loadtxt function.
import numpy as np
np_array = np.loadtxt('pre_data.txt', delimiter=',')
Related
I have some parameters and functions that I need to write to a file, but the functions and x-values have different lengths, i.e. domains and codomains, when compared to each other.
My current formatting, assuming e.g. two parameters A and B and two functions f1 and f2 is [A,B,x,f1,f2] where then x, f1, f2 are itself again lists or numpy arrays.
So my imagined data.txt could look like this:
[0, 0, [0,1,2,3], [1,2,3,4], [4,5,6,7]]
[0, 1, [0,1,2,3,4,5,6], [1,2,3,4,5,6,7], [4,5,6,7,8,9,10]]
[1, 10, [2,3,4,5,6], [1,2,3,4,5], [4,5,6,7,8]]
Then I could read in line by line, getting the parameters A and B and plot the functions f1 and f2, given the right x-values.
To write to a file I use the following code, which almost gives me what I described above.
OutSummary=[A,B,x,f1,f2]
Out=str(OutSummary).replace('\n', '')
f=open(filename,'a')
np.savetxt(f, [Out],fmt='%s')
f.close()
Currently, this produces entries like [0, 1, array([ 8. , 8.29229229, 8.58458458, ....
The issue is now that reading in does not work, due to the different lengths of the x-values and function arrays, i.e.
PrevEntris = np.genfromtxt(filename,dtype=str)
(with dtype=str, or dtype=None, or no dtype given) gives me e.g.
ValueError: Some errors were detected ! Line #7 (got 555 columns instead of 1105),
i.e. the x-values contained in the 7th line only had 555 entries, while the previous 6 had 1105.
I see that this is not good code, and I am saving arrays as strings, but I did not manage to find a better way. I'd be really interested to know if there is some advanced numpy way of handling this, or maybe even using a SQL database kind of thing rather than one .txt file? I spent the last few hours trying to make it work using json, but was not successful, yet (1st time user).
Thanks
You can use the builtin json module, since Python lists and JSON have the same syntax. Here is some example code if you want to store each of the lists in a bigger list:
import json
f=open("data.txt")
lines=f.read().splitlines()
data=[json.loads(line) for line in lines]
f.close() #remember to close your files
Edit: I realized I should have used list comprehensions instead so I changed my response. Still works the same way, but is neater
The answer form Icestrike411 works very well for me, especially for my requested formatting style. Additionally, I solved it another way in the meantime also using json, when slightly altering the format.
One data block could look like
new_data = {
"A": 1,
"B": 0.05,
"X": [0,1,2,3,4]
}
and then I append it to the outfile with the following function, running append('some.txt', new_data):
def append(filename, new_entry):
try:
with open(filename, "r") as filea:
content = json.load(filea)
except:
content=[]
#print("Likely empty file.")
content.append(new_entry)
out=json.dumps(content).replace("},", "},\n") #new line for each entry in file
with open(filename, "w") as fileb:
fileb.write(out)
and then reading it with
with open(filename, "r") as file:
contentr = json.load(file)
New to python, trying to convert json file to csv and wrote below code but keep getting "TypeError: string indices must be integers" error. Please suggest.
import json
import csv
#x= '''open("Test_JIRA.json","r")'''
#x = json.load(x)
with open('Test_JIRA.json') as jsonfile:
x = json.load(jsonfile)
f = csv.writer(open("test.csv", "w"))
# Write CSV Header, If you dont need that, remove this line
f.writerow(["id", "self", "key", "customfield_12608", "customfield_12607"])
for x in x:
f.writerow([x["id"],
x["self"],
x["key"],
x["fields"]["customfield_12608"],
x["fields"]["customfield_12607"]
])
Here is sample 1 row input json file data:
{"expand":"schema,names","startAt":0,"maxResults":50,"total":100,"issues":[{"expand":"operations,versionedRepresentations,editmeta,changelog,renderedFields","id":"883568","self":"https://jira.xyz.com/rest/api/2/issue/223568","key":"AI-243","fields":{"customfield_22608":null,"customfield_12637":"2017-10-12T21:46:00.000-0700"}}]}
As far as I see problem is here
for x in x:
Note, that x in your code is a dict, not list. I think (based on provided json example) you need something like
for x in x['issues']:
Also, #Reti43 note in comment, that keys of dicts in x['issues'] vary between elements. To make your code more safe you could use get
for x in x['issues']:
f.writerow([x.get("id"),
x.get("self"),
x.get("key"),
x.get("fields", {}).get("customfield_12608"),
x.get("fields", {}).get("customfield_12607")
])
I have a file .txt like this:
8.3713312149,0.806817531586,0.979428482338,0.20179159543
5.00263547897,2.33208847046,0.55745770379,0.830205341157
0.0087910592556,4.98708152771,0.56425779093,0.825598658777
and I want data to be saved in a 2d array such as
array = [[8.3713312149,0.806817531586,0.979428482338,0.20179159543],[5.00263547897,2.33208847046,0.55745770379,0.830205341157],[0.0087910592556,4.98708152771,0.56425779093,0.825598658777]
I tried with this code
#!/usr/bin/env python
checkpoints_from_file[][]
def read_checkpoints():
global checkpoints_from_file
with open("checkpoints.txt", "r") as f:
lines = f.readlines()
for line in lines:
checkpoints_from_file.append(line.split(","))
print checkpoints_from_file
if __name__ == '__main__':
read_checkpoints()
but it does not work.
Can you guys tell me how to fix this? thank you
You have two errors in your code. The first is that checkpoints_from_file[][] is not a valid way to initialize a multidimensional array in Python. Instead, you should write
checkpoints_from_file = []
This initializes a one-dimensional array, and you then append arrays to it in your loop, which creates a 2D array with your data.
You are also storing the entries in your array as strings, but you likely want them as floats. You can use the float function as well as a list comprehension to accomplish this.
checkpoints_from_file.append([float(x) for x in line.split(",")])
Reading from your file,
def read_checkpoints():
checkpoints_from_file = []
with open("checkpoints.txt", "r") as f:
lines = f.readlines()
for line in lines:
checkpoints_from_file.append(line.split(","))
print(checkpoints_from_file)
if __name__ == '__main__':
read_checkpoints()
Or assuming you can read this data successfully, using a string literal,
lines = """8.3713312149,0.806817531586,0.979428482338,0.20179159543
5.00263547897,2.33208847046,0.55745770379,0.830205341157
0.0087910592556,4.98708152771,0.56425779093,0.825598658777"""
and a list comprehension,
list_ = [[decimal for decimal in line.split(",")] for line in lines.split("\n")]
Expanded,
checkpoints_from_file = []
for line in lines.split("\n"):
list_of_decimals = []
for decimal in line.split(","):
list_of_decimals.append(decimal)
checkpoints_from_file.append(list_of_decimals)
print(checkpoints_from_file)
Your errors:
Unlike in some languages, in Python you don't initialize a list like, checkpoints_from_file[][], instead, you can initialize a one-dimensional list checkpoint_from_file = []. Then, you can insert more lists inside of it with Python's list.append().
I'm new to Python, but I'm trying to learn. I'm trying to recreate a Matlab for loop in Python. The Matlab for loop looks like this:
for i = 2:(L-1)
Acceleration_RMT5_x(i-1) = (RMT5(i+1,1)-2*RMT5(i,1)+RMT5(i
1,1))/(1/FrameRate)^2;
end
The datatype is float64, and is a 288x1 vector. My Python so far is:
for i in RMT5x:
Acceleration_RMT5x = RMT5x[i+1] -2*RMT5x[i] +RMT5x[i-1]/(1/250)^2)
This gives me "invalid syntax".
What do I need to fix to resolve this error?
To raise something to a power in Python you need ** not ^
Also you are looping through the values of RMT5x but you are trying to use the value (i) as an index. Instead you want to loop through the index.
Acceleration_RMT5x = list()
for i in range(1, len(RMT5x)-1):
Acceleration_RMT5x.append(RMT5x[i+1] -2*RMT5x[i] +RMT5x[i-1]/(1./250)**2)
I would use a list comprehension:
import numpy as np
Acceleration_RMT5x = [np.power( (RMT5(i+1,1)-2*RMT5(i,1)+RMT5(i-1,1))/(1/FrameRate), 2)]
I have a list of negative floats. I want to make a histogram with them. As far as I know, Python can't do operations with negative numbers. Is this correct? The list is like [-0.2923998, -1.2394875, -0.23086493, etc.]. I'm trying to find the maximum and minimum number so I can find out what the range is. My code is giving an error:
setrange = float(maxv) - float(minv)
TypeError: float() argument must be a string or a number
And this is the code:
f = open('clusters_scores.out','r')
#first, extract all of the sim values
val = []
for line in f:
lineval = line.split()
print lineval
val.append(lineval)
print val
#val = map(float,val)
maxv = max(val)
minv = min(val)
setrange = float(maxv) - float(minv)
All the values that are being put into the 'val' list are negative decimals. What is the error referring to, and how do I fix it?
The input file looks like:
-0.0783532095182 -0.99415440702 -0.692972552716 -0.639273674023 -0.733029194040.765257900121 -0.755438339963
-0.144140594077 -1.06533353638 -0.366278118372 -0.746931508538 -1.02549039392 -0.296715961215
-0.0915937502791 -1.68680560936 -0.955147543358
-0.0488457137771 -0.0943080192383 -0.747534412969 -1.00491121699
-1.43973471463
-0.0642611118901 -0.0910684525497
-1.19327387414 -0.0794696449245
-1.00791366035 -0.0509749096549
-1.08046507281 -0.957339914505 -0.861495748259
The results of split() are a list of split values, which is probably why you are getting that error.
For example, if you do '-0.2'.split(), you get back a list with a single value ['-0.2'].
EDIT: Aha! With your input file provided, it looks like this is the problem: -0.733029194040.765257900121. I think you mean to make that two separate floats?
Assuming a corrected file like this:
-0.0783532095182 -0.99415440702 -0.692972552716 -0.639273674023 -0.733029194040 -0.765257900121 -0.755438339963
-0.144140594077 -1.06533353638 -0.366278118372 -0.746931508538 -1.02549039392 -0.296715961215
-0.0915937502791 -1.68680560936 -0.955147543358
-0.0488457137771 -0.0943080192383 -0.747534412969 -1.00491121699
-1.43973471463
-0.0642611118901 -0.0910684525497
-1.19327387414 -0.0794696449245
-1.00791366035 -0.0509749096549
-1.08046507281 -0.957339914505 -0.861495748259
The following code will no longer throw that exception:
f = open('clusters_scores.out','r')
#first, extract all of the sim values
val = []
for line in f:
linevals = line.split()
print linevals
val += linevals
print val
val = map(float, val)
maxv = max(val)
minv = min(val)
setrange = float(maxv) - float(minv)
I have changed it to take the list result from split() and concatenate it to the list, rather than append it, which will work provided there are valid inputs in your file.
All the values that are being put into the 'val' list are negative decimals.
No, they aren't; they're lists of strings that represent negative decimals, since the .split() call produces a list. maxv and minv are lists of strings, which can't be fed to float().
What is the error referring to, and how do I fix it?
It's referring to the fact that the contents of val aren't what you think they are. The first step in debugging is to verify your assumptions. If you try this code out at the REPL, then you could inspect the contents of maxv and minv and notice that you have lists of strings rather than the expected strings.
I assume you want to put all the lists of strings (from each line of the file) together into a single list of strings. Use val.extend(lineval) rather than val.append(lineval).
That said, you'll still want to map the strings into floats before calling max or min because otherwise you will be comparing the strings as strings rather than floats. (It might well work, but explicit is better than implicit.)
Simpler yet, just read the entire file at once and split it; .split() without arguments splits on whitespace, and a newline is whitespace. You can also do the mapping at the same point as the reading, with careful application of a list comprehension. I would write:
with open('clusters_scores.out') as f:
val = [float(x) for x in f.read().split()]
result = max(val) - min(val)