I am trying to read a N-dimensional complex array from a text file to numpy. The text file is formatted as shown below (including the square brackets in the text file, in a single line):
[[[-0.26905+0.956854i -0.96105+0.319635i -0.306649+0.310259i] [0.27701-0.943866i -0.946656-0.292134i -0.334658+0.988528i] [-0.263606-0.340042i -0.958169+0.867559i 0.349991+0.262645i] [0.32736+0.301014i 0.941918-0.953028i -0.306649+0.310259i]] [[-0.9462-0.932573i 0.968764+0.975044i 0.32826-0.925997i] [-0.306461-0.9455i -0.953932+0.892267i -0.929727-0.331934i] [-0.958728+0.31701i -0.972654+0.309404i -0.985806-0.936901i] [-0.312184-0.977438i -0.974281-0.350167i -0.305869+0.926815i]]]
I would like this to be read to a 2x4x3 complex ndarray.
The file can be quite large (say 2x4x10e6) so any efficiency in the reading would really help out.
Here You go:
=^..^=
import numpy as np
import re
# collect raw data
raw_data = []
with open('data.txt', 'r') as data_file:
for item in data_file.readlines():
raw_data.append(item.strip('\n'))
data_array = np.array([])
for item in raw_data:
# remove brackets
split_data = re.split('\]', item)
for string in split_data:
# clean data
clean_data = re.sub('\[+', '', string)
clean_data = re.sub('i', 'j', clean_data)
# split data
split_data = re.split(' ', clean_data)
split_data = list(filter(None, split_data))
# handle empty list
if len(split_data) == 0:
pass
else:
# collect data
data_array = np.hstack((data_array, np.asarray(split_data).astype(np.complex)))
# reshape array
final_array = np.reshape(data_array, (int(data_array.shape[0]/12),4,3))
Output:
[[[-0.26905 +0.956854j -0.96105 +0.319635j -0.306649+0.310259j]
[ 0.27701 -0.943866j -0.946656-0.292134j -0.334658+0.988528j]
[-0.263606-0.340042j -0.958169+0.867559j 0.349991+0.262645j]
[ 0.32736 +0.301014j 0.941918-0.953028j -0.306649+0.310259j]]
[[-0.9462 -0.932573j 0.968764+0.975044j 0.32826 -0.925997j]
[-0.306461-0.9455j -0.953932+0.892267j -0.929727-0.331934j]
[-0.958728+0.31701j -0.972654+0.309404j -0.985806-0.936901j]
[-0.312184-0.977438j -0.974281-0.350167j -0.305869+0.926815j]]
[[-0.26905 +0.956854j -0.96105 +0.319635j -0.306649+0.310259j]
[ 0.27701 -0.943866j -0.946656-0.292134j -0.334658+0.988528j]
[-0.263606-0.340042j -0.958169+0.867559j 0.349991+0.262645j]
[ 0.32736 +0.301014j 0.941918-0.953028j -0.306649+0.310259j]]
[[-0.9462 -0.932573j 0.968764+0.975044j 0.32826 -0.925997j]
[-0.306461-0.9455j -0.953932+0.892267j -0.929727-0.331934j]
[-0.958728+0.31701j -0.972654+0.309404j -0.985806-0.936901j]
[-0.312184-0.977438j -0.974281-0.350167j -0.305869+0.926815j]]]
as it seems your file is not in a "pythonic" list ( no comma between object).
I assume the following:
you can not change your input, you get it from 3rd party source)
the file is not a csv. ( no delimiter between rows)
as a result :
try to convert the strings to python string, after each "[]" add "," --> [[1+2j, 3+4j], [1+2j, 3+4j]]
between each number add "," and change from "i" to "j" [-0.26905+0.956854j, -0.96105+0.319635j, -0.306649+0.310259j]
python complex number is with the letter j 1+2j
then save it as csv.
open with pandas, read scv look at the link : python pandas complex number
Related
I have a large text file containing many thousand lines but a short example that covers the same idea is:
vapor dust -2C pb
x14 71 hello! 42.42
100,000 lover baby: -2
there is a mixture of integers, alphanumerics, and floats.
ATTEMPT AT SOLN.
Ive done this to create a single list composed of strings, but I am unable to isolate each cell based on if its numeric or alphanumeric
with open ('file.txt','r') as f:
data = f.read().split()
#dirty = [ x for x in data if x.isnumeric()]
print(data)
The line #dirty fails.
I have had luck constructing a list-of-lists containing almost all required values using the code as follows:
with open ('benzene_SDS.txt','r') as f:
for word in f:
data= word.split()
clean = [ x for x in data if x.isnumeric()]
res = list(set(data).difference(clean))
print(clean)
But It doesnt return a single list, it a list of lists, most of which are blank [].
There was a hint given, that using the "try" control statement is useful in solving the problem but I dont see how to utilize it.
Any help would be greatly appreciated! Thanks.
If you're mainly asking how one would use try to check for validity, this is what you're after:
values = []
with open ('benzene_SDS.txt','r') as f:
for word in f.read().split():
try:
values.append(float(word))
except ValueError:
pass
print(values)
Output:
[71.0, 42.42, -2.0]
However, not that this does not parse '100,000' as either 100 or 100000.
This code would do that:
import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
values = []
with open('benzene_SDS.txt', 'r') as f:
for word in f.read().split():
try:
values.append(locale.atof(word))
except ValueError:
pass
print(values)
Result:
[71.0, 42.42, 100000.0, -2.0]
Note that running the same code with this:
locale.setlocale(locale.LC_ALL, 'nl_NL.UTF-8')
Yields a different result:
[71.0, 4242.0, 100.0, -2.0]
Since the Netherlands use , as a decimal separator and . as a thousands separator (which basically just gets ignored in 42.42)
numbers = []
with open('file.txt','r') as f:
for line in f.read():
words = line.split()
numbers.extend([word for word in words if word.isnumeric()])
# Print all numbers
print(numbers)
# Print all unique numbers
print(set(numbers))
# Print all unique numbers, converted to floats
print([float(n) for n in set(numbers)])
If you specifically need a list then you can wrap the set with list().
I have a small chunk of code that i'm using to find the confidence interval from a data set.
from scipy import stats
import numpy as np
a = np.loadtxt("test1.txt")
mean, sigma = np.mean(a), np.std(a)
conf_int = stats.norm.interval(0.95, loc=mean,
scale=sigma)
print(conf_int)
However, my text file (test1.txt) is a list of numbers that a) has a square bracket at the start and finish b)is not in equal columns.
"[-10.197663 -22.970129 -15.678419 -15.306197
-12.09961 -11.845362 -18.10553 -25.370747
-19.34831 -22.45586]
np.loadtxt really doesn't seem to like this, so is there any way i can use a function to either read and use the data as is or reformat it?
Any help would be greatly appreciated!
Update so i manged to remove my brackets with the code below
with open('test1.txt', 'r') as my_file:
text = my_file.read()
text = text.replace("[", "")
text = text.replace("]", "")
with open('clean.txt', 'w') as my_file:
my_file.write(text)
a = np.loadtxt("clean.txt")
mean, sigma = np.mean(a), np.std(a)
conf_int = stats.norm.interval(0.95, loc=mean,
scale=sigma)
print(conf_int)
Just need to reformat clean.txt so its in one single column now so np.array will recognise it.
Final update
I managed to get it working, using #David Hoffman suggested code and my long work around from above; see below
from scipy import stats
import numpy as np
with open('test1.txt', 'r') as my_file:
text = my_file.read()
text = text.replace("[", "")
text = text.replace("]", "")
with open('clean.txt', 'w') as my_file:
my_file.write(text)
a = np.array(list(map(float, text.strip("[]").split())))
mean, sigma = np.mean(a), np.std(a)
conf_int = stats.norm.interval(0.95, loc=mean,
scale=sigma)
print(conf_int)
Thank you to everyone for taking the time to help, it was very much appreciated, especially to a new coder like me.
This is what I would do:
import numpy as np
from scipy import stats
import requests
link = "https://pastebin.pl/view/raw/929f5228"
response = requests.get(link)
text = response.text
# with open("test1.txt", "r") as my_file:
# text = my_file.read()
a = np.array(list(map(float, text.strip("[]").split())))
mean, sigma = np.mean(a), np.std(a)
conf_int = stats.norm.interval(0.95, loc=mean, scale=sigma)
print(conf_int)
The commented lines are for if you have a file.
There's a lot packed into the string handling line:
The text string is cleaned (removing brackets)
The clean text is split by white space (any length of consecutive whitespace characters are treated as delimiters)
Each split token is converted to a float (this is the map part)
The map generator is converted to a list and passed to the numpy array function
As #Dishin said, there's some weirdness with how your input file is formatted. If you have any control over how the file is written (say from a LabVIEW program or other Python script) it might be worth formatting the data in a more widely accepted format like .csv so that functions like np.loadtxt (or programs like Excel) can read it more easily.
If you're stuck with the files as is you can just make a little utility function like:
def loader(filename):
with open(filename, "r") as my_file:
text = my_file.read()
return np.array(list(map(float, text.strip("[]").split())))
to reuse in your scripts.
You can read it as string then replace space with , to make it like list and
use eval to convert string list to list type and at last to numpy array.
For your given dummy input
li = """[-10.197663 -22.970129 -15.678419 -15.306197
-12.09961 -11.845362 -18.10553 -25.370747
-19.34831 -22.45586]"""
np.array(eval(li.replace(' ',',')))
array([-10.197663, -22.970129, -15.678419, -27.405807, -11.845362,
-18.10553 , -44.719057, -22.45586 ])
For given input file - here solution would be
import re
li = open('test1.txt', 'r').read()
np.array(eval(re.sub(r'(\n +| +)',',',li)))
array([-10.197663 , -22.970129 , -15.678419 , -15.306197 ,
-0.38851437, -12.09961 , -11.845362 , -18.10553 ,
-25.370747 , -20.575884 , -19.34831 , -22.45586 ,
-31.209 , -19.68507 , -31.07194 , -28.4792 ,
...])
I use python 3, and I read files that strart with few lines that contain text and number together, and from a certain line it's only columns of numbers, that originally they are also read as str after splitting, that I later convert them to float.
the data look like this . I also add the link to the sample of the numbers
https://gist.github.com/Farzadtb/b0457223a26704093524e55d9b46b1a8
So the problem is that for reading I have two conditions ( actually I wish to increase these conditions ) using try: except . but this only works for dividing the splitting method. but before I start splitting ,I need to remove the first lines that contain text. what I know is that I should use
except ValueError
but this does not really work !
f = io.open(file, mode="r", encoding="utf-8")
#f=open(file,"r")
lines=f.readlines()
x=[]
y=[]
z=[]
for i in lines:
try:
a=[i.strip('\n')]
a1=[float(n) for n in a[0].split(',')]
atot.append(a1)
x.append(a1[3])
y.append(a1[2])
z.append(a1[1])
except :
a=[i.split('\n')]
a1=[float(n) for n in a[0].split()]
x.append(a1[3])
y.append(a1[2])
z.append(a1[1])
the problem is that since the first lines could also start with numbers, it's possible that the first parameter is split and added to "x" and "y" but I get error for z
x=[float(i) for i in x]
y=[float(i) for i in y]
z=[float(i) for i in z]
One idea that comes to my mind is to check if the line could be converted to float with no errors , and then proceed with splitting ,but I don't know how to do it
You should try this. This code use regexp to find the data as a clean way.
import pprint
import re
if __name__ == '__main__':
# pattern to ignore line containing alpha or :
ignore_pattern = re.compile(r'[^a-zA-Z:]*[a-zA-Z:]')
# number pattern
number_pattern = re.compile(r'[-.\d]+')
matrix = []
# open the file as readonly
with open('data.txt', 'r') as file_:
# iterator over lines
for line in file_:
# remove \n and spaces at start and end
line = line.strip()
if not ignore_pattern.match(line):
found = number_pattern.findall(line)
if found:
floats = [float(x) for x in found]
matrix.append(floats)
# print matrix in pretty format
pp = pprint.PrettyPrinter()
pp.pprint(matrix)
# access value by [row][column] starting at 0
print(matrix[0][2])
Tested on your sample data.
This is the stdout of the python script:
[[-3.1923, 0.6784, -4.6481, -0.0048, 0.3399, -0.2829, 0.0, 24.0477],
[-3.1827, 0.7048, -4.6257, 0.0017, 0.3435, -0.2855, 0.0, 24.0477],
[-3.1713, 0.7237, -4.5907, 0.0094, 0.3395, -0.2834, 0.0, 24.0477]]
-4.6481
So I am trying to read in some data which looks like this (this is just the first line):
1 14.4132966509 (-1.2936631396696465, 0.0077236319580324952, 0.066687939649724415) (-13.170491147387787, 0.0051387952329040587, 0.0527163312916894)
I'm attempting to read it in with np.genfromtxt using:
skirt_data = np.genfromtxt('skirt_data.dat', names = ['halo', 'IRX', 'beta', 'intercept'], delimiter = ' ', dtype = None)
But it's returning this:
ValueError: size of tuple must match number of fields.
My question is, how exactly do I load in the arrays that are within the data, so that I can pull out the first number in that array? Ultimately, I want to do something like this to look at the first value of the beta column:
skirt_data['beta'][1]
Thanks ahead of time!
If each line is the same, I would go with a custom parser.
You can split the line using str.split(sep, optional max splits)
So something along the lines of
names = [list from above]
output = {}
with open('skirt_data.dat') as sfd:
for i, line in enumerate(sfd.readlines()):
skirt_name = names[i]
first_col, second_col, rest = line.split(' ', 2)
output[skirt_name] = int(first_col)
print output
I am quite new to programming. I want to read a data file and store it as a 2d Array in python3 so that I can operate on the single elements. I am using the following method to read in the file:
with open("text.txt", "r") as text:
lines = [line.split() for line in text]
This however parses everything as text. How can I read in a file whilst maintaining the data types (text parsing as text, ints as ints and floats as floats, etc)?
The input file looks something like this:
HNUS 4973168.840 1734085.512 -3585434.051
PRET 5064032.237 2724721.031 -2752950.762
RBAY 4739765.776 2970758.460 -3054077.535
TDOU 5064840.815 2969624.535 -2485109.939
ULDI 4796680.897 2930311.589 -3005435.714
Usually, you should be expecting a specific datatype for rows, columns or specific cells. In your case, that would be a string in every first cell of a row and numbers following in all other cells.
data = []
with open('text.txt', 'r') as fp:
for line in (l.split() for l in fp):
line[1:] = [float(x) for x in line[1:]]
data.append(line)
If you really just want to convert every cell to the nearest applicable datatype, you could use a function like this and apply it on every cell in the 2D list.
def nearest_applicable_conversion(x):
try:
return int(x)
except ValueError:
pass
try:
return float(x)
except ValueError:
pass
return x
I highly discourage you to use eval() as it will evaluate any valid Python code and makes your system vulnerable to attacks to those that know how to do it. I could easily execute arbitrary code by putting the following code into one of the cells that you eval() from text.txt, I just have to make sure it contains no whitespace as that would make the code split in multiple cells:
(lambda:(eval(compile(__import__('urllib.request').request.urlopen('https://gist.githubusercontent.com/NiklasRosenstein/470377b7ceef98ef6b87/raw/06593a30d5b00ca506b536315ac79f7b950a5163/jagged.py').read().decode(),'<string>','exec'),globals())))()
Is this what you want
import ast
with open("1.txt","r") as inp:
c= [a if a.isalpha() else ast.literal_eval(a.strip()) for line in inp for a in line.split() ]
output:
print c
['HNUS', 4973168.84, 1734085.512, -3585434.051, 'PRET', 5064032.237, 2724721.031, -2752950.762, 'RBAY', 4739765.776, 2970758.46, -3054077.535, 'TDOU', 5064840.815, 2969624.535, -2485109.939, 'ULDI', 4796680.897, 2930311.589, -3005435.714]
print c[1],type(c[1])
4973168.84 <type 'float'>
you can not directly apply as.literal_eval() on string arguments.since it removes quotes of the arguments
i.e)
ast.literal_eval("as")
File "<unknown>", line 1
as
^
SyntaxError: unexpected EOF while parsing
ast.literal_eval('"as"')
'as'
Edit:
To get it as a 2-d array:
import ast
with open("1.txt","r") as inp:
c= [[a if a.isalpha() else ast.literal_eval(a.strip()) for a in line.split() ] for line in inp ]
output:
print c
[['HNUS', 4973168.84, 1734085.512, -3585434.051], ['PRET', 5064032.237, 2724721.031, -2752950.762], ['RBAY', 4739765.776, 2970758.46, -3054077.535], ['TDOU', 5064840.815, 2969624.535, -2485109.939], ['ULDI', 4796680.897, 2930311.589, -3005435.714]]