Problem when using a numpy array of string (format) elements, python - python

I want to use an array of string format elements to take a shortcut and automate formatting steps ahead. Short story: it resulted with an error (unmatched '{' in format spec). I tried to check what happened by writing and running the corresponding part of the code, step by step:
1:
import numpy as np
from sty import fg, rs
r=np.array([["{:>{w}.{p}f}"]*6]*3)
print(r[0,0])
output: {:>{w}.{p}f}
2:
sc=fg.red + r[0,0] + fg.rs
sc
output: '\x1b[31m{:>{w}.{p}f}\x1b[39m'
3:
r[0,0]=sc
r[0,0]
output:'\x1b[31m{:>{w}.'
As you can see the part after the dot ({p}f}\x1b[39m) is missing. I would like to learn if there is any solution to this.

Related

List of arrays into .txt file without brackets and well spaced

I'm trying to save .txt file of a list of arrays as follow :
list_array =
[array([-20.10400009, -9.94099998, -27.10300064]),
array([-20.42099953, -9.91499996, -27.07099915]),
...
This is the line I invoked.
np.savetxt('path/file.txt', list_array, fmt='%s')
This is what I get
[-20.10400009 -9.94099998 -27.10300064]
[-20.42099953 -9.91499996 -27.07099915]
...
This is what I want
-20.10400009 -9.94099998 -27.10300064
-20.42099953 -9.91499996 -27.07099915
...
EDIT :
It is translated from Matlab as followed where I .append to transform
Cell([array([[[-20.10400009, -9.94099998, -27.10300064]]]),
array([[[-20.42099953, -9.91499996, -27.07099915]]]),
array([[[-20.11199951, -9.88199997, -27.16399956]]]),
array([[[-19.99500084, -10.0539999 , -27.13899994]]]),
array([[[-20.4109993 , -9.87100029, -27.12800026]]])],
dtype=object)
I cannot really see what is wrong with your code, except for the missing imports. With array, do you mean numpy.array, or are you importing like from numpy import array (which you should refrain from doing)?
Running this example gives exactly what you want.
import numpy as np
list_array = [np.array([-20.10400009, -9.94099998, -27.10300064]),
np.array([-20.42099953, -9.91499996, -27.07099915])]
np.savetxt('test.txt', list_array, fmt='%s')
> cat test.txt
-20.10400009 -9.94099998 -27.10300064
-20.42099953 -9.91499996 -27.07099915

Data transfer problems to an array and slowness when access data compared to matlab

I'm trying to port a code from matlab to python, my major problem is reading the file and tranposing the data to arrays.
In matlab:
[filename,pathname,~] = uigetfile('*.out');
data{1} = importdata(fullfile(pathname,filename), '\t', 8);
unit = dados{1}.colheaders;
title = strsplit(char(dados{1}.textdata(7,1)));
In python:
import tkinter.filedialog
import numpy as np
def openfile():
file_path = tkinter.filedialog.askopenfile(mode='r', filetypes=[('','.out')])
data=np.loadtxt(file_path,delimiter='\t',skiprows=8)
nrows, ncols = np.shape(data)
return data, nrows, ncols
data, nrows, ncols = openfile()
print(data[0:5][0])
But when i try to access the first column (time vector) and then print this vector, i got the print of a line. Even if i invert the indices from [0:5][0] to [0][0:5] i got a similar result.
Another problem, is that accessing files takes much longer than in matlab.
Below a sample of data which i'm trying to access in python.
#
Predictions were generated on 07-Jun-2021 at 07:36:56 using OpenFAST, compiled as a 64-bit application using double precision at commit v2.5.0
linked with NWTC Subroutine Library; ElastoDyn; InflowWind; AeroDyn; ServoDyn; HydroDyn; MoorDyn (v1.01.02F, 8-Apr-2016)
Description from the FAST input file: IEA 15 MW offshore reference model on UMaine VolturnUS-S semi-submersible floating platform
Time NcIMUTVxs NcIMUTVys NcIMUTVzs NcIMUTAxs NcIMUTAys NcIMUTAzs NcIMURVxs NcIMURVys NcIMURVzs NcIMURAxs NcIMURAys NcIMURAzs
(s) (m/s) (m/s) (m/s) (m/s^2) (m/s^2) (m/s^2) (deg/s) (deg/s) (deg/s) (deg/s^2) (deg/s^2) (deg/s^2)
0.0000 0.000E+00 0.000E+00 0.000E+00 -7.319E-01 -3.911E-01 -1.344E+00 0.000E+00 0.000E+00 0.000E+00 4.008E+00 -1.493E+01 4.163E-01
0.0250 -1.818E-02 -9.621E-03 -3.261E-02 -6.358E-01 -3.754E-01 -1.210E+00 9.613E-02 -3.609E-01 9.976E-03 3.542E+00 -1.345E+01 3.672E-01
0.0500 -3.140E-02 -1.845E-02 -5.898E-02 -5.513E-01 -3.181E-01 -9.064E-01 1.709E-01 -6.537E-01 1.772E-02 2.361E+00 -9.933E+00 2.434E-01
0.0750 -4.459E-02 -2.540E-02 -7.653E-02 -3.923E-01 -2.385E-01 -4.594E-01 2.103E-01 -8.428E-01 2.174E-02 7.456E-01 -4.845E+00 7.446E-02
0.1000 -5.177E-02 -3.032E-02 -8.156E-02 -2.350E-01 -1.594E-01 5.288E-02 2.078E-01 -8.920E-01 2.140E-02 -9.449E-01 9.618E-01 -1.022E-01
numpy.loadtxt is, in general, not very efficient (numpy save/load works best with binary format). Plus your code as-is doesn't work for me (because the delimiter is not really a tab, but rather multiple spaces, and I don't think that's supported by numpy).
In your position I would either use raw python (and then convert to numpy array) or pandas (probably slower but more robust).
Ignoring the tkinter part and just supposing the file name to be data.txt, the first solution would look like:
import numpy as np
data = []
with open('data.txt') as fp:
for i, line in fp:
if i >= 8:
data.append([float(x) for x in line.split()])
data = np.asarray(data)
The second solution with pandas would be:
import pandas as pd
df = pd.read_csv('data.txt', skiprows=7, delimiter=' ', skipinitialspace=True)
data = df.values
The results are equivalent, but the slightly different: python's split function automatically trims white space at beginning and end, plus it considers any white space as one separator (one space, multiple spaces, tab, etc.). The conversion to float works in the example you provided. All the first 8 rows are skipped. Pandas' version also ignores multiple spaces, but I think it wouldn't work with tabs, plus we need to explicitly tell it to ignore the whitespaces at the beginning of the line. We also just skip 7 lines there, not 8, because by default csv files must have the column names in the first column. So in this particular case, we would get a dataframe with column names
['(s)', '(m/s)', '(m/s).1', '(m/s).2', '(m/s^2)', '(m/s^2).1',
'(m/s^2).2', '(deg/s)', '(deg/s).1', '(deg/s).2', '(deg/s^2)',
'(deg/s^2).1', '(deg/s^2).2']
But that doesn't matter anyway, because when we take .values in the end, only numeric values are kept.
Perhaps, the more important difference is that if there is an invalid value at some place (say, a string), python's code would raise an exception when trying to convert to float, pandas' solution will happily accept it and create a column of "object" type (i.e. "anything" type) and not convert even valid entries to float (in that column).

python sparse matrix creation paralellize to speed up

I am creating a sparse matrix file, by extracting the features from an input file. The input file contains in each row, one film id, and then followed by some feature IDs and that features score.
6729792 4:0.15568 8:0.198796 9:0.279261 13:0.17829 24:0.379707
the first number is the ID of the film, and then the value to the left of the colon is feature ID and the value to the right is the score of that feature.
Each line represents one film, and the number of feature:score pairs vary from one film to another.
here is how I construct my sparse matrix.
import sys
import os
import os.path
import time
import numpy as np
from Film import Film
import scipy
from scipy.sparse import coo_matrix, csr_matrix, rand
def sparseCreate(self, Debug):
a = rand(self.total_rows, self.total_columns, format='csr')
l, m = a.shape[0], a.shape[1]
f = tb.open_file("sparseFile.h5", 'w')
filters = tb.Filters(complevel=5, complib='blosc')
data_matrix = f.create_carray(f.root, 'data', tb.Float32Atom(), shape=(l, m), filters=filters)
index_film = 0
input_data = open('input_file.txt', 'r')
for line in input_data:
my_line = np.array(line.split())
id_film = my_line[0]
my_line = np.core.defchararray.split(my_line[1:], ":")
self.data_matrix_search_normal[str(id_film)] = index_film
self.data_matrix_search_reverse[index_film] = str(id_film)
for element in my_line:
if int(element[0]) in self.selected_features:
column = self.index_selected_feature[str(element[0])]
data_matrix[index_film, column] = float(element[1])
index_film += 1
self.selected_matrix = data_matrix
json.dump(self.data_matrix_search_reverse,
open(os.path.join(self.output_path, "data_matrix_search_reverse.json"), 'wb'),
sort_keys=True, indent=4)
my_films = Film(
self.selected_matrix, self.data_matrix_search_reverse, self.path_doc, self.output_path)
x_matrix_unique = self.selected_matrix[:, :]
r_matrix_unique = np.asarray(x_matrix_unique)
f.close()
return my_films
Question:
I feel that this function is too slow on big datasets, and it takes too long to calculate.
How can I improve and accelerate it? maybe using MapReduce? What is wrong in this function that makes it too slow?
IO + conversions (from str, to str, even 2 times to str of the same var, etc) + splits + explicit loops. Btw, there is CSV python module which may be used to parse your input file, you can experiment with it (I suppose you use space as delimiter). Also I' see you convert element[0] to int/str which is bad - you create many tmp. object. If you call this function several times, you may to try to reuse some internal objects (array?). Also, you can try to implement it in another style: with map or list comprehension, but experiments are needed...
General idea of Python code optimization is to avoid explicit Python byte-code execution and to prefer native/C Python functions (for anything). And sure try to solve so many conversions. Also if input file is yours you can format it to fixed length of fields - this helps you to avoid split/parse totally (only string indexing).

I want to extract the MAC address of the following python-wifi 0.6.1 code

The following code tries to extract the MAC address of all nearby APs and then saves them in a matrix, i dont know what is wrong. it uses the library python-wifi 0.6.1. Here is the code and error:
`
import errno
import sys
import types
import pythonwifi.flags
from pythonwifi.iwlibs import Wireless, WirelessInfo, Iwrange, getNICnames, getWNICnames
i=0
ArregloMAC=[20][30]
wifi= Wireless('wlan0')
results = wifi.scan()
(num_channels, frequencies) = wifi.getChannelInfo()
print "%-8.16s Scan completed :" % (wifi.ifname, )
for ap in results:
index = 1
ArregloMAC[i][index-1]= str("%d-%s" % (_, ap.bssid))
index = index+1
print ArregloMAC`
IndexError: list index out of range
This line
ArregloMAC=[20][30]
will give you an index out of range straight off. What it says is, create a list of one element, [20], then take the 31st element of that list, and assign it to ArregloMAC. Since the list has only one element you will inevitably get an error.
It looks like you are trying to declare a 2-dimensional array. That is not how Python lists work.

Converting an imperative algorithm into functional style

I wrote a simple procedure to calculate the average of the test coverage of some specific packages in a Java project. The raw data in a huge html file is like this:
<body>
package pkg1 <line_coverage>11/111,<branch_coverage>44/444<end>
package pkg2 <line_coverage>22/222,<branch_coverage>55/555<end>
package pkg3 <line_coverage>33/333,<branch_coverage>66/666<end>
...
</body>
Given the specified packages "pkg1" and "pkg3", for example, the average line coverage is:
(11+33)/(111+333)
and average branch coverage is:
(44+66)/(444+666)
I wrote the follow procedure to get the result and it works well. But how to implement this calculation in a functional style? Something like "(x,y) for x in ... for b in ... if...". I know a little Erlang, Haskell and Clojure, So solutions in these languages are also appreciated. Thanks a lot!
from __future__ import division
import re
datafile = ('abc', 'd>11/23d>34/89d', 'e>25/65e>13/25e', 'f>36/92f>19/76')
core_pkgs = ('d', 'f')
covered_lines, total_lines, covered_branches, total_branches = 0, 0, 0, 0
for line in datafile:
for pkg in core_pkgs:
ptn = re.compile('.*'+pkg+'.*'+'>(\d+)/(\d+).*>(\d+)/(\d+).*')
match = ptn.match(line)
if match is not None:
cvln, tlln, cvbh, tlbh = match.groups()
covered_lines += int(cvln)
total_lines += int(tlln)
covered_branches += int(cvbh)
total_branches += int(tlbh)
print 'Line coverage:', '{:.2%}'.format(covered_lines / total_lines)
print 'Branch coverage:', '{:.2%}'.format(covered_branches/total_branches)
Down below you can find my Haskell solution. I will try to explain the important points I went through as I wrote it.
First you will find that I created a data structure for coverage data. It's generally a good idea to create data structures to represent whatever data you want to handle. This is in part because it makes it easier to design your code when you can think in terms of whatever you are designing – closely related to functional programming philosophies, and in part because it can eliminate a few bugs where you think you are doing something but are in actuality doing something else.
Related to the point before: The first thing I do is to convert the string-represented data into my own data structure. When you are doing functional programming, you are often doing things in "sweeps." You don't have a single function that converts data to your format, filters out the unwanted data and summarises the result. You have three different functions for each of those tasks, and you do them one at a time!
This is because functions are very composable, i.e. if you have three different ones, you can stick them together to form a single one if you want to. If you start with a single one, it is very difficult to take it apart to form three different ones.
The actual workings of the conversion function is actually quite uninteresting unless you are specifically doing Haskell. All it does is try to match each string with a regex, and if it succeeds, it adds the coverage data to the resulting list.
Again, mad composition is about to happen. I don't create a function to loop over a list of coverages and sum them up. I create a single function to sum two coverages, because I know I can use it together with the specialised fold loop (which is sort of like a for loop on steroids) to summarise all coverages in a list. There's no need for me to reinvent the wheel and create a loop myself.
Besides, my sumCoverages function works with a lot of specialised loops, so I don't have to write a ton of functions, I just stick my single function into a ton of pre-made library functions!
In the main function you will see what I mean by programming in "sweeps" or "passes" over the data. First I convert it to the internal format, then I filter out the unwanted data, then I summarise the remaining data. These are completely independent computations. That's functional programming.
You will also notice that I use two specialised loops there, filter and fold. This means that I don't have to write any loops myself, I just stick in a function to those standard library loops and let those take it from there.
import Data.Maybe (catMaybes)
import Data.List (foldl')
import Text.Printf (printf)
import Text.Regex (matchRegex, mkRegex)
corePkgs = ["d", "f"]
stats = [
"d>11/23d>34/89d",
"e>25/65e>13/25e",
"f>36/92f>19/76"
]
format = mkRegex ".*(\\w+).*>([0-9]+)/([0-9]+).*>([0-9]+)/([0-9]+).*"
-- It might be a good idea to define a datatype for coverage data.
-- A bit of coverage data is defined as the name of the package it
-- came from, the lines covered, the total amount of lines, the
-- branches covered and the total amount of branches.
data Coverage = Coverage String Int Int Int Int
-- Then we need a way to convert the string data into a list of
-- coverage data. We do this by regex. We try to match on each
-- string in the list, and then we choose to keep only the successful
-- matches. Returned is a list of coverage data that was represented
-- by the strings.
convert :: [String] -> [Coverage]
convert = catMaybes . map match
where match line = do
[name, cl, tl, cb, tb] <- matchRegex format line
return $ Coverage name (read cl) (read tl) (read cb) (read tb)
-- We need a way to summarise two coverage data bits. This can of course also
-- be used to summarise entire lists of coverage data, by folding over it.
sumCoverage (Coverage nameA clA tlA cbA tbA) (Coverage nameB clB tlB cbB tbB) =
Coverage (nameA ++ nameB ++ ",") (clA + clB) (tlA + tlB) (cbA + cbB) (tbA + tbB)
main = do
-- First we need to convert the strings to coverage data
let coverageData = convert stats
-- Then we want to filter out only the relevant data
relevantData = filter (\(Coverage name _ _ _ _) -> name `elem` corePkgs) coverageData
-- Then we need to summarise it, but we are only interested in the numbers
Coverage _ cl tl cb tb = foldl' sumCoverage (Coverage "" 0 0 0 0) relevantData
-- So we can finally print them!
printf "Line coverage: %.2f\n" (fromIntegral cl / fromIntegral tl :: Double)
printf "Branch coverage: %.2f\n" (fromIntegral cb / fromIntegral tb :: Double)
Here are some quickly-hacked, untested ideas applied to your code:
import numpy as np
import re
datafile = ('abc', 'd>11/23d>34/89d', 'e>25/65e>13/25e', 'f>36/92f>19/76')
core_pkgs = ('d', 'f')
covered_lines, total_lines, covered_branches, total_branches = 0, 0, 0, 0
for pkg in core_pkgs:
ptn = re.compile('.*'+pkg+'.*'+'>(\d+)/(\d+).*>(\d+)/(\d+).*')
matches = map(datafile, ptn.match)
statsList = [map(int, match.groups()) for match in matches if matches]
# statsList is a list of [cvln, tlln, cvbh, tlbh]
stats = np.array(statsList)
covered_lines, total_lines, covered_branches, total_branches = stats.sum(axis=1)
Well, as you can see I haven't bothered to finish off the remaining loop, but I think the point is made by now. There's certainly a lot more than one way to do this; I elected to show off map() (which some will say makes this less efficient, and it probably does), as well as NumPy to get the (admittedly light) math done.
This is the corresponding Clojure solution:
(defn extract-data
"extract 4 integer from a string line according to a package name"
[pkg line]
(map read-string
(rest (first
(re-seq
(re-pattern
(str pkg ".*>(\\d+)/(\\d+).*>(\\d+)/(\\d+)"))
line)))))
(defn scan-lines-by-pkg
"scan all string lines and extract all data as integer sequences
according to package names"
[pkgs lines]
(filter seq (for [pkg pkgs
line lines]
(extract-data pkg line))))
(defn sum-data
"add all data in valid lines together"
[pkgs lines]
(apply map + (scan-lines-by-pkg pkgs lines)))
(defn get-percent
[covered all]
(str (format "%.2f" (float (/ (* covered 100) all))) "%"))
(defn get-cov
[pkgs lines]
{:line-cov (apply get-percent (take 2 (sum-data pkgs lines)))
:branch-cov (apply get-percent (drop 2 (sum-data pkgs lines)))})
(get-cov ["d" "f"] ["abc" "d>11/23d>34/89d" "e>25/65e>13/25e" "f>36/92f>19/76"])

Categories

Resources