Python - string to matrix representation

Python - string to matrix representation - python

I have a string a="1 2 3; 4 5 6". How do i express this as a matrix [1 2 3; 4 5 6] in Python?
I want to then use another such string b, convert to a matrix and find a x b.

You can use the numpy module to create a matrix directly from a string in matlab type format
>>> import numpy as np
>>> a="1 2 3; 4 5 6"
>>> np.matrix(a)
matrix([[1, 2, 3],
[4, 5, 6]])
You can use the same library to do matrix multiplication
>>> A = np.matrix("1 2 3; 4 5 6")
>>> B = np.matrix("2 3; 4 5; 6 7")
>>> A * B
matrix([[28, 34],
[64, 79]])
Go read up on the numpy library, it is a very powerful module to do all of the type of work that you are referring to.

This is one way to do it, split the string at ;, then go through each string, split at ' ' and then go through that, convert it to an int and append to a sublist, then append that sublist to another list:
a = "1 2 3; 4 5 6"
aSplit = a.split('; ')
l = []
for item in aSplit:
subl = []
for num in item.split(' '):
subl.append(int(num))
l.append(subl)
print l

Related

Python splitting an int based on the char length

I’m new to python and would like to do a simple function. I’d like to read the input array and if the value is more than 4 digits, to then split it then print the first value then the second value.
I’m having issues splitting the number and getting rid of 0’s inbetween; so for example 1006, would become 1, 6.
Input array:
a = [ 1002, 2, 3, 7 ,9, 15, 5992]
Desired output in console:
1, 2
2
3
7
9
15
59,92

You can abstract the splitting into a function and then use a list comprehension to map that function over the list. The following can be tweaked (it matches more of what you had before one of your edits). It can be tweaked of course:
def split_num(n):
s = str(n)
if len(s) < 4:
return 0, n
else:
a,b = s[:2], s[2:]
if a[1] == '0': a = a[0]
return int(a), int(b)
nums = [1002, 2, 3, 7 ,9, 15, 5992]
result = [split_num(n) for n in nums]
for a,b in result:
print(a,b)
Output:
1 2
0 2
0 3
0 7
0 9
0 15
59 92

If you just want a list of the non-zero digits in the original list, you can use this:
a = [ 1002, 2, 3, 7 ,9, 15, 5992]
strings = [str(el) for el in a]
str_digits = [char for el in strings for char in el if char != '0']
and if you want the digits as ints, you can do:
int_digits = [int(el) for el in str_digits]
or go straight to
int_digits = [int(char) for el in strings for char in el if char != '0']
I'm not sure what the logic is behind your desired output is, though, so if this isn't helpful I'm sorry.

Convert literal string to list inside python

helpful
'[2, 4]'
'[0, 0]'
'[0, 1]'
'[7, 13]'
'[4, 6]'
Column name helpful has a list inside the string. I want to split 2 and 4 into separate columns.
[int(each) for each in df['helpful'][0].strip('[]').split(',')]
This works the first row but if I do
[int(each) for each in df['helpful'].strip('[]').split(',')]
gives me attribute error
AttributeError: 'Series' object has no attribute 'strip'
How can I print out like this in my dataframe??
helpful not_helpful
2 4
0 0
0 1
7 13
4 6

As suggested by #abarnert, the first port of call is find out why your data is coming across as strings and try and rectify that problem.
However, if this is beyond your control, you can use ast.literal_eval as below.
import pandas as pd
from ast import literal_eval
df = pd.DataFrame({'helpful': ['[2, 4]', '[0, 0]', '[0, 1]', '[7, 13]', '[4, 6]']})
res = pd.DataFrame(df['helpful'].map(literal_eval).tolist(),
columns=['helpful', 'not_helpful'])
# helpful not_helpful
# 0 2 4
# 1 0 0
# 2 0 1
# 3 7 13
# 4 4 6
Explanation
From the documentation, ast.literal_eval performs the following function:
Safely evaluate an expression node or a string containing a Python
literal or container display. The string or node provided may only
consist of the following Python literal structures: strings, bytes,
numbers, tuples, lists, dicts, sets, booleans, and None.

Assuming what you've described here accurately mimics your real-world case, how about a regex with .str.extract()?
>>> regex = r'\[(?P<helpful>\d+),\s*(?P<not_helpful>\d+)\]'
>>> df
helpful
0 [2, 4]
1 [0, 0]
2 [0, 1]
>>> df['helpful'].str.extract(regex, expand=True).astype(np.int64)
helpful not_helpful
0 2 4
1 0 0
2 0 1
Each pattern (?P<name>...) is a named capturing group. Here, there are two: helpful/not helpful. This assumes the pattern can be described by: opening bracket, 1 or more digits, comma, 0 or more spaces, 1 or more digits, and closing bracket. The Pandas method (.extract()), as its name implies, "extracts" the result of match.group(i) for each i:
>>> import re
>>> regex = r'\[(?P<helpful>\d+),\s*(?P<not_helpful>\d+)\]'
>>> re.search(regex, '[2, 4]').group('helpful')
'2'
>>> re.search(regex, '[2, 4]').group('not_helpful')
'4'

Just for fun without module.
s = """
helpful
'[2, 4]'
'[0, 0]'
'[0, 1]'
'[7, 13]'
'[4, 6]'
"""
lst = s.strip().splitlines()
d = {'helpful':[], 'not_helpful':[]}
el = [tuple(int(x) for x in e.strip("'[]").split(', ')) for e in lst[1:]]
d['helpful'].extend(x[0] for x in el)
d['not_helpful'].extend(x[1] for x in el)
NUM_WIDTH = 4
COLUMN_WIDTH = max(len(k) for k in d)
print('{:^{num_width}}{:^{column_width}}{:^{column_width}}'.format(
' ', *sorted(d),
num_width=NUM_WIDTH,
column_width=COLUMN_WIDTH
)
)
for (i, v) in enumerate(zip(d['helpful'], d['not_helpful']), 1):
print('{:^{num_width}}{:^{column_width}}{:^{column_width}}'.format(
i, *v,
num_width=NUM_WIDTH,
column_width=COLUMN_WIDTH
)
)

Why does vstack change the type of the elments? And how do I solve this?

I have some lists such as
list1 = ['hi',2,3,4]
list2 = ['hello', 7,1,8]
list3 = ['morning',7,2,1]
Where 'hi', 'hello' and 'morning' are strings, while the rest are numbers.
However then I try to stack them up as:
matrix = np.vstack((list1,list2,list3))
However the types of the numbers become string. In particular they become numpy_str.
How do I solve this? I tried replacing the items, I tried changing their type, nothing works
edit
I made a mistake above! In my original problem, the first list is actually a list of headings, so for example
list1 = ['hi', 'number of hours', 'number of days', 'ideas']
So the first column (in the vertically stacked array) is a column of strings. The other columns have a string as their first element and then numbers.

You could use Pandas DataFrames, they allow for heterogeneous data:
>>> pandas.DataFrame([list1, list2, list3])
0 1 2 3
0 hi 2 3 4
1 hello 7 1 8
2 morning 7 2 1
If you want to name the columns, you can do that too:
pandas.DataFrame([list1, list2, list3], columns=list0)
hi nb_hours nb_days ideas
0 hi 2 3 4
1 hello 7 1 8
2 morning 7 2 1

Since number can be written as strings, but strings can not be written as number, your matrix will have all its elements of type string.
If you want to have a matrix of integers, you can:
1- Extract a submatrix corresponding to your numbers and then map it to be integers 2- Or you can directly extract only the numbers from your lists and stack them.
import numpy as np
list1 = ['hi',2,3,4]
list2 = ['hello', 7,1,8]
list3 = ['morning',7,2,1]
matrix = np.vstack((list1,list2,list3))
# First
m = map(np.int32,matrix[:,1:])
# [array([2, 3, 4], dtype=int32), array([7, 1, 8], dtype=int32), array([7, 2, 1], dtype=int32)]
# Second
m = np.vstack((list1[1:],list2[1:],list3[1:]))
# [[2 3 4] [7 1 8] [7 2 1]]
edit (Answer to comment)
I'll call the title list list0:
list0 = ['hi', 'nb_hours', 'nb_days', 'ideas']
It's basically the same ideas:
1- Stack all then extract submatrix (Here we don't take neither first row neither first column: [1:,1:])
matrix = np.vstack((list0,list1,list2,list3))
matrix_nb = map(np.int32,matrix[1:,1:])
2- Directly don't stack the list0 and stack all the other lists (except their first element [1:]):
m = np.vstack((list1[1:],list2[1:],list3[1:]))

Do numpy 1D arrays follow row/column rules?

I have just started using numpy and I am getting confused about how to use arrays. I have seen several Stack Overflow answers on numpy arrays but they all deal with how to get the desired result (I know how to do this, I just don't know why I need to do it this way). The consensus that I've seen is that arrays are better than matrices because they are a more basic class and less restrictive. I understand you can transpose an array which to me means there is a distinction between a row and a column, but the multiplication rules all produce the wrong outputs (compared to what I am expecting).
Here is the test code I have written along with the outputs:
a = numpy.array([1,2,3,4])
print(a)
>>> [1 2 3 4]
print(a.T) # Transpose
>>> [1 2 3 4] # No apparent affect
b = numpy.array( [ [1], [2], [3], [4] ] )
print(b)
>>> [[1]
[2]
[3]
[4]] # Column (Expected)
print(b.T)
>>> [[1 2 3 4]] # Row (Expected, transpose seems to work here)
print((b.T).T)
>>> [[1]
[2]
[3]
[4]] # Column (All of these are as expected,
# unlike for declaring the array as a row vector)
# The following are element wise multiplications of a
print(a*a)
>>> [ 1 4 9 16]
print(a * a.T) # Row*Column
>>> [ 1 4 9 16] # Inner product scalar result expected
print(a.T * a) # Column*Row
>>> [ 1 4 9 16] # Outer product matrix result expected
print(b*b)
>>> [[1]
[4]
[9]
[16]] # Expected result, element wise multiplication in a column
print(b * b.T) # Column * Row (Outer product)
>>> [[ 1 2 3 4]
[ 2 4 6 8]
[ 3 6 9 12]
[ 4 8 12 16]] # Expected matrix result
print(b.T * (b.T)) # Column * Column (Doesn't make much sense so I expected elementwise multiplication
>>> [[ 1 4 9 16]]
print(b.T * (b.T).T) # Row * Column, inner product expected
>>> [[ 1 2 3 4]
[ 2 4 6 8]
[ 3 6 9 12]
[ 4 8 12 16]] # Outer product result
I know that I can use numpy.inner() and numpy.outer() to achieve the affect (that is not a problem), I just want to know if I need to keep track of whether my vectors are rows or columns.
I also know that I can create a 1D matrix to represent my vectors and the multiplication works as expected. I'm trying to work out the best way to store my data so that when I look at my code it is clear what is going to happen - right now the maths just looks confusing and wrong.
I only need to use 1D and 2D tensors for my application.

I'll try annotating your code
a = numpy.array([1,2,3,4])
print(a)
>>> [1 2 3 4]
print(a.T) # Transpose
>>> [1 2 3 4] # No apparent affect
a.shape will show (4,). a.T.shape is the same. It kept the same number of dimensions, and performed the only meaningful transpose - no change. Making it (4,1) would have added a dimension, and destroyed the A.T.T roundtrip.
b = numpy.array( [ [1], [2], [3], [4] ] )
print(b)
>>> [[1]
[2]
[3]
[4]] # Column (Expected)
print(b.T)
>>> [[1 2 3 4]] # Row (Expected, transpose seems to work here)
b.shape is (4,1), b.T.shape is (1,4). Note the extra set of []. If you'd created a as a = numpy.array([[1,2,3,4]]) its shape too would have been (1,4).
The easy way to make b would be b=np.array([[1,2,3,4]]).T (or b=np.array([1,2,3,4])[:,None] or b=np.array([1,2,3,4]).reshape(-1,1))
Compare this to MATLAB
octave:3> a=[1,2,3,4]
a =
1 2 3 4
octave:4> size(a)
ans =
1 4
octave:5> size(a.')
ans =
4 1
Even without the extra [] it has initialed the matrix as 2d.
numpy has a matrix class that imitates MATLAB - back in the time when MATLAB allowed only 2d.
In [75]: m=np.matrix('1 2 3 4')
In [76]: m
Out[76]: matrix([[1, 2, 3, 4]])
In [77]: m.shape
Out[77]: (1, 4)
In [78]: m=np.matrix('1 2; 3 4')
In [79]: m
Out[79]:
matrix([[1, 2],
[3, 4]])
I don't recommend using np.matrix unless it really adds something useful to your code.
Note the MATLAB talks of vectors, but they are really just their matrix with only one non-unitary dimension.
# The following are element wise multiplications of a
print(a*a)
>>> [ 1 4 9 16]
print(a * a.T) # Row*Column
>>> [ 1 4 9 16] # Inner product scalar result expected
This behavior follows from a.T == A. As you noted, * produces element by element multiplication. This is equivalent to the MATLAB .*. np.dot(a,a) gives the dot or matrix product of 2 arrays.
print(a.T * a) # Column*Row
>>> [ 1 4 9 16] # Outer product matrix result expected
No, it is still doing elementwise multiplication.
I'd use broadcasting, a[:,None]*a[None,:] to get the outer product. Octave added this in imitation of numpy; I don't know if MATLAB has it yet.
In the following * is always element by element multiplication. It's broadcasting that produces matrix/outer product results.
print(b*b)
>>> [[1]
[4]
[9]
[16]] # Expected result, element wise multiplication in a column
A (4,1) * (4,1)=>(4,1). Same shapes all around.
print(b * b.T) # Column * Row (Outer product)
>>> [[ 1 2 3 4]
[ 2 4 6 8]
[ 3 6 9 12]
[ 4 8 12 16]] # Expected matrix result
Here (4,1)*(1,4)=>(4,4) product. The 2 size 1 dimensions have been replicated so it becomes, effectively a (4,4)*(4,4). How would you do replicate this in MATLAB - with .*?
print(b.T * (b.T)) # Column * Column (Doesn't make much sense so I expected elementwise multiplication
>>> [[ 1 4 9 16]]
* is elementwise regardless of expectations. Think b' .* b' in MATLAB.
print(b.T * (b.T).T) # Row * Column, inner product expected
>>> [[ 1 2 3 4]
[ 2 4 6 8]
[ 3 6 9 12]
[ 4 8 12 16]] # Outer product result
Again * is elementwise; inner requires a summation in addition to multiplication. Here broadcasting again applies (1,4)*(4,1)=>(4,4).
np.dot(b,b) or np.trace(b.T*b) or np.sum(b*b) give 30.
When I worked in MATLAB I frequently checked the size, and created test matrices that would catch dimension mismatches (e.g. a 2x3 instead of a 2x2 matrix). I continue to do that in numpy.
The key things are:
numpy arrays may be 1d (or even 0d)
A (4,) array is not exactly the same as a (4,1) or (1,4)`.
* is elementwise - always.
broadcasting usually accounts for outer like behavior

"Transposing" is, from a numpy perspective, really only a meaningful concept for two-dimensional structures:
>>> import numpy
>>> arr = numpy.array([1,2,3,4])
>>> arr.shape
(4,)
>>> arr.transpose().shape
(4,)
So, if you want to transpose something, you'll have to make it two-dimensional:
>>> arr_2d = arr.reshape((4,1)) ## four rows, one column -> two-dimensional
>>> arr_2d.shape
(4, 1)
>>> arr_2d.transpose().shape
(1, 4)
Also, numpy.array(iterable, **kwargs) has a key word argument ndmin, which will, set to ndmin=2 prepend your desired shape with as many 1 as necessary:
>>> arr_ndmin = numpy.array([1,2,3,4],ndmin=2)
>>> arr_ndmin.shape
(1, 4)

Yes, they do.
Your question is already answered. Though I assume you are a Matlab user? If so, you may find this guide useful: Moving from MATLAB matrices to NumPy arrays

Slicing a list from the end

Say, I have a list of values:
>>> a = [1, 2, 3, 4]
How can I make it include the end value through slicing? I expected:
>>> a[4:]
[4]
instead of:
>>> a[4:]
[]

Slicing indices start from zero
So if you have:
>>> xs = [1, 2, 3, 4]
| | | |
V V V V
0 1 2 3 <-- index in xs
And you slice from 4 onwards you get:
>>> xs[4:]
[]
Four is is the length of ``xs`, not the last index!
However if you slice from 3 onwards (the last index of the list):
>>> xs[3:]
[4]
See: Data Structures
Many many common computer programmming langauges and software systems are in fact zero-based so please have a read of Zero-based Numbering

Python indexes are zero based. The last element is at index 3, not 4:
>>> a = [1,2,3,4]
>>> a[3:]
[4]

a = [1,2,3,4]
a[-1:]
In python you can iterate values from beginning to end ending to beginning
1, 2, 3, 4
| | | |
0 1 2 3 (or)
-4 -3 -2 -1
So If you want last element of the list you can use either a[len(a)-1:] or a[-1:]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - string to matrix representation - python

I have a string a="1 2 3; 4 5 6". How do i express this as a matrix [1 2 3; 4 5 6] in Python? I want to then use another such string b, convert to a matrix and find a x b.

Related

Python splitting an int based on the char length

Convert literal string to list inside python

Why does vstack change the type of the elments? And how do I solve this?

Do numpy 1D arrays follow row/column rules?

Slicing a list from the end

Categories

Resources