Rectifying the numpy string - python

I am doing automation on the manual work where I am reading the data from outlook mail and storing the required data in a NumPy string array. However, data having lots of space you say dummy one. I need to rectify the NumPy string.
import numpy as np
arr=np.array([])
#outlook code and store in array.
arr=[{'5'} {'9'} {'7'} {'9'} {''} {''} {''} {''} {''} {''}]
# required output look like this
arr=[5979]
Can anyone help to get me the required output.

Solution for this given format but not scalable.
It iterates over each set contained in the list and unpack them to another list of string. Then is convert the list of string to a single string and finally to an integer
arr = [{'5'}, {'9'}, {'7'}, {'9'}, {''}, {''}]
value = int("".join([str(*x) for x in arr if str(*x).isdigit()]))
print(value)
5979

You can .strip() each string to remove spaces and append to previous strings. I'm not sure why you use sets inside a list and not strings directly, this will save you that next(iter(..)). Also, note that you won't get much benefit from numpy array of strings, but for numeric arrays you can get huge benefits.
arr = [{'5'}, {'9'}, {'7'}, {'9'}, {' '}, {' '}, {' '}]
value = ''
for s in arr:
value += next(iter(s)).strip()

Related

Strip all string and make Numpy array from list

I have a list it contains dictionaries that hold string and float data eg. [{a:'DOB', b:'weight', c:height}, {a:12.2, b:12.5, c:11.5}, {a:'DOB', b:33.5, c:33.2}] as such:
I want to convert this to numpy and strip all keys and string values so only the float values pass into the numpy array then I want to use this to work out some stats. eg [[12.2,12.5,11.5], ['', 33.5, 33.2]]
where the whole row is string it will be omitted but where the item in a row is string it should keep a null value.
I'm not sure how to achieve this.
This answer combines all the suggestions in the comments. The procedure loops thru the initial list of dictionaries and does the following:
Creates a new list is using list compression, saving each dictionary value as float, or None for non-floats.
Counts # of float values in the new list and appends if there is at least 1 float.
Creates an array from the new list using np.array().
I added missing quotes to starting code in original post.
Also, in the future posts you should at least make an effort to code something, then ask for help where you get stuck.
test_d = [{'a':'DOB', 'b':'weight', 'c':'height'},
{'a':12.2, 'b':12.5, 'c':11.5},
{'a':'DOB', 'b':33.5, 'c':33.2}]
arr_list = []
for d in test_d:
d_list = [x if isinstance(x, float) else None for x in d.values()]
check = sum(isinstance(x, float) for x in d_list)
if check > 0:
arr_list.append(d_list)
print (arr_list)
arr = np.array(arr_list)
print(arr)
For reference, here is the list compression converted to a standard loop with if/else logic:
for d in test_d:
# List compression converted to a loop with if/else below:
d_list = []
for x in d.values():
if isinstance(x, float):
d_list.append(x)
else:
d_list.append(None)

How to Convert a List to numpy.datetime64 format

I know that we can create a single string to np.datetime64 format such as:
a = np.datetime64('2020-01-01')
But what if we have a list with multiple strings of dates in it?
How are we able to apply the same np.datetime64 to convert all the elements inside into a datetime format? Apart from doing a for-loop perhaps.
When you have your string list, use it as a source to a Numpy array,
passing datetime64 as dtype. E.g.:
lst = ['2020-01-01', '2020-02-05', '2020-03-07' ]
a = np.array(lst, dtype='datetime64')
When you execute a (actually print this array in a notebook),
you will get:
array(['2020-01-01', '2020-02-05', '2020-03-07'], dtype='datetime64[D]')
As you can see, in this case the default precision is Day.
But you can pass the precision explicitely, e.g. b = np.array(lst, dtype='datetime64[s]').
Don't be misled by apostrophes surrounding each element in the above
printout, they are not strings. To check it, execute a[0] and
you will get:
numpy.datetime64('2020-01-01')
Using list comprehension:
strings_list= [...]
npdate_list = [np.datetime64(x) for x in strings_list]
Is there a specific reason for you to want to avoid a loop?
List comprehension is okay?

Assigning string to a list

import random
import numpy as np
LOC = np.zeros(96)
LOC[0] = 'H'
for t in range(0,96):
if 32<t<40:
LOC[t] = random.choice(['H','W','I'])
Here, I want to initialize LOC with the character 'H' and has the check few conditions. But when I try to assign it, I am getting an error could not convert string to float: 'H'. How can I assign a character/string to the list LOC?
NumPy is not really made for mixing types of content. If you want an array of strings the empty values shouldn't be zero, but rather empty strings: ''
You can use random.choices() to get the random values and assign, but the trick is to set the dtype to something that's appropriate for strings::
import random
import numpy as np
LOC = np.zeros(96, dtype='<U1')
LOC[0] = 'H'
LOC[32:40] = random.choices(['H','W','I'], k = 40 - 32)
This will be an array of empty strings except where you've assigned random values. Regular python lists, of course work with mixed types, if you don't need NumPy, you can initialize the array with:
LOC = [0] * 96
and then proceed with setting values with whatever you want.
In python u can use the ord function to get the unicode code point of a charachter. So using
LOC[t] = ord(random.choice(['H','W','I']))
you should be able to achieve your goal, even though I would call it 'assigning a character to a numpy array' and not 'assigning a string to a list'.

For loop through a numpy array of strings

I imported a csv file containing zip codes as a string using the following line:
my_data = genfromtext('path\to\file.csv', delimiter = ',', dtype=str, autostrip=True)
I am importing as a string in order to keep the leading zeroes some zip codes may contain. Now I need to also loop through the entire numpy array and I wanted to do so like this:
for i in np.nditer(my_data):
do something with my_data[i]
But unfortunately it is returning the following error:
Arrays used as indices must be of integer (or boolean) type
Any idea how I can loop through each element of this numpy array?
While looping over NumPy arrays is often not a good solution, you can do it like this:
for i in range(len(my_data)):
do something with my_data[i]
You might be better off reading your data into a list, process the strings, and convert into NumPy array afterwards.
You should do something with i, not with my_data[i]. i is already your element (a part if mydata).
Thats why my_data[i] is not working, becouse i is not an index. it is a numpy array.
If you want to use index, and the given element too, use enumerate()
Example:
lista = [20,50,70]
for idx, element in enumerate(lista):
print (idx, element)
For more info visit this site numpy iteration tutorial

Concatenate dicts of numpy arrays retaining numpy dtype

I'm concatenating python dicts within a loop (not shown). I declare a new empty dict (dsst_mean_all) on the first instance of the loop:
if station_index == 0:
dsst_mean_all = {}
for key in dsst_mean:
dsst_mean_all[key] = []
source = [dsst_mean_all, dsst_mean]
for key in source[0]:
dsst_mean_all[key] = np.concatenate([d[key] for d in source])
and then, as you can see in the second part of the code above, I concatenate the dict that has been obtained within the loop (dsst_mean) with the large dict that's going to hold all the data (dsst_mean_all).
Now dsst_mean is a dict whose elements are numpy arrays of different types. Mostly they are float32. My question is, how can I retain the datatype during concatenation? My dsst_mean_all dict ends up being float64 numpy arrays for all elements. I need these to match dsst_mean to save memory and reduce file size. Note that dsst_mean for all iterations of the loop has the same structure and elements of the same dtype.
Thanks.
You can define the dtype of your arrays in the list comprehension.
Either hardecoded:
dsst_mean_all[key] = np.concatenate([d[key].astype('float32') for d in source])
Or dynamic:
dsst_mean_all[key] = np.concatenate([d[key].astype(d[key].dtype) for d in source])
Docs: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html
Ok one way to solve this is to avoid declaring dsst_mean_all as a new empty dict. This - I think - is why everything is being cast to float64 by default. With an if/else statement, on the first iteration simply set dsst_mean_all to dsst_mean, whilst for all subsequent iterations do the concatenation as shown in my original question.

Categories

Resources