Integer Manipulation of arrays python - python

I have 2 arrays and I need to switch the last digit of the integers in one array with the integers in another. Its better if I show you the output to get a better understanding of what I'm trying to do. I'm not sure this is even possible to do at the least.
Output of arrays:
first_array=['3', '4', '5', '2', '0', '0', '1', '7']
second_array=['527', '61', '397', '100', '97', '18', '45', '1']
What it then look like:
first_array=['3', '4', '5', '2', '0', '0', '1', '7']
second_array =['523', '64', '395', '102', '90', '10', '41', '7']

>>> [s[:-1]+f for (f,s) in zip(first_array, second_array)]
['523', '64', '395', '102', '90', '10', '41', '7']

If it is actual integers, you could try "rounding down" each element of the second list to nearest multiple of 10, then adding each element from the first list. For example:
>>> first = [3,4,5,6]
>>> second = [235,123,789,9021]
>>> second = [x - (x%10) for x in second]
>>> second
[230, 120, 780, 9020]
>>> [x + y for (x,y) in zip(first, second)]
[233, 124, 785, 9026]

Related

Python code that adds a new value to the list when it finds the elements of one list in another list

L1 =['0-0-3-0-0-80-0', '0-0-3-0-0-82-0']
L2 = [['0', '4', '0', '0', '42', '71','42','0-0-0-0-0-4-0'],['0', '4', '2', '0', '42', '72','42', '0-0-0-1-0-4-2'],['0', '80', '0', '0', '42', '81','43', '0-0-3-0-0-80-0'],['0', '80', '0', '1', '21', '81','43', '0-0-3-0-0-80-0'],['0', '81', '0', '0', '43', '82', '21', '0-0-3-1-0-81-0',],['0', '82', '0', '0', '21', '83', '43', '0-0-3-0-0-82-0']]
So I want to search L1's values in L2 lists if code finds the value
'0-0-3-0-0-80-0' is in ['0', '80', '0', '0', '42', '81','43', '0-0-3-0-0-80-0'] and ['0', '80', '0', '1', '21', '81','43', '0-0-3-0-0-80-0']
and
'0-0-3-0-0-82-0' is in ['0', '82', '0', '0', '21', '83', '43', '0-0-3-0-0-82-0']
Last result will be shown as like that
L2 = [['0', '4', '0', '0', '42', '71','42','0-0-0-0-0-4-0',""],['0', '4', '2', '0', '42', '72','42', '0-0-0-1-0-4-2',""],['0', '80', '0', '0', '42', '81','43', '0-0-3-0-0-80-0',"found"],['0', '80', '0', '1', '21', '81','43', '0-0-3-0-0-80-0',"found"],['0', '81', '0', '0', '43', '82', '21', '0-0-3-1-0-81-0',],['0', '82', '0', '0', '21', '83', '43', '0-0-3-0-0-82-0',"found"]]
for i in range(0,len(L2)):
for x in L1:
if x in L2[i]:
result.append(L2[i]+["found"])
else:
result.append(L2[i]+[""])
I tried this but it duplicates results two times.
The code creates results for two times.
It sounds like you want to append "found" or "" to each list in L2.
The code below appends "" by default and overwrites it with "found" each time something is found. There is no result variable needed:
for i in range(0,len(L2)):
if len(L2[i]) == 8:
L2[i].append('')
for x in L1:
if x in L2[i]:
L2[i][8] = 'found'
Output as requested

How can I extract a column and create a vector out of them?

mat = [['1', '2', '3', '4', '5'],
['6', '7', '8', '9', '10'],
['11', '12', '13', '14', '15']]
Suppose, I have this vector of vectors.
Say, I need to extract 2nd column of each row, convert them into binary, and then create a vector of them.
Is it possible to do it without using NumPy?
Use zip for transpose list and make loop with enumerate and filter by id with bin().
mat = [['1', '2', '3', '4', '5'],
['6', '7', '8', '9', '10'],
['11', '12', '13', '14', '15']]
vec = [[bin(int(r)) for r in row] for idx, row in enumerate(zip(*mat)) if idx == 1][0]
print(vec) # ['0b10', '0b111', '0b1100']
Yes. This is achievable with the following code :
mat = [['1', '2', '3', '4', '5'],
['6', '7', '8', '9', '10'],
['11', '12', '13', '14', '15']]
def decimalToBinary(n):
return bin(n).replace("0b", "")
new_vect = []
for m in mat:
m = int(m[1])
new_vect.append(decimalToBinary(m))
print (new_vect)
Hope this is expected
['10', '111', '1100']

Python, I want to convert a specific column type after importing a csv file

import numpy as np
data_arr = np.loadtxt("asset.csv", delimiter = ",", dtype = 'str')
data_arr
Result:
array([['G1', '1', '100', '5', '0'],
['G1', '1', '21', '538', '0'],
['G1', '1', '22', '6000', '0'],
...,
['G2', '8', '61', '241908', '8800'],
['G2', '8', '70', '57341', '16800'],
['G2', '9', '51', '1340', '0']], dtype='<U7')
But I want to convert 2,3,4,5 columns(The '1', '100', '5', and '0' fields in the first row) to int type
because I want to try
family_number = np.array([1,2,3,4,100])
capital = data_arr[data_arr[:,0]=="G1"]
for i, number in enumerate(family_number):
family_numbers = capital[capital[:,1]>i] & capital[capital[:,1]<=number]
print("\t" + len(family_numbers))
How can I convert type of columns? Please help!
Give the proper types of columns:
np.loadtxt('asset.csv', delimiter=",", dtype='S20,int64,int64,int64,int64')
EDIT: list the maximum string length alongside. E.g. this should now work assuming your first column doesn't exceed 20 characters.
You could slice the result and use astype:
arr = np.array([['1', '2', '3'], ['3', '4', '5'], ['5', '6', '7']])
arr
# array([['1', '2', '3'],
# ['3', '4', '5'],
# ['5', '6', '7']], dtype='<U1')
arr[:,1:].astype(int)
# array([[2, 3],
# [4, 5],
# [6, 7]])
I mean, the other answers and comments (#Amadan's, for example) seem correct in that you could import them with different data types in the first place. But, if you're stuck after the fact, code like the above should work.
Try something like:
df[col_bame]=df[can l_name].astype(int)

deleting columns in tflearn producing strange output

I am using tflearn and I am using the following code to load my csv file...
data, labels = load_csv('/home/eric/Documents/Speed Dating Data.csv',
target_column=0, categorical_labels=False)
Here is a snippet of my csv file (there are a lot more columns)...
I want to remove a specific column. For example, let's say I remove column 1 and then print out the data for column 1 to 5...
def preprocess(cols_del):
data, labels = load_csv('/home/eric/Documents/Speed Dating Data.csv',
target_column=0, categorical_labels=False)
for col_del in sorted(cols_del):
[data.pop(col_del) for position in data]
for i in range(20):
print(data[i][0:5])
def main(_):
delete = [0]
preprocess(delete)
This is the result...
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
The data is clearly different. What is going on? Are rows being deleted instead of column? How can I delete the entire column completely without altering any other columns?
Also, I know it is kind of a separate question, but if I were to use n_classes in my load csv function, how would I do that? Is that the number of column in my CSV?
What's happening is that the line [data.pop(col_del) for position in data] is deleting half your rows, and then you're displaying the first 20 rows of what's left. (It would delete all the rows, but the call to pop is advancing the loop iterator.)
If you don't want certain columns you should pass your delete list to the columns_to_ignore parameter when you call load_csv. See the function description at load_csv. If you need to remove columns from a dataset in memory I think it would be worth your time to learn the basics of the Pandas library; it will make your life much simpler.
You would need n_classes if your target labels were categorical, in order to tell load_csv how many categories there are. Since you have categorical_labels=False, you shouldn't need it.

Identify or count continuously repeated number (actually missing value: nan) in the list

Basically, I would like to identify whether the missing values in data set are continuously repeated or not. If there are countinously repeated missing values in the data set, I would like to know whether lengths of the each continuously repeated missing value sets are above certian number or not.
For example:
data =['1', '0', '9', '31', '11', '12', 'nan', '10', '44', '53', '12', '66', '99', '3', '2', '6.75833',....., 'nan', 'nan', 'nan', '3', '7', 'nan', 'nan']
In data above, the total number of 'nan' would be 6 and it could be calculated with data.count('nan'). However, what I want to know is how much continuously the missing value can be repeated. For this data, the answer would be 3.
I apologize that I don't show my example code, but I am a very novice in this area and I couldn't have any idea for coding.
Any idea, help or tips would be really appreciated.
This looks like a job for itertools.groupby():
>>> from itertools import groupby
>>> data =['1', '0', '9', '31', '11', '12', 'nan', '10', '44', '53',
'12', '66', '99', '3', '2', '6.75833', 'nan', 'nan', 'nan',
'3', '7', 'nan', 'nan']
>>> [len(list(group)) for key, group in groupby(data) if key == 'nan']
[1, 3, 2]
Note if your code actually has real NaNs instead of strings, the if key == 'nan'equality test should be replaced with math.isnan(key).
Or you can try this one, which is faster:
grouped_L = [sum(1 for i in group) for k,group in groupby(L)]
Using pyrle for speed. In this solution I replace nan with a number not in the data (-42). This is because nan is a difficult value for rles, as np.nan != np.nan and hence no nans are treated as consecutive.
import numpy as np
data =['1', '0', '9', '31', '11', '12', 'nan', '10', '44', '53', '12', '66', '99', '3', '2', '6.75833', 'nan', 'nan', 'nan', '3', '7', 'nan', 'nan']
arr = np.array([np.float(f) for f in data])
assert not -42 in arr
from pyrle import Rle
r = Rle(arr)
arr[np.isnan(arr)] = -42
is_nan = r.values == -42
np.max(r.runs[is_nan])
# 3

Categories

Resources