I have an array of 3 numbers per row, 4 columns deep.
I am struggling to figure out how I can write the code to print all numbers from a specified column rather than from a row.
I have searched for tutorials that explain this easily and just cannot find any that have helped.
Can anyone point me in the right direction?
If you're thinking of python lists as rows and columns, probably better to use numpy arrays (if you're not already). Then you can print the various rows and columns easily, E.g.
import numpy as np
a = np.array([[1,2,6],[4,5,8],[8,3,5],[6,5,4]])
#Print first column
print(a[:,0])
#Print second row
print(a[1,:])
Note that otherwise you have a list of lists and you'd need to use something like,
b = [[1,2,6],[4,5,8],[8,3,5],[6,5,4]]
print([i[0] for i in b])
You can do this:
>>> a = [[1,2,3],[1,1,1],[2,1,1],[4,1,2]]
>>> print [row[0] for row in a]
[1, 1, 2, 4]
Related
I've read a few other posts about this but the other solutions haven't worked for me. I'm trying to look at 2 different CSV files and compare data from 1 column from each file. Here's what I have so far:
import pandas as pd
import numpy as np
dataBI = pd.read_csv("U:/eu_inventory/EO BI Orders.csv")
dataOrderTrimmed = dataBI.iloc[:,1:2].values
dataVA05 = pd.read_csv("U:\eu_inventory\VA05_Export.csv")
dataVAOrder = dataVA05.iloc[:,1:2].values
dataVAList = []
ordersInBoth = []
ordersInBI = []
ordersInVA = []
for order in np.nditer(dataOrderTrimmed):
if order in dataVAOrder:
ordersInBoth.append(order)
else:
ordersInBI.append(order)
So if the order number from dataOrderTrimmed is also in dataVAOrder I want to add it to ordersInBoth, otherwise I want to add it to ordersInBI. I think it splits the information correctly but if I try to print ordersInBoth each item prints as array(5555555, dtype=int64) I want to have a list of the order numbers not as an array and not including the dtype information. Let me know if you need more information or if the way I've typed it out is confusing. Thanks!
The way you're using .iloc is giving you a DataFrame, which becomes 2D array when you access values. If you just want the values in the column at index 1, then you should just say:
dataOrderTrimmed = dataBI.iloc[:, 1].values
Then you can iterate over dataOrderTrimmed directly (i.e. you don't need nditer), and you will get regular scalar values.
I hope you are well. I am new. I am trying to add certain columns but not to all, and I require your help.
W=[[77432664,6,2,4,3,4,3],
[6233234,7,3,2,5,3,1],
[3412455221,8,3,2,4,5,5]]
rows=len(W)
columns=len(W[0])
for i in range(rows):
T=sum(W[i])
W[i].append(T)
I assume by "add" you mean "sum" and not "insert". If so, then you can use what is called a slice:
for row in rows:
t = sum(row[1:])
row.append(t)
row[1:] takes all but the first element of the list row. For more information on this syntax, you should google "python slice".
Also notice how I am iterating over rows directly, rather than using an index. This is the most common way to do a loop in Python.
You can create a subarray in python by specifying the column range and then add it. Below code demonstrate the addition of column 2,3,4,5,6 in Python.
W=[[77432664,6,2,4,3,4,3],
[6233234,7,3,2,5,3,1],
[3412455221,8,3,2,4,5,5]]
rows=len(W)
columns=len(W[0])
for i in range(rows):
T=sum(W[i][2:6]) #For i=0 it retreives subarray [2,4,3,4,3] then add it to get T=16
W[i].append(T)
I'd suggest using pandas sum method using over axis=0:
# numeric of columns
my_cols_n = [2,3,4,5,6]
# Get cols by name
my_cols = [x for x,i in enumerate(list(df.columns)) if i in my_cols_n]
# Get Sum
df["my_sum"] = df[my_cols].sum(axis=0)
To add to #Code-Apprentice answer - consider using numpy for similar assignments:
import numpy as np
W=[[77432664,6,2,4,3,4,3],
[6233234,7,3,2,5,3,1],
[3412455221,8,3,2,4,5,5]]
W=np.array(W)
>>> print(W[:, 3:].mean(axis=1))
[3.5 2.75 4. ]
Especially with the growth of complexity of matrix operations - you will quickly see big advantages of numpy
I have an np.array I would like to remove specific elements from based on the element "name" and not the index. Is this sort of thing possible with np.delete() ?
Namely my original ndarray is
textcodes= data['CODES'].unique()
which captures unique text codes given the quarter.
Specifically I want to remove certain codes which I need to run through a separate process and put them into a separate ndarray
sep_list = np.array(['SPCFC_CODE_1','SPCFC_CODE_2','SPCFC_CODE_3','SPCFC_CODE_4])
I have trouble finding a solution on removing these specific codes in sep_list from textcodes because I don't know exactly where these sep_list codes would be indexed as it would be different each quarter and I would like to automate it based on the specific names instead because those will always be the same.
Any help is greatly appreciated. Thank you.
You should be able to do something like:
import numpy as np
data = [3,2,1,0,10,5]
bad_list = [1, 2]
data = np.asarray(data)
new_list = np.asarray([x for x in data if x not in bad_list])
print("BAD")
print(data)
print("GOOD")
print(new_list)
Yields:
BAD
[ 3 2 1 0 10 5]
GOOD
[ 3 0 10 5]
It is impossible to tell for sure since you did not provide sample data, but the following implementation using your variables should work:
import numpy as np
textcodes= data['CODES'].unique()
sep_list = np.array(['SPCFC_CODE_1','SPCFC_CODE_2','SPCFC_CODE_3','SPCFC_CODE_4'])
final_list = [x for x in textcodes if x not in sep_list]
I have a six column matrix. I want to find the row(s) where BOTH columns match the query.
I've been trying to use numpy.where, but I can't specify it to match just two columns.
#Example of the array
x = np.array([[860259, 860328, 861277, 861393, 865534, 865716], [860259, 860328, 861301, 861393, 865534, 865716], [860259, 860328, 861301, 861393, 871151, 871173],])
print(x)
#Match first column of interest
A = np.where(x[:,2] == 861301)
#Match second column on interest
B = np.where(x[:,3] == 861393)
#rows in both A and B
np.intersect1d(A, B)
#This approach works, but is not column specific for the intersect, leaving me with extra rows I don't want.
#This is the only way I can get Numpy to match the two columns, but
#when I query I will not actually know the values of columns 0,1,4,5.
#So this approach will not work.
#Specify what row should exactly look like
np.where(all([860259, 860328, 861277, 861393, 865534, 865716]))
#I want something like this:
#Where * could be any number. But I think that this approach may be
#inefficient. It would be best to just match column 2 and 3.
np.where(all([*, *, 861277, 861393, *, *]))
I'm looking for an efficient answer, because I am looking through a 150GB HDF5 file.
Thanks for your help!
If I understand you correctly,
you can use a little more advanced slicing, like this:
np.where(np.all(x[:,2:4] == [861277, 861393], axis=1))
this will give you only where these 2 cols are equal to [861277, 861393]
I am trying to convert all my codes, written in MATLAB, to python. I have a problem and I couldn't find a way to solve it. Maybe someone has an idea.
I have a file which has m rows and two columns. I want to read file, and then sort file based on the second column. Later, I must use the sorted first column (from first row to 1000th row) and find values larger than threshold (here for example 0.2) and sum them.
Hope someone has an idea.
Thanks
If the file is for example with fields separated by tabs and rows separated by columns the problem is quite simple:
f = open("filename.csv")
data = [map(float, x.split("\t")) for x in f.readlines()]
data.sort(key = lambda x:x[1])
result = sum(x[0] for x in data[:1000] if x[0] > 0.2)
Consider using Numpy arrays and its accompanying functions. They are (usually) quite similar to those in MATLAB, which might make your conversion from the latter easier.
import numpy as np
data = np.genfromtext("filename.csv", delimiter="\t", dtype=np.float)
idx = np.argsort(data[:, 1])
data1000 = data[idx[:1000]] # First 1000 of sorted data
result = np.sum(data1000[data1000[:, 0] > 0.2, 0])