I'm trying to randomize all the rows of my DataFrame but with no success.
What I want to do is from this matrix
A= [ 1 2 3
4 5 6
7 8 9 ]
to this
A_random=[ 4 5 6
7 8 9
1 2 3 ]
I've tried with np. random.shuffle but it doesn't work.
I'm working in Google Colaboratory environment.
If you want to make this work with np.random.shuffle, then one way would be to extract the rows into an ArrayLike structure, shuffle them in place and then recreate the DataFrame:
A = pandas.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
extracted_rows = A.values.tolist() # Each row is an array element, so rows will remain fixed but their order shuffled
np.random.shuffle(extracted_rows)
A_random = pandas.DataFrame(extracted_rows)
Related
I'm currently creating a new column in my pandas dataframe, which calculates a value based on a simple calculation using a value in another column, and a simple value subtracting from it. This is my current code, which almost gives me the output I desire (example shortened for reproduction):
subtraction_value = 3
data = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9]}
data['new_column'] = data['test'][::-1] - subtraction_value
When run, this gives me the current output:
print(data['new_column'])
[9,1,2,1,-2,0,-1,3,7,6]
However, if I wanted to use a different value to subtract on the column, from position [0], then use the original subtraction value on positions [1:3] of the column, before using the second value on position [4] again, and repeat this pattern, how would I do this iteratively? I realize I could use a for loop to achieve this, but for performance reasons I'd like to do this another way. My new output would ideally look like this:
subtraction_value_2 = 6
print(data['new_column'])
[6,1,2,1,-5,0,-1,3,4,6]
You can use positional indexing:
subtraction_value_2 = 6
col = data.columns.get_loc('new_column')
data.iloc[0::4, col] = data['test'].iloc[0::4].sub(subtraction_value_2)
or with numpy.where:
data['new_column'] = np.where(data.index%4,
data['test']-subtraction_value,
data['test']-subtraction_value_2)
output:
test new_column
0 12 6
1 4 1
2 5 2
3 4 1
4 1 -5
5 3 0
6 2 -1
7 5 2
8 10 4
9 9 6
subtraction_value = 3
subtraction_value_2 = 6
data = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9]})
data['new_column'] = data.test - subtraction_value
data['new_column'][::4] = data.test[::4] - subtraction_value_2
print(list(data.new_column))
Output:
[6, 1, 2, 1, -5, 0, -1, 2, 4, 6]
I have list vector_list of length 800,000, where the elements are lists of size 768. I'm trying to add 768 columns to a pandas dataframe where each column is 800,000 long and represents an element from each list. Here's my code:
active = pd.DataFrame()
for i in range(len(vector_list[0])):
element_list = []
for j in range(len(vector_list)):
element_list.append(vector_list[j][i])
active['Element {}'.format(i)] = element_list
Just to reiterate,
len(vector_list) = 800,000
len(vector_list[0]) = 768
Is there a more clever, faster way to do this?
Directly pass the list to DataFrame constructor.
import pandas as pd
_list = [[1, 2], [3, 4], [5, 6], [7, 8]]
df = pd.DataFrame(_list)
print(df.head())
Output
0 1
0 1 2
1 3 4
2 5 6
3 7 8
I'm new to numpy and am trying to do some slicing and indexing with arrays. My goal is to take an array, and use slicing and indexing to square the last column, and then subtract the first column from that result. I then want to put the new column back into the old array.
I've been able to figure out how to slice and index the column to get the result I want for the last column. My problem however is that when I try to put it back into my original array, I get the wrong output (as seen below).
theNumbers = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
sliceColumnOne = theNumbers[:,0]
sliceColumnThree = theNumbers[:,3]**2
editColumnThree = sliceColumnThree - sliceColumnOne
newArray = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[editColumnThree]])
print("nums:\n{}".format(newArray))
I want the output to be
[[ 1 2 3 15]
[ 5 6 7 59]
[ 9 10 11 135]
[ 13 14 15 243]]
However mine becomes:
[list([1, 2, 3, 4]) list([5, 6, 7, 8]) list([9, 10, 11, 12])
list([array([ 15, 59, 135, 243])])]
Any suggestions on how to fix this?
Just assign the last numpy array row to the new one "theNumbers[3] = editColumnThree"
Code:
import numpy as np
theNumbers = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
sliceColumnOne = theNumbers[:,0]
sliceColumnThree = theNumbers[:,3]**2
editColumnThree = sliceColumnThree - sliceColumnOne
theNumbers[3] = editColumnThree
print("nums:\n{}".format(theNumbers))
Output:
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[ 15 59 135 243]]
newArray = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[editColumnThree]])
print("nums:\n{}".format(newArray))
this way, editColumnThree is the last row, not column. You can use
newArray = theNumbers.copy() # if a copy is needed
newArray[:,-1] = editColumnThree # replace last (-1) column
If you just want to stack the vectors on top of eachother, use vstack:
import numpy as np
theNumbers = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
newNumbers = np.vstack(theNumbers)
print(newNumbers)
>>>[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[13 14 15 16]]
But the issue here isn't just that you need to stack these numbers, you are mixing up columns and rows. You are changing a row instead of a column. To change the column, update the last element in each row:
import numpy as np
theNumbers = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
LastColumn = theNumbers[:,3]**2
FirstColumn = theNumbers[:,0]
editColumnThree = LastColumn - FirstColumn
for i in range(4):
theNumbers[i,3] = editColumnThree [i]
print(theNumbers)
>>>[[ 1 2 3 15]
[ 5 6 7 59]
[ 9 10 11 135]
[ 13 14 15 243]]
I have 2D array:
import numpy as np
output = np.array([1,1,6])*np.arange(6)[:,None]+1
output
Out[32]:
array([[ 1, 1, 1],
[ 2, 2, 7],
[ 3, 3, 13],
[ 4, 4, 19],
[ 5, 5, 25],
[ 6, 6, 31]])
I tried to use np.savetxt('file1.txt', output, fmt='%10d')
i have got the result in one line only
How can I save it in txt file simillar to :
x y z
1 1 1
2 2 7
3 3 13
4 4 19
5 5 25
6 6 31
3 separate columns, each column has name (x,y,z)
Please note: the original array too large (40000000 rows and 3 columns), I am using Python 3.6
I have tried the solutions in here and here but, it does not work with me
Noor, let me guess - you are using windows notepad to view the file?
I use Notepad++ which is smart enough to understand Unix-style-Lineendings which are used (by default) when creating files by np.savetxt() even when operated under windows.
You might want to explicitly specify newline="\r\n" when calling savetxt.
np.savetxt('file1.txt', output, fmt='%10d' ,header= " x y z", newline="\r\n")
Doku: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.savetxt.html
I am not sure about your data, but this:
import numpy as np
output = np.array([1,1,6])*np.arange(60)[:,None]+1
print(output)
np.savetxt('file1.txt', output, fmt='%10d' ,header= " x y z")
Produces this output:
# x y z
1 1 1
2 2 7
3 3 13
=== snipped a few lines ===
58 58 343
59 59 349
60 60 355
for me.
for np.arange(1000000) its about 32MB big and similarly formatted...
for np.arange(10000000) its about 322MB big and similarly formatted...
willem-van-onsem 1+Gb was far closer.
I did not account for the spacing of fixed 10 chars per number, my bad.
So I just started programming in Python a few days ago. And now, im trying to make a program that generates a random list, and then, choose the duplicates elements. The problem is, I dont have duplicate numbers in my list.
This is my code:
import random
def generar_listas (numeros, rango):
lista = [random.sample(range(numeros), rango)]
print("\n", lista, sep="")
return
def texto_1 ():
texto = "Debes de establecer unos parĂ¡metros para generar dos listas aleatorias"
print(texto)
return
texto_1()
generar_listas(int(input("\nNumero maximo: ")), int(input("Longitud: ")))
And for example, I choose 20 and 20 for random.sample, it generates me a list from 0 to 20 but in random position. I want a list with random numbers and duplicated.
What you want is fairly simple. You want to generate a random list of numbers that contain some duplicates. The way to do that is easy if you use something like numpy.
Generate a list (range) of 0 to 10.
Sample randomly (with replacement) from that list.
Like this:
import numpy as np
print np.random.choice(10, 10, replace=True)
Result:
[5 4 8 7 0 8 7 3 0 0]
If you want the list to be ordered just use the builtin function "sorted(list)"
sorted([5 4 8 7 0 8 7 3 0 0])
[0 0 0 3 4 5 7 7 8 8]
If you don't want to use numpy you can use the following:
print [random.choice(range(10)) for i in range(10)]
[7, 3, 7, 4, 8, 0, 4, 0, 3, 7]
random.randrange is what you want.
>>> [random.randrange(10) for i in range(5)]
[3, 2, 2, 5, 7]