I'm supposed to create code that will simulate a d20 sided dice rolling 25 times using np.random.choice.
I tried this:
np.random.choice(20,25)
but this still includes 0's which wouldn't appear on a dice.
How do I account for the 0's?
Use np.arange:
import numpy as np
np.random.seed(42) # for reproducibility
result = np.random.choice(np.arange(1, 21), 50)
print(result)
Output
[ 7 20 15 11 8 7 19 11 11 4 8 3 2 12 6 2 1 12 12 17 10 16 15 15
19 12 20 3 5 19 7 9 7 18 4 14 18 9 2 20 15 7 12 8 15 3 14 17
4 18]
The above code draws numbers from 0 to 20 both inclusive. To understand why, you could check the documentation of np.random.choice, in particular on the first argument:
a : 1-D array-like or int
If an ndarray, a random sample is generated from its elements. If an
int, the random sample is generated as if a was np.arange(n)
np.random.choice() takes as its first argument an array of possible choices (if int is given it works like np.arrange), so you can use list(range(1, 21)) to get the output you want
+1
np.random.choice(20,25) + 1
Related
Iterating over np.arange gives incorrect results starting from i = 10
import numpy as np
from math import pi, sqrt, exp, factorial
def pyas(x, sred):
return sred**x*exp(-sred)/factorial(x)
for i, j in zip(np.arange(20), range(20)):
print(i, j, pyas(i,10), pyas(j,10))
The question is, why do the values of the pyas() function start to differ?
The output:
0 0 4.5399929762484854e-05 4.5399929762484854e-05
1 1 0.00045399929762484856 0.00045399929762484856
2 2 0.0022699964881242427 0.0022699964881242427
3 3 0.007566654960414142 0.007566654960414142
4 4 0.018916637401035354 0.018916637401035354
5 5 0.03783327480207071 0.03783327480207071
6 6 0.06305545800345118 0.06305545800345118
7 7 0.09007922571921599 0.09007922571921599
8 8 0.11259903214901998 0.11259903214901998
9 9 0.1251100357211333 0.1251100357211333
10 10 0.01764133335640144 0.1251100357211333
11 11 0.001382752728810601 0.11373639611012118
12 12 -6.89413134691794e-05 0.09478033009176766
13 13 9.595669338819968e-06 0.07290794622443666
14 14 1.4396571374678832e-07 0.05207710444602619
15 15 -5.313583114618023e-08 0.03471806963068413
16 16 4.0683489446471356e-09 0.021698793519177577
17 17 2.0030859032126935e-10 0.012763996187751515
18 18 -1.0541774694098002e-11 0.007091108993195286
19 19 -7.394475413970681e-13 0.0037321626279975192
The difference is that np.arange is iterating fixed size np.int64 values while range is iterating python's unbounded int. We can see this by choosing the highest value in your range: 19.
>>> foo=10**np.int64(19)
>>> bar=10**19
>>> type(foo), type(bar)
(<class 'numpy.int64'>, <class 'int'>)
>>> foo, bar
(-8446744073709551616, 10000000000000000000)
The numpy version overflowed and wrapped to a negative number with a smaller absolute value. The python int is not limited to 64 bits and can expand indefinately.
student_id 0 1 2 3 4 5 6 7 8 9 10 11 12
0 131X1319 1 14 6 16 1 10 8 15 15 17 15 18 16
1 13212YX3 1 1 4 8 11 9 14 7 0 3 0 17 13
2 13216131 1 1 13 9 15 17 0 9 3 15 11 8 10
3 132921W6 1 14 10 4 18 7 8 15 15 17 15 18 16
I have a dataframe like this. And I want to make a graph using networkX. And I want to make the edge thicker each time an edge goes from one node to another node. Suppose,
15->15->17->15->18->16
appears twice in the dataframe. So, I want to increase the thickness to two. I made the normal graph but not been able to increase the graph thickness.
This is my code to create the normal graph:
columns=list(pattern_df.columns.values)
pattern_g = nx.empty_graph(0, nx.DiGraph())
for i in range(len(columns)-1):
pattern_g.add_edges_from(zip(pattern_df[columns[i]],
pattern_df[columns[i+1]]))
sum_val=pattern_df.sum(numeric_only=True, axis=0)
values = [sum_val.get(node, 0.25) for node in pattern_g.nodes()]
nx.draw(pattern_g, with_labels=True, font_color='black')
plt.show()
This is the graph I have generated to the sample data:
You've done a poor job of explaining what you're trying to do. Also, it would have been nice if you had provided code that could work with a simple copy and paste.
I suspect that what you have in mind is something like this.
And I want to make the edge thicker each time an edge goes from one node to another node. Suppose that the sequence
15 15 17 15 18 16
appears in two different rows in the dataframe. So, I want to increase the thickness of each edge corresponding to a contiguous pair within that sequence, i.e. 15->15, 15->17, 17->15 and so forth.
Your explanation doesn't say what should happen if the same pair appears multiple times within the same row; I assume that such repetitions should separately count towards the thickness of that edge.
Here is some code that does work if you simply copy and paste it and does my best guess at what you're trying to do (i.e. assumes my interpretation is correct).
from collections import Counter
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
# Reconstruct the dataframe from its inconvenient format
df_str = ''' student_id 0 1 2 3 4 5 6 7 8 9 10 11 12
0 131X1319 1 14 6 16 1 10 8 15 15 17 15 18 16
1 13212YX3 1 1 4 8 11 9 14 7 0 3 0 17 13
2 13216131 1 1 13 9 15 17 0 9 3 15 11 8 10
3 132921W6 1 14 10 4 18 7 8 15 15 17 15 18 16
'''
lines = df_str.splitlines()
cols = lines[0].split()
data = [line.split()[1:] for line in lines[1:]]
pattern_df = pd.DataFrame(data,columns = cols)
# Count appearance of each edge
columns=list(pattern_df.columns.values)
ct = Counter(p for i in range(len(columns)-1)
for p in zip(pattern_df[columns[i]],pattern_df[columns[i+1]]))
# Build associated graph
pattern_g = nx.DiGraph()
pattern_g.add_edges_from(ct)
# Draw graph, using frequency of each pair as edge-width
width = [ct[p] for p in pattern_g.edges]
nx.draw(pattern_g, node_color = 'orange', with_labels=True, width = width)
plt.show()
Here's the result.
Regarding your comment: in order to add the width of an edge as an attribute within the graph pattern_g, you can make the following change to the graph-building section of the script I suggested.
# Build associated graph
pattern_g = nx.DiGraph()
for e,v in ct.items():
pattern_g.add_edge(*e, weight=v)
I must write a program that accepts a number, n, where -6 < n < 2. The program must print out the numbers n to n+41 as 6 rows of 7 numbers. The first row must contain the values n to n+6, the second, the values n+7 to n+7+6, and so on.
That is, numbers are printed using a field width of 2, and are right-justified. Fields are separated by a single space. There are no spaces after the final field.
Output:
Enter the start number: -2
-2 -1 0 1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31 32
33 34 35 36 37 38 39
The numbers need to be directly lined under each other.
I have absolutely no idea how to do this
This is my code so far:
start = int(input('Enter the start number: '))
for n in range(n,n+41):
If you could help me I will really appreciate it.
I assume you are not allowed to use a library to tabulate the numbers for you and are expected to do the logic yourself.
You need to print 6 rows of numbers. Start by determining the first number of each row. That is given by range(n,n+42,7) (note, not n+41). For starting value -2, those are the numbers -2, 5, 12, 19, 26, 33. Every other number in the row is just the next 6 integers. If the first number in the row is leftmost then the entire row is given by range(leftmost, leftmost + 7). So the first row those are the numbers -2, -1, 0, 1, 2, 3, 4.
To print 6 rows of 7 numbers you need a loop with 6 iterations, one for each value of leftmost. Inside that loop you print the other numbers. The only complication is all of the numbers in the list must be followed by a space, except the last. So that has to get special treatment.
You need to specify format {0:2d} to ensure that "numbers are printed using a field width of 2".
n = -2
for leftmost in range(n,n+42,7):
for value in range(leftmost,leftmost + 6):
print("{0:2d}".format(value), end=" ")
print("{0:2d}".format(leftmost+6))
-2 -1 0 1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31 32
33 34 35 36 37 38 39
check the tabulate library here, you can use it to format the output - the tablefmt="plain" parameter produces a very similar table.
If you store the numbers in a list you can use list slicing to get the rows of 7 numbers each and put those in an another list to satisfy the format that tabulate is expecting
from tabulate import tabulate
n = 2
while not -6 < n < 2:
n = int(input('Please submit a number greater than -6 and smaller than 2:\n'))
number_list, output_list = [], []
for i in range(42):
number_list.append(n + i)
for i in range(6):
output_list.append(number_list[i*7:i*7+7])
print()
print(
tabulate(
output_list,
tablefmt='plain'
)
)
Please submit a number greater than -6 and smaller than 2:
-3
-3 -2 -1 0 1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
32 33 34 35 36 37 38
I'm wondering if there is a pythonic way to fill nulls for categorical data by randomly choosing from the distribution of unique values. Basically proportionally / randomly filling categorical nulls based on the existing distribution of the values in the column...
-- below is an example of what I'm already doing
--I'm using numbers as categories to save time, I'm not sure how to randomly input letters
import numpy as np
import pandas as pd
np.random.seed([1])
df = pd.DataFrame(np.random.normal(10, 2, 20).round().astype(object))
df.rename(columns = {0 : 'category'}, inplace = True)
df.loc[::5] = np.nan
print df
category
0 NaN
1 12
2 4
3 9
4 12
5 NaN
6 10
7 12
8 13
9 9
10 NaN
11 9
12 10
13 11
14 9
15 NaN
16 10
17 4
18 9
19 9
This is how I'm currently inputting the values
df.category.value_counts()
9 6
12 3
10 3
4 2
13 1
11 1
df.category.value_counts()/16
9 0.3750
12 0.1875
10 0.1875
4 0.1250
13 0.0625
11 0.0625
# to fill categorical info based on percentage
category_fill = np.random.choice((9, 12, 10, 4, 13, 11), size = 4, p = (.375, .1875, .1875, .1250, .0625, .0625))
df.loc[df.category.isnull(), "category"] = category_fill
Final output works, just takes a while to write
df.category.value_counts()
9 9
12 4
10 3
4 2
13 1
11 1
Is there a faster way to do this or a function that would serve this purpose?
Thanks for any and all help!
You could use stats.rv_discrete:
from scipy import stats
counts = df.category.value_counts()
dist = stats.rv_discrete(values=(counts.index, counts/counts.sum()))
fill_values = dist.rvs(size=df.shape[0] - df.category.count())
df.loc[df.category.isnull(), "category"] = fill_values
EDIT: For general data(not restricted to integers) you can do:
dist = stats.rv_discrete(values=(np.arange(counts.shape[0]),
counts/counts.sum()))
fill_idxs = dist.rvs(size=df.shape[0] - df.category.count())
df.loc[df.category.isnull(), "category"] = counts.iloc[fill_idxs].index.values
I need some very basic help with Python 3.3. I'm trying to get a better understanding of formatting using a for loop and I want to simply print out the odd numbers from 1-20 in two columns.
Here is what I've tried:
for col1 in range(1,10,2):
for col2 in range(11,20,2):
print(col1,'\t',col2)
For some reason my output is very strange. The left column has the odd numbers from 1-10, but each number is listed five times before it goes to the next number
1 11
1 13
1 15
1 17
1 19
3 11
3 13
3 15
3 17
3 19
etc..
What i want is:
1 11
3 13
5 15
7 17
9 19
You should do it using zip:
for i,j in zip(range(1,10,2), range(11,20,2)):
print('{}\t{}'.format(i,j))
[OUTPUT]
1 11
3 13
5 15
7 17
9 19
When you use nested loops, the problem is that you are printing the second column for each number in the first column, which is not what you want. Instead, you want to iterate through them simultaneously. That is where zip comes in handy.
You do not need a second for-loop or zip here. Instead, all you need is this:
>>> for n in range(1, 10, 2):
... print(n, '\t', n + 10)
...
1 11
3 13
5 15
7 17
9 19
>>>
It works because the numbers in the second column are simply those in the first plus 10.