Trouble preparing data for contourf plot [duplicate] - python
This question already has an answer here:
Plotting Isolines/contours in matplotlib from (x, y, z) data set
(1 answer)
Closed 4 years ago.
I would like to plot some data for a contourf plot in matplot lib, I would like do to do something like the following:
x = np.arange(0, 10, 0.5)
y = np.arange(0, 10, 0.5)
z = x**2 - y
fig, ax = plt.subplots()
cs = ax.contourf(x, y, z)
But this sends the following error:
TypeError: Input z must be a 2D array.
Can someone recommend a way for me to massage my data to make contourf happy. Ideally if someone who also explain why the format of my data doesn't work that would be greatly helpful as well.
Note: The actual data I'm using is read from a data file, it is the same as above, except replace x, y, z with c_a, a, Energy respectively.
c_a
0 1.60
1 1.61
2 1.62
3 1.63
4 1.64
5 1.65
6 1.66
7 1.67
8 1.68
9 1.69
10 1.70
11 1.60
12 1.61
13 1.62
14 1.63
15 1.64
16 1.65
17 1.66
18 1.67
19 1.68
20 1.69
21 1.70
22 1.60
23 1.61
24 1.62
25 1.63
26 1.64
27 1.65
28 1.66
29 1.67
...
91 1.63
92 1.64
93 1.65
94 1.66
95 1.67
96 1.68
97 1.69
98 1.70
99 1.60
100 1.61
101 1.62
102 1.63
103 1.64
104 1.65
105 1.66
106 1.67
107 1.68
108 1.69
109 1.70
110 1.60
111 1.61
112 1.62
113 1.63
114 1.64
115 1.65
116 1.66
117 1.67
118 1.68
119 1.69
120 1.70
Name: c_a, Length: 121, dtype: float64
a
0 6.00
1 6.00
2 6.00
3 6.00
4 6.00
5 6.00
6 6.00
7 6.00
8 6.00
9 6.00
10 6.00
11 6.01
12 6.01
13 6.01
14 6.01
15 6.01
16 6.01
17 6.01
18 6.01
19 6.01
20 6.01
21 6.01
22 6.02
23 6.02
24 6.02
25 6.02
26 6.02
27 6.02
28 6.02
29 6.02
...
91 6.08
92 6.08
93 6.08
94 6.08
95 6.08
96 6.08
97 6.08
98 6.08
99 6.09
100 6.09
101 6.09
102 6.09
103 6.09
104 6.09
105 6.09
106 6.09
107 6.09
108 6.09
109 6.09
110 6.10
111 6.10
112 6.10
113 6.10
114 6.10
115 6.10
116 6.10
117 6.10
118 6.10
119 6.10
120 6.10
Name: a, Length: 121, dtype: float64
Energy
0 -250.647503
1 -250.647661
2 -250.647758
3 -250.647791
4 -250.647762
5 -250.647668
6 -250.647511
7 -250.647297
8 -250.647031
9 -250.646721
10 -250.646378
11 -250.647624
12 -250.647758
13 -250.647831
14 -250.647841
15 -250.647788
16 -250.647671
17 -250.647493
18 -250.647258
19 -250.646972
20 -250.646644
21 -250.646282
22 -250.647726
23 -250.647835
24 -250.647884
25 -250.647871
26 -250.647794
27 -250.647655
28 -250.647456
29 -250.647200
...
91 -250.647657
92 -250.647449
93 -250.647182
94 -250.646860
95 -250.646488
96 -250.646071
97 -250.645620
98 -250.645140
99 -250.647896
100 -250.647841
101 -250.647729
102 -250.647559
103 -250.647330
104 -250.647043
105 -250.646702
106 -250.646310
107 -250.645876
108 -250.645407
109 -250.644912
110 -250.647847
111 -250.647769
112 -250.647635
113 -250.647444
114 -250.647193
115 -250.646887
116 -250.646526
117 -250.646116
118 -250.645665
119 -250.645180
120 -250.644669
Name: Energy, Length: 121, dtype: float64
x and y also need to be 2d (see meshgrid):
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 10, 0.5)
y = np.arange(0, 10, 0.5)
X, Y = np.meshgrid(x, y)
Z = X**2 - Y
fig, ax = plt.subplots()
cs = ax.contourf(X, Y, Z)
plt.show()
Output:
Edit:
If your input is in a one dimensional array you need to know how to reshape it. Maybe its sqrt(length)=11 in your case? (Assuming your domain is square)
import numpy as np
import matplotlib.pyplot as plt
c_a = np.array([np.linspace(1.6, 1.7, num=11) for _ in range(11)]).flatten()
a = np.array([[1.6+i*0.1 for j in range(11)] for i in range(11)]).flatten()
energy = np.array([-250.644-np.random.random()*0.003 for i in range(121)])
x = c_a.reshape((11, 11))
y = a.reshape((11, 11))
z = energy.reshape((11, 11))
fig, ax = plt.subplots()
cs = ax.contourf(x, y, z)
plt.show()
Output:
Related
Table is not displayed with python requests
There's a website https://www.hockey-reference.com//leagues/NHL_2022.html I need to get table in div with id=div_stats from bs4 import BeautifulSoup url = 'https://www.hockey-reference.com/leagues/NHL_2022.html' r = requests.get(url=url) soup = BeautifulSoup(r.text, 'html.parser') table = soup.find('div', id='div_stats') print(table) #None Response is 200, but there's no such div in BeautifulSoup object. If I open the page using selenium or manually - it gets loaded properly. import requests from bs4 import BeautifulSoup from selenium import webdriver from selenium.webdriver.common.by import By from time import sleep url = 'https://www.hockey-reference.com/leagues/NHL_2022.html' with webdriver.Chrome() as browser: browser.get(url) #sleep(1) html = browser.page_source #r = requests.get(url=url, stream=True) soup = BeautifulSoup(html, 'html.parser') table = soup.find_all('div', id='div_stats') However, while using webdriver it may load page for quite a long time (even if I see the whole page, it's still loading browser.get(url), and the code couldn't continue). Is there any solution that can help avoiding selenium / stop the loading when the table is in the HTML? I tried: stream and timeout in requests.get(), for season in seasons: browser.get(url) wait = WebDriverWait(browser, 5) wait.until(EC.visibility_of_element_located((By.ID, 'div_stats'))) html = browser.execute_script('return document.documentElement.outerHTML') Nothing of that worked.
This is one way to get that table as a dataframe: import pandas as pd import requests from bs4 import BeautifulSoup as bs headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" } url= 'https://www.hockey-reference.com//leagues/NHL_2022.html' response = requests.get(url).text.replace('<!--', '').replace('-->', '') soup = bs(response, 'html.parser') table_w_data = soup.select_one('table#stats') df = pd.read_html(str(table_w_data), header=1)[0] print(df) Result in terminal: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Unnamed: 6_level_0 Unnamed: 7_level_0 Unnamed: 8_level_0 Unnamed: 9_level_0 ... Special Teams Shot Data Unnamed: 31_level_0 Rk Unnamed: 1_level_1 AvAge GP W L OL PTS PTS% GF ... PK% SH SHA PIM/G oPIM/G S S% SA SV% SO 0 1.0 Florida Panthers* 27.8 82 58 18 6 122 0.744 337 ... 79.54 12 8 10.1 10.8 3062 11.0 2515 0.904 5 1 2.0 Colorado Avalanche* 28.2 82 56 19 7 119 0.726 308 ... 79.66 6 5 9.0 10.4 2874 10.7 2625 0.912 7 2 3.0 Carolina Hurricanes* 28.3 82 54 20 8 116 0.707 277 ... 88.04 4 3 9.2 7.7 2798 9.9 2310 0.913 6 3 4.0 Toronto Maple Leafs* 28.4 82 54 21 7 115 0.701 312 ... 82.05 13 4 8.6 8.5 2835 11.0 2511 0.900 7 4 5.0 Minnesota Wild* 29.4 82 53 22 7 113 0.689 305 ... 76.14 2 5 10.8 10.8 2666 11.4 2577 0.903 3 5 6.0 Calgary Flames* 28.8 82 50 21 11 111 0.677 291 ... 83.20 7 3 9.1 8.6 2908 10.0 2374 0.913 11 6 7.0 Tampa Bay Lightning* 29.6 82 51 23 8 110 0.671 285 ... 80.56 7 5 11.0 11.4 2535 11.2 2441 0.907 3 7 8.0 New York Rangers* 26.7 82 52 24 6 110 0.671 250 ... 82.30 8 2 8.2 8.2 2392 10.5 2528 0.919 9 8 9.0 St. Louis Blues* 28.8 82 49 22 11 109 0.665 309 ... 84.09 9 5 7.5 7.9 2492 12.4 2591 0.908 4 9 10.0 Boston Bruins* 28.5 82 51 26 5 107 0.652 253 ... 81.30 5 6 9.9 9.4 2962 8.5 2354 0.907 4 10 11.0 Edmonton Oilers* 29.1 82 49 27 6 104 0.634 285 ... 79.37 11 6 8.1 7.1 2790 10.2 2647 0.905 4 11 12.0 Pittsburgh Penguins* 29.7 82 46 25 11 103 0.628 269 ... 84.43 3 8 6.9 8.4 2849 9.4 2576 0.914 7 12 13.0 Washington Capitals* 29.5 82 44 26 12 100 0.610 270 ... 80.44 8 9 7.7 8.8 2577 10.5 2378 0.898 8 13 14.0 Los Angeles Kings* 28.0 82 44 27 11 99 0.604 235 ... 76.65 11 9 7.7 8.3 2865 8.2 2341 0.901 5 14 15.0 Dallas Stars* 29.4 82 46 30 6 98 0.598 233 ... 79.00 7 5 6.7 7.5 2486 9.4 2545 0.904 2 15 16.0 Nashville Predators* 27.7 82 45 30 7 97 0.591 262 ... 79.23 2 5 12.6 11.9 2439 10.7 2646 0.906 4 16 17.0 Vegas Golden Knights 28.5 82 43 31 8 94 0.573 262 ... 77.40 10 7 7.6 7.7 2830 9.3 2458 0.901 3 17 18.0 Vancouver Canucks 27.7 82 40 30 12 92 0.561 246 ... 74.89 5 6 8.0 8.6 2622 9.4 2612 0.912 1 18 19.0 Winnipeg Jets 28.2 82 39 32 11 89 0.543 250 ... 75.00 9 8 8.8 9.5 2645 9.5 2721 0.907 5 19 20.0 New York Islanders 30.1 82 37 35 10 84 0.512 229 ... 84.19 5 7 8.9 8.4 2367 9.7 2669 0.913 9 20 21.0 Columbus Blue Jackets 26.6 82 37 38 7 81 0.494 258 ... 78.57 7 6 7.7 7.2 2463 10.5 2887 0.897 2 21 22.0 San Jose Sharks 29.0 82 32 37 13 77 0.470 211 ... 85.20 4 11 8.8 8.6 2400 8.8 2622 0.900 3 22 23.0 Anaheim Ducks 27.9 82 31 37 14 76 0.463 228 ... 80.80 6 4 9.3 9.8 2393 9.5 2725 0.902 4 23 24.0 Buffalo Sabres 27.5 82 32 39 11 75 0.457 229 ... 76.42 6 6 8.1 7.9 2451 9.3 2702 0.894 1 24 25.0 Detroit Red Wings 26.9 82 32 40 10 74 0.451 227 ... 73.78 4 10 8.9 8.5 2414 9.4 2761 0.888 4 25 26.0 Ottawa Senators 26.6 82 33 42 7 73 0.445 224 ... 80.32 9 4 10.0 10.2 2463 9.1 2740 0.904 2 26 27.0 Chicago Blackhawks 28.0 82 28 42 12 68 0.415 213 ... 76.23 2 6 7.9 8.7 2362 9.0 2703 0.893 4 27 28.0 New Jersey Devils 25.8 82 27 46 9 63 0.384 245 ... 80.19 6 14 8.1 8.4 2562 9.6 2540 0.881 2 28 29.0 Philadelphia Flyers 28.3 82 25 46 11 61 0.372 210 ... 75.74 6 11 9.0 9.0 2539 8.3 2785 0.894 1 29 30.0 Seattle Kraken 28.7 82 27 49 6 60 0.366 213 ... 74.89 8 7 8.5 8.0 2380 8.9 2367 0.880 3 30 31.0 Arizona Coyotes 28.0 82 25 50 7 57 0.348 206 ... 75.00 3 4 10.2 8.2 2121 9.7 2910 0.894 1 31 32.0 Montreal Canadiens 27.8 82 22 49 11 55 0.335 218 ... 75.55 6 12 10.2 9.0 2442 8.9 2823 0.888 3 32 NaN League Average 28.2 82 41 32 9 91 0.555 255 ... 79.39 7 7 8.9 8.9 2593 9.8 2593 0.902 4 33 rows × 32 columns Expect to do a little cleanup of that data, once you get it. Relevant documentation for pandas: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html And for requests: https://requests.readthedocs.io/en/latest/ And for BeautifulSoup: https://beautiful-soup-4.readthedocs.io/en/latest/index.html
How can I create group column
I want to divide data each 2unit raw using pandas for example df_A: raw data data1 data2 data3 23 13.3 983 13 33.4 124 24 62.3 574 25 78.5 554 63 93.3 982 29 43.3 123 53 62.6 364 83 74.3 453 21 83.0 165 93 23.4 433 df_B :result data group data1 data2 data3 0 23 13.3 983 0 13 33.4 124 1 24 62.3 574 1 25 78.5 554 2 63 93.3 982 2 29 43.3 123 3 53 62.6 364 3 83 74.3 453 4 21 83.0 165 4 93 23.4 433 thank you
Try: df["group"] = df.index // 2 Or: df["group"] = np.arange(len(df)) // 2 This creates "group" column: data1 data2 data3 group 0 23 13.3 983 0 1 13 33.4 124 0 2 24 62.3 574 1 3 25 78.5 554 1 4 63 93.3 982 2 5 29 43.3 123 2 6 53 62.6 364 3 7 83 74.3 453 3 8 21 83.0 165 4 9 93 23.4 433 4
Issue with connecting points for 3d line plot in plotly
I have a dataframe of the following format: x y z aa_num frame_num cluster 1 1.86 3.11 8.62 1 1 1 2 1.77 3.32 8.31 2 1 1 3 1.59 3.17 8.00 3 1 1 4 1.67 3.49 7.81 4 1 1 5 2.04 3.59 7.81 5 1 1 6 2.20 3.34 7.57 6 1 1 7 2.09 3.19 7.25 7 1 1 8 2.13 3.30 6.89 8 1 1 9 2.17 3.63 6.70 9 1 1 10 2.22 3.63 6.33 10 1 1 11 2.06 3.83 6.04 11 1 1 12 2.31 3.75 5.76 12 1 1 13 2.15 3.45 5.59 13 1 1 14 2.21 3.28 5.26 14 1 1 15 2.00 3.13 4.98 15 1 1 16 2.13 2.86 4.74 16 1 1 17 1.97 2.78 4.41 17 1 1 18 2.20 2.76 4.10 18 1 1 19 2.43 2.46 4.14 19 1 1 20 2.34 2.23 3.85 20 1 1 21 2.61 2.16 3.59 21 1 1 22 2.42 1.92 3.36 22 1 1 23 2.44 1.95 2.98 23 1 1 24 2.26 1.62 2.94 24 1 1 25 2.19 1.35 3.20 25 1 1 26 1.92 1.11 3.08 26 1 1 27 1.93 0.83 3.33 27 1 1 28 1.83 0.72 3.68 28 1 1 29 1.95 0.47 3.95 29 1 1 30 1.84 0.36 4.29 30 1 1 31 0.56 3.93 7.07 1 2 1 32 0.66 3.84 7.42 2 2 1 33 0.87 3.54 7.49 3 2 1 34 0.84 3.19 7.33 4 2 1 35 0.76 3.32 6.98 5 2 1 36 0.88 3.23 6.63 6 2 1 37 1.10 3.46 6.43 7 2 1 38 1.35 3.49 6.15 8 2 1 39 1.72 3.50 6.23 9 2 1 40 1.88 3.67 5.93 10 2 1 41 2.25 3.72 5.97 11 2 1 42 2.43 3.48 5.74 12 2 1 43 2.23 3.35 5.44 13 2 1 44 2.23 3.38 5.06 14 2 1 45 2.01 3.38 4.76 15 2 1 46 2.02 3.44 4.38 16 2 1 47 1.98 3.10 4.20 17 2 1 48 2.05 3.13 3.83 18 2 1 49 2.28 2.85 3.72 19 2 1 50 2.09 2.56 3.58 20 2 1 51 2.21 2.37 3.27 21 2 1 52 2.06 2.04 3.15 22 2 1 53 1.93 2.01 2.80 23 2 1 54 1.86 1.64 2.83 24 2 1 55 1.95 1.38 3.10 25 2 1 56 1.78 1.04 3.04 26 2 1 57 1.90 0.84 3.34 27 2 1 58 1.83 0.74 3.70 28 2 1 59 1.95 0.48 3.95 29 2 1 60 1.84 0.36 4.29 30 2 1 etc.. I'm trying to create a 3d line plot of this data, where a line consisting of 30 <x,y,z> points will be plotted for each frame_num and the points would be connected in the order of aa_num. The code to do this is as follows: fig = plot_ly(output_cl, x = ~x, y = ~y, z = ~z, type = 'scatter3d', mode = 'lines+markers', opacity = 1, line = list(width = 1, color = ~frame_num, colorscale = 'Viridis'), marker = list(size = 2, color = ~frame_num, colorscale = 'Viridis')) When I plot a single frame, it works fine: However, a strange issue arises when I try to plot multiple instances. For some reason, when I try to plot frame 1 and 2, point 1 and 30 connect to each other for frame 2. However, this doesn't happen for frame 1. Any ideas why? Is there someway to specify the ordering of points in 3d in plotly?
If you want to create two different traces based on column frame_num you need to pass it as a categorial variable by using factor. As an alternative you can use name = ~ frame_num or split = ~ frame_num to create multiple traces. library(plotly) output_cl <- data.frame( x = c(1.86,1.77,1.59,1.67,2.04,2.2,2.09, 2.13,2.17,2.22,2.06,2.31,2.15,2.21,2,2.13,1.97,2.2, 2.43,2.34,2.61,2.42,2.44,2.26,2.19,1.92,1.93,1.83,1.95, 1.84,0.56,0.66,0.87,0.84,0.76,0.88,1.1,1.35,1.72,1.88, 2.25,2.43,2.23,2.23,2.01,2.02,1.98,2.05,2.28,2.09, 2.21,2.06,1.93,1.86,1.95,1.78,1.9,1.83,1.95,1.84), y = c(3.11,3.32,3.17,3.49,3.59,3.34,3.19, 3.3,3.63,3.63,3.83,3.75,3.45,3.28,3.13,2.86,2.78,2.76, 2.46,2.23,2.16,1.92,1.95,1.62,1.35,1.11,0.83,0.72, 0.47,0.36,3.93,3.84,3.54,3.19,3.32,3.23,3.46,3.49,3.5, 3.67,3.72,3.48,3.35,3.38,3.38,3.44,3.1,3.13,2.85,2.56, 2.37,2.04,2.01,1.64,1.38,1.04,0.84,0.74,0.48,0.36), z = c(8.62,8.31,8,7.81,7.81,7.57,7.25,6.89, 6.7,6.33,6.04,5.76,5.59,5.26,4.98,4.74,4.41,4.1, 4.14,3.85,3.59,3.36,2.98,2.94,3.2,3.08,3.33,3.68,3.95, 4.29,7.07,7.42,7.49,7.33,6.98,6.63,6.43,6.15,6.23,5.93, 5.97,5.74,5.44,5.06,4.76,4.38,4.2,3.83,3.72,3.58, 3.27,3.15,2.8,2.83,3.1,3.04,3.34,3.7,3.95,4.29), aa_num = c(1L,2L,3L,4L,5L,6L,7L,8L,9L,10L, 11L,12L,13L,14L,15L,16L,17L,18L,19L,20L,21L,22L,23L, 24L,25L,26L,27L,28L,29L,30L,1L,2L,3L,4L,5L,6L,7L, 8L,9L,10L,11L,12L,13L,14L,15L,16L,17L,18L,19L,20L, 21L,22L,23L,24L,25L,26L,27L,28L,29L,30L), frame_num = c(1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L, 2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L, 2L,2L), cluster = c(1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L) ) fig = plot_ly(output_cl, x = ~x, y = ~y, z = ~z, type = 'scatter3d', mode = 'lines+markers', color = ~factor(frame_num), # or name = ~ frame_num or split = ~ frame_num colors ="Set2", opacity = 1, line = list(width = 1), marker = list(size = 2)) fig
can we use clustering without target variables?
This is Sample data .. Inn R B W Eco AVG SR 111 368 432 30 5.11 12.27 14.4 94 359 444 24 4.85 14.96 18.5 47 187 202 13 5.55 14.38 15.54 59 273 279 16 5.87 17.06 17.44 34 132 140 9 5.66 14.67 15.56 135 437 536 33 4.89 13.24 16.24 1 0 1 1 0 0 1 Now I would like to Make a new column which is Choice with values as Good, Bad, Moderate Bowling option for each row. How can i achieve it?
Create histogram from panda frame
I am trying to created bar histogram that will show the mean of subjects by groups my data looks like this - week 8 exp Subject Group 1 2 3 Mean 0 255 WT 0 101.8 75.6 84.1 87.166667 1 157 HD 0 92.6 87.8 82.3 87.566667 2 418 WT 0 54.5 47.0 50.8 50.766667 3 300 WT 0 48.1 73.1 72.2 64.466667 4 299 HD 0 71.8 86.0 93.4 83.733333 5 258 WT 0 88.0 98.5 50.2 78.900000 6 173 WT 0 75.4 70.5 83.9 76.600000 7 273 HD 0 103.6 94.2 108.3 102.033333 8 175 WT 0 36.7 30.7 42.2 36.533333 9 172 HD 0 82.6 91.6 73.4 82.533333 10 263 WT 0 110.7 102.4 105.5 106.200000 11 304 1 90.4 90.1 103.4 94.633333 12 305 1 128.6 141.5 123.1 131.066667 13 306 1 52.0 45.6 57.2 51.600000 14 309 0.1 41.3 52.6 79.9 57.933333 15 317 0.1 86.2 95.8 77.1 86.366667 My code is - frame_data = pd.read_csv('final results.csv', header=[0,1]) data_avg = df.iloc[:, -3:].mean(axis=1) frame_data[('exp', 'Mean')] = frame_data.iloc[:, -3:].mean(axis=1) grouped_by_group = frame_data.groupby(['Group', 'Mean']).size().unstack('Mean') grouped_by_group.plot.bar(title='Grip') I am getting an error KeyError: 'Group' i checked many times and it is the way it is written... I do not know what is wrong...
I think need reshape DataFrame by melt, aggregate mean and then then Series.plot: frame_data = pd.read_csv('final results.csv', header=[0,1]) frame_data[('exp', 'Mean')] = frame_data.iloc[:, -3:].mean(axis=1) #flatten MultiIndex to columns frame_data.columns = frame_data.columns.map('_'.join) grouped_by_group = frame_data.groupby('8_Group')['exp_Mean'].mean() print (grouped_by_group) 8_Group 0.1 72.150000 1 92.433333 HD 0 88.966667 WT 0 71.519048 Name: value, dtype: float64 grouped_by_group.plot.bar(title='Grip')