Reading XML (NIST .n42 file) and data extraction

Reading XML (NIST .n42 file) and data extraction - python

I have a xml file that I need to extract data from 'channelData' in the below xml.
from xml.dom import minidom
xmldoc = minidom.parse('Annex_B_n42.xml')
itemlist = xmldoc.getElementsByTagName('ChannelData')
print(len(itemlist))
print(itemlist[0].attributes['compressionCode'].value)
for s in itemlist:
print(s.attributes['compressionCode'].value)
Which doesn't return the data, just the value 'None'.
I also tried another approach from an another example:
import xml.etree.ElementTree as ET
root = ET.parse('Annex_B_n42.xml').getroot()
#value=[]
for type_tag in root.findall('Spectrum'):
value = type_tag.get('id')
print(value)
print("data from file " +str(value))
This did not work at all and value is not being populated. I really don't understand how to parse xml.
Here is the xml file
<?xml version="1.0"?>
<?xml-model href="http://physics.nist.gov/N42/2011/N42/schematron/n42.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<RadInstrumentData xmlns="http://physics.nist.gov/N42/2011/N42" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://physics.nist.gov/N42/2011/N42 file:///d:/Data%20Files/ANSI%20N42%2042/V2/Schema/n42.xsd" n42DocUUID="d72b7fa7-4a20-43d4-b1b2-7e3b8c6620c1">
<RadInstrumentInformation id="RadInstrumentInformation-1">
<RadInstrumentManufacturerName>RIIDs R Us</RadInstrumentManufacturerName>
<RadInstrumentModelName>iRIID</RadInstrumentModelName>
<RadInstrumentClassCode>Radionuclide Identifier</RadInstrumentClassCode>
<RadInstrumentVersion>
<RadInstrumentComponentName>Software</RadInstrumentComponentName>
<RadInstrumentComponentVersion>1.1</RadInstrumentComponentVersion>
</RadInstrumentVersion>
</RadInstrumentInformation>
<RadDetectorInformation id="RadDetectorInformation-1">
<RadDetectorCategoryCode>Gamma</RadDetectorCategoryCode>
<RadDetectorKindCode>NaI</RadDetectorKindCode>
</RadDetectorInformation>
<EnergyCalibration id="EnergyCalibration-1">
<CoefficientValues>-21.8 12.1 6.55e-03</CoefficientValues>
</EnergyCalibration>
<RadMeasurement id="RadMeasurement-1">
<MeasurementClassCode>Foreground</MeasurementClassCode>
<StartDateTime>2003-11-22T23:45:19-07:00</StartDateTime>
<RealTimeDuration>PT60S</RealTimeDuration>
<Spectrum id="RadMeasurement-1Spectrum-1" radDetectorInformationReference="RadDetectorInformation-1" energyCalibrationReference="EnergyCalibration-1">
<LiveTimeDuration>PT59.61S</LiveTimeDuration>
<ChannelData compressionCode="None">
0 0 0 22 421 847 1295 1982 2127 2222 2302 2276
2234 1921 1939 1715 1586 1469 1296 1178 1127 1047 928 760
679 641 542 529 443 423 397 393 322 272 294 227
216 224 208 191 189 163 167 173 150 137 136 129
150 142 160 159 140 103 90 82 83 85 67 76
73 84 63 74 70 69 76 61 49 61 63 65
58 62 48 75 56 61 46 56 43 37 55 47
50 40 38 54 43 41 45 51 32 35 29 33
40 44 33 35 20 26 27 17 19 20 16 19
18 19 18 20 17 45 55 70 62 59 32 30
21 23 10 9 5 13 11 11 6 7 7 9
11 4 8 8 14 14 11 9 13 5 5 6
10 9 3 4 3 7 5 5 4 5 3 6
5 0 5 6 3 1 4 4 3 10 11 4
1 4 2 11 9 6 3 5 5 1 4 2
6 6 2 3 0 2 2 2 2 0 1 3
1 1 2 3 2 4 5 2 6 4 1 0
3 1 2 1 1 0 1 0 0 2 0 1
0 0 0 1 0 0 0 0 0 0 0 2
0 0 0 1 0 1 0 0 2 1 0 0
0 0 1 3 0 0 0 1 0 1 0 0
0 0 0 0
</ChannelData>
</Spectrum>
</RadMeasurement>
</RadInstrumentData>

You can use BeautifulSoup to get the channeldata tag value like following
from bs4 import BeautifulSoup
with open('Annex_B_n42.xml') as f:
xml = f.read()
bs_obj = BeautifulSoup(xml)
print(bs_obj.find_all("channeldata")[0].text)
That will print you
' 0 0 0 22 421 847 1295 1982 2127 2222 2302 2276 2234 1921 1939 1715 1586 1469 1296 1178 1127 1047 928 760 679 641 542 529 443 423 397 393 322 272 294 227 216 224 208 191 189 163 167 173 150 137 136 129 150 142 160 159 140 103 90 82 83 85 67 76 73 84 63 74 70 69 76 61 49 61 63 65 58 62 48 75 56 61 46 56 43 37 55 47 50 40 38 54 43 41 45 51 32 35 29 33 40 44 33 35 20 26 27 17 19 20 16 19
18 19 18 20 17 45 55 70 62 59 32 30 21 23 10 9 5 13 11 11 6 7 7 9 11 4 8 8 14 14 11 9 13 5 5 6 10 9 3 4 3 7 5 5 4 5 3 6 5 0 5 6 3 1 4 4 3 10 11 4 1 4 2 11 9 6 3 5 5 1 4 2 6 6 2 3 0 2 2 2 2 0 1 3 1 1 2 3 2 4 5 2 6 4 1 0 3 1 2 1 1 0 1 0 0 2 0 1 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 1 0 1 0 0 2 1 0 0 0 0 1 3 0 0 0 1 0 1 0 0 0 0 0 0 '

Try this:
import xml.etree.ElementTree as ET
root = ET.parse('Annex_B_n42.xml').getroot()
elems = root.findall(".//*[#compressionCode='None']")
print(elems[0].text)

Related

Cannot read PDF Data into Sheets with Gspread-DataFrame

I want to read data from a PDF I downloaded using Tabula into Google Sheets, and when I transfer the data as it was read into Google Sheets, I get an error. I know the data I downloaded is dirty, but I wanted to clean it up in Google Sheets.
Downloading Data from Pdf Portion of Full Portion of Code
import tabula
import pandas as pd
file_path = 'TnPresidentbyCountyNov2016.pdf'
df = tabula.read_pdf(file_path, pages='all', multiple_tables='FALSE', stream='TRUE')
print (df)
[ Anderson 19,212 9,013 74 1,034 42 174 189 28 0 0.1
0 Bedford 11,486 3,395 25 306 8 47 75 5 0 0
1 Benton 4,716 1,474 12 83 13 11 14 2 0 0
2 Bledsoe 3,622 897 7 95 4 9 18 2 0 0
3 Blount 37,443 12,100 83 1,666 72 250 313 51 1 1
4 Bradley 29,768 7,070 66 1,098 44 143 210 29 1 1
5 Campbell 9,870 2,248 32 251 25 43 45 5 0 0
6 Cannon 4,007 1,127 8 106 7 18 29 3 0 0
7 Carroll 7,756 2,327 22 181 20 18 39 2 0 0
8 Carter 16,898 3,453 30 409 20 54 130 26 0 0
9 Cheatham 11,297 3,878 26 463 13 50 99 8 0 0
10 Chester 5,081 1,243 5 115 4 12 10 4 0 0
11 Claiborne 8,602 1,832 16 192 24 27 29 2 0 0
12 Clay 2,141 707 2 47 2 10 11 0 0 0
13 Cocke 9,791 1,981 21 211 19 27 59 2 0 2
14 Coffee 14,417 4,743 32 517 23 62 113 9 0 1
15 Crockett 3,982 1,303 7 76 3 8 13 1 0 0
16 Cumberland 20,413 5,202 37 471 26 53 99 17 0 1
17 Davidson 84,550 148,864 412 9,603 304 619 2,459 106 0 6
18 Decatur 3,588 894 5 70 4 8 16 2 0 0
19 DeKalb 5,171 1,569 10 117 6 29 49 0 0 0
20 Dickson 13,233 4,722 32 489 18 58 94 9 0 3
21 Dyer 10,180 2,816 19 193 13 27 48 3 0 0
22 Fayette 13,055 5,874 19 261 16 37 62 21 0 0
23 Fentress 6,038 1,100 10 107 14 11 37 1 0 0
24 Franklin 11,532 4,374 28 319 16 36 66 7 0 0
25 Gibson 13,786 5,258 26 305 18 36 66 8 0 0
26 Giles 7,970 2,917 16 162 11 11 41 1 0 0
27 Grainger 6,626 1,154 17 130 12 28 26 4 0 0
28 Greene 18,562 4,216 28 481 29 56 152 14 0 0
29 Grundy 3,636 999 11 80 3 13 19 0 0 0
30 Hamblen 15,857 4,075 30 443 27 73 93 8 0 0
31 Hamilton 78,733 55,316 147 5,443 138 349 1,098 121 0 0
32 Hancock 1,843 322 4 42 1 5 13 0 0 0
33 Hardeman 4,919 4,185 18 84 11 13 30 9 0 0
34 Hardin 8,012 1,622 15 134 22 48 96 0 0 0
35 Hawkins 16,648 3,507 31 397 12 52 91 7 0 3
36 Haywood 3,013 3,711 11 60 10 10 19 0 0 0
37 Henderson 8,138 1,800 13 172 9 27 39 1 0 0
38 Henry 9,508 3,063 18 223 15 27 60 4 0 0
39 Hickman 5,695 1,824 20 161 19 15 39 18 0 0
40 Houston 2,182 866 9 88 4 7 12 0 0 0
41 Humphreys 4,930 1,967 17 166 12 23 26 5 0 0
42 Jackson 3,236 1,129 2 62 1 7 17 1 0 0
43 Jefferson 14,776 3,494 34 497 22 76 115 8 0 1
44 Johnson 5,410 988 11 102 7 9 39 6 0 0
45 Knox 105,767 62,878 382 7,458 227 986 1,634 122 0 9
46 Lake 1,357 577 5 18 1 6 6 0 0 0, Lauderdale 4,884 3,056 14 87 13 10 14.1 \
0 Lawrence 12,420 2,821 21 271 13 36 77
1 Lewis 3,585 890 14 59 8 9 42
2 Lincoln 10,398 2,554 19 231 13 39 46
3 Loudon 17,610 4,919 41 573 22 77 87
Just a sample of the data I pulled. Again, not what I completely envisioned, but as a beginner coder, I wanted to clean it up in Sheets
HERE is an image of the PDF I was downloading data from.
Here is the link to download the PDF I am downloading data from
Now I want to import gspread and gpsread_dataframe to upload into a Google Sheet tab and here is where I am having problems.
EDIT: Whereas neither section included all of my coding, now the top and bottom portions include all of my coding done so far.
from oauth2client.service_account import ServiceAccountCredentials
import json
import gspread
SHEET_ID = '18xad0TbNGMPh8gUSIsEr6wNsFzcpKGbyUIQ-A4GQ1bo'
SHEET_NAME = '2016'
gc = gspread.service_account('waynetennesseedems.json')
spreadsheet = gc.open_by_key(SHEET_ID)
worksheet = spreadsheet.worksheet(SHEET_NAME)
from gspread_dataframe import set_with_dataframe
set_with_dataframe(worksheet, df, include_column_header='False')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/var/folders/zc/x2w76_4121g3gzfxybkz2q480000gn/T/ipykernel_44678/2784595029.py in <module>
----> 1 set_with_dataframe(worksheet, df, include_column_header='False')
/opt/anaconda3/lib/python3.9/site-packages/gspread_dataframe.py in set_with_dataframe(worksheet, dataframe, row, col, include_index, include_column_header, resize, allow_formulas, string_escaping)
260 # If header-related params are True, the values are adjusted
261 # to allow space for the headers.
--> 262 y, x = dataframe.shape
263 index_col_size = 0
264 column_header_size = 0
AttributeError: 'list' object has no attribute 'shape'
Does it have to do with how my Data was pulled from my PDF?

It seems that df is a list, first be sure to have downloaded the tabula-py module, secondly try to pass the parameter output_format='dataframe' to the tabula.read_pdf() function, like so:
import pandas as pd
import json
import gspread
from tabula.io import read_pdf
from oauth2client.service_account import ServiceAccountCredentials
from gspread_dataframe import set_with_dataframe
file_path = 'TnPresidentbyCountyNov2016.pdf'
df = read_pdf(file_path, output_format='dataframe', pages='all', multiple_tables='FALSE', stream='TRUE')
# print (df)
SHEET_ID = '18xad0TbNGMPh8gUSIsEr6wNsFzcpKGbyUIQ-A4GQ1bo'
SHEET_NAME = '2016'
gc = gspread.service_account('waynetennesseedems.json')
spreadsheet = gc.open_by_key(SHEET_ID)
worksheet = spreadsheet.worksheet(SHEET_NAME)
set_with_dataframe(worksheet, df, include_column_header='False')
Moreover I suggest you to take a look at the PEP8 style guide, to have a better idea on how to write a well formatted script.

How to plot multiple chart on one figure and combine with another?

# Create an axes object
axes = plt.gca()
# pass the axes object to plot function
df.plot(kind='line', x='鄉鎮別', y='男', ax=axes,figsize=(10,8));
df.plot(kind='line', x='鄉鎮別', y='女', ax=axes,figsize=(10,8));
df.plot(kind='line', x='鄉鎮別', y='合計(男+女)', ax=axes,figsize=(10,8),title='hihii',
xlabel='鄉鎮別',ylabel='人數')
It's my data.
鄉鎮別 鄰數 戶數 男 女 合計(男+女) 遷入 遷出 出生 死亡 結婚 離婚
0 苗栗市 715 32517 42956 43362 86318 212 458 33 65 28 13
1 苑裡鎮 362 15204 22979 21040 44019 118 154 17 24 9 7
2 通霄鎮 394 11557 17034 15178 32212 73 113 5 33 3 3
3 竹南鎮 518 32061 44069 43275 87344 410 392 31 59 35 11
4 頭份市 567 38231 52858 52089 104947 363 404 39 69 31 19
5 後龍鎮 367 12147 18244 16274 34518 93 144 12 41 2 7
6 卓蘭鎮 176 5861 8206 7504 15710 29 51 1 11 2 0
7 大湖鄉 180 5206 7142 6238 13380 31 59 5 21 3 2
8 公館鄉 281 10842 16486 15159 31645 89 169 12 32 5 3
9 銅鑼鄉 218 6106 8887 7890 16777 57 62 7 13 4 1
10 南庄鄉 184 3846 5066 4136 9202 22 48 1 10 0 2
11 頭屋鄉 120 3596 5289 4672 9961 59 53 2 11 4 4
12 三義鄉 161 5625 8097 7205 15302 47 63 3 12 3 5
13 西湖鄉 108 2617 3653 2866 6519 38 20 1 17 3 0
14 造橋鄉 115 4144 6276 5545 11821 44 64 3 11 3 2
15 三灣鄉 93 2331 3395 2832 6227 27 18 2 9 0 2
16 獅潭鄉 98 1723 2300 1851 4151 28 10 1 4 0 0
17 泰安鄉 64 1994 3085 2642 5727 36 26 2 8 4 1
18 總計 4721 195608 276022 259758 535780 1776 2308 177 450 139 82
This my output df.plot
First question is how to display Chinese?
Second is can I use without df.plot to plot line chart?
last question is : There are four graphs(use subplot): the line graphs of male and female population and total population(男、女、合計(男+女)) in each township; the line graphs of in-migration and out-migration(遷入和遷出); the long bar graphs of household number(戶數); and the line graphs of births and deaths(出生和死亡).

Read online excel file with a specific sheet and only selected columns

I have to read through pandas the CTG.xls file from the following path:
https://archive.ics.uci.edu/ml/machine-learning-databases/00193/.
From this file I have to select the sheet Data. Moreover I have to select from column K to the column AT of the file. So at the end one have a dataset with these column:
["LB","AC","FM","UC","DL","DS","DP","ASTV","MSTV","ALTV" ,"MLTV" ,"Width","Min","Max" ,"Nmax","Nzeros","Mode","Mean" ,"Median" ,"Variance" ,"Tendency" ,"CLASS","NSP"]
How can I do this using the read function in pandas?

Use:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00193/CTG.xls'
df = pd.read_excel(url, sheet_name='Data', skipfooter=3)
df = df.drop(columns=df.filter(like='Unnamed').columns)
df.columns = df.iloc[0].to_list()
df = df[1:].reset_index(drop=True)
Output
LB AC FM UC DL DS DP ASTV MSTV ALTV MLTV Width Min Max Nmax Nzeros Mode Mean Median Variance Tendency CLASS NSP
0 120 0 0 0 0 0 0 73 0.5 43 2.4 64 62 126 2 0 120 137 121 73 1 9 2
1 132 0.00638 0 0.00638 0.00319 0 0 17 2.1 0 10.4 130 68 198 6 1 141 136 140 12 0 6 1
2 133 0.003322 0 0.008306 0.003322 0 0 16 2.1 0 13.4 130 68 198 5 1 141 135 138 13 0 6 1
3 134 0.002561 0 0.007682 0.002561 0 0 16 2.4 0 23 117 53 170 11 0 137 134 137 13 1 6 1
4 132 0.006515 0 0.008143 0 0 0 16 2.4 0 19.9 117 53 170 9 0 137 136 138 11 1 2 1
... ... ... ... ... ... .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..
2121 140 0 0 0.007426 0 0 0 79 0.2 25 7.2 40 137 177 4 0 153 150 152 2 0 5 2
2122 140 0.000775 0 0.006971 0 0 0 78 0.4 22 7.1 66 103 169 6 0 152 148 151 3 1 5 2
2123 140 0.00098 0 0.006863 0 0 0 79 0.4 20 6.1 67 103 170 5 0 153 148 152 4 1 5 2
2124 140 0.000679 0 0.00611 0 0 0 78 0.4 27 7 66 103 169 6 0 152 147 151 4 1 5 2
2125 142 0.001616 0.001616 0.008078 0 0 0 74 0.4 36 5 42 117 159 2 1 145 143 145 1 0 1 1
[2126 rows x 23 columns]

Trouble with PyTorchLSTM in Thinc

Running the following code:
from thinc.api import chain, PyTorchLSTM, Sigmoid, Embed, with_padded, with_array2d
vocab_size = len(vocab_to_int)+1 # +1 for the 0 padding + our word tokens
output_size = 1
embedding_dim = 400
hidden_dim = 256
n_layers = 2
model = chain(
Embed(nV=vocab_size, nO=embedding_dim),
with_padded(PyTorchLSTM(nI=embedding_dim,nO=hidden_dim, depth=n_layers)),
with_array2d(Sigmoid(nI=hidden_dim, nO=output_size))
)
model.initialize(X=train_x[:5], Y=train_y[:5])
I get this error: ValueError: Provided 'x' array should be 2-dimensional, but found 3 dimension(s).
Here is x[0], y[0]
[ 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
21025 308 6 3 1050 207 8 2138 32 1 171 57
15 49 81 5785 44 382 110 140 15 5194 60 154
9 1 4975 5852 475 71 5 260 12 21025 308 13
1978 6 74 2395 5 613 73 6 5194 1 24103 5
1983 10166 1 5786 1499 36 51 66 204 145 67 1199
5194 19869 1 37442 4 1 221 883 31 2988 71 4
1 5787 10 686 2 67 1499 54 10 216 1 383
9 62 3 1406 3686 783 5 3483 180 1 382 10
1212 13583 32 308 3 349 341 2913 10 143 127 5
7690 30 4 129 5194 1406 2326 5 21025 308 10 528
12 109 1448 4 60 543 102 12 21025 308 6 227
4146 48 3 2211 12 8 215 23] 1
I am relatively new to building these models, but I think it has to do with the fact that the output of the Pytorch LSTM layer has two dimensions. In a typical torch LSTM you'd stack the output from the LSTM layer (I think), but I'm not sure how to do that here. I assumed with_array2d would help but it doesn't seem to.

ValueError shape mismatch: objects cannot be broadcast to a single shape

This is my code that I plan to use for creating a BAR chart.Ignore next line.I am writing this just to balance the code and details .
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def bar1():
df=pd.read_csv('C:\\Users\Bhuwan Bhatt\Desktop\IP PROJECT\Book1.csv',encoding= 'unicode_escape')
x=np.arange(11)
Countries=df['Country']
STotalMed=df['SummerTotal']
WTotalMed=df['WinterTotal']
plt.bar(x-0.25,STotalMed,width=.2, label='Total Medals by Countries in Summer',color='g')
plt.bar(x+0.25,WTotalMed,width=.2, label='Total Medals by Countries in Winter',color='r')
plt.xticks(np.arange(11),Countries,rotation=30)
plt.title('Olympics Data Analysis of Top 10 Countries',color='red',fontsize=10)
plt.xlabel('Countries')
plt.ylabel('Total Medals')
plt.grid()
plt.legend()
plt.show()
bar1()
I get this error for some reason:
Traceback (most recent call last):
File "C:/Users/Bhuwan Bhatt/Desktop/dsd.py", line 19, in <module>
bar1()
File "C:/Users/Bhuwan Bhatt/Desktop/dsd.py", line 10, in bar1
plt.bar(x-0.25,STotalMed,width=.2, label='Total Medals by Countries in Summer',color='g')
File "C:\Users\Bhuwan Bhatt\AppData\Local\Programs\Python\Python38-32\lib\site-packages\matplotlib\pyplot.py", line 2471, in bar
return gca().bar(
File "C:\Users\Bhuwan Bhatt\AppData\Local\Programs\Python\Python38-32\lib\site-packages\matplotlib\__init__.py", line 1438, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "C:\Users\Bhuwan Bhatt\AppData\Local\Programs\Python\Python38-32\lib\site-packages\matplotlib\axes\_axes.py", line 2430, in bar
x, height, width, y, linewidth = np.broadcast_arrays(
File "<__array_function__ internals>", line 5, in broadcast_arrays
File "C:\Users\Bhuwan Bhatt\AppData\Local\Programs\Python\Python38-32\lib\site-packages\numpy\lib\stride_tricks.py", line 264, in broadcast_arrays
shape = _broadcast_shape(*args)
File "C:\Users\Bhuwan Bhatt\AppData\Local\Programs\Python\Python38-32\lib\site-packages\numpy\lib\stride_tricks.py", line 191, in _broadcast_shape
b = np.broadcast(*args[:32])
ValueError: shape mismatch: objects cannot be broadcast to a single shape
This is the CSV file I've been using:
Country SummerTimesPart Sumgoldmedal Sumsilvermedal Sumbronzemedal SummerTotal WinterTimesPart Wingoldmedal Winsilvermedal Winbronzemedal WinterTotal TotalTimesPart Tgoldmedal Tsilvermedal Tbronzemedal TotalMedal
 Afghanistan  14 0 0 2 2 0 0 0 0 0 14 0 0 2 2
 Algeria  13 5 4 8 17 3 0 0 0 0 16 5 4 8 17
 Argentina  24 21 25 28 74 19 0 0 0 0 43 21 25 28 74
 Armenia  6 2 6 6 14 7 0 0 0 0 13 2 6 6 14
Australasia 2 3 4 5 12 0 0 0 0 0 2 3 4 5 12
  Australia  26 147 163 187 497 19 5 5 5 15 45 152 168 192 512
  Austria  27 18 33 36 87 23 64 81 87 232 50 82 114 123 319
  Azerbaijan  6 7 11 24 42 6 0 0 0 0 12 7 11 24 42
  Bahamas  16 6 2 6 14 0 0 0 0 0 16 6 2 6 14
  Bahrain  9 2 1 0 3 0 0 0 0 0 9 2 1 0 3
  Barbados 12 0 0 1 1 0 0 0 0 0 12 0 0 1 1
  Belarus  6 12 27 39 78 7 8 5 5 18 13 20 32 44 96
  Belgium  26 40 53 55 148 21 1 2 3 6 47 41 55 58 154
  Bermuda  18 0 0 1 1 8 0 0 0 0 26 0 0 1 1
  Bohemia  3 0 1 3 4 0 0 0 0 0 3 0 1 3 4
  Botswana  10 0 1 0 1 0 0 0 0 0 10 0 1 0 1
  Brazil  22 30 36 63 129 8 0 0 0 0 30 30 36 63 129
  British WestIndies  1 0 0 2 2 0 0 0 0 0 1 0 0 2 2
  Bulgaria  20 51 87 80 218 20 1 2 3 6 40 52 89 83 224
  Burundi  6 1 1 0 2 0 0 0 0 0 6 1 1 0 2
  Cameroon 14 3 1 2 6 1 0 0 0 0 15 3 1 2 6
  Canada 26 64 102 136 302 23 73 64 62 199 49 137 166 198 501
  Chile  23 2 7 4 13 17 0 0 0 0 40 2 7 4 13
  China  10 224 167 155 546 11 13 28 21 62 21 237 195 176 608
  Colombia  19 5 9 14 28 2 0 0 0 0 21 5 9 14 28
  Costa Rica  15 1 1 2 4 6 0 0 0 0 21 1 1 2 4
  Ivory Coast  13 1 1 1 3 0 0 0 0 0 13 1 1 1 3
  Croatia  7 11 10 12 33 8 4 6 1 11 15 15 16 13 44
  Cuba  20 78 68 80 226 0 0 0 0 0 20 78 68 80 226
INFO-----> SummerTimesPart : No. of times participated in summer by each country
WinterTimesPart : No. of times participated in winter by each countryta

Just change
x=np.arange(11)
to
x = np.arange(len(df))
and
plt.xticks(np.arange(11),Countries,rotation=30)
to
plt.xticks(x,Countries,rotation=30)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading XML (NIST .n42 file) and data extraction - python

Try this: import xml.etree.ElementTree as ET root = ET.parse('Annex_B_n42.xml').getroot() elems = root.findall(".//*[#compressionCode='None']") print(elems[0].text)

Related

Cannot read PDF Data into Sheets with Gspread-DataFrame

How to plot multiple chart on one figure and combine with another?

Read online excel file with a specific sheet and only selected columns

Trouble with PyTorchLSTM in Thinc

ValueError shape mismatch: objects cannot be broadcast to a single shape

Categories

Resources