I have a data frame that looks like this:
data = {'State': ['24', '24', '24',
'24','24','24','24','24','24','24','24','24'],
'County code': ['001', '001', '001',
'001','002','002','002','002','003','003','003','003'],
'TT code': ['123', '123', '123',
'123','124','124','124','124','125','125','125','125'],
'BLK code': ['221', '221', '221',
'221','222','222','222','222','223','223','223','223'],
'Age Code': ['1', '1', '2', '2','2','2','2','2','2','1','2','1']}
df = pd.DataFrame(data)
essentially I want to just have where only the TT code where the age code is 2 and there are no 1's. So I just want to have the data frame where:
'State': ['24', '24', '24', '24'],
'County code': ['002','002','002','002',],
'TT code': ['124','124','124','124',],
'BLK code': ['222','222','222','222'],
'Age Code': ['2','2','2','2']
is there a way to do this?
IIUC, you want to keep only the TT groups where there are only Age groups with value '2'?
You can use a groupby.tranform('all') on the boolean Series:
df[df['Age Code'].eq('2').groupby(df['TT code']).transform('all')]
output:
State County code TT code BLK code Age Code
4 24 002 124 222 2
5 24 002 124 222 2
6 24 002 124 222 2
7 24 002 124 222 2
This should work.
df111['Age Code'] = "2"
I am just wondering why the choice of string for valueType of integer
I am trying to iterate through a list, then split the rows - to perform a function on specific element in the split.
what i want is something like this - so i can grab each element by postilion x[2] = 220
['2', '325', '220', '1.0']
what i get is this (split by character)
for row in range(len(pln)):
for j in range(len(pln[row])):
print( pln[row][j], end="")
x = [pln[row][j].split()]
print (x)
2[['2']]
[['', '']]
3[['3']]
2[['2']]
5[['5']]
[['', '']]
2[['2']]
2[['2']]
0[['0']]
[['', '']]
1[['1']]
.[['.']]
0[['0']]
[['\n']]
pln = (before iteration as list)
['2 325 220 1.0\n', '2 600 200 3.3\n', '2 325 100 3.3\n', '2 600 120 5.5\n', '2 600 125 5.5\n', '2 325 100 3.4']
pln = (after iteration)
2 325 220 1.0
2 600 200 3.3
2 325 100 3.3
2 600 120 5.5
2 600 125 5.5
2 325 100 3.4
Here is a solution:
lst = ['2 325 220 1.0\n', '2 600 200 3.3\n', '2 325 100 3.3\n', '2 600 120 5.5\n', '2 600 125 5.5\n', '2 325 100 3.4']
result_list = []
for i in lst:
k = i.split()
result_list.append(k)
print(result_list)
#Output:
[['2', '325', '220', '1.0'], ['2', '600', '200', '3.3'], ['2', '325', '100', '3.3'], ['2', '600', '120', '5.5'], ['2', '600', '125', '5.5'], ['2', '325', '100', '3.4']]
You can access an element like:
#result_list[row_number][element index]
print(result_list[2][3]) #fourth element in third row
#Output
3.3
You can iterate all rows like:
for row in result_list:
print(row)
#Output:
['2', '325', '220', '1.0']
['2', '600', '200', '3.3']
['2', '325', '100', '3.3']
['2', '600', '120', '5.5']
['2', '600', '125', '5.5']
['2', '325', '100', '3.4']
You can iterate any column like:
for row in result_list:
print(row[1]) #This will give second column
#Output:
325
600
325
600
600
325
Stop printing when doing your splitting. Print after the columns have been split.
pln = ['2 325 220 1.0\n', '2 600 200 3.3\n', '2 325 100 3.3\n', '2 600 120 5.5\n', '2 600 125 5.5\n', '2 325 100 3.4']
rows = [line.split() for line in pln]
for row in rows:
print('\t'.join(row))
You can then access each row by its index
second_row = rows[1]
Then access each column by index
third_column = second_row[2]
This code will store individual characters in new_listThis is all I understood from your question, if anything else do ask.
given_list = ['2','325','220','1.0']
new_list = list()
length = len(given_list)
for i in range(length):
for j in given_list[i]:
new_list.append(j)
if i+1 != length:
new_list.append(",")
print(new_list)
Output: ['2', ',', '3', '2', '5', ',', '2', '2', '0', ',', '1', '.', '0']
I have the following in Manager:
class NullIf(Func):
template = "NULLIF(%(expressions)s, '')"
class MySiteManager(models.Manager):
def get_queryset(self):
qws = MySiteQuerySet(self.model, using=self._db).filter(
some_id=settings.BASE_SOME_ID).annotate(
# This is made for sorting by short labels as by numeric values
short_label_numeric=Cast(
NullIf(Func(
F('short_label'),
Value('^(\D+)|(\w+)'),
Value(''),
Value('g'),
function='regexp_replace')),
models.BigIntegerField())
).order_by('short_label_numeric', 'short_label')
for q in qws:
print(q.short_label, end='\n')
return qws
Output of print values looks like:
1
10
100
101
102
103
104
105
106
107
108
109
11
110
111
112
113
114
115
116
117
118
119
12
120
121
122
123
124
125
126
127
128
129
13
130
131
132
133
134
135
136
137
138
139
14
140
141
142
143
144
145
146
147
148
149
15
150
151
152
153
154
155
156
157
158
159
16
17
18
19
20
200c
21
22
23
24
25
26
260
261
262
263
264
265fs
266fs
267c
268c
269c
27
273c
274c
275c
276c
28
29
2c
30
302
31
32c
33c
34
35c
36
37
38
3c
4
5
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
524
6
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
7
701
702
703
704
705
706
707
708
709
710
8
801
802
803
804
805
806
807
808
809
810
9
901
902
S1
S10
S11
S12
S13
S14
S15
S16
S17
S18
S19
S2
S20
S3
S4
S5
S6
S7
S8
S9
And my question:
How to build queryset with output looks like e.g. 1 2 3 3c 4 5 6 6c ... 264 265fs 266fs 267c 268c 269c ... S1 S2 S3 S4 ??? Does someone have any assumptions?
the main idea is order by numeric and then by char part of the label, i can't reproduce and test, but solution may looks like:
first here the sql:
SELECT
(regexp_matches(short_label, '^\d+'))[1]::numeric AS ln,
regexp_matches(short_label, '^\D+') as ls,
short_label
FROM YOUR_APP_TABLENAME ORDER BY 1, 2, 3;
annotaion in the orm:
for first sql condition i create custom Func
In [1]: from myapp.models import *
In [2]: from django.db.models import F, Func, Value
...:
...: class StartNumeric(Func):
...: function = 'REGEXP_MATCHES'
...: template = "(%(function)s(%(expressions)s, '^\d+'))[1]::int"
...:
...: qs = Ingredient.objects.annotate(
...: ln=StartNumeric('short_label'),
...: ls=Func('short_label', Value('^\D+'), function='regexp_matches'),
...: ).values('ln').order_by('ln', 'ls', 'short_label')
...:
...:
In [3]: print(qs.query)
SELECT (REGEXP_MATCHES("myapp_ingredient"."short_label", '^\d+'))[1]::int AS "ln" FROM "myapp_ingredient" ORDER BY "ln" ASC, regexp_matches("myapp_ingredient"."short_label", ^\D+) ASC, "myapp_ingredient"."short_label" ASC
In [4]: data = qs.values_list('short_label', flat=True)
...: print(list(data))
...:
...:
['1', '2c', '3c', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32c', '33c', '34', '35c', '36', '37', '38', '100', '101', '102', '103', '104', '105', '106', '107', '108', '109', '110', '111', '112', '113', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '148', '149', '150', '151', '152', '153', '154', '155', '156', '157', '158', '159', '200c', '260', '261', '262', '263', '264', '265fs', '266fs', '267c', '268c', '269c', '273c', '274c', '275c', '276c', '302', '501', '502', '503', '504', '505', '506', '507', '508', '509', '510', '511', '512', '513', '514', '515', '516', '517', '518', '519', '520', '521', '522', '524', '601', '602', '603', '604', '605', '606', '607', '608', '609', '610', '611', '612', '613', '614', '615', '616', '617', '618', '619', '620', '621', '622', '623', '701', '702', '703', '703', '704', '705', '706', '707', '708', '709', '710', '801', '802', '803', '804', '805', '806', '807', '808', '809', '810', '901', '902', 'aaaa', 'ddd', 'ddeee', 'rrrrr', 'S1', 'S10', 'S11', 'S12', 'S13', 'S14', 'S15', 'S16', 'S17', 'S18', 'S19', 'S2', 'S20', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9', 'vvvv', 'zzzz']
hope it help
How about sorting the output using a natural sort:
import re
_nsre = re.compile(r'(\d+)')
def natural_sort_key(s):
return [int(text) if text.isdigit() else text.lower()
for text in re.split(_nsre, s)]
s = "1 10 2 100 101 102 103 104 105 106 107 108 109 11 110 111 112 113 114 115 116 117 118 119 12 120 121 122 123 124 125 126 127 128 129 13 130 131 132 133 134 135 136 137 138 139 14 140 141 142 143 144 145 146 147 148 149 15 150 151 152 153 154 155 156 157 158 159 16 17 18 19 20 200c 21 22 23 24 25 26 260 261 262 263 264 265fs 266fs 267c 268c 269c 27 273c 274c 275c 276c 28 29 2c 30 302 31 32c 33c 34 35c 36 37 38 3c 4 5 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 524 6 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 7 701 702 703 704 705 706 707 708 709 710 8 801 802 803 804 805 806 807 808 809 810 9 901 902 S1 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S2 S20 S3 S4 S5 S6 S7 S8 S9"
list1 = s.split(' ')
list1.sort(key=natural_sort_key)
Output list1:
['1', '2', '2c', '3c', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32c', '33c', '34', '35c', '36', '37', '38', '100', '101', '102', '103', '104', '105', '106', '107', '108', '109', '110', '111', '112', '113', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '148', '149', '150', '151', '152', '153', '154', '155', '156', '157', '158', '159', '200c', '260', '261', '262', '263', '264', '265fs', '266fs', '267c', '268c', '269c', '273c', '274c', '275c', '276c', '302', '501', '502', '503', '504', '505', '506', '507', '508', '509', '510', '511', '512', '513', '514', '515', '516', '517', '518', '519', '520', '521', '522', '524', '601', '602', '603', '604', '605', '606', '607', '608', '609', '610', '611', '612', '613', '614', '615', '616', '617', '618', '619', '620', '621', '622', '623', '701', '702', '703', '704', '705', '706', '707', '708', '709', '710', '801', '802', '803', '804', '805', '806', '807', '808', '809', '810', '901', '902', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9', 'S10', 'S11', 'S12', 'S13', 'S14', 'S15', 'S16', 'S17', 'S18', 'S19', 'S20']
I'm trying to scrape the content from this URL which contains multiple tables. The desired output would be:
NAME FG% FT% 3PM REB AST STL BLK TO PTS SCORE
Team Jackson (0-8) .4313 .7500 21 71 34 11 12 15 189 1-8-0
Team Keyrouze (4-4) .4441 .8090 31 130 71 18 13 45 373 8-1-0
Nutz Vs. Draymond Green (4-4) .4292 .8769 30 86 66 15 9 28 269 3-6-0
Team Pauls 2 da Wall (3-5) .4784 .8438 40 123 64 18 20 30 316 6-3-0
Team Noey (2-6) .4350 .7679 21 125 62 20 9 33 278 7-2-0
YOU REACH, I TEACH (2-5-1) .4810 .7432 20 114 56 30 7 50 277 2-7-0
Kris Kaman His Pants (5-3) .4328 .8000 20 74 59 20 5 27 238 3-6-0
Duke's Balls In Daniels Face (3-4-1) .5000 .7045 42 139 38 27 22 30 303 6-3-0
Knicks Tape (5-3) .5000 .8152 34 143 92 12 9 47 397 4-5-0
Suck MyDirk (5-3) .4734 .8814 29 106 86 22 17 40 435 5-4-0
In Porzingod We Trust (4-4) .4928 .7222 27 180 95 16 16 46 423 7-2-0
Team Aguilar (6-1-1) .4718 .7053 28 177 65 12 35 48 413 2-7-0
Team Li (7-0-1) .4714 .8118 35 134 74 17 17 47 368 6-3-0
Team Iannetta (4-4) .4527 .7302 22 125 90 20 13 44 288 3-6-0
If it's too difficult to format the tables like that, I'd like to know how I can scrape all the tables? My code to scrape all rows is like this:
tableStats = soup.find('table', {'class': 'tableBody'})
rows = tableStats.findAll('tr')
for row in rows:
print(row.string)
But it only prints the value "TEAM" and nothing else... Why doesn't it contain all the rows in the table?
Thanks.
Instead of looking for the table tag, you should look for the rows directly with a more dependable class, such as linescoreTeamRow. This code snippet does the trick,
from bs4 import BeautifulSoup
import requests
a = requests.get("http://games.espn.com/fba/scoreboard?leagueId=224165&seasonId=2017")
soup = BeautifulSoup(a.text, 'lxml')
# searching for the rows directly
rows = soup.findAll('tr', {'class': 'linescoreTeamRow'})
# you will need to isolate elements in the row for the table
for row in rows:
print row.text
Found a way to exactly get the 2-D matrix I specified in the question. It's stored as the list teams.
Code:
from bs4 import BeautifulSoup
import requests
source_code = requests.get("http://games.espn.com/fba/scoreboard?leagueId=224165&seasonId=2017")
plain_text = source_code.text
soup = BeautifulSoup(plain_text, 'lxml')
teams = []
rows = soup.findAll('tr', {'class': 'linescoreTeamRow'})
# Creates a 2-D matrix.
for row in range(len(rows)):
team_row = []
columns = rows[row].findAll('td')
for column in columns:
team_row.append(column.getText())
print(team_row)
# Add each team to a teams matrix.
teams.append(team_row)
Output:
['Team Jackson (0-10)', '', '.4510', '.8375', '41', '135', '101', '23', '11', '50', '384', '', '5-4-0']
['YOU REACH, I TEACH (3-6-1)', '', '.4684', '.7907', '22', '169', '103', '22', '10', '32', '342', '', '4-5-0']
['Nutz Vs. Draymond Green (4-6)', '', '.4552', '.8372', '30', '157', '68', '15', '16', '39', '356', '', '2-7-0']
["Jesse's Blue Balls (4-5-1)", '', '.4609', '.7576', '47', '158', '71', '30', '20', '38', '333', '', '7-2-0']
['Team Noey (4-6)', '', '.4763', '.8261', '42', '164', '70', '25', '29', '44', '480', '', '5-4-0']
['Suck MyDirk (6-3-1)', '', '.4733', '.8403', '54', '160', '132', '23', '11', '47', '544', '', '4-5-0']
['Kris Kaman His Pants (5-5)', '', '.4569', '.8732', '53', '138', '105', '27', '21', '53', '465', '', '6-3-0']
['Team Aguilar (6-3-1)', '', '.4433', '.7229', '40', '202', '68', '30', '22', '54', '452', '', '3-6-0']
['Knicks Tape (6-3-1)', '', '.4406', '.8824', '52', '172', '108', '24', '13', '49', '513', '', '6-3-0']
['Team Iannetta (4-6)', '', '.5321', '.6923', '24', '146', '94', '32', '16', '60', '428', '', '3-6-0']
['In Porzingod We Trust (6-4)', '', '.4694', '.6364', '37', '216', '133', '31', '21', '77', '468', '', '4-5-0']
['Team Keyrouze (6-4)', '', '.4705', '.8854', '51', '135', '108', '25', '17', '43', '550', '', '5-4-0']
['Team Li (8-1-1)', '', '.4369', '.8182', '57', '203', '130', '34', '22', '54', '525', '', '6-3-0']
['Team Pauls 2 da Wall (5-5)', '', '.4780', '.5970', '27', '141', '47', '19', '25', '28', '263', '', '3-6-0']