So I have this google sheets file, I need to extract event data and create Event Model in Django. So far I have no problem getting data from API, but some of the fields in spreadsheets are empty, thus, API does not return me those fields, for example, index 23 is complete, but in index 24 fields are not defined. It is ok for me to enter empty data in Django models, it does not matter at all.
WHAT I ACTUALLY WANT is if array[22][1] is empty(which it is (array[22][0] is 'May 4')) then append null value for that index. I wrote this line but it doesn't work, how do I implement this?
for row in range(index):
for column in range(6):
try:
print(values[row][column])
except IndexError:
values[row][column].append('')
If row[column] is missing, you want to append to row, not row[column] itself (which we've already established is missing, and will get you a TypeError or something).
Another option would be something like:
for row in values:
if len(row) < 6:
row.extend([''] * (6 - len(row)))
i.e. "for each row, if it's shorter than 6 items, add enough '' items to make up the difference".
Related
Currently I am working on my own project and I am attempting to create a new column that is based off an if statement condition in which I am trying to return an indexed element from a column. I am having issues indexing all rows and then grabbing the specific index I want.
Overview:
Dataset name = fivb_2019
'Result' is an object column and the final score of the volleyball match. I am trying to take an indexed element of this column to create a new column e.g.(set_one_points)
row value examples
rows 5-7 in the dataframe contain the value below
0-2 (10-21, 16-21)
I am trying to get the indexed value [5:7] for the home team and [8:10] for the away team
for i in fivb_2019['Result']:
print (i[5:7])
When I run this for loop I am able to produce all of the results I desire for my column, but when I put this for loop in my function I return only the value 10 for the home team at index [5:7] and value 21 at index [8:10]. Here is the function:
def fill_set_one (Result, teamid,home_teamid):
for i in fivb_2019['Result']:
if home_teamid == teamid:
return (i[5:7])
else:
return (i[8:10])
fivb_2019['set_one_points'] = fivb_2019.apply(lambda x: fill_set_one(x['Result'],
x['teamid'], x['home_teamid']),axis=1)
The function runs, but the results from the value_counts of my new column [set_one_points] only return these values:
fivb_2019['set_one_points'].value_counts()
'10' 111319 times
'21' 111236 times
I think when I am attempting to index within the function i([5:7]) it might be grabbing only columns five to seven and the string index '10'. Because in the last line of code you can see that in row 223765 and others that 13 should be the returned result.
1-2 (13-21, 21-14, 11-15)
Other row examples
2-1 (16-21, 21-18, 22-20)
2-1 (18-21, 23-21, 16-14)
2-0 (21-19, 21-19)
Just confused why I can't return all the results within the function to create my new column.
Thanks for the help in advance, I appreciate it.
Kyle
I have a csv file with a column of integers i'm reading and I want to terminate the program if there is a duplicate value in the column, along with displaying the value that was found to be a duplicate. I am currently able to find if there are duplicates and terminate the program using:
for x in df.duplicated(['projectID']): # projectID is the column header
if x == True:
sys.exit("ERROR: there is a duplicate projectID in the csv file. Terminating Program.")
but I want a way to tell the user which value is duplicated. This is where I stuck. I have no idea how to do so. I know there can be multiple duplicates but I'm content with saying
sys.exit("ERROR: {0} is a duplicate projectID in the csv file. Terminating Program.". format(x))
to the first duplicate integer it finds. Any ideas for how the code would look?
CSV would look something like:
projectName, projectID
Alpha,1
Beta,2
Gamma,3
Delta,1
so the value '1' is a duplicate which I would like to display to the user.
Here's a way to do that:
if df.projectID.duplicated().any():
print("There are some duplicates:")
print(f"The first duplicate value of 'projectID' is {df[df.projectID.duplicated()].projectID.iloc[0]}")
The output is:
There are some duplicates:
The first duplicate value of 'projectID' is 1
To explain the last line:
This is the full line:
df[df.projectID.duplicated()].projectID.iloc[0]
It's comprised of the following pieces:
Step 1: df.projectID.duplicated() - produced a Boolean series of which values are duplicates.
Step 2: df[<step-1>]: reduce the data frame to include only the values which are indeed duplicates.
Step 3: <step-2>.projectID: extract the ProjectID series from the reduced dataframe.
Step 4: <step-3>.iloc[0]: take the value in the first location of the duplicate ProjectID series. This is the value you'd like to print.
I have a double-nested dictionary, where the value returned is a list with characteristics about a person. I want to write each value in the list to google sheets, so have used gspread. Here's my code:
for person in list_id:
index = 2
for key, value in enrich_dict.items():
for keytwo, valuetwo in value.items():
row = [valuetwo[0], valuetwo[1], valuetwo[2], valuetwo[3], person]
sheet.insert_row(row, index)
index += 1
for some reason, valuetwo[3] is never inserted into the sheet, I just get 4 columns of data. No matter what data I test with (have tried using simple strings), this is always the case, the 4th value is skipped.
Can you post an example of your input and expected output?
I have a simple function to get all items from my sqlite database:
def get_items(self):
stmt = "SELECT description FROM items"
return [x[0] for x in self.conn.execute(stmt)]
It works well but I cannot figure out how to also print the row numbers along with each row's description. Right now it just prints description (which is a row with some text in it). How would I get the print output to be something like this?
1: the text in row 1
2: the text in row 2
3: etc etc etc
This is needed because eventually I will need to call upon a row number to delete a row entirely. So if Python receives input to delete row 10 in the sqlite database, I would need to be able to easily identify row 10. Kind of like if you right click + delete a row in excel and then the rest of the rows move up and become the previous number. I would like my program to work exactly like that.
Right now I think my table is only 1 column (the description column with the text). Do I need to add another column to identify row number? When I do SHOW TABLE in VSCode, I actually do see a column with the numbers labeled "#", but I'm not sure if that's really a part of the table or if it's just VSCode adding that for aesthetics.
If you want to select the 101th row, you can use SELECT description FROM items LIMIT 1 OFFSET 101 and use DELETE FROM items LIMIT 1 OFFSET 101 for deleting it.
I have looked at the documentation about the hierarchical indexing in Pandas. I tried testing it with extracted data from an URL, but I am, clearly, missing something:
# creating an array of rows
rows = []
# making a for loop to append every player from every 'td' instance
for r in container.select('tr'):
rows.append([col.text.strip() for col in r.select('td')])
zipped = zip(*rows)
# first row needs to have the header of the table on the website
csvfile = pd.DataFrame(zipped)
csvfile.columns = rows[0]
I thought about creating an empty dataframe of 13 columns, but I am not sure whether that will solve my problem
What I am trying to do is to create 13 columns, where each column has some data, which I have already extracted.
EDIT:
The extracted data looks like:
What I want, more specifically, is to put the left side (Column A) as a row instead, and put the right side under each.