I have a double-nested dictionary, where the value returned is a list with characteristics about a person. I want to write each value in the list to google sheets, so have used gspread. Here's my code:
for person in list_id:
index = 2
for key, value in enrich_dict.items():
for keytwo, valuetwo in value.items():
row = [valuetwo[0], valuetwo[1], valuetwo[2], valuetwo[3], person]
sheet.insert_row(row, index)
index += 1
for some reason, valuetwo[3] is never inserted into the sheet, I just get 4 columns of data. No matter what data I test with (have tried using simple strings), this is always the case, the 4th value is skipped.
Can you post an example of your input and expected output?
Related
Currently I am working on my own project and I am attempting to create a new column that is based off an if statement condition in which I am trying to return an indexed element from a column. I am having issues indexing all rows and then grabbing the specific index I want.
Overview:
Dataset name = fivb_2019
'Result' is an object column and the final score of the volleyball match. I am trying to take an indexed element of this column to create a new column e.g.(set_one_points)
row value examples
rows 5-7 in the dataframe contain the value below
0-2 (10-21, 16-21)
I am trying to get the indexed value [5:7] for the home team and [8:10] for the away team
for i in fivb_2019['Result']:
print (i[5:7])
When I run this for loop I am able to produce all of the results I desire for my column, but when I put this for loop in my function I return only the value 10 for the home team at index [5:7] and value 21 at index [8:10]. Here is the function:
def fill_set_one (Result, teamid,home_teamid):
for i in fivb_2019['Result']:
if home_teamid == teamid:
return (i[5:7])
else:
return (i[8:10])
fivb_2019['set_one_points'] = fivb_2019.apply(lambda x: fill_set_one(x['Result'],
x['teamid'], x['home_teamid']),axis=1)
The function runs, but the results from the value_counts of my new column [set_one_points] only return these values:
fivb_2019['set_one_points'].value_counts()
'10' 111319 times
'21' 111236 times
I think when I am attempting to index within the function i([5:7]) it might be grabbing only columns five to seven and the string index '10'. Because in the last line of code you can see that in row 223765 and others that 13 should be the returned result.
1-2 (13-21, 21-14, 11-15)
Other row examples
2-1 (16-21, 21-18, 22-20)
2-1 (18-21, 23-21, 16-14)
2-0 (21-19, 21-19)
Just confused why I can't return all the results within the function to create my new column.
Thanks for the help in advance, I appreciate it.
Kyle
So I have this google sheets file, I need to extract event data and create Event Model in Django. So far I have no problem getting data from API, but some of the fields in spreadsheets are empty, thus, API does not return me those fields, for example, index 23 is complete, but in index 24 fields are not defined. It is ok for me to enter empty data in Django models, it does not matter at all.
WHAT I ACTUALLY WANT is if array[22][1] is empty(which it is (array[22][0] is 'May 4')) then append null value for that index. I wrote this line but it doesn't work, how do I implement this?
for row in range(index):
for column in range(6):
try:
print(values[row][column])
except IndexError:
values[row][column].append('')
If row[column] is missing, you want to append to row, not row[column] itself (which we've already established is missing, and will get you a TypeError or something).
Another option would be something like:
for row in values:
if len(row) < 6:
row.extend([''] * (6 - len(row)))
i.e. "for each row, if it's shorter than 6 items, add enough '' items to make up the difference".
The Python API for Google sheets has a get method to get values from a spreadsheet, but it requires a range argument. I.e., your code must be something like this,
sheets_service = service.spreadsheets().values()
data = sheets_service.get(spreadsheetId = _GS_ID, range = _SHEET_NAME).execute()
and you cannot omit the range argument, nor will a value of '' work, or a value of 'Sheet1' or similar (unless there is a sheet named Sheet1).
What if I do not know the sheet name ahead of time? Can I reference the first or left-most sheet somehow? Failing that, is there a way to get a list of all the sheets? I have been looking at the API and have not found anything for that purpose, but this seems like such a basic need that I feel I'm missing something obvious.
You can retrieve the values and metadata of Spreadsheet using spreadsheets.get of Sheets API. By using the parameter of fields, you can retrieve various information of the Spreadsheet.
Sample 1 :
This sample retrieves the index, sheet ID and sheet name in Spreadsheet. In this case, index: 0 means the first sheet.
service.spreadsheets().get(spreadsheetId=_GS_ID, fields='sheets(properties(index,sheetId,title))').execute()
Sample 2 :
This sample retrieves the sheet name, the number of last row and last column of data range using sheet index. When 0 is used for the sheet index, it means the first sheet.
res = service.spreadsheets().get(spreadsheetId=_GS_ID, fields='sheets(data/rowData/values/userEnteredValue,properties(index,sheetId,title))').execute()
sheetIndex = 0
sheetName = res['sheets'][sheetIndex]['properties']['title']
lastRow = len(res['sheets'][sheetIndex]['data'][0]['rowData'])
lastColumn = max([len(e['values']) for e in res['sheets'][sheetIndex]['data'][0]['rowData'] if e])
Reference :
spreadsheets.get
Convert column index into corresponding column letter
For column, you can see the method about converting from index to letter at above thread.
So i have a problem with the Gspread for python 3
when i do something like:
x = worksheet.cell(1,1).value
print(x)
Then i get the value of cell 1,1 which in my case is:
Nice
But when i do:
x = worksheet.col_values(1)
print(x)
Then i get all the results as in
'Nice', 'Cool','','','','','','','','','','','','','',''
And all the empty cells as well which i don't understand since i am asking just for values why i do i get all the '', empty brackets and why the other results are also in brackets ? I would expect something like:
Nice
Cool
When i call for the values of a column and those are the only values. Anyone know how to get such results ?
According to this https://github.com/burnash/gspread documentation it should work but it dose not.
You are getting all of the column data, contained in a list. It starts at row one and gives you all rows in that column to the bottom of the spreadsheet (1000 rows by default), including empty cells. The documentation tells you this:
col_values(col) Returns a list of all values in column col.
Empty cells in this list will be rendered as None.
This seems to have been changed to return empty strings instead, but the principle is the same.
To get just values, use a list comprehension:
x = [item for item in worksheet.col_values(1) if item]
Noting that the above will remove blank rows between items, which might cause misalignment if you try to work with multiple columns where row number is important. Since it's a list, individual items are accessed with:
for item in x:
print(item)
Looking again at the gspread-documentation, I was able to create a dataframe and then thereafter obtain the column-values:
gc = gspread.authorize(GoogleCredentials.get_application_default())
sht2 = gc.open_by_url('https://docs.google.com/spreadsheets/d/<id>')
worksheet = sht2.worksheet("Sheet-name")
dataframe = pd.DataFrame(worksheet.get_all_records())
dataframe.head(3)
Note: Don't forget to enable your gsheet's sharing-settings to "Anyone with a link", to be able to access the sheet from e.g. google colab.
You can also create a while loop and make something like this.
Let's say you want column E to G, you can start the loop from x=5 and end it on x=7. Just make sure that you transpose the dataframe at the end before printing it.
columns = []
x = 5
while x < 8:
data = sheet.col_values(x)[1:]
x += 1
columns.append(data)
df = pd.DataFrame(columns).T
print(df)
The long (winded) version:
I'm gathering research data using Python. My initial parsing is ugly (but functional) code which gives me some basic information and turns my raw data into a format suitable for heavy duty statistical analysis using SPSS. However, every time I modify the experiment, I have to dive into the analysis code.
For a typical experiment, I'll have 30 files, each for a unique user. Field count is fixed for each experiment (but can vary from one to another 10-20). Files are typically 700-1000 records long with a header row. Record format is tab separated (see sample which is 4 integers, 3 strings, and 10 floats).
I need to sort my list into categories. In a 1000 line file, I could have 4-256 categories. Rather than trying to pre-determine how many categories each file has, I'm using the code below to count them. The integers at the beginning of each line dictate what category the float values in the row correspond to. Integer combinations can be modified by the string values to produce wildly different results, and multiple combinations can sometimes be lumped together.
Once they're in categories, number crunching begins. I get statistical info (mean, sd, etc. for each category for each file).
The essentials:
I need to parse data like the sample below into categories. Categories are combos of the non-floats in each record. I'm also trying to come up with a dynamic (graphical) way to associate column combinations with categories. Will make a new post fot this.
I'm looking for suggestions on how to do both.
# data is a list of tab separated records
# fields is a list of my field names
# get a list of fieldtypes via gettype on our first row
# gettype is a function to get type from string without changing data
fieldtype = [gettype(n) for n in data[1].split('\t')]
# get the indexes for fields that aren't floats
mask = [i for i, field in enumerate(fieldtype) if field!="float"]
# for each row of data[skipping first and last empty lists] we split(on tabs)
# and take the ith element of that split where i is taken from the list mask
# which tells us which fields are not floats
records = [[row.split('\t')[i] for i in mask] for row in data[1:-1]]
# we now get a unique set of combos
# since set doesn't happily take a list of lists, we join each row of values
# together in a comma seperated string. So we end up with a list of strings.
uniquerecs = set([",".join(row) for row in records])
print len(uniquerecs)
quit()
def gettype(s):
try:
int(s)
return "int"
except ValueError:
pass
try:
float(s)
return "float"
except ValueError:
return "string"
Sample Data:
field0 field1 field2 field3 field4 field5 field6 field7 field8 field9 field10 field11 field12 field13 field14 field15
10 0 2 1 Right Right Right 5.76765674196 0.0310912272139 0.0573603238282 0.0582901376612 0.0648936500524 0.0655294305058 0.0720571099855 0.0748289246137 0.446033755751
3 1 3 0 Left Left Right 8.00982745764 0.0313840132052 0.0576521406854 0.0585844966069 0.0644905497442 0.0653386429438 0.0712603578765 0.0740345755708 0.2641076191
5 19 1 0 Right Left Left 4.69440026591 0.0313852052224 0.0583165354345 0.0592403274967 0.0659404609478 0.0666070804916 0.0715314027001 0.0743022054775 0.465994962101
3 1 4 2 Left Right Left 9.58648184552 0.0303649003017 0.0571579895338 0.0580911765412 0.0634304670863 0.0640132919609 0.0702920967445 0.0730697946335 0.556525293
9 0 0 7 Left Left Left 7.65374257547 0.030318719717 0.0568551744109 0.0577785415066 0.0640577002605 0.0647226582655 0.0711459854908 0.0739256050784 1.23421547397
Not sure if I understand your question, but here are a few thoughts:
For parsing the data files, you usually use the Python csv module.
For categorizing the data you could use a defaultdict with the non-float fields joined as a key for the dict. Example:
from collections import defaultdict
import csv
reader = csv.reader(open('data.file', 'rb'), delimiter='\t')
data_of_category = defaultdict(list)
lines = [line for line in reader]
mask = [i for i, n in enumerate(lines[1]) if gettype(n)!="float"]
for line in lines[1:]:
category = ','.join([line[i] for i in mask])
data_of_category[category].append(line)
This way you don't have to calculate the categories in the first place an can process the data in one pass.
And I didn't understand the part about "a dynamic (graphical) way to associate column combinations with categories".
For at least part of your question, have a look at Named Tuples
Step 1: Use something like csv.DictReader to turn the text file into an iterable of rows.
Step 2: Turn that into a dict of first entry: rest of entries.
with open("...", "rb") as data_file:
lines = csv.Reader(data_file, some_custom_dialect)
categories = {line[0]: line[1:] for line in lines}
Step 3: Iterate over the items() of the data and do something with each line.
for category, line in categories.items():
do_stats_to_line(line)
Some useful answers already but I'll throw mine in as well. Key points:
Use the csv module
Use collections.namedtuple for each row
Group the rows using a tuple of int field values as the key
If your source rows are sorted by the keys (the integer column values), you could use itertools.groupby. This would likely reduce memory consumption. Given your example data, and the fact that your files contain >= 1000 rows, this is probably not an issue to worry about.
def coerce_to_type(value):
_types = (int, float)
for _type in _types:
try:
return _type(value)
except ValueError:
continue
return value
def parse_row(row):
return [coerce_to_type(field) for field in row]
with open(datafile) as srcfile:
data = csv.reader(srcfile, delimiter='\t')
## Read headers, create namedtuple
headers = srcfile.next().strip().split('\t')
datarow = namedtuple('datarow', headers)
## Wrap with parser and namedtuple
data = (parse_row(row) for row in data)
data = (datarow(*row) for row in data)
## Group by the leading integer columns
grouped_rows = defaultdict(list)
for row in data:
integer_fields = [field for field in row if isinstance(field, int)]
grouped_rows[tuple(integer_fields)].append(row)
## DO SOMETHING INTERESTING WITH THE GROUPS
import pprint
pprint.pprint(dict(grouped_rows))
EDIT You may find the code at https://gist.github.com/985882 useful.