I'm working with http://pygsheets.readthedocs.io/en/latest/index.html a wrapper around the google sheets api v4.
I have a small script where I am trying to select a json sheet to upload to a google sheets worksheet. The file name is made up of the:
spreadsheetname_year_month_xxx.json
The code:
import tkinter as tk
import tkFileDialog
import pygsheets
root = tk.Tk()
root.withdraw()
file_path = tkFileDialog.askopenfilename()
print file_path
file_name = file_path.split('/')[-1]
print file_name
file_name_segments = file_name.split('_')
spreadsheet = file_name_segments[0]
worksheet = file_name_segments[1]+'_'+file_name_segments[2]
gc = pygsheets.authorize(outh_file='client_secret_xxxxxx.apps.googleusercontent.com.json')
ssheet = gc.open(spreadsheet)
ws = ssheet.add_worksheet(worksheet(ssheet,str(raw_input(file_path))))
The file path leads to a generated json file that looks like:
{
"count": 12,
"results": [
{
"case": "abc1",
"case_name": "invalid",
"case_type": "invalid",
},
{
"case": "abc2",
"case_name": "invalid",
"case_type": "invalid",
},
............
I am getting :
File "upload_to_google_sheets.py", line 27, in <module>
ws = ssheet.add_worksheet(worksheet(ssheet,str(raw_input(file_path))))
TypeError: 'unicode' object is not callable
As you can see I'm trying to instantiate a worksheet with the json data. What am I doing wrong?
The JsonSheet param is not for values of the worksheet, it is for specifying the properties of the worksheet, hence should be of this format.
As directly converting a json to spreadsheet is rather ambiguous, you will need to first convert it to a numpy array (matrix) or an pandas DataFrame. Then you can use set_as_df of just update_values for updating the spreadsheet.
Related
I am trying to convert a CSV file to JSON but there is a header in my csv that is empty. Is there a way to name it when outputting it to JSON?
Example data
"" Calories Fat Sodium
Bread 100 10 23
I got this code from geeksforgeeks
import csv
import json
# Function to convert a CSV to JSON
# Takes the file paths as arguments
def make_json(csvFilePath, jsonFilePath):
# create a dictionary
data = {}
# Open a csv reader called DictReader
with open(csvFilePath, encoding='utf-8') as csvf:
csvReader = csv.DictReader(csvf)
# Convert each row into a dictionary
# and add it to data
for rows in csvReader:
# Assuming a column named 'No' to
# be the primary key
key = rows['']
data[key] = rows
# Open a json writer, and use the json.dumps()
# function to dump data
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(data, indent=4))
# Driver Code
# Decide the two file paths according to your
# computer system
csvFilePath = r'Names.csv'
jsonFilePath = r'Names.json'
# Call the make_json function
make_json(csvFilePath, jsonFilePath)
I did this and it gets the first row, but i'm not sure how to rename it when i output it to JSON.
It appears as "":"Bread" in the JSON file.
key = rows['']
Thanks in advance if anyone can help!
Edit: Expected output
{
"Food": "Bread",
"Calories": "45",
"Fat (g)": "0",
"Carb. (g)": "11",
"Fiber (g)": "0",
"Protein": "0",
"Sodium": "10"
}
(beginner) I'm attempting to copy values from one spreadsheet to another using python. I'm using gspread but I can't seem to figure out how to copy the values from my first spreadsheet to the other. How can I copy values from the first spreadsheet and paste it on the other using python?
Here is the updated code:import gspread
from oauth2client.service_account import ServiceAccountCredentials
scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/spreadsheets','https://www.googleapis.com/auth/drive.file','https://www.googleapis.com/auth/drive']
creds = ServiceAccountCredentials.from_json_keyfile_name('sheetproject-956vg2854670.json',scope)
client = gspread.authorize(creds)
spreadsheetId= "1CkeZ8Xw4Xmho-mrFP9teHWVT-unkApzMSUFql5KkrGI"
sourceSheetName = "python test"
destinationSheetName = "python csv"
client = gspread.authorize(creds)
spreadsheet = client.open_by_key(spreadsheetId)
sourceSheetId = spreadsheet.worksheet("python test")._properties['0']. #gid value in the url
destinationSheetId = spreadsheet.worksheet("python csv")._properties['575313690'] #gid value in the url
body = {
"requests": [
{
"copypaste": {
"source": {
"sheetId": 0,
"startRowIndex": 0,
"endRowIndex": 20,
"startColumnIndex": 0,
"endcolumnIndex": 1
},
"destination": {
"sheetId": 575313690,
"startRowIndex": 0,
"endRowIndex": 20,
"startColumnIndex": 0,
"endcolumnIndex": 1
},
"pasteType": "Paste_Normal"
}
}
]
}
res = spreadsheet.batch_update(body)
print(res)
You want to copy the values from a sheet to other sheet in a Google Spreadsheet.
You want to achieve this using gspread with python.
You have already been able to get and put values for Google Spreadsheet using Google Sheets API.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
In this answer, I would like to propose to use batch_update for copying the values from from a sheet to other sheet in the Spreadsheet. In this case, your goal can be achieved by one API call.
Sample script:
In this sample script, the script of authorization is removed. The sample script for copying values from a sheet to other sheet in the Spreadsheet is shown. So when you use this script, please add the authorization script, and run the script.
spreadsheetId = "###" # Please set the Spreadsheet ID.
sourceSheetName = "Sheet1" # Please set the sheet name of source sheet.
destinationSheetName = "Sheet2" # Please set the sheet name of destination sheet.
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(spreadsheetId)
sourceSheetId = spreadsheet.worksheet(sourceSheetName)._properties['sheetId']
destinationSheetId = spreadsheet.worksheet(destinationSheetName)._properties['sheetId']
body = {
"requests": [
{
"copyPaste": {
"source": {
"sheetId": sourceSheetId,
"startRowIndex": 0,
"endRowIndex": 5,
"startColumnIndex": 0,
"endColumnIndex": 5
},
"destination": {
"sheetId": destinationSheetId,
"startRowIndex": 0,
"endRowIndex": 5,
"startColumnIndex": 0,
"endColumnIndex": 5
},
"pasteType": "PASTE_VALUES"
}
}
]
}
res = spreadsheet.batch_update(body)
print(res)
Note:
In this sample script, the values of cells of A1:E5 in the sheet of "Sheet1" are copied to the cells of A1:E5 in the sheet of "Sheet2".
In this case, the range is given by the GridRange.
This is a simple sample script. So please modify this for your actual situation.
References:
batch_update
Method: spreadsheets.batchUpdate
CopyPasteRequest
GridRange
If I misunderstood your question and this was not the direction you want, I apologize.
1) Import openpyxl library as xl.
2) Open the source excel file using the path in which it is located.
Note: The path should be a string and have double backslashes (\) instead of single backslash (). Eg: Path should be C:\Users\Desktop\source.xlsx Instead of C:\Users\Admin\Desktop\source.xlsx
3) Open the required worksheet to copy using the index of it. The index of worksheet ānā is ān-1ā. For example, the index of worksheet 1 is 0.
4) Open the destination excel file and the active worksheet in it.
5) Calculate the total number of rows and columns in source excel file.
6) Use two for loops (one for iterating through rows and another for iterating through columns of the excel file) to read the cell value in source file to a variable and then write it to a cell in destination file from that variable.
7) Save the destination file.
here is a py only code example from How to copy over an Excel sheet to another workbook in Python:
import openpyxl as xl
path1 = 'C:\\Users\\Xukrao\\Desktop\\workbook1.xlsx'
path2 = 'C:\\Users\\Xukrao\\Desktop\\workbook2.xlsx'
wb1 = xl.load_workbook(filename=path1)
ws1 = wb1.worksheets[0]
wb2 = xl.load_workbook(filename=path2)
ws2 = wb2.create_sheet(ws1.title)
for row in ws1:
for cell in row:
ws2[cell.coordinate].value = cell.value
wb2.save(path2)
Using only gspread
For copying the whole sheet to another spreadsheet:
import gspread
client = gspread.authorize(<put here credentials>)
source_gsheet_url = '<put here url>'
source_worksheet_name = '<put here name>'
dest_gsheet_url = '<put here url>'
ws = gc.open_by_url(source_gsheet_url).worksheet(source_worksheet_name)
ws.copy_to(dest_gsheet_url)
while working with small zipfiles(about 8MB) containg 25MB of CSV files the below code works exactly as it should. As soon as I attempt to download larger files (45MB zip file containing a 180MB csv) the code breaks and I get the following error message:
(venv) ufulu#ufulu awr % python get_awr_ranking_data.py
https://api.awrcloud.com/v2/get.php?action=get_topsites&token=REDACTED&project=REDACTED Client+%5Bw%5D&fileName=2017-01-04-2019-10-09
Traceback (most recent call last):
File "get_awr_ranking_data.py", line 101, in <module>
getRankingData(project['name'])
File "get_awr_ranking_data.py", line 67, in getRankingData
processRankingdata(rankDateData['details'])
File "get_awr_ranking_data.py", line 79, in processRankingdata
domain.append(row.split("//")[-1].split("/")[0].split('?')[0])
AttributeError: 'float' object has no attribute 'split'
My goal is to download data for 170 projects and save the data to sqlite DB.
Please bear with me me as I am a novice in the field of programming and python. I would greatly appreciate any help to fixing the code below as well as any other sugestions and improvements to making the code more robust and pythonic.
Thanks in advance
from dotenv import dotenv_values
import requests
import pandas as pd
from io import BytesIO
from zipfile import ZipFile
from sqlalchemy import create_engine
# SQL Alchemy setup
engine = create_engine('sqlite:///rankingdata.sqlite', echo=False)
# Excerpt from the initial API Call
data = {'projects': [{'name': 'Client1',
'id': '168',
'frequency': 'daily',
'depth': '5',
'kwcount': '80',
'last_updated': '2019-10-01',
'keywordstamp': 1569941983},
{
"depth": "5",
"frequency": "ondemand",
"id": "194",
"kwcount": "10",
"last_updated": "2019-09-30",
"name": "Client2",
"timestamp": 1570610327
},
{
"depth": "5",
"frequency": "ondemand",
"id": "196",
"kwcount": "100",
"last_updated": "2019-09-30",
"name": "Client3",
"timestamp": 1570610331
}
]}
#setup
api_url = 'https://api.awrcloud.com/v2/get.php?action='
urls = [] # processed URLs
urlbacklog = [] # URLs that didn't return a downloadable File
# API Call to recieve URL containing downloadable zip and csv
def getRankingData(project):
action = 'get_dates'
response = requests.get(''.join([api_url, action]),
params=dict(token=dotenv_values()['AWR_API'],
project=project))
response = response.json()
action2 = 'topsites_export'
rankDateData = requests.get(''.join([api_url, action2]),
params=dict(token=dotenv_values()['AWR_API'],
project=project, startDate=response['details']['dates'][0]['date'], stopDate=response['details']['dates'][-1]['date'] ))
rankDateData = rankDateData.json()
print(rankDateData['details'])
urls.append(rankDateData['details'])
processRankingdata(rankDateData['details'])
# API Call to download and unzip csv data and process it in pandas
def processRankingdata(url):
content = requests.get(url)
# {"response_code":25,"message":"Export in progress. Please come back later"}
if "response_code" not in content:
f = ZipFile(BytesIO(content.content))
#print(f.namelist()) to get all filenames in Zip
with f.open(f.namelist()[0], 'r') as g: rankingdatadf = pd.read_csv(g)
rankingdatadf = rankingdatadf[rankingdatadf['Search Engine'].str.contains("Google")]
domain = []
for row in rankingdatadf['URL']:
domain.append(row.split("//")[-1].split("/")[0].split('?')[0])
rankingdatadf['Domain'] = domain
rankingdatadf['Domain'] = rankingdatadf['Domain'].str.replace('www.', '')
rankingdatadf = rankingdatadf.drop(columns=['Title', 'Meta description', 'Snippet', 'Page'])
print(rankingdatadf['Search Engine'][0])
writeData(rankingdatadf)
else:
urlbacklog.append(url)
pass
# Finally write the data to database
def writeData(rankingdatadf):
table_name_from_file = project['name']
check = engine.has_table(table_name_from_file)
print(check) # boolean
if check == False:
rankingdatadf.to_sql(table_name_from_file, con=engine)
print(project['name'] + ' ...Done')
else:
print(project['name'] + ' ... already in DB')
for project in data['projects']:
getRankingData(project['name'])
The problem seems to be the split call on a float and not necessarily the download. Try changing line 79
from
domain.append(row.split("//")[-1].split("/")[0].split('?')[0])
to
domain.append(str(str(str(row).split("//")[-1]).split("/")[0]).split('?')[0])
It looks like you're trying to parse the network location portion of the URL here, you can also use urllib.parse to make this easier instead of chaining all the splits:
from urllib.parse import urlparse
...
for row in rankingdatadf['URL']:
domain.append(urlparse(row).netloc)
I think a malformed URL is causing you issues, try (to diagnose issue):
try :
for row in rankingdatadf['URL']:
try:
domain.append(urlparse(row).netloc)
catch Exception:
exit(row)
Looks like you figured it out above, you have a database entry with a NULL value for the URL field. Not sure what your fidelity requirements for this data set are but might want to enforce database rules for URL field, or use pandas to drop rows where URL is NaN.
rankingdatadf = rankingdatadf.dropna(subset=['URL'])
I am trying to write data to an excel file. Every link that meets the requirements in the if-test should be written out in the excel file. It starts writing at (0,0) and goes on downwards in the same column (0,1),(0,2).. (0,3) etc. The problem is that it writes out data to the excel file, but only when the if-test has reached its last time.
Json-file:
[
{
"beds": "3",
"bath": "2",
"link": "https://www.realestate.com/5619-w-michelle-dr-glendale-az-85308--790",
"price": "382,76"
},
{
"beds": "3",
"bath": "1",
"link": "https://www.realestate.com/5619-w-michelle-dr-glendale-az-85308--790",
"price": "382,76"
},
{
"beds": "2",
"bath": "3",
"link": "https://www.realestate.com/5619-w-michelle-dr-glendale-az-85308--790",
"price": "382,76"
},
{
"beds": "3",
"bath": "2",
"link": "https://www.realestate.com/5619-w-michelle-dr-glendale-az-85308--790",
"price": "382,76"
}
]
Python code: Tried this
import json
import re
from xlwt import Workbook
class Products:
def __init__(self):
self.list_links=[]
def product(self,index):
for k, v in index.items():
if k=='link':
link=v
if k=='bath':
bath=v
fl_bath=int(bath)
wb=Workbook()
sheet1=wb.add_sheet('sheet1')
sheet1.col(0).width = 7000
if fl_bath >= 2:
length=len(self.list_links)
sheet1.write(length,0,link)
self.list_links.append(link)
print(link)
wb.save("python.xls")
with open('./try.json') as json_file:
data = json.load(json_file)
i=0
p=Products()
while i <= 3:
dicts = data[i]
p.product(dicts)
i+=1
It should write out the links downwards in each row in the excel file, every links thats meets the requirments:
row1: https://www.realestate.com/5619-w-michelle-dr-glendale-az-85308--790
row2:
https://www.realestate.com/5619-w-michelle-dr-glendale-az-85308--790
row3:
https://www.realestate.com/5619-w-michelle-dr-glendale-az-85308--790
I get this output (excel-file):
row1:
row2:
row3:
https://www.realestate.com/5619-w-michelle-dr-glendale-az-85308--790
3 of the links meets the criterium. But only the last one in the iteration gets written out in the excel file. Are they being overwritten in some way after each iteration? Any good tips on how to fix this?
You can simplify your code since the requirement is a simple greater-than comparison:
import json
from xlwt import Workbook
with open('inputFile.json') as json_file:
data = json.load(json_file)
wb = Workbook()
firstSheet = wb.add_sheet('sheet1')
firstSheet.col(0).width = 7000
row = -1
for item in data:
if int(item['bath']) >= 2:
row = row + 1
firstSheet.write(row,0,item['link'])
wb.save("outputFile.xls")
It seems like you are overwriting Excel file each time? Move your workbook definition code to the init of Product class and save function to separate class method and call it after processing of your dict.
Problem here is that for every iteration you are creating a new workbook and a sheet and writing one link and saving it as "python.xls" every time. You should create the workbook outside the the function, only once and in the function product write a link to it. Something like this:
import json
from xlwt import Workbook
wb = Workbook()
sheet1 = wb.add_sheet('sheet1')
sheet1.col(0).width = 7000
class Products:
def __init__(self):
self.list_links=[]
def product(self,index):
for k, v in index.items():
if k=='link':
link = v
if k=='bath':
bath = v
fl_bath=int(bath)
if fl_bath >= 2:
length=len(self.list_links)
sheet1.write(length,0,link)
self.list_links.append(link)
print(link)
with open('./try.json') as json_file:
data = json.load(json_file)
while i <= 3:
dicts = data[i]
p.product(dicts)
i+=1
wb.save("python.xls")
I would like to export my elastic search data that is being returned from my query to a CSV file. I have been looking everywhere to do this but all my attemps on this is failing on how to do it.
my query is below;
import elasticsearch
import unicodedata
import csv
es = elasticsearch.Elasticsearch(["9200"])
# this returns up to 100 rows, adjust to your needs
res = es.search(index="search", body=
{
"_source": ["CN", "NCR", "TRDT"],
"query": {
"bool": {
"should": [
{"wildcard": {
"CN": "TEST*"
}
}]
}
}},size=10)
sample = res['hits']['hits']
# then open a csv file, and loop through the results, writing to the csv
with open('outputfile.tsv', 'wb') as csvfile:
filewriter = csv.writer(csvfile, delimiter='\t', # we use TAB delimited, to handle cases where freeform text may have a comma
quotechar='|', quoting=csv.QUOTE_MINIMAL)
# create column header row
filewriter.writerow(["CN", "NCR", "TRDT"]) #change the column labels here
# fill columns 1, 2, 3 with your data
col1 = hit["some"]["deeply"]["nested"]["field"].decode('utf-8') #replace these nested key names with your own
col1 = col1.replace('\n', ' ')
# col2 = , col3 = , etc...
for hit in sample:
filewriter.writerow(col1)
could someone fix this code up for it to work please, been spending hours on this to make it work.
the error i am getting when running this is below:
Traceback (most recent call last):
File "C:/Users/.PyCharmCE2017.2/config/scratches/trtr.py", line 30, in <module>
filewriter.writerow(["CN", "NCR", "TRDT"]) # change the column labels here
TypeError: a bytes-like object is required, not 'str'
thank you in advance.
Replace wb with w in with open('outputfile.tsv', 'wb') since you are using Python 3.x