Load google sheet csv as dictionary in python - python

I have a public google sheet with 2 sheets https://docs.google.com/spreadsheets/d/14hFn00O9632n96Z2xGWvfrcY-K4kHiOGR02Rx7dsj54/edit#gid=447738801
I know how to wget this as a csv (if it had only 1 sheet), but is there a simple way of getting sheet 1 and sheet 2 as a dictionary or as a csv file (each sheet as 1 csv file) and I will parse it.
gid of both sheets are different
In the end I will have header as key and values below the header as values

Use the requests library to download each sheet, and write the response content to a file.
Working implementation:
import requests
sheets = {
'sheet1': 'https://docs.google.com/spreadsheets/d/14hFn00O9632n96Z2xGWvfrcY-K4kHiOGR02Rx7dsj54/export?format=csv&id=14hFn00O9632n96Z2xGWvfrcY-K4kHiOGR02Rx7dsj54&gid=0',
'sheet2': 'https://docs.google.com/spreadsheets/d/14hFn00O9632n96Z2xGWvfrcY-K4kHiOGR02Rx7dsj54/export?format=csv&id=14hFn00O9632n96Z2xGWvfrcY-K4kHiOGR02Rx7dsj54&gid=447738801'
}
for sheet in list(sheets.keys()):
response = requests.get(sheets[sheet])
with open(f'{sheet}.csv', 'wb') as csvfile:
csvfile.write(response.content)
This will save each sheet in a file (sheet1.csv and sheet2.csv in this case). Note that I got the link for each sheet just by downloading it as CSV from a browser and copying the download link.
You can then convert it to a dictionary using the CSV library. See this post.

Just use pandas to load CSV. Then you can convert into Dict or anything else later
import pandas as pd
# Read data from file 'filename.csv'
data = pd.read_csv("filename.csv")
data.to_dict('series')

Related

Why did my code (that was supposed to put in column headers) wipe the whole Excel sheet blank with no headers?

I wrote this simple program for writing column headers to empty cells above a data table in a pre-existing Excel .xlsx file. I don't get any errors when I run this, but when I open the file (with a single sheet), the whole table is gone and it's blank; there's not even any of the headers that it was supposed to write in. Can anyone please help me figure out why this happened? I can get the data again, I just need this to work.
import pandas as pd
from openpyxl import load_workbook
headers = []
# code not shown, but just prompts user for column headers and saves in list 'headers'
#open .xlsx file
book = load_workbook(r'path.xlsx')
writer = pd.ExcelWriter(r'path.xlsx', engine='xlsxwriter')
#write column headers from list into empty cells
writer.columns = headers[:]
#save and close
writer.save()
writer.close()
book.close()
You can try out this code
import pandas as pd
# Read excel file (.xlsx)
book_df = pd.read_excel(r'path.xlsx')
# headers is list of header names like ['header_1','header_2','header_3']
book_df.columns = headers
book_df.to_excel(r'modified_file_name.xlsx',index=False)
# In case you want the file in the same name , make sure the file is not open else you may get permission error
book_df.to_excel(r'path.xlsx',index=False)

python sqlite3 uploading data to DB from excel(xls/xlsx)

I'm trying to create a DB from an excel spreadsheet. I can fetch data from excel and display in the html page, but I am not able to store it in sqlite db.
Few ways you can try:
Save excel as csv. Read csv in python (link) and save in sqlite (link).
Read excel into a pandas dataframe (link), and then save dataframe to sqlite (link).
Read excel directly from python (link) and save data to sqlite.
I used below code which worked but it over rights file.
#import pandas software library
import pandas as pd
df = pd.read_excel(r'C:\Users\kmc487\PycharmProjects\myproject\Product List.xlsx')
#Print sheet1
print(df)
df.to_excel("output.xlsx", sheet_name="Sheet_1")
Below are the input file details:
My input file is in .xlsx format and file is stored as .xls(Need code to .xlsx format)
File has heading in second row(First row blank)

Python save Excel .xlsx to CSV/XML and also save styling Information for conversion back into .xsls

My Python program converts Excel files (.xlsx) into a CSV file using Panda's read_excel and to_csv function, and at some point in the future, the CSV is converted back into an Excel file. Maintaining the data is fine, but of course all of the formatting and styling is gone. So I could use some help in being able to capture the that information to use when after converting the CSV back into an Excel file.
import pandas as pd
import xlsxwriter
EXCEL_PATH_FROM = r'C:\absolute\path\to\excel.xlsx'
EXCEL_PATH_TO = r'C:\absolute\path\to\other\excel.xlsx'
CSV_PATH = r'C:\absolute\path\to\csv.csv'
# read excel and convert to csv
def saveData():
read_excel = pd.read_excel(EXCEL_PATH_FROM)
print("writing csv...")
read_excel.to_csv(CSV_PATH, index=None, header=True)
# get csv data and import that data into an excel file
def createFromData():
csv = pd.read_csv(CSV_PATH)
excel = pd.ExcelWriter(EXCEL_PATH_TO, engine='xlsxwriter')
csv.to_excel(excel, index=None)
excel.save()
Some ideas I had were to save the Excel as a XML and insert format and style information as attributes or something, or to create both a CSV and XML from the Excel (one for data and one for styling). One problem I have is figuring out how to access that information.
Are there currently any packages that support Python 3 (currently using 3.8) that could help simplify this process? I dug through openpyxl's documentation and they have some stylesheet classes that aren't meant to be used directly I don't think and I couldn't figure out how to use them directly.

Is it possible to append data to an xls file in Python?

I am trying to add a large dataset to an existing xls spreadsheet.
I'm currently writing to it using a pandas dataframe and the .to_excel() function, however this erases the existing data in the (multi-sheet) workbook. The existing spreadsheet is very large and complex,it also interacts with several other files, so I can't convert it to xlsx or read and rewrite all of the data, as I've seen some suggestions on other questions. I want the data that I am adding to be pasted starting from a set row in an existing sheet.
Yes , you can use the library xlsxwriter , link= https://xlsxwriter.readthedocs.io
code example :
import xlsxwriter
Name="MyFile"+".xlsx"
workbook = xlsxwriter.Workbook(Name)
worksheet = workbook.add_worksheet()
worksheet.write("A1", "Incident category".decode("utf-8"))
worksheet.write("B1", "Longitude".decode("utf-8"))
worksheet.write("C1", "Latitude".decode("utf-8"))
workbook.close()

Python: How to handle excel data from web without saving file

I'm new to python and having trouble dealing with excel manpulation in python.
So here's my situation: I'm using requests to get a .xls file from a web server. After that I'm using xlrd to save the content in excel file. I'm only interested in one value of that file, and there are thousands of files im retrieving from different url addresses.
I want to know how could i handle the contents i get from request in some other way rather than creating a new file.
Besides, i've included my code my comments on how could I improve it. Besides, it doesn't work, since i'm trying to save new content in an already created excel file (but i couldnt figure out how to delete the contents of that file for my code to work (even if its not efficient)).
import requests
import xlrd
d={}
for year in string_of_years:
for month in string_of_months:
dls=" http://.../name_year_month.xls"
resp = requests.get(dls)
output = open('temp.xls', 'wb')
output.write(resp.content)
output.close()
workbook = xlrd.open_workbook('temp.xls')
worksheet = workbook.sheet_by_name(mysheet_name)
num_rows = worksheet.nrows
for k in range(num_rows):
if condition I'm looking for:
w={key_year_month:worksheet.cell_value(k,0)}
dic.update(w)
break
xlrd.open_workbook can accept a string for the file data instead of the file name. Your code could pass the contents of the XLS, rather than creating a file and passing its name.
Try this:
# UNTESTED
resp = requests.get(dls)
workbook = xlrd.open_workbook(file_contents=resp.content)
Reference: xlrd.open_workbook documentation
Save it and then delete the file readily on each loop after the work with os.
import os
#Your Stuff here
os.remove(#path to temp_file)

Categories

Resources