Scrape images from excel file and store into dictionary - Python - python

I am trying to scrape images from an excel file and store them into a dictionary. However I am running into some issues with the package I am using. Below is my code.
#loading the Excel File and the sheet
pxl_doc = openpyxl.load_workbook(excel_file_path)
sheet = pxl_doc["Mud Motor Report "]
# calling the image_loader
image_loader = SheetImageLoader(sheet)
# get the image (put the cell you need instead of 'A1')
image_number = 0
for cell in image_loader._images:
image = image_loader.get(cell)
dict_new["image_{}".format(image_number)] = image
# showing the image
# image.show()
image_number += 1
The code above works fine, however once I get to the 7th file in my loop I hit an unusual error saying
ValueError: I/O operation on closed file.
I found this issue on another thread with no solution. The error is misleading as the file is open and is able to be read. Reading image from xlsx python won't work - I/O operation on closed file
I was wondering if there is another way to scrape images from an excel file and put them into a dictionary. I would later insert this dictionary into a sql database.

Related

How to save python notebook cell code to file in Colab

TLDR: How can I make a notebook cell save its own python code to a file so that I can reference it later?
I'm doing tons of small experiments where I make adjustments to Python code to change its behaviour, and then run various algorithms to produce results for my research. I want to save the cell code (the actual python code, not the output) into a new uniquely named file every time I run it so that I can easily keep track of which experiments I have already conducted. I found lots of answers on saving the output of a cell, but this is not what I need. Any ideas how to make a notebook cell save its own code to a file in Google Colab?
For example, I'm looking to save a file that contains the entire below snippet in text:
df['signal adjusted'] = df['signal'].pct_change() + df['baseline']
results = run_experiment(df)
All cell codes are stored in a List variable In.
For example you can print the lastest cell by
print(In[-1]) # show itself
# print(In[-1]) # show itself
So you can easily save the content of In[-1] or In[-2] to wherever you want.
Posting one potential solution but still looking for a better and cleaner option.
By defining the entire cell as a string, I can execute it and save to file with a separate command:
cell_str = '''
df['signal adjusted'] = df['signal'].pct_change() + df['baseline']
results = run_experiment(df)
'''
exec(cell_str)
with open('cell.txt', 'w') as f:
f.write(cell_str)

openpyxl blocking excel file after first read

I am trying to overwrite a value in a given cell using openpyxl. I have two sheets. One is called Raw, it is populated by API calls. Second is Data that is fed off of Raw sheet. Two sheets have exactly identical shape (cols/rows). I am doing a comparison of the two to see if there is a bay assignment in Raw. If there is - grab it to Data sheet. If both Raw and Data have the value in that column missing - then run a complex Algo (irrelevant for this question) to assign bay number based on logic.
I am having problems with rewriting Excel using openpyxl.
Here's example of my code.
data_df = pd.read_excel('Algo Build v23test.xlsx', sheet_name='MondayData')
raw_df = pd.read_excel('Algo Build v23test.xlsx', sheet_name='MondayRaw')
no_bay_res = data_df[data_df['Bay assignment'].isnull()].reset_index() #grab rows where there is no bay assignment in a specific column
book = load_workbook("Algo Build v23test.xlsx")
sheet = book["MondayData"]
for index, reservation in no_bay_res.iterrows():
idx = int(reservation['index'])
if pd.isna(raw_df.iloc[idx, 13]):
continue
else:
value = raw_df.iat[idx,13]
data_df.iloc[idx, 13] = value
sheet.cell(idx+2, 14).value = int(value)
book.save("Algo Build v23test.xlsx")
book.close()
print(value) #302
Now the problem is that it seems that book.close() is not working. Book is still callable in python. Now, it overwrites Excel totally fine. However, if I try to run these two lines again
data_df = pd.read_excel('Algo Build v23test.xlsx', sheet_name='MondayData')
raw_df = pd.read_excel('Algo Build v23test.xlsx', sheet_name='MondayRaw')
I am getting datasets full of NULL values, except for the value that was replaced. (attached the image).
However, if I open that Excel file manually from the folder and save it (CTRL+S) and try running the code again - it works properly. Weirdest problem.
I need to loop the code above for Monday-Sunday, so I need it to be able to read the data again without manually resaving the file.
Due to some reason, pandas will read all the formulas as NaN after the file been used in the script by openpyxl until the file has been opened, saved and closed. Here's the code that helps do that within the script. However, it is rather slow.
import xlwings as xl
def df_from_excel(path, sheet_name):
app = xl.App(visible=False)
book = app.books.open(path)
book.save()
app.kill()
return pd.read_excel(path, sheet_name)
I got the same problem, the only workaround I found is to terminate the excel.exe manually from taskmanager. After that everything went fine.

Problem while inserting a text file/an image into SQlite using flask:python

I'm trying to insert a text file and an image into SQlite database but the code I'm using doesn't work,even though there are no errors displayed but it won't store anything in the BLOB image column of SQLite.I have been searching the internet but couldn't find what I'm looking for.
The python code I'm using is:
data=sqlite3.Binary(f.read())
new_file = Samples(Samplename=filename,imagefile=data)
db.session.add(new_file)
db.session.commit()
And plus I dont want to save the filepath ,I want to save the file itself,even if it is not a good practice.Kindly help.TIA

Saving scraped documents in two sheets in an excel file

I've created a scraper which is supposed to parse some documents from a webpage and save it to an excel file creating two sheets. However, when I run it, I can see that It only saves the documents of last link in a single sheet whereas there should be two sheets with documents from two links properly. I even printed the results to see what is happening in the background but i found there nothing wrong. I thing the first sheet is overwritten and second one is never created. How to get around this so that data will be saved in two sheets in an excel file. Thanks in advance to take a look into it.
Here is my code:
import requests
from lxml import html
from pyexcel_ods3 import save_data
name_list = ['Altronix','APC']
def docs_parser(link, name):
res = requests.get(link)
root = html.fromstring(res.text)
vault = {}
for post in root.cssselect(".SubBrandList a"):
if post.text == name:
refining_docs(post.attrib['href'], vault)
def refining_docs(new_link, vault):
res = requests.get(new_link).text
root = html.fromstring(res)
sheet = root.cssselect("#BrandContent h2")[0].text
for elem in root.cssselect(".ProductDetails"):
name_url = elem.cssselect("a[class]")[0].attrib['href']
vault.setdefault(sheet, []).append([str(name_url)])
save_data("docs.ods", vault)
if __name__ == '__main__':
for name in name_list:
docs_parser("http://store.immediasys.com/brands/" , name)
But, the same way when I write code for another site, it meets the expectation creating different sheets and saving documents in those. Here is the link:
https://www.dropbox.com/s/bgyh1xxhew8hcvm/Pyexcel_so.txt?dl=0
Question: I thing the first sheet is overwritten and second one is never created. How to get around this so that data will be saved in two sheets in an excel file.
You overwrite the Workbook File on every Link that's be appended.
You should never call save_data(... within a loop, only once at the End of your Script.
Comparing you Two Scripts there is No difference, both behave the same, again and again overwriting the Workbook File. Maybe the File IO get overloaded as you doing more than 160 Times overwriting the Workbook File within a short Time.
The First Script should create 13 Sheets:
data sheet:powerpivot-etc links:20
data sheet:flappy-owl-videos links:1
data sheet:reporting-services-videos links:20
data sheet:csharp links:14
data sheet:excel-videos links:9
data sheet:excel-vba-videos links:20
data sheet:sql-server-videos links:9
data sheet:report-builder-2016-videos links:4
data sheet:ssrs-2016-videos links:5
data sheet:sql-videos links:20
data sheet:integration-services links:19
data sheet:excel-vba-user-form links:20
data sheet:archived-videos links:16
The Second Script should create 2 Sheets:
vault sheet:Altronix links:16
vault sheet:APC links:16

Using win32com via python to scrape excel file for chart objects and convert them to images

I am trying to scrape a .xlsx excel file for chart objects and export them as images. The only similar stackoverflow question I found was this one which attempts to do the same thing. The script, however, does not seem to work (even when I correct the syntax/methods).
I am willing to get this running in either Python 2.7.9 or 3.4.0. as I have both versions running on my computer.
Here is the code I am working with:
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(r'C:\Users\Emilyn\Desktop\chartTest.xlsx')
excel.Visible = True
wb.Sheets("Sheet1").Select()
wbSheetOne = wb.Sheets(1)
wb.DisplayAlerts = False
i = 0
for chart in wbSheetOne.ChartObjects():
print(chart.Name)
chart.CopyPicture()
excel.ActiveWorkbook.Sheets.Add(After =excel.ActiveWorkbook.Sheets(3)).Name="temp_sheet" + str(i)
temp_sheet = wb.ActiveSheet
cht = wb.ActiveSheet.ChartObjects().Add(0,0,800,600)
cht.Chart.Export("chart" + str(i) + ".png")
i = i+1
excel.ActiveWorkbook.close
wb.DisplayAlerts = True
This opens my excel file, generates three .png images in my documents folder, and creates three new worksheets for the images, but the images are all blank.I am not sure what I can do to get the chart objects in my excel file to correctly copy to these newly created images.
Any help I could get on this would be greatly appreciated as there seems to be no in depth documentation on pywin/win32com anywhere.
I've been searching the internet like mad and trying to get this to work for a day or two now... It's hard to get something to work when you don't know all of the methods available, or even what some of the methods do.
(Yes, I have read all the "read me" files that came with the library and read what they offered on their website as well.)
I already figured out what to do but I suppose I'll post it for future users.
for index in range(1, count + 1):
currentChart = wbSheet.ChartObjects(index)
currentChart.Copy
currentChart.Chart.Export("chart" + str(index) + ".png")
I used a count to do a for loop, this way you dynamically read the amount of chart objects in an excel file.
Also, the reason I started the range at 1 is because VB in excel starts index of objects at 1, not zero.
You copy an existing chart as a picture, but don't do anything with it.
You insert a worksheet, without adding any data, then you embed a chart in that worksheet, which must be blank since there's no data to display, then export the blank chart as a png file.

Categories

Resources