Python pathlib not accepted by Pandas.ExcelWriter(..) - python

After finally getting Pandas to output my file as a spreadsheet (silly issue), I ran into an issue with the file path input not being read properly, even as a raw string. Eventually I stumbled upon the pathlib library, and it's been successful in allowing me to read and write to my file as needed. However, now I'm back to ExcelWriter complaining about my file again.
Here's my ugly attempt at using pathlib to create the paths (directories substituted):
import pandas as pd
core = input("Core number ('core_#k'): ")
part = input("Part name ('part_#'): ")
file_level = 'core_' + core + 'k'
in_file_name = file_level + '_part_' + part + '.txt'
out = in_file_name[:-4] + '.ods'
# format filenames
path_raw = Path("DRIVELABEL:\someDirectories\\")
raw_text_path = Path(path_raw) / "raw_text" / file_level / in_file_name
spreadsheet_path = Path(path_raw) / "spreadsheets" / file_level / out
Error:
FileNotFoundError: [Errno 2] No such file or directory: 'DRIVELABEL:/SomeDirectories/outputFile.ods'
I can't figure out how to get ExcelWriter to write this file properly. The only thing I can think right now is that the file_level for spreadsheet_path doesn't exist yet, but shouldn't it be created at write?
with pd.ExcelWriter(spreadsheet_path.as_posix(), engine='odf') as doc:
df.to_excel(doc, sheet_name="Sheet1", index=False)

Related

Creating Corpus from wiki dump file using Jupyter notebook

I'm trying to follow this page to create a wiki corpus, but I'm using Jupiter notebook https://www.kdnuggets.com/2017/11/building-wikipedia-text-corpus-nlp.html
this is my code:
import sys
from gensim.test.utils import datapath
from gensim.corpora import WikiCorpus
path_to_wiki_dump = datapath("enwiki-latest-pages-articles.xml.bz2")
wiki = WikiCorpus(path_to_wiki_dump)
output = open('wiki_en.txt', 'w', encoding='utf-8')
i = 0
for text in wiki.get_texts():
output.write(bytes(' '.join(text), 'utf-8').decode('utf-8') + '\n')
i = i + 1
if (i % 10000 == 0):
print('Processed ' + str(i) + ' articles')
output.close()
print('Processing complete!')
The Error I got was
FileNotFoundError: [Errno 2] No such file or directory: '/opt/anaconda3/lib/python3.8/site-packages/gensim/test/test_data/enwiki-latest-pages-articles.xml.bz2'
All the files are in one place so I'm not sure what's wrong
Did you ever download the file enwiki-latest-pages-articles.xml.bz2 somehow, somewhere?
Did you specifically place it at the path /opt/anaconda3/lib/python3.8/site-packages/gensim/test/test_data/enwiki-latest-pages-articles.xml.bz2?
If not the datapath() function you're using won't construct the right path. (That particular function is meant to find a directory of test data bundled with Gensim, and shouldn't really be used to construct paths to your own dowloaded/created files!)
Instead of using that function, you should just specify the actual path, local to the Jupyter notebook server, where you put the file, as a string argument to WikiCorpus.

PANDAS: Execute to_excel successfully,but no output file

When I use to_excel to generate excel file, no error occurred , but no output file available. can't find anything,any file in my supposed location...
And I don't know why..
import pandas as pd
import os
import xlrd
import openpyxl
pd.set_option('display.width',None)
DIR = 'E:\Process'
datapath =os.path.join(DIR, 'Data.xlsx')
formatpath = os.path.join(DIR, 'Format.xlsx')
df = pd.read_excel(datapath)
df1=pd.read_excel(formatpath)
for i in range(0, len(df)):
target = (df.iloc[i,17])
df2 = df1
df2.iat[3,3] = target
print(df2)
filename = df.iloc[i,2]
filename = str(filename) + ".xlsx"
sourcepath = os.path.join(DIR, filename)
writer = pd.ExcelWriter(sourcepath)
df2.to_excel(writer)
print(sourcepath)
Expanding on the comment:
Use writer.save() after calling to_excel.
Alternatively you can use the with statement as suggested in the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.ExcelWriter.html
When I'm trying to re-run this code on my local machine, I'm running into an error with setting DIR to 'E:\Process' because of the escaping of "\" for windows machines. Have you tried "E:\\Process"? or try it as a raw string literal r'E:\Process'?

Objects of type 'WindowsPath' can not be converted to a COM VARIANT

I have an xlsx file which is a template for a receipt. It contains images and cells. I used to go into the file manually, update the information and then export to pdf before sending to my clients. I would like to be able to convert an xlsx to pdf through python if possible.
My problem is no one shows a tutorial which just chooses a xlsx file and changes it to pdf. Or no decent video tutorial.
I've tried getting openpyxl to save it as an extension with .pdf but i know that was a long shot. And i tried to follow an example on stack overflow but it didnt work that well.
I keep getting :
File "<COMObject <unknown>>", line 5, in ExportAsFixedFormat
Objects of type 'WindowsPath' can not be converted to a COM VARIANT
and I'm pretty stuck.
#this file will open a wb and save it as another file name
#this first part opens a file from a location and makes a copy to another location
from pathlib import Path
from win32com import client
#sets filename and file
file_name = 'After Summer Bookings.xlsx'
dir_path = Path('C:/Users/BOTTL/Desktop/Business')
new_file_name = 'hello.pdf'
new_save_place = Path('C:/Users/BOTTL/Desktop/Business Python')
xlApp = client.Dispatch("Excel.Application")
books = xlApp.Workbooks.Open(dir_path / file_name)
ws = books.Worksheets[0]
ws.Visible = 1
ws.ExportAsFixedFormat(0, new_save_place / new_file_name)
I'd like it to open the xlsx file I have called After Summer Bookings.xlsx and save it as a pdf file called hello.pdf
Solved it myself :)
from pathlib import Path
from win32com import client
#sets filename and file
file_name = 'After Summer Bookings.xlsx'
dir_path = Path('C:/Users/BOTTL/Desktop/Business')
new_file_name = 'hello.pdf'
new_save_place = ('C:/Users/BOTTL/Desktop/Business Python/')
path_and_place = new_save_place + new_file_name
xlApp = client.Dispatch("Excel.Application")
books = xlApp.Workbooks.Open(dir_path / file_name)
ws = books.Worksheets[0]
ws.Visible = 1
ws.ExportAsFixedFormat(0,path_and_place)
when concatenating the location and the filename it didn't like that I had made it a path, so now that I removed path, it works like a dream :)

Unable to save csv file with Pandas

Sorry for the dummy question but I read lots of topics but my code still do not create and save a .csv file.
import pandas as pd
def save_csv(lista):
try:
print("Salvando...")
name_path = time.strftime('%d%m%y') + '01' + '.csv'
df = pd.DataFrame(lista, columns=["column"])
df.to_csv(name_path, index=False)
except:
pass
dados = [-0.9143399074673653, -1.0944355744868517, -1.1022400576621294]
save_csv(dados)
Path name is 'DayMonthYear01.csv' (20121701.csv).
When I run the code it finishes but no file is saved.
The output of the code is just:
>>>
RESTART: C:\Users\eduhz\AppData\Local\Programs\Python\Python36-32\testeCSV.py
Salvando...
>>>
Does anyone knows what am I missing?
First, as answered by #Abdou I changed the code to provide me what was the error.
import pandas as pd
import time
def save_csv(lista):
try:
print("Salvando...")
name_path = time.strftime('%d%m%y') + '01' + '.csv'
df = pd.DataFrame(lista, columns=["column"])
df.to_csv(name_path, index=False)
except Exception as e:
print(e)
dados = [-0.9143399074673653, -1.0944355744868517, -1.1022400576621294]
save_csv(dados)
Then I found out it was due to a permission error
[Errno 13] Permission denied:
caused by the fact Notepad (without being opened as Administrator) does not have access to some directories and therefore anything run inside it wouldn't be able to write to those directories.
I tried running Notepad as administrator but it didn't work.
The solution was running the code with the Python IDLE.
Did you import the time module? All i did was add that and it made a 21121701.csv with the 3 entries in one columns in the current working directory.
import pandas as pd
import time
def save_csv(lista):
print("Salvando...")
name_path = time.strftime('%d%m%y') + '01' + '.csv'
df = pd.DataFrame(lista, columns=["column"])
df.to_csv(name_path, index=False)
dados = [-0.9143399074673653, -1.0944355744868517, -1.1022400576621294]
save_csv(dados)
Removing the try/except gives a file permission error if you have a file of the same name already open. You have to close any file you are trying to write (on windows at least).
Per Abdou's comment, if you (or the program) don't have write access to the directory then that would cause a permission error too.

Shutil move function gives invalid arguement error PYTHON

I am writing some code that will go through a file, edit it as a temp file, and than copy the temp file over the new file so as to make the edit. However when using the move method from shutil I keep getting this error :
IOError: [Errno 22] Invalid Argument
I've tried using copy, copy2, and copyfile. Here is a copy of the code :
def writePPS(seekValue,newData):
PPSFiles = findPPS("/pps")
for file in PPSFiles:
#create a temp file
holder,temp = mkstemp(".pps")
print holder, temp
pps = open(file,"r+")
newpps = open(temp,"w")
info = pps.readlines()
#parse through the pps file and find seekvalue, replace with newdata
for data in info:
valueBoundry = data.find(":")
if seekValue == data[0:(valueBoundry)]:
print "writing new value"
newValue = data[0:(valueBoundry+1)] + str(newData)
#write to our temp file
newpps.write(newValue)
else: newpps.write(data)
pps.close()
close(holder)
newpps.close()
#remove original file
remove(file)
#move temp file to pps
copy(temp,"/pps/ohm.pps")
I am not exactly sure why you are getting that error, but to start you could try cleaning up your code a bit and fixing all those import statements. Its hard to see where the functions are coming from and for all you know you could have a namespace collision eventually.
Lets start here with some actually runnable code:
import shutil
import os
import tempfile
def writePPS(seekValue,newData):
PPSFiles = findPPS("/pps")
for file_ in PPSFiles:
#create a temp file
newpps = tempfile.NamedTemporaryFile(suffix=".pps")
print newpps.name
with open(file_,"r+") as pps:
#parse through the pps file and find seekvalue, replace with newdata
for data in pps:
valueBoundry = data.find(":")
if seekValue == data[0:(valueBoundry)]:
print "writing new value"
newValue = data[0:(valueBoundry+1)] + str(newData)
#write to our temp file
newpps.write(newValue)
else:
newpps.write(data)
#move temp file to pps
newpps.flush()
shutil.copy(newpps.name,"/pps/ohm.pps")
You don't need to read all your lines into memory. You can just loop over each line. You also don't need to manage all those open/close file operations. Just use a with context and also a NamedTemporaryFile which will clean itself up when it is garbage collected.
Important note, that in your example and above, you are overwriting the same destination file each time for every source file. I left it that way for you to address. But if you start here, we can then begin to figure out why you are getting errors.

Categories

Resources