Python Pandas df is not defined - python

I have a problem with a script I wrote a while back, couple of months ago it worked fine without problem. However since then the OS has been updated.
The script works fine until it tries to create a dataframe with pandas
import os
import pandas as pd
import matplotlib.pyplot as plt
dir_input = '/home/xxx/xxx/xxx/Script/input/'
osdir = []
alldir = []
for all_files in os.listdir(dir_input):
alldir.append(all_files)
for file in os.listdir(dir_input): #Adds all the specified files to the list osdir
if file.endswith('.xlsx'):
osdir.append(file)
print("Found {0}".format(file))
for filename in osdir:
(fileroot, extension) = os.path.splitext(filename)
print 'Processing file...'
print fileroot
print ''
# pandas works with so called dataframes to import the data. Since I dont need all the columns we only use column d,f and j
df = pd.read_excel(dir_input+filename,parse_cols="D,F,J", index=df.index)
...
The error I get using spyder
Traceback (most recent call last):
File "<ipython-input-5-2cf9c86bcb8c>", line 1, in <module>
runfile('/home/xxx/python_scripts/xpos-frame-mean_batch_v1.1.py', wdir='/home/cdoering/python_scripts')
File "/home/xxx/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "/home/xxx/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 78, in execfile
builtins.execfile(filename, *where)
File "/home/xxx/python_scripts/script.py", line 54, in <module>
df = pd.read_excel(dir_input+filename,parse_cols="D,F,J", index=df.index)
NameError: name 'df' is not defined
My feeling is there is something wrong with pandas, maybe? I uninstalled it using conda and reinstalled it. Tried uninstalling with pip, but never used pip to install it so it couldn't find it. I am at a loss.

As #EdChum said in their comment, the problem is 'referencing the index prior to creation'. Specifically, when you have index=df.index you are referring to the index attribute of the df, but you haven't created the df yet, so that attribute doesn't exist.

Related

Python -- Pandas can't find file but numpy can

I am at a complete loss here. I am trying to open a txt file in pandas, I have tried multiple different approaches, but I receive the same error message every time. 'no such file'...
What is strange is that this...
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
full_file = np.loadtxt('2_Feature_Test.txt', delimiter=',')
...works completely fine, however this...
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
full_file = pd.read_csv('2_Feature_Test.txt', sep=',')
...does not.
Doesn't matter full path, doesn't matter backslashes or forward slashes or prefixing with r for raw string. Is the problem something to do with pandas and numpy being in different locations? I have no clue. Please, if you have any ideas I am all ears and would love nothing more than to get to the bottom of this. Thanks everyone.
If it helps, this is the full error message I receive...
Traceback (most recent call last):
File "C:\Users\Pat Oaks\Documents\txt_files\Thonny\lib\site-packages\thonny\workbench.py", line 1449, in event_generate
handler(event)
File "C:\Users\Pat Oaks\Documents\txt_files\Thonny\lib\site-packages\thonny\assistance.py", line 138, in handle_toplevel_response
self._explain_exception(msg["user_exception"])
File "C:\Users\Pat Oaks\Documents\txt_files\Thonny\lib\site-packages\thonny\assistance.py", line 178, in _explain_exception
+ _error_helper_classes["*"]
File "C:\Users\Pat Oaks\Documents\txt_files\Thonny\lib\site-packages\thonny\assistance.py", line 176, in <listcomp>
for helper_class in (
File "C:\Users\Pat Oaks\Documents\txt_files\Thonny\lib\site-packages\thonny\plugins\stdlib_error_helpers.py", line 555, in __init__
super().__init__(error_info)
File "C:\Users\Pat Oaks\Documents\txt_files\Thonny\lib\site-packages\thonny\assistance.py", line 478, in __init__
self.last_frame_module_source = read_source(self.last_frame.filename)
File "C:\Users\Pat Oaks\Documents\txt_files\Thonny\lib\site-packages\thonny\common.py", line 252, in read_source
with tokenize.open(filename) as fp:
File "C:\Users\Pat Oaks\Documents\txt_files\Thonny\lib\tokenize.py", line 447, in open
buffer = _builtin_open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'pandas\\_libs\\index.pyx'
UPDATE: due to a more patient man than I actually reading the error message, I realize the issue is most likely with the pandas installation. Install of pandas via conda install pandas failed saying 'the specified procedure could not be found'. Might this have something to do with the issue? Anybody seen this before?
As the comments have said, clearly the missing file is one of pandas', not the file you are trying to read.
Try forcing the reinstall of pandas
pip install -I pandas
or, if using Anaconda
conda install pandas --force-reinstall

Export Python Dataframe to Excel

I'm trying to export a Python Dataframe to excel using xlsx or csv...
Here is the code I tried to use:
export_word_count = word_count.to_excel (r'C:\Users\OTR\PycharmProjects\MyProjects\word_count.xlsx', index = None, header=True)
I keep getting the following error messages:
Traceback (most recent call last):
File "C:/Users/OTR/PycharmProjects/MyProjects/CAP_Test_MotsCles.py", line 35,
in <module>
export_word_count = word_count.to_excel
(r'C:\Users\OTR\PycharmProjects\MyProjects\word_count_CAP.xlsx', index = None,
header=True)
File "C:\Users\OTR\PycharmProjects\MyProjects\venv\lib\site-
packages\pandas\core\generic.py", line 2127, in to_excel
engine=engine)
File "C:\Users\OTR\PycharmProjects\MyProjects\venv\lib\site-packages\pandas\io\formats\excel.py", line 656, in write
writer = ExcelWriter(_stringify_path(writer), engine=engine)
File "C:\Users\OTR\PycharmProjects\MyProjects\venv\lib\site-packages\pandas\io\excel.py", line 1204, in __init__
from openpyxl.workbook import Workbook
ModuleNotFoundError: No module named 'openpyxl'
Any output on this would be greatly appreciated. I tried to tweek the code, but still wouldn't export. Thank you.
EDIT:
Managed to export, but having issues with full data export
Sample Python Data:
products 58
company 53
cannabis 42
business 39
You dont have python openpyxl module installed.
Install it with:
pip install openpyxl
Your words are your index. Right now you are not exporting the index.
Try changing your code to:
word_count.to_excel (r'C:\Users\OTR\PycharmProjects\MyProjects\word_count.xlsx', index =True, header=True)
‘index=True’ is the default behavior, so not actually necessary.

Why I am not able to load excel files generated in the morning, but can load them in the afternoon in Python using Openpyxl

I am using Python Openpyxl to import excel files which are generated by a online tool. When I import the files generated in the morning, I got an error like this:
Traceback (most recent call last):
File "test4.py", line 8, in <module>
wb = openpyxl.load_workbook (temp2)
File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 201, in load_workbook
wb.properties = DocumentProperties.from_tree(src)
File "C:\Python27\lib\site-packages\openpyxl\descriptors\serialisable.py", line 89, in from_tree
return cls(**attrib)
File "C:\Python27\lib\site-packages\openpyxl\packaging\core.py", line 106, in__init__
self.modified = modified
File "C:\Python27\lib\site-packages\openpyxl\descriptors\base.py", line 267, in __set__
value = W3CDTF_to_datetime(value)
File "C:\Python27\lib\site-packages\openpyxl\utils\datetime.py", line 40, in W3CDTF_to_datetime
dt = [int(v) for v in match.groups()[:6]]
AttributeError: 'NoneType' object has no attribute 'groups'
The strange thing is I only got this error when I importing the files which are generated by the online tool in the morning. I tried the same file but generated in the afternoon, it works very well. I'm confused where the problem is. There are no fields in the excel files related to time. And the files generated in the morning and in the afternoon are exactly the same except the modified time. Does anybody can help me with it? Thank you.
Excel files created from this online tool isn't well compatible with openpyxl
The function load_workbook will get workbook-level information and assign to Workbook()'s wb.properties from 'docProps/core.xml' by opening excel file through zipfile. One piece of information is modified time.
The value of modified raise the error, it can't be transported into datetime. The pattern of 'modified' must be openpyxl.utils.datetime.W3CDTF_REGEX, which is W3CDTF|W3C Date and Time Formats
You can check the excel's modified time if it corresponds to W3CDTF. Here is the code:
from openpyxl.reader.excel import _validate_archive
archive = _validate_archive('/path/to/yourexcel.xlsx')
valid_files = archive.namelist()
# you'll find 'xx/core.xml' I'm not sure if it's 'docProps/core.xml'
print valid_files
# read 'xx/core.xml'
wb_info = archive.read('docProps/core.xml')
print wb_info
In wb_info, you will find something like
<dcterms:modified xsi:type="dcterms:W3CDTF">2017-04-01T22:48:48Z</dcterms:modified>.
Contrast wb_info of excel files from online tool and your pc.

initialization of multiarray raised unreported exception python

I am a new programmer who is picking up python. I recently am trying to learn about importing csv files using numpy.
Here is my code:
import numpy as np
x = np.loadtxt("abcd.py", delimiter = True, unpack = True)
print(x)
The idle returns me with:
>> True
>> Traceback (most recent call last):
>> File "C:/Python34/Scripts/a.py", line 1, in <module>
import numpy as np
>> File "C:\Python34\lib\site-packages\numpy\__init__.py", line 180, in <module>
from . import add_newdocs
>> File "C:\Python34\lib\site-packages\numpy\add_newdocs.py", line 13, in <module>
from numpy.lib import add_newdoc
>> File "C:\Python34\lib\site-packages\numpy\lib\__init__.py", line 8, in <module>
from .type_check import *
>> File "C:\Python34\lib\site-packages\numpy\lib\type_check.py", line 11, in <module>
import numpy.core.numeric as _nx
>> File "C:\Python34\lib\site-packages\numpy\core\__init__.py", line 14, in <module>
from . import multiarray
>> SystemError: initialization of multiarray raised unreported exception
Why do I get the this system error and how can I remedy it?
I have experienced this problem too. This is cuased by a file named "datetime.py" in the same folder (exactly the same problem confronted by Bruce). Actually "datetime" is an existing python module. However, I do not know why running my own script, e.g. plot.py will invoke my datetime.py file (I have seen the output produced by my datetime.py, and there will be an auto-generated datetime.cpython-36.pyc in the __pycache__ folder).
Although I am not clear about how the error is triggered, after I rename my datetime.py file to other names, I can run the plot.py immediately. Therefore, I suggest you check if there are some files whose name collides with the system modules. (P.S. I use the Visual Studio Code to run python.)
As there is an error at the import line, your installation of numpy is broken in some way. My guess is that you have installed numpy for python2 but are using python3. You should remove numpy and attempt a complete re-install, taking care to pick the correct version.
There are a few oddities in the code:
You are apparently reading a python file, abcd.py, not a csv file. Typically you want to have your data in a csv file.
The delimiter is a string, not a boolean, typically delimiter="," (Documentation)
import numpy as np
x = np.loadtxt("abcd.csv", delimiter = ",", unpack = True)

Unable to import .xlsx into Python: No such file or directory

I'm trying to import data from HW3_Yld_Data.xlsx into Python. I made sure that the Excel file is in the same directory as the Python file. Here's what I wrote:
import pandas as pd
Z = pd.read_excel('HW3_Yld_Data.xlsx')
Here's the error I got:
In [2]: import pandas as pd
...:
...: Z = pd.read_excel('HW3_Yld_Data.xlsx')
Traceback (most recent call last):
File "<ipython-input-2-7237c05c79ba>", line 3, in <module>
Z = pd.read_excel('HW3_Yld_Data.xlsx')
File "/Users/Zhengnan/anaconda/lib/python2.7/site-packages/pandas/io/excel.py", line 151, in read_excel
return ExcelFile(io, engine=engine).parse(sheetname=sheetname, **kwds)
File "/Users/Zhengnan/anaconda/lib/python2.7/site-packages/pandas/io/excel.py", line 188, in __init__
self.book = xlrd.open_workbook(io)
File "/Users/Zhengnan/anaconda/lib/python2.7/site-packages/xlrd/__init__.py", line 394, in open_workbook
f = open(filename, "rb")
IOError: [Errno 2] No such file or directory: 'HW3_Yld_Data.xlsx'
What's mind-boggling is that it used to work fine. It appeared to stop working after I did a "conda update --all" yesterday.
BTW I'm using Spyder as IDE. Please help. Thank you.
Each process in the operating system has a current working directory. Any relative path is relative to the current working directory.
The current working directory is set to the directory from which you launched the process. This is very natural when using the command-line, but get be confusing for people only using GUIs.
You can retrieve it using os.getcwd(), and you can change it using os.chdir(). Of course, you can also change it before launching your script.
Instead of using the relative path, use the full path of your xlsx for a test. Your conda update may have changed your environment.
You can try something like this in order to test it:
import os
pre = os.path.dirname(os.path.realpath(__file__))
fname = 'HW3_Yld_Data.xlsx'
path = os.path.join(pre, fname)
Z = pd.read_excel(path)

Categories

Resources