Pycharm getting column names - python

I'm new into this coding world (like 2 weeks old) so I just ran into a problem. I was following a tutorial like most of us did in the begging. The task was to add a new column called "Month". To do that they suggest to take the 2 first numbers from the column called "Order Date". I wrote the code by letter from the tutorial, the only difference was that I was using Pycharm and they Jupyter Notebook. I like Pycharm so maybe someone knows how to solve this.
The code is the following:
import pandas as pd
import os
files = [file for file in os.listdir("./Files")]
allmonths = pd.DataFrame()
for file in files:
df = pd.read_csv("./Files/" + file)
allmonths = pd.concat([allmonths,df])
alldata = pd.read_csv("allmonths.csv")
### Month Column addition
alldata["Month"] = alldata["Order Date"].str[0:2]
allmonths['Month']
print(alldata.head())
The Traceback:
Traceback (most recent call last):
File "D:\Coding\Sales_Data\venv\lib\site-packages\pandas\core\indexes\base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas_libs\index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Order Date'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\Coding\Sales_Data\Sales Anal.py", line 11, in
alldata["Month"] = alldata["Order Date"].str[0:2]
File "D:\Coding\Sales_Data\venv\lib\site-packages\pandas\core\frame.py", line 3505, in getitem
indexer = self.columns.get_loc(key)
File "D:\Coding\Sales_Data\venv\lib\site-packages\pandas\core\indexes\base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'Order Date'
I know the problem is something about the column names, and maybe that Pycharm can't get it from the CSV file. But, HOW to solve it... IDK

Related

Influxdb-client Python API

I have a quite concerning problem with this API. I am using this API to perform six different queries one after the other. However the between the queries I save the resulting pandas dataframes into csv files.
After this phase, I proceed to read the csv files and perform some operations (I substitute the NaN values with 0 in the column called '_value'). The problem is that, in some executions of the code, the csv files seems to miss some columns or are not filled. It returns the following error:
Traceback (most recent call last):
File "Path\code.py", line 136, in <module>
fd['_value'] = fd["_value"].fillna(0)
File "Path\Anaconda3\envs\tf\lib\site-packages\pandas\core\frame.py", line 3505, in __getitem__
indexer = self.columns.get_loc(key)
File "Path\Anaconda3\envs\tf\lib\site-packages\pandas\core\indexes\base.py", line 3631, in get_loc
raise KeyError(key) from err
KeyError: '_value'
Exception ignored in: <function InfluxDBClient.__del__ at 0x000002723CB52170>
Traceback (most recent call last):
File "Path\Anaconda3\envs\tf\lib\site-packages\influxdb_client\client\influxdb_client.py", line 284, in __del__
File "Path\Anaconda3\envs\tf\lib\site-packages\influxdb_client\_sync\api_client.py", line 84, in __del__
File "Path\Anaconda3\envs\tf\lib\site-packages\influxdb_client\_sync\api_client.py", line 661, in _signout
TypeError: 'NoneType' object is not callable
Exception ignored in: <function ApiClient.__del__ at 0x000002723CB53BE0>
Traceback (most recent call last):
File "Path\Anaconda3\envs\tf\lib\site-packages\influxdb_client\_sync\api_client.py", line 84, in __del__
File "Path\Anaconda3\envs\tf\lib\site-packages\influxdb_client\_sync\api_client.py", line 661, in _signout
TypeError: 'NoneType' object is not callable
I don't know how to solve this problem and why sometimes occurs and why sometimes not. Below you can see the code for which I do the queries.
from influxdb_client import InfluxDBClient
client = InfluxDBClient(
url=url,
token=token,
org=org
)
query_api = client.query_api()
df = query_api.query_data_frame(query_1)
df.to_csv('path_to_file/name1.csv')
df = query_api.query_data_frame(query_2)
df.to_csv('path_to_file/name2.csv')
df = query_api.query_data_frame(query_3)
df.to_csv('path_to_file/name3.csv')
df = query_api.query_data_frame(query_4)
df.to_csv('path_to_file/name4.csv')
df = query_api.query_data_frame(query_5)
df.to_csv('path_to_file/name5.csv')
df = query_api.query_data_frame(query_6)
df.to_csv('path_to_file/name6.csv')
client.close()
To write this code I followed the examples at GitHub Influxdb-Client Python: Queries with pandas dataframe
Should I do somenthing like: open the client -> query -> save file -> close the client -> repeat?
EDIT: open the client -> query -> save file -> close the client -> repeat, did not solve anything.

ValueError: Worksheet index 0 is invalid, 0 worksheets found.Cannot open xlsx with pandas in python

Could someone help me figure out why my files dont open.
import pandas as pd
file = "C://Dev//20211103_logfile Box 2.8.xlsx"
temp=pd.read_excel(file)
Here is the full error!
PS C:\Dev> & C:/Users/keyur/AppData/Local/Programs/Python/Python39/python.exe c:/Dev/test_excel.py
C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\reader\workbook.py:88:
UserWarning: File contains an invalid specification for 20211103_logfile. This will be removed
warn(msg)
Traceback (most recent call last):
File "c:\Dev\test_excel.py", line 6, in <module>
temp=pd.read_excel(file)
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 372, in read_excel
data = io.parse(
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 1272, in parse
return self._reader.parse(
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 537, in parse
sheet = self.get_sheet_by_index(asheetname)
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_openpyxl.py", line 546, in get_sheet_by_index
self.raise_if_bad_sheet_by_index(index)
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 468, in raise_if_bad_sheet_by_index
raise ValueError(
ValueError: Worksheet index 0 is invalid, 0 worksheets found
PS C:\Dev>
There are problem with your excel,
try make a new excel and copy pase all data ,then try again ,this method works for me.

KeyError: "None of [Index(['file_path'], dtype='object')] are in the [columns]

I tried to run this file and it's giving me the following error while trying to call the function. I couldn't understand what is causing this error. I want to do some image enhancement so that my model can better understand them while training. Any other suggestions or codes to do the same are welcome.
Traceback (most recent call last):
File "C:/DIP_Ankita/image_corrector.py", line 12, in <module>
data = data[['file_path']]
File "C:\Users\srchirag27\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\frame.py", line 3001, in __getitem__
indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)
File "C:\Users\srchirag27\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexing.py", line 1285, in _convert_to_indexer
return self._get_listlike_indexer(obj, axis, **kwargs)[1]
File "C:\Users\srchirag27\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexing.py", line 1092, in _get_listlike_indexer
keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
File "C:\Users\srchirag27\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexing.py", line 1177, in _validate_read_indexer
key=key, axis=self.obj._get_axis_name(axis)
KeyError: "None of [Index(['file_path'], dtype='object')] are in the [columns]"
There was some issue with the file being saved. The previously written code was being called rather than the updated one. I deleted the file and then created another one with the same script and it worked fine.
data[['file_path']] creates a new dataframe containing only one column - file_path.
Check data.columns and if the required column is not there then you need to ensure that column is being passed.
I think you are missing np before conj, np.conj() in your line 10

Executing Python script in Azure ML studio

I wanted to create a webservice which will provide a summary of texts in the given URL using python , beautifulsoup and nltk.
However I encounter the following error in Azure ML Studio
Schematics in AZURE:
EnterData Module is having URL from wiki
Execute Python Script is having following code
import pandas as pd
import urllib.request as ur
from bs4 import BeautifulSoup
def azureml_main(dataframe1="https://en.wikipedia.org/wiki/Fluid_mechanics", dataframe2 = None):
wiki = dataframe1[0].to_string()
page = ur.urlopen(wiki)
soup = BeautifulSoup(page)
df= pd.DataFrame([soup.find_all('p')[0].get_text()], columns =['article_text'])
return dataframe1,
Running this experiment producing following error:
Error 0085: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
Caught exception while executing function: Traceback (most recent call last):
File "C:\pyhome\lib\site-packages\pandas\indexes\base.py", line 1876, in get_loc
return self._engine.get_loc(key)
File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4027)
File "pandas\index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas\index.c:3891)
File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\server\invokepy.py", line 199, in batch
odfs = mod.azureml_main(*idfs)
File "C:\temp\84d7e9fbcfe54596a2e7de022b4d236c.py", line 23, in azureml_main
wiki = dataframe1[0][0].to_string()
File "C:\pyhome\lib\site-packages\pandas\core\frame.py", line 1992, in __getitem__
return self._getitem_column(key)
File "C:\pyhome\lib\site-packages\pandas\core\frame.py", line 1999, in _getitem_column
return self._get_item_cache(key)
File "C:\pyhome\lib\site-packages\pandas\core\generic.py", line 1345, in _get_item_cache
values = self._data.get(item)
File "C:\pyhome\lib\site-packages\pandas\core\internals.py", line 3225, in get
loc = self.items.get_loc(item)
File "C:\pyhome\lib\site-packages\pandas\indexes\base.py", line 1878, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4027)
File "pandas\index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas\index.c:3891)
File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 0
Process returned with non-zero exit code 1
---------- End of error message from Python interpreter ----------
Start time: UTC 11/11/2018 15:34:21
End time: UTC 11/11/2018 15:34:30
I am using Anaconda 4.0/Python 3.5 to run this snippet.
when I assign the URL to the variable wiki, the code runs successfully in my local machine
I am not sure why I cannot fetch the value from the input dataframe1.
Input dataframe is not having header hence dataframe1[0] should fetch the URL directly..
Thanks to help me on this.
your dataframe1 is look like this
dataframe1 = {'Col1' : ['https://en.wikipedia.org/wiki/Finite_element_method']}
the key is not index (int), but its 'Col1', you can fix it with
wiki = dataframe1['Col1'].to_string(index=0)
but it raise another error, the URL is trimmed if too long
https://en.wikipedia.org/wiki/Finite_element....
so it better using
wiki = dataframe1['Col1'][0]
another error is
return dataframe1,
it should be
return df,
fixed code
import pandas as pd
import urllib.request as ur
from bs4 import BeautifulSoup
def azureml_main(dataframe1="https://en.wikipedia.org/wiki/Fluid_mechanics", dataframe2 = None):
wiki = dataframe1['Col1'][0]
page = ur.urlopen(wiki)
soup = BeautifulSoup(page)
df= pd.DataFrame([soup.find_all('p')[0].get_text()], columns=['article_text'])
return df,

pd.to_datetime error after saving csv file without doing anything

when I was using pd.to_datetime, my code is like below
rate = pd.read_csv('P2training.csv', header=0)
rate['Date'] = pd.to_datetime(rate['Date'], format='%Y-%m-%d')
rate.set_index('Date', inplace=True, drop=True)
rate.tail(10)
print(rate)
in P2training.csv, first column is 'Date' and this code ran well when I first downloaded P2training dataset. However after I open the csv file and save it without doing anything else, this code started to report errors below. If I put the original downloaded file to replace the 'saved' file, the code can still run properly.
C:\Users\yaojia\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\compat\pandas.py:56:
FutureWarning: The pandas.core.datetools module is deprecated and will
be removed in a future version. Please use the pandas.tseries module
instead. from pandas.core import datetools Traceback (most recent
call last): File
"C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 444, in _convert_listlike
values, tz = tslib.datetime_to_datetime64(arg) File "pandas_libs\tslib.pyx", line 1810, in
pandas._libs.tslib.datetime_to_datetime64 (pandas_libs\tslib.c:33275)
TypeError: Unrecognized value type:
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File
"C:/Users/yaojia/.PyCharmEdu4.0/config/scratches/scratch_7.py", line
23, in
rate['Date'] = pd.to_datetime(rate['Date'], format='%Y-%m-%d') File
"C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 509, in to_datetime
values = _convert_listlike(arg._values, False, format) File "C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 447, in _convert_listlike
raise e File "C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 435, in _convert_listlike
require_iso8601=require_iso8601 File "pandas_libs\tslib.pyx", line 2355, in pandas._libs.tslib.array_to_datetime
(pandas_libs\tslib.c:46617) File "pandas_libs\tslib.pyx", line
2484, in pandas._libs.tslib.array_to_datetime
(pandas_libs\tslib.c:44616) ValueError: time data '12/31/1979'
doesn't match format specified
Process finished with exit code 1
Could anyone give any hint what's going wrong?
I guess you open the csv with excel? If yes, excel recognize that column 'Date' are indeed dates and parse the column in it's own date format (in your case 'day/month/year') and save it this way while you are expecting 'year-month-day'.
I suggest you to open/save your csv with a text editor or change the default excel date format...

Categories

Resources