I'm using Colab to run the code but I get this error and I can't fix it. Could you help me out?
I don't know what to do in order to fix it because I have tried to change upper case or lower case.
#Inativos: ajustar nomes das colunas
dfInativos = dfInativos.rename(columns={'userId': 'id'})
dfInativos = dfInativos.rename(columns={'classId': 'ClasseId'})
dfInativos[['id','ClasseId','lastActivityDate','inactivityDaysCount','sevenDayInactiveStatus']] = dfInativos
#dfInativos['id'] = dfInativos['id'].astype(int, errors = 'ignore')
#dfInativos['ClasseId'] = dfInativos['ClasseId'].astype(int, errors = 'ignore')
dfInativos['id'] = pd.to_numeric(dfInativos['id'],errors = 'coerce')
dfInativos['ClasseId'] = pd.to_numeric(dfInativos['ClasseId'],errors = 'coerce')
#dfInativos.dropna(subset = ['lastActivityDate'], inplace=True)
dfInativos.drop_duplicates(subset = ['id','ClasseId'], inplace=True)
dfInativos['seven DayInactiveStatus'] = dfInativos['sevenDayInactiveStatus'].replace(0,'')
#Add Inactive data to main data frame
df = df.merge(dfInativos, on=['id','ClasseId'], how='left')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-79-10fe94c48d1f> in <module>()
2 dfInativos = dfInativos.rename(columns={'userId': 'id'})
3 dfInativos = dfInativos.rename(columns={'classId': 'ClasseId'})
----> 4 dfInativos[['id','ClasseId','lastActivityDate','inactivityDaysCount','sevenDayInactiveStatus']] = dfInativos
5
6
2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexers.py in check_key_length(columns, key, value)
426 if columns.is_unique:
427 if len(value.columns) != len(key):
--> 428 raise ValueError("Columns must be same length as key")
429 else:
430 # Missing keys in columns are represented as -1
ValueError: Columns must be same length as key
Related
I hope you can help me with this issue, I've been having for a while. I keep getting this error no matter what i try:
This is the type as I know it:
tweets['post_date'] = pd.to_datetime(tweets['post_date'], unit='s')
tweets['date'] = pd.to_datetime(tweets['post_date'].apply(lambda date: date.date()))
tweets.head()
Output:
post_date body ticker_symbol date
19 2015-01-01 00:11:17 $UNP $ORCL $QCOM $MSFT $AAPL Top scoring mega ... MSFT 2015-01-01
43 2015-01-01 00:55:58 http://StockAviator.com....Top penny stocks, N... MSFT 2015-01-01
TypeError Traceback (most recent call last)
/usr/local/lib/python3.8/dist-packages/pandas/core/arrays/datetimelike.py in _validate_comparison_value(self, other)
539 try:
--> 540 self._check_compatible_with(other)
541 except (TypeError, IncompatibleFrequency) as err:
13 frames
TypeError: Cannot compare tz-naive and tz-aware datetime-like objects.
The above exception was the direct cause of the following exception:
InvalidComparison Traceback (most recent call last)
InvalidComparison: 2015-01-01 12:00:00-05:00
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
/usr/local/lib/python3.8/dist-packages/pandas/core/ops/invalid.py in invalid_comparison(left, right, op)
32 else:
33 typ = type(right).__name__
---> 34 raise TypeError(f"Invalid comparison between dtype={left.dtype} and {typ}")
35 return res_values
36
TypeError: Invalid comparison between dtype=datetime64[ns] and Timestamp
This error comes from this column in my code:
#market opens 14:30 closes 21:00
def getAvgPerPrice (tweets,stockk):
stock = stockk.copy()
result = pd.DataFrame([])
for i in range(0,len(stock)-1):
d = stock.index[i]
next_d = stock.index[i+1]
wanted_tweets = tweets[((tweets.post_date - timedelta(hours = 3)) >=( d + timedelta(hours = h))) & ((tweets.post_date - timedelta(hours = 3)) < (next_d + timedelta(hours = h)))]
result.at[i,'date'] = d
result.at[i,'close'] = stock.iloc[i].Close
result.at[i,'avgScore'] = wanted_tweets['score'].mean()
I would really appreciate if anyone could help me find the issue. I have tried many things already but no luck. Thank you in advance
I have been using the below script to calculate the storage capacity on one of our server environments. It reads the values from a report I get every two weeks and then creates a file I can import into PowerBI to create graphs. It ran without an issue 2 weeks ago but today when I tried to run it I get a TypeError. I assume it is "if float(df['Capacity(TB)']) >= 0.01: " causing the issue as per the error message.
The data I am importing is a xls sheet with a header name and values underneath it. I had a look to see if there are any blank fields but could not find any. Any help/suggestions would be greatly appreciated.
import pandas as pd
import numpy as np
from datetime import datetime
import os
from os import listdir
from os.path import isfile, join
#SCCM resource import as 'df'
pathres = r'C:\Capacity Reports\SCOM Reports'
onlyfiles = [f for f in listdir(pathres) if isfile(join(pathres, f))]
df = pd.DataFrame()
for i in onlyfiles:
print(i)
dfresimp = pd.read_excel(pathres+'\\'+i)
df = pd.concat([df, dfresimp])
#CMDB import as 'df2'
df2 = pd.read_excel('C:\\Capacity Reports\\CMDB_Export.xlsx')
#Windows Lifecycle import as 'df3'
df3 = pd.read_excel('C:\\Capacity Reports\\Windows Server Lifecycle.xlsx')
#SCVMM clusters import as 'df4'
df4 = pd.read_excel('C:\\Capacity Reports\\HyperV Overview.xlsx')
#SCVMM Storage reports import as 'df5'
pathstor = r'C:\Capacity Reports\Hyper-V Storage'
Storfiles = [f for f in listdir(pathstor) if isfile(join(pathstor, f))]
df5 = pd.DataFrame()
for i in Storfiles:
print(i)
dfstorimp = pd.read_excel(pathstor+'\\'+i)
df5 = pd.concat([df5, dfstorimp])
#CREATE MAIN TABLE
df['NAME'] = df['Computer Name'].str.upper()
df11 = pd.DataFrame()
df11['NAME'] = df2['NAME'].str.upper()
df11['Application Owner'] = df2['Application Owner'].str.title()
df11['HW EOSL'] = df2['HW EOSL'].str.title()
#print(df11['HW EOSL'])
Main_Table = df.merge(df11, on='NAME', how='left')
Main_Table = Main_Table.merge(df3, on='Operating System Edition', how='left')
df13 = pd.DataFrame()
df13['Hyper V Cluster name'] = df4['Hyper V Cluster name']
df13['Computer Name'] = df4['Server Name'].str.upper()
Main_Table = Main_Table.merge(df13, on='Computer Name', how='left')
Main_Table['OS_Support'] = pd.to_datetime(Main_Table['Extended_Support_End_Date'], format='"%Y-%m-%d %H:%S:%f')
Main_Table['OS_Support'] = Main_Table['OS_Support'].dt.strftime("%Y-%m-%d")
#print(Main_Table['OS_Support'])
def f(df):
if df['Host/GuestVM'] == 'GuestVM':
result = (df['Total Physical Memory GB']-(df['Total Physical Memory GB']*(df['Memory % Used Max Value']/100)))/2
return result
else:
np.nan
Main_Table['Reclaimable Memory Calculated'] = Main_Table.apply(f, axis=1)
def f(df):
if df['Host/GuestVM'] == 'GuestVM':
result = (df['Total Logical Processors']-(df['Total Logical Processors']*(df['CPU % Used Max Value']/100)))/2
return result
else:
np.nan
Main_Table['Reclaimable CPU Calculated'] = Main_Table.apply(f, axis=1)
Main_Table['Reclaimable Memory Calculated'] = round(Main_Table['Reclaimable Memory Calculated'])
Main_Table['Reclaimable CPU Calculated'] = round(Main_Table['Reclaimable CPU Calculated'])
Main_Table['Report Timestamp'] = Main_Table['Report Timestamp'].dt.strftime("%Y%m%d")
Main_Table = Main_Table.drop_duplicates()
Main_Table['Report Timestamp Number'] = Main_Table['Report Timestamp']
column = Main_Table["Report Timestamp Number"]
max_value = column.max()
Total_Memory_Latest = 0
def f(df):
global Total_Memory_Latest
if df['Report Timestamp Number'] == max_value and df['Host/GuestVM'] == 'Host':
Total_Memory_Latest += df['Total Physical Memory GB']
return 0
else:
np.nan
Main_Table['DummyField'] = Main_Table.apply(f, axis=1)
Main_Table.to_excel(r'C:\Users\storm_he\OneDrive - MTN Group\Documents\Testing\Main_Table.xlsx')
#CREATE STORAGE TABLE AND EXPORT
def f(df):
#if df['Host/GuestVM'] == 'Host':
#try:
if float(df['Capacity(TB)']) >= 0.01:
result = (df['Available(TB)']/df['Capacity(TB)'])*100
return round(result)
else:
return ''
#except:
#return np.nan
df5['% Storage free'] = df5.apply(f, axis=1)
pattern = '|'.join(['.mtn.co.za', '.mtn.com'])
df5['VMHost'] = df5['VMHost'].str.replace(pattern,'')
df5['VMHost'] = df5['VMHost'].str.upper()
df5['Report Timestamp'] = df5['Report Timestamp'].dt.strftime("%Y%m%d")
#print(df5['Report Timestamp'])
df5.to_excel(r'C:\Users\storm_he\OneDrive - MTN Group\Documents\Testing\Main_Storage_table.xlsx')
print('Run Finished')
StackTrace
TypeError Traceback (most recent call last)
<ipython-input-1-3c53bb32e311> in <module>
108 column = Main_Table["Report Timestamp Number"]
109
--> 110 max_value = column.max()
111 Total_Memory_Latest = 0
112
~\Anaconda3\lib\site-packages\pandas\core\generic.py in stat_func(self, axis, skipna, level, numeric_only, **kwargs)
11212 if level is not None:
11213 return self._agg_by_level(name, axis=axis, level=level, skipna=skipna)
> 11214 return self._reduce(
11215 f, name, axis=axis, skipna=skipna, numeric_only=numeric_only
11216 )
~\Anaconda3\lib\site-packages\pandas\core\series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
3889 )
3890 with np.errstate(all="ignore"):
-> 3891 return op(delegate, skipna=skipna, **kwds)
3892
3893 # TODO(EA) dispatch to Index
~\Anaconda3\lib\site-packages\pandas\core\nanops.py in f(values, axis, skipna, **kwds)
123 result = alt(values, axis=axis, skipna=skipna, **kwds)
124 else:
--> 125 result = alt(values, axis=axis, skipna=skipna, **kwds)
126
127 return result
~\Anaconda3\lib\site-packages\pandas\core\nanops.py in reduction(values, axis, skipna, mask)
835 result = np.nan
836 else:
--> 837 result = getattr(values, meth)(axis)
838
839 result = _wrap_results(result, dtype, fill_value)
~\Anaconda3\lib\site-packages\numpy\core\_methods.py in _amax(a, axis, out, keepdims, initial, where)
28 def _amax(a, axis=None, out=None, keepdims=False,
29 initial=_NoValue, where=True):
---> 30 return umr_maximum(a, axis, None, out, keepdims, initial, where)
31
32 def _amin(a, axis=None, out=None, keepdims=False,
TypeError: '>=' not supported between instances of 'float' and 'str'
I have got this error when try split my one column to few columns. But it split on just on one or two columns.If you wanna split on 3,4,5 columns it writes:
ValueError Traceback (most recent call last)
/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
349 try:
--> 350 return self._range.index(new_key)
351 except ValueError:
ValueError: 2 is not in range
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-19-d4e6a4d03e69> in <module>
22 data_old[Col_1_Label] = newz[0]
23 data_old[Col_2_Label] = newz[1]
---> 24 data_old[Col_3_Label] = newz[2]
25 #data_old[Col_4_Label] = newz[3]
26 #data_old[Col_5_Label] = newz[4]
/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
350 return self._range.index(new_key)
351 except ValueError:
--> 352 raise KeyError(key)
353 return super().get_loc(key, method=method, tolerance=tolerance)
354
KeyError: 2
There is my code.I have csv file.And when pandas read it - create one column with value 'Контракт'.Then. I split it on another columns. But it split till two columns.I wanna 7 columns!Help please to understand this logic!
import pandas as pd
from pandas import Series, DataFrame
import re
dframe1 = pd.read_csv('po.csv')
columns = ['Контракт']
data_old = pd.read_csv('po.csv', header=None, names=columns)
data_old
# The thing you want to split the column on
SplitOn = ':'
# Name of Column you want to split
Split_Col = 'Контракт'
newz = data_old[Split_Col].str.split(pat=SplitOn, n=-1, expand=True)
# Column Labels (you can add more if you will have more)
Col_1_Label = 'Номер телефону'
Col_2_Label = 'Тарифний пакет'
Col_3_Label = 'Вихідні дзвінки з України за кордон'
Col_4_Label = 'ВАРТІСТЬ ПАКЕТА/ЩОМІСЯЧНА ПЛАТА'
Col_5_Label = 'ЗАМОВЛЕНІ ДОДАТКОВІ ПОСЛУГИ ЗА МЕЖАМИ ПАКЕТА'
Col_6_Label = 'Вартість послуги "Корпоративна мережа'
Col_7_Label = 'ЗАГАЛОМ ЗА КОНТРАКТОМ (БЕЗ ПДВ ТА ПФ)'
data_old[Col_1_Label] = newz[0]
data_old[Col_2_Label] = newz[1]
data_old[Col_3_Label] = newz[2]
#data_old[Col_4_Label] = newz[3]
#data_old[Col_5_Label] = newz[4]
#data_old[Col_6_Label] = newz[5]
#data_old[Col_7_Label] = newz[6]
data_old
Pandas does not support "unstructured text", you should convert it to a standard format or python objects and then create a dataframe from it
Imagine that you have a file with this text named data.txt:
Contract № 12345679 Number of phone: +7984563774
Total price for month : 00.00000
Total price: 10.0000
You can load an process it with Python like this:
with open('data.txt') as f:
content = list(data.readlines())
# First line contains the contract number and phone information
contract, phone = content[0].split(':')
# find contract number using regex
contract = re.findall('\d+', contract)[0]
# The phone is strightforward
phone = phone.strip()
# Second line and third line for prices
total_price = float(content[1].split(':')[1].strip())
total_month_price = float(content[2].split(':')[1].strip())
Then with these variables you can create a dataframe
df = pd.DataFrame([dict(N_of_contract=contract, total_price=total_price, total_month_price =total_month_price )])
Repeat the same for all files.
I'm trying to import data from multiple web pages into a data table using Python.
Basically, I'm trying to download attendance data for certain teams since 2000.
Here is what I have so far:
import requests
import pandas as pd
import numpy as np
#What is the effect of a rival team's performance on a team's attendance
Teams = ['LAA', 'LAD', 'NYY', 'NYM', 'CHC', 'CHW', 'OAK', 'SFG']
Years = []
for year in range(2000,2020):
Years.append(str(year))
bbattend = pd.DataFrame(columns=['GM_Num','Date','Team','Home','Opp','W/L','R','RA','Inn','W-L','Rank','GB','Time','D/N','Attendance','Streak','Game_Win','Wins','Losses','Net_Wins'])
for team in Teams:
for year in Years:
url = 'https://www.baseball-reference.com/teams/' + team + '/' + year +'-schedule-scores.shtml'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
#Formatting data table
df.rename(columns={"Gm#": "GM_Num", "Unnamed: 4": "Home", "Tm": "Team", "D/N": "Night"}, inplace = True)
df['Home'] = df['Home'].apply(lambda x: 0 if x == '#' else 1)
df['Game_Win'] = df['W/L'].astype(str).str[0]
df['Game_Win'] = df['Game_Win'].apply(lambda x: 0 if x == 'L' else 1)
df['Night'] = df['Night'].apply(lambda x: 1 if x == 'N' else 0)
df['Streak'] = df['Streak'].apply(lambda x: -1*len(x) if '-' in x else len(x))
df.drop('Unnamed: 2', axis=1, inplace = True)
df.drop('Orig. Scheduled', axis=1, inplace = True)
df.drop('Win', axis=1, inplace = True)
df.drop('Loss', axis=1, inplace = True)
df.drop('Save', axis=1, inplace = True)
#Drop rows that do not have data
df = df[df['GM_Num'].str.isdigit()]
WL = df["W-L"].str.split("-", n = 1, expand = True)
df["Wins"] = WL[0].astype(dtype=np.int64)
df["Losses"] = WL[1].astype(dtype=np.int64)
df['Net_Wins'] = df['Wins'] - df['Losses']
bbattend.append(df)
bbattend
When I do the thing in the loop separately by using a specific link instead of trying to use concatenation to make the url, it seems to work.
However, using this code, I am getting the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-77-997e6aeea77e> in <module>
16 url = 'https://www.baseball-reference.com/teams/' + team + '/' + year +'-schedule-scores.shtml'
17 html = requests.get(url).content
---> 18 df_list = pd.read_html(html)
19 df = df_list[-1]
20 #Formatting data table
~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na, displayed_only)
1092 decimal=decimal, converters=converters, na_values=na_values,
1093 keep_default_na=keep_default_na,
-> 1094 displayed_only=displayed_only)
~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, **kwargs)
914 break
915 else:
--> 916 raise_with_traceback(retained)
917
918 ret = []
~/anaconda3/lib/python3.7/site-packages/pandas/compat/__init__.py in raise_with_traceback(exc, traceback)
418 if traceback == Ellipsis:
419 _, _, traceback = sys.exc_info()
--> 420 raise exc.with_traceback(traceback)
421 else:
422 # this version of raise is a syntax error in Python 3
ValueError: No tables found
I don't really understand what the error message is saying.
I'd appreciate any help!
Because do not have any table in some page, e.g., this page and this page
So, df_list = pd.read_html(html) will raise ValueError: No tables found.
You should need use try-except in here.
In this code, please explain why this error is occurring. I will
share more code of it if anyone is interested.
fit_statsHR=auth2_client.intraday_time_series('activities/heart',base_date=date, detail_level='1sec')
time_list = []
val_list = []
ids = []
dates = []
for i in fit_statsHR['activities-heart-intraday']['dataset']:
val_list.append(i['value'])
time_list.append(i['time'])
ids.append(id)
dates.append(date)
heartdf=pd.DataFrame({'heartRate':val_list,'time':time_list,'userId':ids,'date':dates})
which leads to:
KeyError Traceback (most recent call last)
<ipython-input-3-076c9750910b> in get_hps(auth2_client, id, date)
51 ids = []
52 dates = []
---> 53 for i in (fit_statsHR['activities-heart-intraday']['dataset']):
54 val_list.append(i['value'])
55 time_list.append(i['time'])
KeyError: 'activities-heart-intraday'