The data in test.csv are like this:
TIMESTAMP POLYLINE
0 1408039037 [[-8.585676,41.148522],[-8.585712,41.148639],[...
1 1408038611 [[-8.610876,41.14557],[-8.610858,41.145579],[-...
2 1408038568 [[-8.585739,41.148558],[-8.58573,41.148828],[-...
3 1408039090 [[-8.613963,41.141169],[-8.614125,41.141124],[...
4 1408039177 [[-8.619903,41.148036],[-8.619894,41.148036]]
.. ... ...
315 1419171485 [[-8.570196,41.159484],[-8.570187,41.158962],[...
316 1419170802 [[-8.613873,41.141232],[-8.613882,41.141241],[...
317 1419172121 [[-8.6481,41.152536],[-8.647461,41.15241],[-8....
318 1419171980 [[-8.571699,41.156073],[-8.570583,41.155929],[...
319 1419171420 [[-8.574561,41.180184],[-8.572248,41.17995],[-...
[320 rows x 2 columns]
I read them from csv file in this way:
train = pd.read_csv("path/train.csv",engine='python',error_bad_lines=False)
So, I have this timestamp in Unix format. I want to convert in UTC time and then extract year, month, day and so on.
This is the code for the conversion from Unix timestamp to UTC date time:
train["TIMESTAMP"] = [float(time) for time in train["TIMESTAMP"]]
train["data_time"] = [datetime.datetime.fromtimestamp(time, datetime.timezone.utc) for time in train["TIMESTAMP"]]
To extract year and other information I do this:
train["year"] = train["data_time"].dt.year
train["month"] = train["data_time"].dt.month
train["day"] = train["data_time"].dt.day
train["hour"] = train["data_time"].dt.hour
train["min"] = train["data_time"].dt.minute
But I obtain this error when the execution arrives at the extraction point:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-30-d2249cabe965> in <module>()
67 train["TIMESTAMP"] = [float(time) for time in train["TIMESTAMP"]]
68 train["data_time"] = [datetime.datetime.fromtimestamp(time, datetime.timezone.utc) for time in train["TIMESTAMP"]]
---> 69 train["year"] = train["data_time"].dt.year
70 train["month"] = train["data_time"].dt.month
71 train["day"] = train["data_time"].dt.day
2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/accessors.py in __new__(cls, data)
478 return PeriodProperties(data, orig)
479
--> 480 raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values
I also read a lot of similiar discussion but I can't figure out why I obtain this error.
Edited:
So the train["TIMESTAMP"] data are like this:
1408039037
1408038611
1408039090
Then I do this with this data:
train["TIMESTAMP"] = [float(time) for time in train["TIMESTAMP"]]
train["data_time"] = [datetime.datetime.fromtimestamp(time, datetime.timezone.utc) for time in train["TIMESTAMP"]]
train["year"] = train["data_time"].dt.year
train["month"] = train["data_time"].dt.month
train["day"] = train["data_time"].dt.day
train["hour"] = train["data_time"].dt.hour
train["min"] = train["data_time"].dt.minute
train = train[["year", "month", "day", "hour","min"]]
Related
I tried to use this function but I got error.
I think need to change date format of time parameters.
now = datetime.now()
past = now - timedelta(days=2)
past = str(past)
bars = client.get_historical_klines("BTCUSDT", "1m", start_str = past, end_str = None, limit = 1000)
But I got error..
When I delete the start_str and end_str, it works.
How I can handle the date str for this function.
Could you help me?!(example is the best!)
---------ERROR------------
TypeError Traceback (most recent call last)
Input In [46], in <cell line: 1>()
----> 1 bars = client.get_historical_klines(symbol="BTCUSDT", interval="1m",
2 start_str=past, end_str=None, limit=1000)
File ~/opt/anaconda3/lib/python3.9/site-packages/binance/client.py:934, in Client.get_historical_klines(self, symbol, interval, start_str, end_str, limit, klines_type)
914 def get_historical_klines(self, symbol, interval, start_str=None, end_str=None, limit=1000,
915 klines_type: HistoricalKlinesType = HistoricalKlinesType.SPOT):
916 """Get Historical Klines from Binance
917
918 :param symbol: Name of symbol pair e.g BNBBTC
(...)
932
933 """
--> 934 return self._historical_klines(
935 symbol, interval, start_str=start_str, end_str=end_str, limit=limit, klines_type=klines_type
936 )
File ~/opt/anaconda3/lib/python3.9/site-packages/binance/client.py:969, in Client._historical_klines(self, symbol, interval, start_str, end_str, limit, klines_type)
966 timeframe = interval_to_milliseconds(interval)
968 # if a start time was passed convert it
--> 969 start_ts = convert_ts_str(start_str)
971 # establish first available start timestamp
972 if start_ts is not None:
File ~/opt/anaconda3/lib/python3.9/site-packages/binance/helpers.py:76, in convert_ts_str(ts_str)
74 if type(ts_str) == int:
75 return ts_str
---> 76 return date_to_milliseconds(ts_str)
File ~/opt/anaconda3/lib/python3.9/site-packages/binance/helpers.py:24, in date_to_milliseconds(date_str)
22 epoch: datetime = datetime.utcfromtimestamp(0).replace(tzinfo=pytz.utc)
23 # parse our date string
---> 24 d: Optional[datetime] = dateparser.parse(date_str, settings={'TIMEZONE': "UTC"})
25 if not d:
26 raise UnknownDateFormat(date_str)
File ~/opt/anaconda3/lib/python3.9/site-packages/dateparser/conf.py:92, in apply_settings.<locals>.wrapper(*args, **kwargs)
89 if not isinstance(kwargs['settings'], Settings):
90 raise TypeError("settings can only be either dict or instance of Settings class")
---> 92 return f(*args, **kwargs)
File ~/opt/anaconda3/lib/python3.9/site-packages/dateparser/__init__.py:61, in parse(date_string, date_formats, languages, locales, region, settings, detect_languages_function)
57 if languages or locales or region or detect_languages_function or not settings._default:
58 parser = DateDataParser(languages=languages, locales=locales,
59 region=region, settings=settings, detect_languages_function=detect_languages_function)
---> 61 data = parser.get_date_data(date_string, date_formats)
63 if data:
64 return data['date_obj']
File ~/opt/anaconda3/lib/python3.9/site-packages/dateparser/date.py:419, in DateDataParser.get_date_data(self, date_string, date_formats)
376 """
377 Parse string representing date and/or time in recognizable localized formats.
378 Supports parsing multiple languages and timezones.
(...)
416
417 """
418 if not isinstance(date_string, str):
--> 419 raise TypeError('Input type must be str')
421 res = parse_with_formats(date_string, date_formats or [], self._settings)
422 if res['date_obj']:
TypeError: Input type must be str
binance.client.Client.get_historical_klines() takes int or str as input value for start_str, see the documentation of this method:
def get_historical_klines(self, symbol, interval, start_str, end_str=None, limit=500,
klines_type: HistoricalKlinesType = HistoricalKlinesType.SPOT):
"""Get Historical Klines from Binance
:param symbol: Name of symbol pair e.g BNBBTC
:type symbol: str
:param interval: Binance Kline interval
:type interval: str
:param start_str: Start date string in UTC format or timestamp in milliseconds
:type start_str: str|int
:param end_str: optional - end date string in UTC format or timestamp in milliseconds (default will fetch everything up to now)
:type end_str: str|int
:param limit: Default 500; max 1000.
:type limit: int
:param klines_type: Historical klines type: SPOT or FUTURES
:type klines_type: HistoricalKlinesType
:return: list of OHLCV values
"""
return self._historical_klines(symbol, interval, start_str, end_str=end_str, limit=limit, klines_type=klines_type)
You're trying to pass an datetime.datetime object.
You can convert this object to a timestamp (ms) with:
import datetime as dt
now = dt.datetime.now(dt.timezone.utc)
past = now - dt.timedelta(days=2)
# Gives you a timestamp in ms
past_timestamp_ms = int(round(past.timestamp() * 1000, 0))
I have got this error when try split my one column to few columns. But it split on just on one or two columns.If you wanna split on 3,4,5 columns it writes:
ValueError Traceback (most recent call last)
/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
349 try:
--> 350 return self._range.index(new_key)
351 except ValueError:
ValueError: 2 is not in range
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-19-d4e6a4d03e69> in <module>
22 data_old[Col_1_Label] = newz[0]
23 data_old[Col_2_Label] = newz[1]
---> 24 data_old[Col_3_Label] = newz[2]
25 #data_old[Col_4_Label] = newz[3]
26 #data_old[Col_5_Label] = newz[4]
/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
350 return self._range.index(new_key)
351 except ValueError:
--> 352 raise KeyError(key)
353 return super().get_loc(key, method=method, tolerance=tolerance)
354
KeyError: 2
There is my code.I have csv file.And when pandas read it - create one column with value 'Контракт'.Then. I split it on another columns. But it split till two columns.I wanna 7 columns!Help please to understand this logic!
import pandas as pd
from pandas import Series, DataFrame
import re
dframe1 = pd.read_csv('po.csv')
columns = ['Контракт']
data_old = pd.read_csv('po.csv', header=None, names=columns)
data_old
# The thing you want to split the column on
SplitOn = ':'
# Name of Column you want to split
Split_Col = 'Контракт'
newz = data_old[Split_Col].str.split(pat=SplitOn, n=-1, expand=True)
# Column Labels (you can add more if you will have more)
Col_1_Label = 'Номер телефону'
Col_2_Label = 'Тарифний пакет'
Col_3_Label = 'Вихідні дзвінки з України за кордон'
Col_4_Label = 'ВАРТІСТЬ ПАКЕТА/ЩОМІСЯЧНА ПЛАТА'
Col_5_Label = 'ЗАМОВЛЕНІ ДОДАТКОВІ ПОСЛУГИ ЗА МЕЖАМИ ПАКЕТА'
Col_6_Label = 'Вартість послуги "Корпоративна мережа'
Col_7_Label = 'ЗАГАЛОМ ЗА КОНТРАКТОМ (БЕЗ ПДВ ТА ПФ)'
data_old[Col_1_Label] = newz[0]
data_old[Col_2_Label] = newz[1]
data_old[Col_3_Label] = newz[2]
#data_old[Col_4_Label] = newz[3]
#data_old[Col_5_Label] = newz[4]
#data_old[Col_6_Label] = newz[5]
#data_old[Col_7_Label] = newz[6]
data_old
Pandas does not support "unstructured text", you should convert it to a standard format or python objects and then create a dataframe from it
Imagine that you have a file with this text named data.txt:
Contract № 12345679 Number of phone: +7984563774
Total price for month : 00.00000
Total price: 10.0000
You can load an process it with Python like this:
with open('data.txt') as f:
content = list(data.readlines())
# First line contains the contract number and phone information
contract, phone = content[0].split(':')
# find contract number using regex
contract = re.findall('\d+', contract)[0]
# The phone is strightforward
phone = phone.strip()
# Second line and third line for prices
total_price = float(content[1].split(':')[1].strip())
total_month_price = float(content[2].split(':')[1].strip())
Then with these variables you can create a dataframe
df = pd.DataFrame([dict(N_of_contract=contract, total_price=total_price, total_month_price =total_month_price )])
Repeat the same for all files.
I have issues with converting dates in an imported .txt file and I wonder what I'm doing wrong.
I import the data by:
df_TradingMonthlyDates = pd.read_csv(TradingMonthlyDates, dtype=str, sep=',') # header=True,
and it looks like the following table (dates represents start/end of month and have a header Date):
Date
0 2008-12-30
1 2008-12-31
2 2009-01-01
3 2009-01-02
4 2009-01-29
.. ...
557 2020-06-29
558 2020-06-30
559 2020-07-01
560 2020-07-02
561 2020-07-30
.. ...
624 2021-11-30
625 2021-12-01
626 2021-12-02
627 2021-12-30
628 2021-12-31
[629 rows x 1 columns]
<class 'pandas.core.frame.DataFrame'>
I then calculate today's date:
df_EndDate = datetime.now().date()
I'm trying to apply the data above in this function to get the closest date before a given date (given date = today's date in my case):
# https://stackoverflow.com/questions/32237862/find-the-closest-date-to-a-given-date
def nearest(items, pivot):
return min([i for i in items if i < pivot], key=lambda x: abs(x - pivot))
date_output = nearest(df_TradingMonthlyDates, df_EndDate)
# date_output should be = 2020-07-02 given today's date of 2020-07-12
The error messages I receive is that the df_TradingMonthlyDates is not in date format. So I have tried to convert the dataframe to datetime format but can't make it work.
What I have tried to convert the data to date format:
# df_TradingMonthlyDates["Date"] = pd.to_datetime(df_TradingMonthlyDates["Date"], format="%Y-%m-%d")
# df_TradingMonthlyDates = datetime.strptime(df_TradingMonthlyDates, "%Y-%m-%d").date()
# df_TradingMonthlyDates['Date'] = df_TradingMonthlyDates['Date'].apply(lambda x: pd.to_datetime(x[0], format="%Y-%m-%d"))
# df_TradingMonthlyDates = df_TradingMonthlyDates.iloc[1:]
# print(df_TradingMonthlyDates)
# df_TradingMonthlyDates = datetime.strptime(str(df_TradingMonthlyDates), "%Y-%m-%d").date()
# for line in split_source[1:]: # skip the first line
Code:
import pandas as pd
from datetime import datetime
# Version 1
TradingMonthlyDates = "G:/MonthlyDates.txt"
# Import file where all the first/end month date exists
df_TradingMonthlyDates = pd.read_csv(TradingMonthlyDates, dtype=str, sep=',') # header=True,
print(df_TradingMonthlyDates)
# https://community.dataquest.io/t/datetime-and-conversion/213425
# df_TradingMonthlyDates["Date"] = pd.to_datetime(df_TradingMonthlyDates["Date"], format="%Y-%m-%d")
# df_TradingMonthlyDates = datetime.strptime(df_TradingMonthlyDates, "%Y-%m-%d").date()
# df_TradingMonthlyDates['Date'] = df_TradingMonthlyDates['Date'].apply(lambda x: pd.to_datetime(x[0], format="%Y-%m-%d"))
# df_TradingMonthlyDates = df_TradingMonthlyDates.iloc[1:]
# print(df_TradingMonthlyDates)
# df_TradingMonthlyDates = datetime.strptime(str(df_TradingMonthlyDates), "%Y-%m-%d").date()
# for line in split_source[1:]: # skip the first line # maybe header is the problem
print(type(df_TradingMonthlyDates))
df_TradingMonthlyDates = df_TradingMonthlyDates.datetime.strptime(df_TradingMonthlyDates, "%Y-%m-%d")
df_TradingMonthlyDates = df_TradingMonthlyDates.time()
print(df_TradingMonthlyDates)
df_EndDate = datetime.now().date()
print(type(df_EndDate))
# https://stackoverflow.com/questions/32237862/find-the-closest-date-to-a-given-date
def nearest(items, pivot):
return min([i for i in items if i < pivot], key=lambda x: abs(x - pivot))
date_output = nearest(df_TradingMonthlyDates, df_EndDate)
Error messages are different depending on how I tried to convert data type, but I interpret that they all notice that my date format is not successful :
df_TradingMonthlyDates = df_TradingMonthlyDates.datetime.strptime(df_TradingMonthlyDates, "%Y-%m-%d")
Traceback (most recent call last):
File "g:/till2.py", line 25, in <module>
df_TradingMonthlyDates = df_TradingMonthlyDates.datetime.strptime(df_TradingMonthlyDates, "%Y-%m-%d")
File "C:\Users\ID\AppData\Roaming\Python\Python38\site-packages\pandas\core\generic.py", line 5274, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'datetime'
df_TradingMonthlyDates["Date"] = pd.to_datetime(df_TradingMonthlyDates["Date"], format="%Y-%m-%d")
Traceback (most recent call last):
File "g:/till2.py", line 40, in <module>
date_output = nearest(df_TradingMonthlyDates, df_EndDate)
File "g:/till2.py", line 38, in nearest
return min([i for i in items if i < pivot], key=lambda x: abs(x - pivot))
File "g:/till2.py", line 38, in <listcomp>
return min([i for i in items if i < pivot], key=lambda x: abs(x - pivot))
TypeError: '<' not supported between instances of 'str' and 'datetime.date'
Edit: Added Method 3, which might be the easiest with.loc and then .iloc
You could take a slightly different approach (with Method #1 or Method #2 below) by taking the absolute minimum of the difference between today's date and the data, but a key thing you weren't doing was wrapping pd.to_datetime() around the datetime.date object df_EndDate in order to transform it into a DatetimeArray so that it could be compared against your Date column. They both have to be in the same format of DatetimeArray in order to be compared.
Method 1:
import pandas as pd
import datetime as dt
df_TradingMonthlyDates = pd.DataFrame({'Date': {'0': '2008-12-30',
'1': '2008-12-31',
'2': '2009-01-01',
'3': '2009-01-02',
'4': '2009-01-29',
'557': '2020-06-29',
'558': '2020-06-30',
'559': '2020-07-01',
'560': '2020-07-02',
'561': '2020-07-30',
'624': '2021-11-30',
'625': '2021-12-01',
'626': '2021-12-02',
'627': '2021-12-30',
'628': '2021-12-31'}})
df_TradingMonthlyDates['Date'] = pd.to_datetime(df_TradingMonthlyDates['Date'])
df_TradingMonthlyDates['EndDate'] = pd.to_datetime(dt.datetime.now().date())
df_TradingMonthlyDates['diff'] = (df_TradingMonthlyDates['Date'] - df_TradingMonthlyDates['EndDate'])
a=min(abs(df_TradingMonthlyDates['diff']))
df_TradingMonthlyDates = df_TradingMonthlyDates.loc[(df_TradingMonthlyDates['diff'] == a)
| (df_TradingMonthlyDates['diff'] == -a)]
df_TradingMonthlyDates
output 1:
Date EndDate diff
560 2020-07-02 2020-07-11 -9 days
If you don't want the extra columns and just the date, then assign variables to create series rather than new columns:
Method 2:
d = pd.to_datetime(df_TradingMonthlyDates['Date'])
t = pd.to_datetime(dt.datetime.now().date())
e = (d-t)
a=min(abs(e))
df_TradingMonthlyDates = df_TradingMonthlyDates.loc[(e == a) | (e == -a)]
df_TradingMonthlyDates
output 2:
Date
560 2020-07-02
Method 3:
df_TradingMonthlyDates['Date'] = pd.to_datetime(df_TradingMonthlyDates['Date'])
date_output = df_TradingMonthlyDates.sort_values('Date') \
.loc[df_TradingMonthlyDates['Date'] <=
pd.to_datetime(dt.datetime.now().date())] \
.iloc[-1,:]
date_output
output 3:
Date 2020-07-02
Name: 560, dtype: datetime64[ns]
I have a data set as given below-
Timestamp = 22-05-2019 08:40 :Light = 64.00 :Temp_Soil = 20.5625 :Temp_Air = 23.1875 :Soil_Moisture_1 = 756 :Soil_Moisture_2 = 780 :Soil_Moisture_3 = 1002
Timestamp = 22-05-2019 08:42 :Light = 64.00 :Temp_Soil = 20.5625 :Temp_Air = 23.125 :Soil_Moisture_1 = 755 :Soil_Moisture_2 = 782 :Soil_Moisture_3 = 1002
And I want to Reshape(rearrange) the dataset to orient header columns like [Timestamp, Light, Temp_Soil, Temp_Air, Soil_Moisture_1, Soil_Moisture_2, Soil_Moisture_3] and their values as the row entry in Python.
One of possible solutions:
Instead of a "true" input file, I used a string:
inp="""Timestamp = 22-05-2019 08:40 :Light = 64.00 :TempSoil = 20.5625 :TempAir = 23.1875 :SoilMoist1 = 756 :SoilMoist2 = 780 :SoilMoist3 = 1002
Timestamp = 22-05-2019 08:42 :Light = 64.00 :TempSoil = 20.5625 :TempAir = 23.125 :SoilMoist1 = 755 :SoilMoist2 = 782 :SoilMoist3 = 1002"""
buf = pd.compat.StringIO(inp)
To avoid "folding" of output lines, I shortened field names.
Then let's create the result DataFrame and a list of "rows" to append to it.
For now - both of them are empty.
df = pd.DataFrame(columns=['Timestamp', 'Light', 'TempSoil', 'TempAir',
'SoilMoist1', 'SoilMoist2', 'SoilMoist3'])
src = []
Below is a loop processing input rows:
while True:
line = buf.readline()
if not(line): # EOF
break
lst = re.split(r' :', line.rstrip()) # Field list
if len(lst) < 2: # Skip empty source lines
continue
dct = {} # Source "row" (dictionary)
for elem in lst: # Process fields
k, v = re.split(r' = ', elem)
dct[k] = v # Add field : value to "row"
src.append(dct)
And the last step is to append rows from src to df :
df = df.append(src, ignore_index =True, sort=False)
When you print(df), for my test data, you will get:
Timestamp Light TempSoil TempAir SoilMoist1 SoilMoist2 SoilMoist3
0 22-05-2019 08:40 64.00 20.5625 23.1875 756 780 1002
1 22-05-2019 08:42 64.00 20.5625 23.125 755 782 1002
For now all columns are of string type, so you can change the required
columns to either float or int:
df.Light = pd.to_numeric(df.Light)
df.TempSoil = pd.to_numeric(df.TempSoil)
df.TempAir = pd.to_numeric(df.TempAir)
df.SoilMoist1 = pd.to_numeric(df.SoilMoist1)
df.SoilMoist2 = pd.to_numeric(df.SoilMoist2)
df.SoilMoist3 = pd.to_numeric(df.SoilMoist3)
Note that to_numeric() function is clever enough to recognize the possible
type to convert to, so first 3 columns changed their type to float64
and the next 3 to int64.
You can check it executing df.info().
One more possible conversion is to change Timestamp column
to DateTime type:
df.Timestamp = pd.to_datetime(df.Timestamp)
I'm trying to extract dates from txt files using datefinder.find_dates which returns a generator object. Everything works fine until I try to convert the generator to list, when i get the following error.
I have been looking around for a solution but I can't figure out a solution to this, not sure I really understand the problem neither.
import datefinder
import glob
path = "some_path/*.txt"
files = glob.glob(path)
dates_dict = {}
for name in files:
with open(name, encoding='utf8') as f:
dates_dict[name] = list(datefinder.find_dates(f.read()))
Returns :
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-53-a4b508b01fe8> in <module>()
1 for name in files:
2 with open(name, encoding='utf8') as f:
----> 3 dates_dict[name] = list(datefinder.find_dates(f.read()))
C:\ProgramData\Anaconda3\lib\site-packages\datefinder\__init__.py in
find_dates(self, text, source, index, strict)
29 ):
30
---> 31 as_dt = self.parse_date_string(date_string, captures)
32 if as_dt is None:
33 ## Dateutil couldn't make heads or tails of it
C:\ProgramData\Anaconda3\lib\site-packages\datefinder\__init__.py in
parse_date_string(self, date_string, captures)
99 # otherwise self._find_and_replace method might corrupt
them
100 try:
--> 101 as_dt = parser.parse(date_string, default=self.base_date)
102 except ValueError:
103 # replace tokens that are problematic for dateutil
C:\ProgramData\Anaconda3\lib\site-packages\dateutil\parser\_parser.py in
parse(timestr, parserinfo, **kwargs)
1354 return parser(parserinfo).parse(timestr, **kwargs)
1355 else:
-> 1356 return DEFAULTPARSER.parse(timestr, **kwargs)
1357
1358
C:\ProgramData\Anaconda3\lib\site-packages\dateutil\parser\_parser.py in
parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
651 raise ValueError("String does not contain a date:",
timestr)
652
--> 653 ret = self._build_naive(res, default)
654
655 if not ignoretz:
C:\ProgramData\Anaconda3\lib\site-packages\dateutil\parser\_parser.py in
_build_naive(self, res, default)
1222 cday = default.day if res.day is None else res.day
1223
-> 1224 if cday > monthrange(cyear, cmonth)[1]:
1225 repl['day'] = monthrange(cyear, cmonth)[1]
1226
C:\ProgramData\Anaconda3\lib\calendar.py in monthrange(year, month)
122 if not 1 <= month <= 12:
123 raise IllegalMonthError(month)
--> 124 day1 = weekday(year, month, 1)
125 ndays = mdays[month] + (month == February and isleap(year))
126 return day1, ndays
C:\ProgramData\Anaconda3\lib\calendar.py in weekday(year, month, day)
114 """Return weekday (0-6 ~ Mon-Sun) for year (1970-...), month(1- 12),
115 day (1-31)."""
--> 116 return datetime.date(year, month, day).weekday()
117
118
OverflowError: Python int too large to convert to C long
Can someone explain this clearly?
Thanks in advance
REEDIT : After taking into consideration the remarks that were made, I found a minimal, readable and verifiable example. The error occurs on :
import datefinder
generator = datefinder.find_dates("466990103060049")
for s in generator:
pass
This looks to be a bug in the library you are using. It is trying to parse the string as a year, but that this year is too big to be handled by Python. The library that datefinder is using says that it raises an OverflowError in this instance, but that datefinder is ignoring this possibility.
One quick and dirty hack just to get it working would be to do:
>>> datefinder.ValueError = ValueError, OverflowError
>>> list(datefinder.find_dates("2019/02/01 is a date and 466990103060049 is not"))
[datetime.datetime(2019, 2, 1, 0, 0)]