OverflowError when trying to convert generators to lists - python

I'm trying to extract dates from txt files using datefinder.find_dates which returns a generator object. Everything works fine until I try to convert the generator to list, when i get the following error.
I have been looking around for a solution but I can't figure out a solution to this, not sure I really understand the problem neither.
import datefinder
import glob
path = "some_path/*.txt"
files = glob.glob(path)
dates_dict = {}
for name in files:
with open(name, encoding='utf8') as f:
dates_dict[name] = list(datefinder.find_dates(f.read()))
Returns :
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-53-a4b508b01fe8> in <module>()
1 for name in files:
2 with open(name, encoding='utf8') as f:
----> 3 dates_dict[name] = list(datefinder.find_dates(f.read()))
C:\ProgramData\Anaconda3\lib\site-packages\datefinder\__init__.py in
find_dates(self, text, source, index, strict)
29 ):
30
---> 31 as_dt = self.parse_date_string(date_string, captures)
32 if as_dt is None:
33 ## Dateutil couldn't make heads or tails of it
C:\ProgramData\Anaconda3\lib\site-packages\datefinder\__init__.py in
parse_date_string(self, date_string, captures)
99 # otherwise self._find_and_replace method might corrupt
them
100 try:
--> 101 as_dt = parser.parse(date_string, default=self.base_date)
102 except ValueError:
103 # replace tokens that are problematic for dateutil
C:\ProgramData\Anaconda3\lib\site-packages\dateutil\parser\_parser.py in
parse(timestr, parserinfo, **kwargs)
1354 return parser(parserinfo).parse(timestr, **kwargs)
1355 else:
-> 1356 return DEFAULTPARSER.parse(timestr, **kwargs)
1357
1358
C:\ProgramData\Anaconda3\lib\site-packages\dateutil\parser\_parser.py in
parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
651 raise ValueError("String does not contain a date:",
timestr)
652
--> 653 ret = self._build_naive(res, default)
654
655 if not ignoretz:
C:\ProgramData\Anaconda3\lib\site-packages\dateutil\parser\_parser.py in
_build_naive(self, res, default)
1222 cday = default.day if res.day is None else res.day
1223
-> 1224 if cday > monthrange(cyear, cmonth)[1]:
1225 repl['day'] = monthrange(cyear, cmonth)[1]
1226
C:\ProgramData\Anaconda3\lib\calendar.py in monthrange(year, month)
122 if not 1 <= month <= 12:
123 raise IllegalMonthError(month)
--> 124 day1 = weekday(year, month, 1)
125 ndays = mdays[month] + (month == February and isleap(year))
126 return day1, ndays
C:\ProgramData\Anaconda3\lib\calendar.py in weekday(year, month, day)
114 """Return weekday (0-6 ~ Mon-Sun) for year (1970-...), month(1- 12),
115 day (1-31)."""
--> 116 return datetime.date(year, month, day).weekday()
117
118
OverflowError: Python int too large to convert to C long
Can someone explain this clearly?
Thanks in advance
REEDIT : After taking into consideration the remarks that were made, I found a minimal, readable and verifiable example. The error occurs on :
import datefinder
generator = datefinder.find_dates("466990103060049")
for s in generator:
pass

This looks to be a bug in the library you are using. It is trying to parse the string as a year, but that this year is too big to be handled by Python. The library that datefinder is using says that it raises an OverflowError in this instance, but that datefinder is ignoring this possibility.
One quick and dirty hack just to get it working would be to do:
>>> datefinder.ValueError = ValueError, OverflowError
>>> list(datefinder.find_dates("2019/02/01 is a date and 466990103060049 is not"))
[datetime.datetime(2019, 2, 1, 0, 0)]

Related

PyPDF2 Font Read Issue

I'm writing a script to automate extracting data from pdfs I receive. I'm using PyPDF2 to read the pdfs and extract the text to be interpreted. I've tested pdfs with two different formats. The script works perfectly for the first format. When trying it with the second format I'm getting an indexing error (below). After troubleshooting I've found the issue is due to the font used in the second format. They use "Roboto" while the first, successful format, uses Arial.
I've attached stripped-down versions of the pdfs that are causing issues. One in Roboto and one I manually changed to Arial.
https://drive.google.com/drive/folders/1BhaXPfNyLx8euR2dPQaTqdHvtYJg8yEh?usp=sharing
The snippet of code here is where I'm running into the issue:
import PyPDF2
pdf_roboto = r"C:\Users\Robert.Smyth\Python\test_pdf_roboto.pdf"
pdf_arial = r"C:\Users\Robert.Smyth\Python\test_pdf_arial.pdf"
reader = PyPDF2.PdfFileReader(pdf_roboto)
pageObj = reader.pages[0]
pages_text = pageObj.extractText()
The indexing error I'm getting is:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
C:\Users\ROBERT~1.SMY\AppData\Local\Temp/ipykernel_22076/669450932.py in <module>
1 reader = PyPDF2.PdfFileReader(pdf_roboto)
2 pageObj = reader.pages[0]
----> 3 pages_text = pageObj.extractText()
~\Anaconda3\lib\site-packages\PyPDF2\_page.py in extractText(self, Tj_sep, TJ_sep)
1539 """
1540 deprecate_with_replacement("extractText", "extract_text")
-> 1541 return self.extract_text()
1542
1543 def _get_fonts(self) -> Tuple[Set[str], Set[str]]:
~\Anaconda3\lib\site-packages\PyPDF2\_page.py in extract_text(self, Tj_sep, TJ_sep, orientations, space_width, *args)
1511 orientations = (orientations,)
1512
-> 1513 return self._extract_text(
1514 self, self.pdf, orientations, space_width, PG.CONTENTS
1515 )
~\Anaconda3\lib\site-packages\PyPDF2\_page.py in _extract_text(self, obj, pdf, orientations, space_width, content_key)
1144 if "/Font" in resources_dict:
1145 for f in cast(DictionaryObject, resources_dict["/Font"]):
-> 1146 cmaps[f] = build_char_map(f, space_width, obj)
1147 cmap: Tuple[Union[str, Dict[int, str]], Dict[str, str], str] = (
1148 "charmap",
~\Anaconda3\lib\site-packages\PyPDF2\_cmap.py in build_char_map(font_name, space_width, obj)
20 space_code = 32
21 encoding, space_code = parse_encoding(ft, space_code)
---> 22 map_dict, space_code, int_entry = parse_to_unicode(ft, space_code)
23
24 # encoding can be either a string for decode (on 1,2 or a variable number of bytes) of a char table (for 1 byte only for me)
~\Anaconda3\lib\site-packages\PyPDF2\_cmap.py in parse_to_unicode(ft, space_code)
187 cm = prepare_cm(ft)
188 for l in cm.split(b"\n"):
--> 189 process_rg, process_char = process_cm_line(
190 l.strip(b" "), process_rg, process_char, map_dict, int_entry
191 )
~\Anaconda3\lib\site-packages\PyPDF2\_cmap.py in process_cm_line(l, process_rg, process_char, map_dict, int_entry)
247 process_char = False
248 elif process_rg:
--> 249 parse_bfrange(l, map_dict, int_entry)
250 elif process_char:
251 parse_bfchar(l, map_dict, int_entry)
~\Anaconda3\lib\site-packages\PyPDF2\_cmap.py in parse_bfrange(l, map_dict, int_entry)
256 lst = [x for x in l.split(b" ") if x]
257 a = int(lst[0], 16)
--> 258 b = int(lst[1], 16)
259 nbi = len(lst[0])
260 map_dict[-1] = nbi // 2
IndexError: list index out of range
I've found that if I use the exact same pdf and all I change is the font from Roboto to Arial, PyPDF2 has no problem extracting the text. I've searched online and in the PyPDF2 documentation but I can't find any solution on how to get it to extract text in the Roboto font, or add the Roboto font to the PyPDF2 font library.
I'd really appreciate if anyone could provide some advice on how to solve this issue.
Note: manually changing the font from Roboto to Arial isn't a desirable option as I receive hundreds of these invoices monthly.

python-binance : How I can use time parameter to the function?

I tried to use this function but I got error.
I think need to change date format of time parameters.
now = datetime.now()
past = now - timedelta(days=2)
past = str(past)
bars = client.get_historical_klines("BTCUSDT", "1m", start_str = past, end_str = None, limit = 1000)
But I got error..
When I delete the start_str and end_str, it works.
How I can handle the date str for this function.
Could you help me?!(example is the best!)
---------ERROR------------
TypeError Traceback (most recent call last)
Input In [46], in <cell line: 1>()
----> 1 bars = client.get_historical_klines(symbol="BTCUSDT", interval="1m",
2 start_str=past, end_str=None, limit=1000)
File ~/opt/anaconda3/lib/python3.9/site-packages/binance/client.py:934, in Client.get_historical_klines(self, symbol, interval, start_str, end_str, limit, klines_type)
914 def get_historical_klines(self, symbol, interval, start_str=None, end_str=None, limit=1000,
915 klines_type: HistoricalKlinesType = HistoricalKlinesType.SPOT):
916 """Get Historical Klines from Binance
917
918 :param symbol: Name of symbol pair e.g BNBBTC
(...)
932
933 """
--> 934 return self._historical_klines(
935 symbol, interval, start_str=start_str, end_str=end_str, limit=limit, klines_type=klines_type
936 )
File ~/opt/anaconda3/lib/python3.9/site-packages/binance/client.py:969, in Client._historical_klines(self, symbol, interval, start_str, end_str, limit, klines_type)
966 timeframe = interval_to_milliseconds(interval)
968 # if a start time was passed convert it
--> 969 start_ts = convert_ts_str(start_str)
971 # establish first available start timestamp
972 if start_ts is not None:
File ~/opt/anaconda3/lib/python3.9/site-packages/binance/helpers.py:76, in convert_ts_str(ts_str)
74 if type(ts_str) == int:
75 return ts_str
---> 76 return date_to_milliseconds(ts_str)
File ~/opt/anaconda3/lib/python3.9/site-packages/binance/helpers.py:24, in date_to_milliseconds(date_str)
22 epoch: datetime = datetime.utcfromtimestamp(0).replace(tzinfo=pytz.utc)
23 # parse our date string
---> 24 d: Optional[datetime] = dateparser.parse(date_str, settings={'TIMEZONE': "UTC"})
25 if not d:
26 raise UnknownDateFormat(date_str)
File ~/opt/anaconda3/lib/python3.9/site-packages/dateparser/conf.py:92, in apply_settings.<locals>.wrapper(*args, **kwargs)
89 if not isinstance(kwargs['settings'], Settings):
90 raise TypeError("settings can only be either dict or instance of Settings class")
---> 92 return f(*args, **kwargs)
File ~/opt/anaconda3/lib/python3.9/site-packages/dateparser/__init__.py:61, in parse(date_string, date_formats, languages, locales, region, settings, detect_languages_function)
57 if languages or locales or region or detect_languages_function or not settings._default:
58 parser = DateDataParser(languages=languages, locales=locales,
59 region=region, settings=settings, detect_languages_function=detect_languages_function)
---> 61 data = parser.get_date_data(date_string, date_formats)
63 if data:
64 return data['date_obj']
File ~/opt/anaconda3/lib/python3.9/site-packages/dateparser/date.py:419, in DateDataParser.get_date_data(self, date_string, date_formats)
376 """
377 Parse string representing date and/or time in recognizable localized formats.
378 Supports parsing multiple languages and timezones.
(...)
416
417 """
418 if not isinstance(date_string, str):
--> 419 raise TypeError('Input type must be str')
421 res = parse_with_formats(date_string, date_formats or [], self._settings)
422 if res['date_obj']:
TypeError: Input type must be str
binance.client.Client.get_historical_klines() takes int or str as input value for start_str, see the documentation of this method:
def get_historical_klines(self, symbol, interval, start_str, end_str=None, limit=500,
klines_type: HistoricalKlinesType = HistoricalKlinesType.SPOT):
"""Get Historical Klines from Binance
:param symbol: Name of symbol pair e.g BNBBTC
:type symbol: str
:param interval: Binance Kline interval
:type interval: str
:param start_str: Start date string in UTC format or timestamp in milliseconds
:type start_str: str|int
:param end_str: optional - end date string in UTC format or timestamp in milliseconds (default will fetch everything up to now)
:type end_str: str|int
:param limit: Default 500; max 1000.
:type limit: int
:param klines_type: Historical klines type: SPOT or FUTURES
:type klines_type: HistoricalKlinesType
:return: list of OHLCV values
"""
return self._historical_klines(symbol, interval, start_str, end_str=end_str, limit=limit, klines_type=klines_type)
You're trying to pass an datetime.datetime object.
You can convert this object to a timestamp (ms) with:
import datetime as dt
now = dt.datetime.now(dt.timezone.utc)
past = now - dt.timedelta(days=2)
# Gives you a timestamp in ms
past_timestamp_ms = int(round(past.timestamp() * 1000, 0))

Error with pandas.dt while extracting year from a date

The data in test.csv are like this:
TIMESTAMP POLYLINE
0 1408039037 [[-8.585676,41.148522],[-8.585712,41.148639],[...
1 1408038611 [[-8.610876,41.14557],[-8.610858,41.145579],[-...
2 1408038568 [[-8.585739,41.148558],[-8.58573,41.148828],[-...
3 1408039090 [[-8.613963,41.141169],[-8.614125,41.141124],[...
4 1408039177 [[-8.619903,41.148036],[-8.619894,41.148036]]
.. ... ...
315 1419171485 [[-8.570196,41.159484],[-8.570187,41.158962],[...
316 1419170802 [[-8.613873,41.141232],[-8.613882,41.141241],[...
317 1419172121 [[-8.6481,41.152536],[-8.647461,41.15241],[-8....
318 1419171980 [[-8.571699,41.156073],[-8.570583,41.155929],[...
319 1419171420 [[-8.574561,41.180184],[-8.572248,41.17995],[-...
[320 rows x 2 columns]
I read them from csv file in this way:
train = pd.read_csv("path/train.csv",engine='python',error_bad_lines=False)
So, I have this timestamp in Unix format. I want to convert in UTC time and then extract year, month, day and so on.
This is the code for the conversion from Unix timestamp to UTC date time:
train["TIMESTAMP"] = [float(time) for time in train["TIMESTAMP"]]
train["data_time"] = [datetime.datetime.fromtimestamp(time, datetime.timezone.utc) for time in train["TIMESTAMP"]]
To extract year and other information I do this:
train["year"] = train["data_time"].dt.year
train["month"] = train["data_time"].dt.month
train["day"] = train["data_time"].dt.day
train["hour"] = train["data_time"].dt.hour
train["min"] = train["data_time"].dt.minute
But I obtain this error when the execution arrives at the extraction point:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-30-d2249cabe965> in <module>()
67 train["TIMESTAMP"] = [float(time) for time in train["TIMESTAMP"]]
68 train["data_time"] = [datetime.datetime.fromtimestamp(time, datetime.timezone.utc) for time in train["TIMESTAMP"]]
---> 69 train["year"] = train["data_time"].dt.year
70 train["month"] = train["data_time"].dt.month
71 train["day"] = train["data_time"].dt.day
2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/accessors.py in __new__(cls, data)
478 return PeriodProperties(data, orig)
479
--> 480 raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values
I also read a lot of similiar discussion but I can't figure out why I obtain this error.
Edited:
So the train["TIMESTAMP"] data are like this:
1408039037
1408038611
1408039090
Then I do this with this data:
train["TIMESTAMP"] = [float(time) for time in train["TIMESTAMP"]]
train["data_time"] = [datetime.datetime.fromtimestamp(time, datetime.timezone.utc) for time in train["TIMESTAMP"]]
train["year"] = train["data_time"].dt.year
train["month"] = train["data_time"].dt.month
train["day"] = train["data_time"].dt.day
train["hour"] = train["data_time"].dt.hour
train["min"] = train["data_time"].dt.minute
train = train[["year", "month", "day", "hour","min"]]

building panda dataframe from cloudant data, error: If using all scalar values, you must pass an index

I'm just starting with pandas. All the answers I found for the error message do not resolve my error. I'm trying to build a dataframe from a dictionary constructed from an IBM cloudant query. I'm using a jupyter notebook. The specific error message is: If using all scalar values, you must pass an index
the section of code where I think my error is, is here:
def read_high_low_temp(location):
USERNAME = "*************"
PASSWORD = "*************"
client = Cloudant(USERNAME,PASSWORD, url = "https://**********" )
client.connect()
my_database = client["temps"]
query = Query(my_database,selector= {'_id': {'$gt': 0}, 'l':location, 'd':dt.datetime.now().strftime("%m-%d-%Y")}, fields=['temp','t','d'],sort=[{'temp': 'desc'}])
temp_dict={}
temp_dict=query(limit=1000, skip=5)['docs']
df = pd.DataFrame(columns = ['Temperature','Time','Date'])
df.set_index('Time', inplace= True)
for row in temp_dict:
value_list.append(row['temp'])
temp_df=pd.DataFrame({'Temperature':row['temp'],'Time':row['t'], 'Date':row['d']}, index=['Time'])
df=df.append(temp_df)
message="the highest temp in the " + location + " is: " + str(max(value_list)) + " the lowest " + str(min(value_list))
return message, df
my data (Output from Jupyter) looks like this:
Temperature Time Date
Time 51.6 05:07:18 12-31-2020
Time 51.6 04:59:00 12-31-2020
Time 51.5 04:50:31 12-31-2020
Time 51.5 05:15:38 12-31-2020
Time 51.5 05:03:09 12-31-2020
... ... ... ...
Time 45.3 11:56:34 12-31-2020
Time 45.3 11:52:22 12-31-2020
Time 45.3 11:14:15 12-31-2020
Time 45.2 10:32:05 12-31-2020
Time 45.2 10:36:22 12-31-2020
[164 rows x 3 columns]
my full code looks like:
import numpy as np
import pandas as pd
import seaborn as sns
import os, shutil, glob, time, subprocess, re, sys, sqlite3, logging
#import RPi.GPIO as GPIO
from datetime import datetime
import datetime as dt
import cloudant
from cloudant.client import Cloudant
from cloudant.query import Query
from cloudant.result import QueryResult
from cloudant.error import ResultException
import seaborn as sns
def read_high_low_temp(location):
USERNAME = "******"
PASSWORD = "******"
client = Cloudant(USERNAME,PASSWORD, url = "********" )
client.connect()
# location='Backyard'
my_database = client["temps"]
query = Query(my_database,selector= {'_id': {'$gt': 0}, 'l':location, 'd':dt.datetime.now().strftime("%m-%d-%Y")}, fields=['temp','t','d'],sort=[{'temp': 'desc'}])
temp_dict={}
temp_dict=query(limit=1000, skip=5)['docs']
df = pd.DataFrame(columns = ['Temperature','Time','Date'])
df.set_index('Time')
for row in temp_dict:
temp_df=pd.DataFrame({'Temperature':row['temp'],'Time':row['t'], 'Date':row['d']}, index=['Time'])
df=df.append(temp_df)
message="the highest temp in the " + location + " is: " + str(max(value_list)) + " the lowest " + str(min(value_list))
return message, df
print ("Cloudant Jupyter Query test\nThe hour = ",dt.datetime.now().hour)
msg1, values=read_high_low_temp("Backyard")
print (msg1)
print(values)
sns.lineplot(values)
The full error message from Jupyter is:
C:\Users\ustl02870\AppData\Local\Programs\Python\Python37\lib\site-packages\seaborn\_decorators.py:43: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
FutureWarning
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-34956d8dafb0> in <module>
53
54 #df = sns.load_dataset(values)
---> 55 sns.lineplot(values)
56 #print (values)
~\AppData\Local\Programs\Python\Python37\lib\site-packages\seaborn\_decorators.py in inner_f(*args, **kwargs)
44 )
45 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46 return f(**kwargs)
47 return inner_f
48
~\AppData\Local\Programs\Python\Python37\lib\site-packages\seaborn\relational.py in lineplot(x, y, hue, size, style, data, palette, hue_order, hue_norm, sizes, size_order, size_norm, dashes, markers, style_order, units, estimator, ci, n_boot, seed, sort, err_style, err_kws, legend, ax, **kwargs)
686 data=data, variables=variables,
687 estimator=estimator, ci=ci, n_boot=n_boot, seed=seed,
--> 688 sort=sort, err_style=err_style, err_kws=err_kws, legend=legend,
689 )
690
~\AppData\Local\Programs\Python\Python37\lib\site-packages\seaborn\relational.py in __init__(self, data, variables, estimator, ci, n_boot, seed, sort, err_style, err_kws, legend)
365 )
366
--> 367 super().__init__(data=data, variables=variables)
368
369 self.estimator = estimator
~\AppData\Local\Programs\Python\Python37\lib\site-packages\seaborn\_core.py in __init__(self, data, variables)
602 def __init__(self, data=None, variables={}):
603
--> 604 self.assign_variables(data, variables)
605
606 for var, cls in self._semantic_mappings.items():
~\AppData\Local\Programs\Python\Python37\lib\site-packages\seaborn\_core.py in assign_variables(self, data, variables)
666 self.input_format = "long"
667 plot_data, variables = self._assign_variables_longform(
--> 668 data, **variables,
669 )
670
~\AppData\Local\Programs\Python\Python37\lib\site-packages\seaborn\_core.py in _assign_variables_longform(self, data, **kwargs)
924 # Construct a tidy plot DataFrame. This will convert a number of
925 # types automatically, aligning on index in case of pandas objects
--> 926 plot_data = pd.DataFrame(plot_data)
927
928 # Reduce the variables dictionary to fields with valid data
~\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
527
528 elif isinstance(data, dict):
--> 529 mgr = init_dict(data, index, columns, dtype=dtype)
530 elif isinstance(data, ma.MaskedArray):
531 import numpy.ma.mrecords as mrecords
~\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py in init_dict(data, index, columns, dtype)
285 arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
286 ]
--> 287 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
288
289
~\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity)
78 # figure out the index, if necessary
79 if index is None:
---> 80 index = extract_index(arrays)
81 else:
82 index = ensure_index(index)
~\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\construction.py in extract_index(data)
389
390 if not indexes and not raw_lengths:
--> 391 raise ValueError("If using all scalar values, you must pass an index")
392
393 if have_series:
ValueError: If using all scalar values, you must pass an index
I resolved my problem with help/direction from #Ena, as it turned out I made several mistake. In layman's terms 1) I was trying to plot a tuple when it should have been a dataframe, 2) My data was in a dictionary, I was iterating through it trying to build a tuple when I should used built in panda tools to build a dataframe right from the dictionary 3) my code should have been written so as to NOT have scalar values so as NOT to need an index, and finally 4) I was trying to use a tuple as data for my seaborn plot when it should have been a dataframe. Here is the code that now works.
#!/usr/bin/env python
# coding: utf-8
import numpy as np
import pandas as pd
import seaborn as sns
import os, shutil, glob, time, subprocess, sys
from datetime import datetime
import datetime as dt
from matplotlib import pyplot as plt
import cloudant
from cloudant.client import Cloudant
from cloudant.query import Query
from cloudant.result import QueryResult
from cloudant.error import ResultException
import seaborn as sns
def read_high_low_temp(location):
USERNAME = "****************"
PASSWORD = "*****************"
client = Cloudant(USERNAME,PASSWORD, url = "**************************" )
client.connect()
my_database = client["temps"]
query = Query(my_database,selector= {'_id': {'$gt': 0}, 'l':location, 'd':dt.datetime.now().strftime("%m-%d-%Y")}, fields=['temp','t','d'],sort=[{'t': 'asc'}])
temp_dict={}
temp_dict=query(limit=1000, skip=5)['docs']
df = pd.DataFrame(temp_dict)
value_list=[]
for row in temp_dict:
value_list.append(row['temp'])
message="the highest temp in the " + location + " is: " + str(max(value_list)) + " the lowest " + str(min(value_list))
return message, df
msg1, values=read_high_low_temp("Backyard")
g=sns.catplot(x='t', y='temp', data=values, kind='bar',color="darkblue",height=8.27, aspect=11.7/8.27)
print("the minimum temp is:", values['temp'].min(), " the maximum temp is:", values['temp'].max())
plt.xticks(rotation=45)
g.set(xlabel='Time', ylabel='Temperature')
plt.ylim(values['temp'].min()-1, values['temp'].max()+1)
plt.savefig("2021-01-01-temperature graph.png")
g.set_xticklabels(step=10)
The problem is that you assigned "Time" as an index everywhere. Look how the data frame looks in seaborn.lineplot documentation: https://seaborn.pydata.org/generated/seaborn.lineplot.html
Can you try without this df.set_index('Time') part?

how to fix - error: bad escape \u at position 0

Hello I'm trying to export a gmap html using ipywidgets in jupyter notebook but am encountering the following error: - error: bad escape \u at position 0.
I'm new to programing and could use help fixing whatever is causing this error to occur. If there is any easier way to go about exporting the html file I'm happy to change approaches.
Thanks
Here is a snippet of the code: I can add the entire thing if its helpful.
import pandas as pd
import gmaps
from ipywidgets.embed import embed_minimal_html
from ipywidgets import IntSlider
gmaps.configure(api_key='XXXX')
pd.options.mode.chained_assignment = None # default='warn'
file2 = '005 lat:long.csv'
state2 = pd.read_csv(file2)
state2 = state2.rename(columns={'Address1': 'address', 'City':'city',
'State':'state', 'Zip': 'zip'})
storenumbs = state2['Store'].str.split('#', expand=True)
state2 = state2.join(storenumbs)
state2 = state2.drop(['Store', 0], axis=1)
state2 = state2.rename(columns={1: 'store_#'})
state2['store_#'] = state2['store_#'].astype(int)
fig = gmaps.figure(center=(42.5, -71.4), map_type='TERRAIN', zoom_level=9.8)
scale = 4
one_layer = (gmaps.symbol_layer(low_points_lat_long, fill_color='red', stroke_color='red', scale= scale))
two_layer = (gmaps.symbol_layer(low_med_points_lat_long, fill_color='red', stroke_color='yellow', scale= scale))
three_layer = (gmaps.symbol_layer(med_high_points_lat_long, fill_color='yellow', stroke_color='green', scale= scale))
four_layer = (gmaps.symbol_layer(high_points_lat_long, fill_color='green', stroke_color='green', scale= scale))
fig.add_layer(one_layer)
fig.add_layer(two_layer)
fig.add_layer(three_layer)
fig.add_layer(four_layer)
fig
embed_minimal_html('export.html', views=[fig]
Long Form Error Bellow
)
KeyError Traceback (most recent call last)
~/miniconda3/lib/python3.7/sre_parse.py in parse_template(source, pattern)
1020 try:
-> 1021 this = chr(ESCAPES[this][1])
1022 except KeyError:
KeyError: '\\u'
During handling of the above exception, another exception occurred:
error Traceback (most recent call last)
<ipython-input-7-c096ac365396> in <module>
20
21 slider = IntSlider(value=40)
---> 22 embed_minimal_html('export.html', views=[slider], title='Widgets export')
~/miniconda3/lib/python3.7/site-packages/ipywidgets/embed.py in embed_minimal_html(fp, views, title, template, **kwargs)
300 {embed_kwargs}
301 """
--> 302 snippet = embed_snippet(views, **kwargs)
303
304 values = {
~/miniconda3/lib/python3.7/site-packages/ipywidgets/embed.py in embed_snippet(views, drop_defaults, state, indent, embed_url, requirejs, cors)
266 widget_views = u'\n'.join(
267 widget_view_template.format(view_spec=escape_script(json.dumps(view_spec)))
--> 268 for view_spec in data['view_specs']
269 )
270
~/miniconda3/lib/python3.7/site-packages/ipywidgets/embed.py in <genexpr>(.0)
266 widget_views = u'\n'.join(
267 widget_view_template.format(view_spec=escape_script(json.dumps(view_spec)))
--> 268 for view_spec in data['view_specs']
269 )
270
~/miniconda3/lib/python3.7/site-packages/ipywidgets/embed.py in escape_script(s)
239 involving `<` is readable.
240 """
--> 241 return script_escape_re.sub(r'\u003c\1', s)
242
243 #doc_subst(_doc_snippets)
~/miniconda3/lib/python3.7/re.py in _subx(pattern, template)
307 def _subx(pattern, template):
308 # internal: Pattern.sub/subn implementation helper
--> 309 template = _compile_repl(template, pattern)
310 if not template[0] and len(template[1]) == 1:
311 # literal replacement
~/miniconda3/lib/python3.7/re.py in _compile_repl(repl, pattern)
298 def _compile_repl(repl, pattern):
299 # internal: compile replacement pattern
--> 300 return sre_parse.parse_template(repl, pattern)
301
302 def _expand(pattern, match, template):
~/miniconda3/lib/python3.7/sre_parse.py in parse_template(source, pattern)
1022 except KeyError:
1023 if c in ASCIILETTERS:
-> 1024 raise s.error('bad escape %s' % this, len(this))
1025 lappend(this)
1026 else:
error: bad escape \u at position 0
This is an error in Python 3.7, and an issue with Python 3.6 (but it is OK with Python 2.7).
If you use a raw string (prefixed by "r") for the replacement in re.sub function, then the \u is escaped. For instance, r'\u003c\1' is like '\\u003c\\1': this is a string '\u', followed by '003c' and \1.
The solution is to write:
return script_escape_re.sub('\u003c\\1', s)
Quoting the documentation:
Changed in version 3.7: Unknown escapes in repl consisting of '\' and an ASCII letter now are errors.
I was facing a similar issue while trying to escape Unicode characters that have the pattern \uXXXX. Let's take a string containing Unicode characters:
>>> text = "The \u201c\u3010\u3011\u201d in this template are used to mark the variables"
>>> text
'The “【】” in this template are used to mark the variables'
Escape the Unicode characters:
>>> text = text.encode('unicode_escape').decode('ascii')
>>> text
'The \\u201c\\u3010\\u3011\\u201d in this template are used to mark the variables'
And then replace them using re.sub(r'\\u(.){4}', '', text):
>>> import re
>>> re.sub(r'\\u(.){4}', '', text)
'The in this template are used to mark the variables'
I have had the same issue during
[m.start() for m in re.finditer('Valuation Date")', 'dummytext')]
*** sre_constants.error: unbalanced parenthesis at position 15
But it was solved with re.escape help
[m.start() for m in re.finditer(re.escape('Valuation Date")'), 'dummytext')]
Enjoy.

Categories

Resources