I am trying to get the counter to count which date appears most in the code below.
from collections import Counter
with open('dates.json', 'rb') as f:
data = f.readlines()
c = Counter(data)
print (c.most_common()[:10])
the JSON data is stored as a list like
["Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016"]
I would expect the output to be something similar to this (grabbed from another program)
[('Sun Aug 07 02:29:45 +0000 2016', 4), ('Sun Aug 07 02:31:05 +0000 2016', 4), ('Sun Aug 07 02:31:04 +0000 2016', 3), ('Sun Aug 07 02:31:08 +0000 2016', 3), ('Sun Aug 07 02:31:22 +0000 2016', 3)]
But this is my output
[(48, 72), (32, 53), (49, 27), (34, 18), (117, 18), (58, 18), (65, 9), (51, 9), (103, 9), (43, 9)]
I dont really understand what its counting there
Instead of readlines(), you should use json.load() to load the JSON data into a Python list:
import json
with open('dates.json', 'r') as f:
data = json.load(f)
Related
I have the following dataset
id
date
7510
15 Jun 2020
7510
16 Jun 2020
7512
15 Jun 2020
7512
07 Jul 2020
7520
15 Jun 2020
7520
16 Aug 2020
I need to convert this to a dictionary which is quite straight forward, but need each unique id as a key and all corresponding values as values to the unique key.
for example;
dictionary = {7510: ["15 Jun 2020", "16 Jun 2020"], 7512: ["15 Jun 2020", "07 Jul 2020"],
7520: ["15 Jun 2020", "16 Aug 2020"] }
Try this:
df.groupby('id')['date'].agg(list).to_dict()
Output:
{7510: ['15 Jun 2020', '16 Jun 2020'],
7512: ['15 Jun 2020', '07 Jul 2020'],
7520: ['15 Jun 2020', '16 Aug 2020']}
This is my code:
start = '2015-1-1'
end = '2020-12-31'
source = 'yahoo'
google = data.DataReader('GOOG', start=start, end=end, data_source=source).reset_index()
I was using this code till last month and it was working properly, after a month I tried this code and now this code is throwing me error:
Unable to read URL: https://finance.yahoo.com/quote/GOOG/history?period1=1420065000&period2=1609453799&interval=1d&frequency=1d&filter=history
I am not able to figure it out, can you please make me understand, why is this happening?
Yahoo! Finance has changed slightly their structure. Now requires headers for the data retreival on the http request. Once done works fine.
For pandas & pandas-datareader which you'll need to upgrade them if you use it. (Which has been already sorted). Probably on all other packages using data from yahoo! such backtrader, etc, you'll need either upgrade or add headers on the yahoo! script to retrieve data :).
pip install --upgrade pandas
pip install --upgrade pandas-datareader
Have a nice day ;).
Please upgrade the pandas_datareader to a version >= 0.10.0 . This bug is fixed in 0.10.0 as per the release notes.
Fixed Yahoo readers which now require headers
Yahoo! Finance is working fine with pandas without any issue.
Script:
import pandas as pd
import requests
link = 'https://finance.yahoo.com/quote/GOOG/history?period1=1420065000&period2=1609453799&interval=1d&frequency=1d&filter=history'
r = requests.get(link, headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'})
data = pd.read_html(r.text)[0]
df =pd.DataFrame(data)
df=df.iloc[0:100]
print(df)
Output:
Date Open High Low Close AdjClose Volume
Dec 31, 2020 1735.42 1758.93 1735.42 1751.88 1751.88 1011900
Dec 30, 2020 1762.01 1765.09 1725.6 1739.52 1739.52 1306100
Dec 29, 2020 1787.79 1792.44 1756.09 1758.72 1758.72 1299400
Dec 28, 2020 1751.64 1790.73 1746.33 1776.09 1776.09 1393000
Dec 24, 2020 1735 1746 1729.11 1738.85 1738.85 346800
Dec 23, 2020 1728.11 1747.99 1725.04 1732.38 1732.38 1033800
Dec 22, 2020 1734.43 1737.41 1712.57 1723.5 1723.5 936700
Dec 21, 2020 1713.51 1740.85 1699 1739.37 1739.37 1828400
Dec 18, 2020 1754.18 1755.11 1720.22 1731.01 1731.01 4016400
Dec 17, 2020 1768.51 1771.78 1738.66 1747.9 1747.9 1624700
Dec 16, 2020 1772.88 1773 1756.08 1763 1763 1513500
Dec 15, 2020 1764.42 1771.42 1749.95 1767.77 1767.77 1482300
Dec 14, 2020 1775 1797.39 1757.21 1760.06 1760.06 1600200
Dec 11, 2020 1763.06 1784.45 1760 1781.77 1781.77 1220700
Dec 10, 2020 1769.8 1781.31 1740.32 1775.33 1775.33 1362800
Dec 09, 2020 1812.01 1834.27 1767.81 1784.13 1784.13 1507600
Dec 08, 2020 1810.1 1821.9 1796.2 1818.55 1818.55 1096300
Dec 07, 2020 1819 1832.37 1805.78 1819.48 1819.48 1320900
Dec 04, 2020 1824.52 1833.16 1816.99 1827.99 1827.99 1378200
Dec 03, 2020 1824.01 1847.2 1822.65 1826.77 1826.77 1227300
Dec 02, 2020 1798.1 1835.65 1789.47 1827.95 1827.95 1222000
Dec 01, 2020 1774.37 1824.83 1769.37 1798.1 1798.1 1736900
Nov 30, 2020 1781.18 1788.06 1755 1760.74 1760.74 1823800
Nov 27, 2020 1773.09 1804 1772.44 1793.19 1793.19 884900
Nov 25, 2020 1772.89 1778.54 1756.54 1771.43 1771.43 1045800
Nov 24, 2020 1730.5 1771.6 1727.69 1768.88 1768.88 1578000
Nov 23, 2020 1749.6 1753.9 1717.72 1734.86 1734.86 2161600
Nov 20, 2020 1765.21 1774 1741.86 1742.19 1742.19 2313500
Nov 19, 2020 1738.38 1769.59 1737.01 1763.92 1763.92 1249900
Nov 18, 2020 1765.23 1773.47 1746.14 1746.78 1746.78 1173500
Nov 17, 2020 1776.94 1785 1767 1770.15 1770.15 1147100
Nov 16, 2020 1771.7 1799.07 1767.69 1781.38 1781.38 1246800
Nov 13, 2020 1757.63 1781.04 1744.55 1777.02 1777.02 1499900
Nov 12, 2020 1747.63 1768.27 1745.6 1749.84 1749.84 1247500
Nov 11, 2020 1750 1764.22 1747.36 1752.71 1752.71 1264000
Nov 10, 2020 1731.09 1763 1717.3 1740.39 1740.39 2636100
Nov 09, 2020 1790.9 1818.06 1760.02 1763 1763 2268300
Nov 06, 2020 1753.95 1772.43 1740.35 1761.75 1761.75 1660900
Nov 05, 2020 1781 1793.64 1750.51 1763.37 1763.37 2065800
Nov 04, 2020 1710.28 1771.36 1706.03 1749.13 1749.13 3570900
Nov 03, 2020 1631.78 1661.7 1616.62 1650.21 1650.21 1661700
Nov 02, 2020 1628.16 1660.77 1616.03 1626.03 1626.03 2535400
Oct 30, 2020 1672.11 1687 1604.46 1621.01 1621.01 4329100
Oct 29, 2020 1522.36 1593.71 1522.24 1567.24 1567.24 2003100
Oct 28, 2020 1559.74 1561.35 1514.62 1516.62 1516.62 1834000
Oct 27, 2020 1595.67 1606.84 1582.78 1604.26 1604.26 1229000
Oct 26, 2020 1625.01 1638.24 1576.5 1590.45 1590.45 1853300
Oct 23, 2020 1626.07 1642.36 1620.51 1641 1641 1375800
Oct 22, 2020 1593.05 1621.99 1585 1615.33 1615.33 1433600
Oct 21, 2020 1573.33 1618.73 1571.63 1593.31 1593.31 2568300
Oct 20, 2020 1527.05 1577.5 1525.67 1555.93 1555.93 2241700
Oct 19, 2020 1580.46 1588.15 1528 1534.61 1534.61 1607100
Oct 16, 2020 1565.85 1581.13 1563 1573.01 1573.01 1434700
Oct 15, 2020 1547.15 1575.1 1545.03 1559.13 1559.13 1540000
Oct 14, 2020 1578.59 1587.68 1550.53 1568.08 1568.08 1929300
Oct 13, 2020 1583.73 1590 1563.2 1571.68 1571.68 1601000
Oct 12, 2020 1543 1593.86 1532.57 1569.15 1569.15 2482600
Oct 09, 2020 1494.7 1516.52 1489.45 1515.22 1515.22 1435300
Oct 08, 2020 1465.09 1490 1465.09 1485.93 1485.93 1187800
Oct 07, 2020 1464.29 1468.96 1436 1460.29 1460.29 1746200
Oct 06, 2020 1475.58 1486.76 1448.59 1453.44 1453.44 1245400
Oct 05, 2020 1466.21 1488.21 1464.27 1486.02 1486.02 1113300
Oct 02, 2020 1462.03 1483.2 1450.92 1458.42 1458.42 1284100
Oct 01, 2020 1484.27 1499.04 1479.21 1490.09 1490.09 1779500
Sep 30, 2020 1466.8 1489.75 1459.88 1469.6 1469.6 1701600
Sep 29, 2020 1470.39 1476.66 1458.81 1469.33 1469.33 978200
Sep 28, 2020 1474.21 1476.8 1449.3 1464.52 1464.52 2007900
Sep 25, 2020 1432.63 1450 1413.34 1444.96 1444.96 1323000
Sep 24, 2020 1411.03 1443.71 1409.85 1428.29 1428.29 1450200
Sep 23, 2020 1458.78 1460.96 1407.7 1415.21 1415.21 1657400
Sep 22, 2020 1450.09 1469.52 1434.53 1465.46 1465.46 1583200
Sep 21, 2020 1440.06 1448.36 1406.55 1431.16 1431.16 2888800
Sep 18, 2020 1498.01 1503 1437.13 1459.99 1459.99 3103900
Sep 17, 2020 1496 1508.3 1470 1495.53 1495.53 1879800
Sep 16, 2020 1555.54 1562 1519.82 1520.9 1520.9 1311700
Sep 15, 2020 1536 1559.57 1531.83 1541.44 1541.44 1331100
Sep 14, 2020 1539.01 1564 1515.74 1519.28 1519.28 1696600
Sep 11, 2020 1536 1575.2 1497.36 1520.72 1520.72 1597100
Sep 10, 2020 1560.64 1584.08 1525.81 1532.02 1532.02 1618600
Sep 09, 2020 1557.53 1569 1536.05 1556.96 1556.96 1774700
Sep 08, 2020 1533.51 1563.86 1528.01 1532.39 1532.39 2610900
Sep 04, 2020 1624.26 1645.11 1547.61 1591.04 1591.04 2608600
Sep 03, 2020 1709.71 1709.71 1615.06 1641.84 1641.84 3107800
Sep 02, 2020 1673.78 1733.18 1666.33 1728.28 1728.28 2511200
Sep 01, 2020 1636.63 1665.73 1632.22 1660.71 1660.71 1825300
Aug 31, 2020 1647.89 1647.96 1630.31 1634.18 1634.18 1823400
Aug 28, 2020 1633.49 1647.17 1630.75 1644.41 1644.41 1499000
Aug 27, 2020 1653.68 1655 1625.75 1634.33 1634.33 1861600
Aug 26, 2020 1608 1659.22 1603.6 1652.38 1652.38 3993400
Aug 25, 2020 1582.07 1611.62 1582.07 1608.22 1608.22 2247100
Aug 24, 2020 1593.98 1614.17 1580.57 1588.2 1588.2 1409900
Aug 21, 2020 1577.03 1597.72 1568.01 1580.42 1580.42 1446500
Aug 20, 2020 1543.45 1585.87 1538.2 1581.75 1581.75 1706900
Aug 19, 2020 1553.31 1573.68 1543.95 1547.53 1547.53 1660600
Aug 18, 2020 1526.18 1562.47 1523.71 1558.6 1558.6 2027100
Aug 17, 2020 1514.67 1525.61 1507.97 1517.98 1517.98 1378300
Aug 14, 2020 1515.66 1521.9 1502.88 1507.73 1507.73 1354800
Aug 13, 2020 1510.34 1537.25 1508.01 1518.45 1518.45 1455200
Aug 12, 2020 1485.58 1512.39 1485.25 1506.62 1506.62 1437000
Aug 11, 2020 1492.44 1510 1478 1480.32 1480.32 1454400
Yahoo finance has decommissioned their API. Try this python library.
I am new to python and was trying to sort the dates in a list. Below is the code that I wrote and getting
following error on the below line
#### date_object = datetime_object.date() ## list' object has no attribute 'date'
from datetime import datetime,date
lst_dates = ['01 Apr 2017', '01 Apr 2018', '01 Aug 2017', '01 Aug 2018', '01 Dec 2017', '01 Dec 2018', '01 Feb 2017', '01 Feb 2018', '01 Jan 2017', '01 Jan 2018']
datetime_object = sorted(lst_dates, key=lambda x: datetime.strptime(x, '%d %b %Y'))
date_object = datetime_object.date()
print(date_object)
Please assist in helping me understand what the issues is. Thanks
Python don't have the list.date() function, with below code list of dates can be sorted.
from datetime import datetime
lst_dates = ['01 Apr 2017', '01 Apr 2018', '01 Aug 2017', '01 Aug 2018', '01 Dec 2017', '01 Dec 2018', '01 Feb 2017', '01 Feb 2018', '01 Jan 2017', '01 Jan 2018']
lst_dates.sort(key=lambda date: datetime.strptime(date, "%d %b %Y"))
print(lst_dates)
The problem with you code is on line #3 when you are writing
datetime_object = sorted(lst_dates, key=lambda x: datetime.strptime(x, '%d %b %Y'))
sorted function in Python returns a new python list object. If you want to check then run
type(datetime_object)
So in order to achieve what you want here you need to iterate over that list. Your final code would be something like this
from datetime import datetime,date
lst_dates = ['01 Apr 2017', '01 Apr 2018', '01 Aug 2017', '01 Aug 2018', '01 Dec 2017', '01 Dec 2018', '01 Feb 2017', '01 Feb 2018', '01 Jan 2017', '01 Jan 2018']
datetime_obj_list = sorted(lst_dates, key=lambda x: datetime.strptime(x, '%d %b %Y'))
for datetime_object in datetime_obj_list:
datetime_object = datetime.strptime(datetime_object, "%d %b %Y")
print(datetime_object.date())
UPDATE:
Here's a working sample of the code https://ideone.com/YRDQR7
the problem is on 4th line
it should be date_object = datetime.date()
This works just fine:
from datetime import datetime,date
lst_dates = ['01 Apr 2017', '01 Apr 2018', '01 Aug 2017', '01 Aug 2018', '01 Dec 2017', '01 Dec 2018', '01 Feb 2017', '01 Feb 2018', '01 Jan 2017', '01 Jan 2018']
datetime_object = sorted(lst_dates, key=lambda x: datetime.strptime(x, '%d %b %Y'))
#date_object = datetime_object.date() # <<-- remove this line
print(datetime_object)
testing:
>>> from datetime import datetime,date
>>> lst_dates = ['01 Apr 2017', '01 Apr 2018', '01 Aug 2017', '01 Aug 2018', '01 Dec 2017', '01 Dec 2018', '01 Feb 2017', '01 Feb 2018', '01 Jan 2017', '01 Jan 2018']
>>> datetime_object = sorted(lst_dates, key=lambda x: datetime.strptime(x, '%d %b %Y'))
>>> print(datetime_object)
['01 Jan 2017', '01 Feb 2017', '01 Apr 2017', '01 Aug 2017', '01 Dec 2017', '01 Jan 2018', '01 Feb 2018', '01 Apr 2018', '01 Aug 2018', '01 Dec 2018']
>>>
I'm not sure if this is possible but I have a very large array containing dates
a = ['Fri, 19 Aug 2011 19:28:17 -0000',....., 'Wed, 05 Feb 2012 11:00:00 -0000']
I'm trying to find if there is a way to count the frequency of the days and months in the array. In this case I'm trying to count strings for abbreviations of months or days (such as Fri,Mon, Apr, Jul)
You can use Counter() from the collections module.
from collections import Counter
a = ['Fri, 19 Aug 2011 19:28:17 -0000',
'Fri, 09 June 2017 11:11:11 -0000',
'Wed, 05 Feb 2012 11:00:00 -0000']
# this generator splits the dates into words, and cleans word from "".,;-:" characters:
# ['Fri', '19', 'Aug', '2011', '19:28:17', '0000', 'Fri', '09', 'June',
# '2017', '11:11:11', '0000', 'Wed', '05', 'Feb', '2012', '11:00:00', '0000']
# and feeds it to counting:
c = Counter( (x.strip().strip(".,;-:") for word in a for x in word.split() ))
for key in c:
if key.isalpha():
print(key, c[key])
The if prints only those keys from the counter that are pure "letters" - not digits:
Fri 2
Aug 1
June 1
Wed 1
Feb 1
Day-names and Month-names are the only pure isalpha() parts of your dates.
Full c output:
Counter({'0000': 3, 'Fri': 2, '19': 1, 'Aug': 1, '2011': 1,
'19:28:17': 1, '09': 1, 'June': 1, '2017': 1, '11:11:11': 1,
'Wed': 1, '05': 1, 'Feb': 1, '2012': 1, '11:00:00': 1})
Improvement by #AzatIbrakov comment:
c = Counter( (x.strip().strip(".,;-:") for word in a for x in word.split()
if x.strip().strip(".,;-:").isalpha()))
would weed out non-alpha words in the generation step already.
Python has a built in .count method which is very useful here:
lista = [
'Fri, 19 Aug 2011 19:28:17 -0000',
'Fri, 19 Aug 2011 19:28:17 -0000',
'Sun, 19 Jan 2011 19:28:17 -0000',
'Sun, 19 Aug 2011 19:28:17 -0000',
'Fri, 19 Jan 2011 19:28:17 -0000',
'Mon, 05 Feb 2012 11:00:00 -0000',
'Mon, 05 Nov 2012 11:00:00 -0000',
'Wed, 05 Feb 2012 11:00:00 -0000',
'Tue, 05 Nov 2012 11:00:00 -0000',
'Tue, 05 Dec 2012 11:00:00 -0000',
'Wed, 05 Jan 2012 11:00:00 -0000',
]
listb = (''.join(lista)).split()
for index, item in enumerate(listb):
count = {}
for item in listb:
count[item] = listb.count(item)
months = ['Jan', 'Feb', 'Aug', 'Nov', 'Dec']
for k in count:
if k in months:
print(f"{k}: {count[k]}")
Output:
(xenial)vash#localhost:~/python/stack_overflow$ python3.7 count_months.py
Aug: 3
Jan: 3
Feb: 2
Nov: 2
Dec: 1
What happens is we take all the items of the lista and join them into one string. Then we split that string to get all the individual words.
Now we can use the count method and create a dictionary to hold the counts. We can create a list of items we want to retrieve from the dicionary and only retrieve those keys
I have a Python list of dates and I'm using min and max to find the most recent and the oldest (first, is that the best method?), but also I need to format the dates into something where I can figure out the current time and subtract the oldest date in the list so I can say something like "In the last 27 minutes..." where I can state how many days, hours, or minutes have past since the oldest. Here is my list (the dates change obviously depending on what I'm pulling) so you can see the current format. How do I get the info I need?
[u'Sun Oct 06 18:00:55 +0000 2013', u'Sun Oct 06 17:57:41 +0000 2013', u'Sun Oct 06 17:55:44 +0000 2013', u'Sun Oct 06 17:54:10 +0000 2013', u'Sun Oct 06 17:35:58 +0000 2013', u'Sun Oct 06 17:35:58 +0000 2013', u'Sun Oct 06 17:35:25 +0000 2013', u'Sun Oct 06 17:34:39 +0000 2013', u'Sun Oct 06 17:34:39 +0000 2013', u'Sun Oct 06 17:34:39 +0000 2013', u'Sun Oct 06 17:30:35 +0000 2013', u'Sun Oct 06 17:25:28 +0000 2013', u'Sun Oct 06 17:24:04 +0000 2013', u'Sun Oct 06 17:24:04 +0000 2013', u'Sun Oct 06 17:22:08 +0000 2013', u'Sun Oct 06 17:22:08 +0000 2013', u'Sun Oct 06 17:21:00 +0000 2013', u'Sun Oct 06 17:18:49 +0000 2013', u'Sun Oct 06 17:18:49 +0000 2013', u'Sun Oct 06 17:15:29 +0000 2013', u'Sun Oct 06 17:15:29 +0000 2013', u'Sun Oct 06 17:13:35 +0000 2013', u'Sun Oct 06 17:12:18 +0000 2013', u'Sun Oct 06 17:12:00 +0000 2013', u'Sun Oct 06 17:07:34 +0000 2013', u'Sun Oct 06 17:03:59 +0000 2013']
You won't get the oldest and newest date/time entries from your list with the entries by using min and max - "Fri" will come before "Mon", for example. So you'll want to put things into a data structure that represents date/time stamps properly.
Fortunately, Python's datetime module comes with a method to convert lots of date/time stamp strings into a proper representation - datetime.datetime.strptime. See the guide for how to use it.
Once that's done you can use min and max and then timedelta to compute the difference.
from datetime import datetime
# Start with the initial list
A = [u'Sun Oct 06 18:00:55 +0000 2013', u'Sun Oct 06 17:57:41 +0000 2013', u'Sun Oct 06 17:55:44 +0000 2013', u'Sun Oct 06 17:54:10 +0000 2013', u'Sun Oct 06 17:35:58 +0000 2013', u'Sun Oct 06 17:35:58 +0000 2013', u'Sun Oct 06 17:35:25 +0000 2013', u'Sun Oct 06 17:34:39 +0000 2013', u'Sun Oct 06 17:34:39 +0000 2013', u'Sun Oct 06 17:34:39 +0000 2013', u'Sun Oct 06 17:30:35 +0000 2013', u'Sun Oct 06 17:25:28 +0000 2013', u'Sun Oct 06 17:24:04 +0000 2013', u'Sun Oct 06 17:24:04 +0000 2013', u'Sun Oct 06 17:22:08 +0000 2013', u'Sun Oct 06 17:22:08 +0000 2013', u'Sun Oct 06 17:21:00 +0000 2013', u'Sun Oct 06 17:18:49 +0000 2013', u'Sun Oct 06 17:18:49 +0000 2013', u'Sun Oct 06 17:15:29 +0000 2013', u'Sun Oct 06 17:15:29 +0000 2013', u'Sun Oct 06 17:13:35 +0000 2013', u'Sun Oct 06 17:12:18 +0000 2013', u'Sun Oct 06 17:12:00 +0000 2013', u'Sun Oct 06 17:07:34 +0000 2013', u'Sun Oct 06 17:03:59 +0000 2013']
# This is the format string the date/time stamps are using
# On Python 3.3 on Windows you can use this format
# s_format = "%a %b %d %H:%M:%S %z %Y"
# However, Python 2.7 on Windows doesn't work with that. If all of your date/time stamps use the same timezone you can do:
s_format = "%a %b %d %H:%M:%S +0000 %Y"
# Convert the text list into datetime objects
A = [datetime.strptime(d, s_format) for d in A]
# Get the extremes
oldest = min(A)
newest = max(A)
# If you substract oldest from newest you get a timedelta object, which can give you the total number of seconds between them. You can use this to calculate days, hours, and minutes.
delta = int((newest - oldest).total_seconds())
delta_days, rem = divmod(delta, 86400)
delta_hours, rem = divmod(rem, 3600)
delta_minutes, delta_seconds = divmod(rem, 60)
your question can be divided into three pieces:
A)
how to read string formated dates
B)
how to sort list of dates in python
C)
how to calculate the difference between two dates