Python DataReader - Update with new information

Python DataReader - Update with new information - python

import pandas as pd
from pandas_datareader import data as wb
tickers = ["MMM", "ABT", "ABBV", "ABMD", "ACN", "ATVI", "ADBE", "AMD", "AAP", "AES", "AFL", "A", "APD", "AKAM", "ALK", "ALB", "ARE", "ALXN", "ALGN", "ALLE", "LNT", "ALL", "GOOGL", "GOOG", "MO", "AMZN", "AMCR", "AEE", "AAL", "AEP", "AXP", "AIG", "AMT", "AWK", "AMP", "ABC", "AME", "AMGN", "APH", "ADI", "ANSS", "ANTM", "AON", "AOS", "APA", "AAPL", "AMAT", "APTV", "ADM", "ANET", "AJG", "AIZ", "T", "ATO", "ADSK", "ADP", "AZO", "AVB", "AVY", "BKR", "BLL", "BAC", "BK", "BAX", "BDX", "BBY", "BIO", "BIIB", "BLK", "BA", "BKNG", "BWA", "BXP", "BSX", "BMY", "AVGO", "BR", "CHRW", "COG", "CDNS", "CZR", "CPB", "COF", "CAH", "KMX", "CCL", "CARR", "CTLT", "CAT", "CBOE", "CBRE", "CDW", "CE", "CNC", "CNP", "CERN", "CF", "SCHW", "CHTR", "CVX", "CMG", "CB", "CHD", "CI", "CINF", "CTAS", "CSCO", "C", "CFG", "CTXS", "CLX", "CME", "CMS", "KO", "CTSH", "CL", "CMCSA", "CMA", "CAG", "COP", "ED", "STZ", "COO", "CPRT", "GLW", "CTVA", "COST", "CCI", "CSX", "CMI", "CVS", "DHI", "DHR", "DRI", "DVA", "DE", "DAL", "XRAY", "DVN", "DXCM", "FANG", "DLR", "DFS", "DISCA", "DISCK", "DISH", "DG", "DLTR", "D", "DPZ", "DOV", "DOW", "DTE", "DUK", "DRE", "DD", "DXC", "EMN", "ETN", "EBAY", "ECL", "EIX", "EW", "EA", "EMR", "ENPH", "ETR", "EOG", "EFX", "EQIX", "EQR", "ESS", "EL", "ETSY", "EVRG", "ES", "RE", "EXC", "EXPE", "EXPD", "EXR", "XOM", "FFIV", "FB", "FAST", "FRT", "FDX", "FIS", "FITB", "FE", "FRC", "FISV", "FLT", "FLIR", "FMC", "F", "FTNT", "FTV", "FBHS", "FOXA", "FOX", "BEN", "FCX", "GPS", "GRMN", "IT", "GNRC", "GD", "GE", "GIS", "GM", "GPC", "GILD", "GL", "GPN", "GS", "GWW", "HAL", "HBI", "HIG", "HAS", "HCA", "PEAK", "HSIC", "HSY", "HES", "HPE", "HLT", "HFC", "HOLX", "HD", "HON", "HRL", "HST", "HWM", "HPQ", "HUM", "HBAN", "HII", "IEX", "IDXX", "INFO", "ITW", "ILMN", "INCY", "IR", "INTC", "ICE", "IBM", "IP", "IPG", "IFF", "INTU", "ISRG", "IVZ", "IPGP", "IQV", "IRM", "JKHY", "J", "JBHT", "SJM", "JNJ", "JCI", "JPM", "JNPR", "KSU", "K", "KEY", "KEYS", "KMB", "KIM", "KMI", "KLAC", "KHC", "KR", "LB", "LHX", "LH", "LRCX", "LW", "LVS", "LEG", "LDOS", "LEN", "LLY", "LNC", "LIN", "LYV", "LKQ", "LMT", "L", "LOW", "LUMN", "LYB", "MTB", "MRO", "MPC", "MKTX", "MAR", "MMC", "MLM", "MAS", "MA", "MKC", "MXIM", "MCD", "MCK", "MDT", "MRK", "MET", "MTD", "MGM", "MCHP", "MU", "MSFT", "MAA", "MHK", "TAP", "MDLZ", "MPWR", "MNST", "MCO", "MS", "MOS", "MSI", "MSCI", "NDAQ", "NTAP", "NFLX", "NWL", "NEM", "NWSA", "NWS", "NEE", "NLSN", "NKE", "NI", "NSC", "NTRS", "NOC", "NLOK", "NCLH", "NOV", "NRG", "NUE", "NVDA", "NVR", "NXPI", "ORLY", "OXY", "ODFL", "OMC", "OKE", "ORCL", "OTIS", "PCAR", "PKG", "PH", "PAYX", "PAYC", "PYPL", "PENN", "PNR", "PBCT", "PEP", "PKI", "PRGO", "PFE", "PM", "PSX", "PNW", "PXD", "PNC", "POOL", "PPG", "PPL", "PFG", "PG", "PGR", "PLD", "PRU", "PEG", "PSA", "PHM", "PVH", "QRVO", "PWR", "QCOM", "DGX", "RL", "RJF", "RTX", "O", "REG", "REGN", "RF", "RSG", "RMD", "RHI", "ROK", "ROL", "ROP", "ROST", "RCL", "SPGI", "CRM", "SBAC", "SLB", "STX", "SEE", "SRE", "NOW", "SHW", "SPG", "SWKS", "SNA", "SO", "LUV", "SWK", "SBUX", "STT", "STE", "SYK", "SIVB", "SYF", "SNPS", "SYY", "TMUS", "TROW", "TTWO", "TPR", "TGT", "TEL", "TDY", "TFX", "TER", "TSLA", "TXN", "TXT", "TMO", "TJX", "TSCO", "TT", "TDG", "TRV", "TRMB", "TFC", "TWTR", "TYL", "TSN", "UDR", "ULTA", "USB", "UAA", "UA", "UNP", "UAL", "UNH", "UPS", "URI", "UHS", "UNM", "VLO", "VAR", "VTR", "VRSN", "VRSK", "VZ", "VRTX", "VFC", "VIAC", "VTRS", "V", "VNO", "VMC", "WRB", "WAB", "WMT", "WBA", "DIS", "WM", "WAT", "WEC", "WFC", "WELL", "WST", "WDC", "WU", "WRK", "WY", "WHR", "WMB", "WLTW", "WYNN", "XEL", "XLNX", "XYL", "YUM", "ZBRA", "ZBH", "ZION", "ZTS"]
financial_data = pd.DataFrame()
for t in tickers:
financial_data[t] = wb.DataReader(t, data_source='yahoo', start = '1995-1-1')["Adj Close"]
financial_data.to_excel("Financial Data.xlsx")
I am using Datareader to gather some stock info. I am grabbing a lot of info (from 1995 to 2021) and then I export it to Excel. I was wondering if there is a way, let's say tomorrow, to speed up the update of the information, instead of running the whole script on Python from start to bottom, since my goal tomorrow would just be to have a single new line on the whole Excel file. If I just execute the script, it will override the Excel file + add a new line of info. This seems pretty ineffective, and I was wondering if there's a way to "tell the script" I am just looking for tomorrow's info, instead of "telling it" to grab me again the information starting from 1995.
Thanks.

I don't know exactly how pandas works, but I would say it does lazy fast loading and this is not very computationally expensive. The costly thing is to operate with each loaded data. Then I think that in your case if the data is ordered by dates in increasing order, it would be enough to have a variable called timestamp_toStart initialized the first time to '1995-1-1' and that after this, after the first execution it is updated to the last value of the last date read. You could save this value in a file and reread it and load it every time you rerun the script.
financial_data = pd.DataFrame()
#load timestamp_toStart from the file here
for t in tickers:
financial_data[t] = wb.DataReader(t, data_source='yahoo', start = timestamp_toStart)["Adj Close"]
timestamp = wb.DataReader(t, data_source='yahoo', start = timestamp_toStart)["MMM"] #Not Sure about the correct syntax
timestamp_toStart = timestamp
#Save in a file timestamp_toStart
financial_data.to_excel("Financial Data.xlsx")

Related

How to add entire columns to selected cells in template in Python?

Problem
With help from stack overflow, I can now append data to an existing Excel sheet on Amazon Web Service's S3. My current problem is that I want to add columns from a dataframe to multiple cells in the Excel sheet. At the moment, I can only add them individually. The main part in the reprex below where I need help is where it says "THIS IS WHERE I NEED HELP":-)
Reprex
#load packages
from io import BytesIO
from tempfile import NamedTemporaryFile
import boto3
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
# Load Template from S3
bucket_name="main_folder"
object_key="sub_folder/template.xlsx"
bucket_object = boto3.resource('s3').Bucket(bucket_name).Object(object_key)
content = bucket_object.get()['Body'].read()
# Input Data
data_input = {
'Area': ['North', 'North', 'North', 'South', 'South', 'South', 'West', 'West', 'West', 'East', "East", "East"],
"Sub-Area": ["North2", "North1", "North2", "South2", "South1", "South2", "West3", "West9", "West9", "East1",
"East4", "East1"],
"Workers": [1, 20, 30, 2, 33, 5, 3, 6, 44, 1, 11, 111],
"Job1": ["T", "T", "T", "X", "T", "T", "T", "X", "T", "X", "T", "T"],
"Job2": ["F", "X", "T", "X", "T", "F", "T", "X", "F", "X", "T", "T"],
"Job3": ["T", "F", "T", "X", "X", "F", "F", "T", "X", "X", "T", "T"]}
# Create DataFrame
df = pd.DataFrame(data_input)
# Load Workbook
wb = load_workbook(filename=(BytesIO(content)))
ws = wb['Sheet1']
# THIS IS WHERE I NEED HELP
#Change contents
ws["A2"] = df1["Area"][0]
ws["A3"] = df1["Area"][1]
ws["A4"] = df1["Area"][2]
ws["A5"] = df1["Area"][3]
...
ws["A14"] = df1["Area"][11]
ws["D2"] = df1["Sub-Area"][0]
ws["D3"] = df1["Sub-Area"][1]
ws["D4"] = df1["Sub-Area"][2]
...
ws["D14"] = df1["Sub-Area"][11]
etc.
# Save Workbook back to S3
s3 = boto3.client('s3')
with NamedTemporaryFile() as tmp:
filename = '/tmp/{}'.format("template.xlsx")
wb.save(filename)
s3.upload_file(Bucket =bucket_name, Filename = "template.xlsx", Key=object_key)
Help
Is there an easier way of doing this?

How do I load a dataframe into an Excel template on Amazon Web Service's S3?

Issue
I have a dataframe. The template I want to use only has column headings. The column headings in the dataframe are identical to the template column headings. How do I paste the contents of the dataframe into the template excel sheet?
Reprex
Example dataframe
import pandas as pd
data_input = {'Area':['North', 'North', 'North', 'South', 'South', 'South', 'West', 'West', 'West', 'East', "East","East"],
"Sub-Area": ["North2", "North1", "North2", "South2", "South1", "South2","West3", "West9", "West9", "East1", "East4", "East1"],
"Workers": [1,20,30, 2,33,5, 3,6,44, 1, 11, 111],
"Job1":["T", "T", "T", "X","T", "T", "T", "X", "T", "X","T", "T"],
"Job2":["F", "X", "T", "X","T", "F", "T", "X", "F", "X","T", "T"],
"Job3":["T", "F", "T", "X","X", "F", "F", "T","X", "X","T", "T"]}
# Create DataFrame
df1 = pd.DataFrame(data_input)
Attempt #1
# Save dataframe to the template file on S3
with io.BytesIO() as output:
with pd.ExcelWriter(output, engine='openpyxl') as writer:
df1.to_excel(writer, sheet_name='Sheet1',startcol = 0, startrow=2)
data = output.getvalue()
s3 = boto3.resource('s3')
s3.Bucket('main_folder').put_object(Key='sub_folder/template.xlsx', Body=data)
Problem: The above solution just writes my dataset over the template file.
Attempt #2: Appending the dataframe via mode = "a"
# Save dataframe to the template file on S3
# Save file to S3
with io.BytesIO() as output:
# here I add mode = "a"
with pd.ExcelWriter(output, engine='openpyxl', mode = "a") as writer:
df1.to_excel(writer, sheet_name='Sheet1',startcol = 0, startrow=2)
data = output.getvalue()
s3 = boto3.resource('s3')
s3.Bucket('main_folder').put_object(Key='sub_folder/template.xlsx', Body=data)
Problem: Error Message
BadZipFile: File is not a zip file
Attempt 3
In response to a comment from jsn, I tried to first append the df to the template and then load that to S3 but it overwrote all the formatting of the template again.
#downloading template
template = pd.read_excel('s3://main_folder/sub_folder/template.xlsx', sheet_name="Sheet1")
#appending the dataframe
template = template.append(df1)
#now loading to S3
with io.BytesIO() as output:
with pd.ExcelWriter(output, engine='openpyxl') as writer:
template.to_excel(writer, sheet_name='Sheet1')
data = output.getvalue()
s3 = boto3.resource('s3')
s3.Bucket('main_folder').put_object(Key='sub_folder/template.xlsx', Body=data)
Any help would be appreciated

The pandas library may not be suited to store xlsx formatting state.
The alternative here could be openpyxl library which lets you load a workbook and does integrate with pandas to append your data.
You could attempt to do something like this:
from io import BytesIO
from tempfile import NamedTemporaryFile
import boto3
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
# Load Template from S3
bucket_object = boto3.resource('s3').Bucket(bucket_name).Object(object_key)
content = bucket_object.get()['Body'].read()
# Input Data
data_input = {
'Area': ['North', 'North', 'North', 'South', 'South', 'South', 'West', 'West', 'West', 'East', "East", "East"],
"Sub-Area": ["North2", "North1", "North2", "South2", "South1", "South2", "West3", "West9", "West9", "East1",
"East4", "East1"],
"Workers": [1, 20, 30, 2, 33, 5, 3, 6, 44, 1, 11, 111],
"Job1": ["T", "T", "T", "X", "T", "T", "T", "X", "T", "X", "T", "T"],
"Job2": ["F", "X", "T", "X", "T", "F", "T", "X", "F", "X", "T", "T"],
"Job3": ["T", "F", "T", "X", "X", "F", "F", "T", "X", "X", "T", "T"]}
# Create DataFrame
df = pd.DataFrame(data_input)
# Load Workbook
wb = load_workbook(filename=(BytesIO(content)))
ws = wb['Sheet1']
# Append contents of Input Data to Workbook
for r in dataframe_to_rows(df, index=False, header=False):
ws.append(r)
# Save Workbook back to S3
with NamedTemporaryFile() as tmp:
filename = '/tmp/{}'.format(object_key)
wb.save(filename)
s3_resource.Bucket(bucket_name).upload_file(Filename=filename, Key=dest_filename)
Reference material as follows:
https://openpyxl.readthedocs.io/en/stable/pandas.html
https://danh-was-here.netlify.app/save-excel-workbook-to-aws-s3-with-python/
Note:
This was tested locally (i.e. without AWS) and the output suggested that formatting applied to the heading columns in the template file remained even after the new data was added.

Trying to sum Python list [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
this is my first post here, I am learning and practicing Python.
The problem is that anything I try to run after a for loop not running, so at the end I can’t get a total count. Maybe I should have created a function but now I need to know why this is happening, what have I done wrong?
lst = ["32.225.012", "US", "574.280", "17.997.267", "India", "201.187", "14.521.289", "Brazil", "398.185", "5.626.942", "France", "104.077", "4.751.026", "Turkey", "39.398", "4.732.981", "Russia", "107.547", "4.427.433", "United Kingdom", "127.734", "3.994.894", "Italy", "120.256", "3.504.799", "Spain", "77.943", "3.351.014", "Germany", "82.395", "2.905.172", "Argentina", "62.599", "2.824.626", "Colombia", "72.725", "2.776.927", "Poland", "66.533", "2.459.906", "Iran", "70.966", "2.333.126", "Mexico", "215.547", "2.102.130", "Ukraine", "45.211", "1.775.062", "Peru", "60.416", "1.657.035", "Indonesia", "45.116", "1.626.033", "Czechia", "29.141", "1.578.450", "South Africa", "54.285", "1.506.455", "Netherlands", "17.339", "1.210.077", "Canada", "24.105", "1.184.271", "Chile", "26.073", "1.051.868", "Iraq", "15.392", "1.051.779", "Romania", "27.833", "1.020.495", "Philippines", "17.031", "979.034", "Belgium", "24.104", "960.520", "Sweden", "14.000", "838.323", "Israel", "6.361", "835.563", "Portugal", "16.973", "810.231", "Pakistan", "17.530", "774.399", "Hungary", "27.172", "754.614", "Bangladesh", "11.305", "708.265", "Jordan", "8.754", "685.937", "Serbia", "6.312", "656.077", "Switzerland", "10.617", "614.510", "Austria", "10.152", "580.666", "Japan", "10.052", "524.241", "Lebanon", "7.224", "516.301", "United Arab Emirates", "1.580", "510.465", "Morocco", "9.015", "415.281", "Saudi Arabia", "6.935", "402.491", "Bulgaria", "16.278", "401.593", "Malaysia", "1.477", "381.180", "Slovakia", "11.611", "377.662", "Ecuador", "18.470", "366.709", "Kazakhstan", "3.326", "363.533", "Panama", "6.216", "355.924", "Belarus", "2.522", "340.493", "Greece", "10.242", "327.737", "Croatia", "7.001", "316.521", "Azerbaijan", "4.461", "312.699", "Nepal", "3.211","307.401", "Georgia", "4.077", "305.313", "Tunisia", "10.563", "300.258", "Bolivia", "12.885", "294.550", "West Bank and Gaza", "3.206", "271.814", "Paraguay", "6.094", "271.145", "Kuwait", "1.546", "265.819", "Dominican Republic", "3.467", "255.288", "Ethiopia", "3.639", "250.479", "Denmark", "2.482", "250.138", "Moldova", "5.780", "247.857", "Ireland", "4.896", "244.555", "Lithuania", "3.900", "243.167", "Costa Rica", "3.186", "238.421", "Slovenia", "4.236", "224.621", "Guatemala", "7.478", "224.517", "Egypt", "13.168", "214.872", "Armenia", "4.071", "208.356", "Honduras", "5.212", "204.289", "Qatar", "445","197.378", "Bosnia and Herzegovina", "8.464", "193.721", "Venezuela", "2.082", "192.326", "Oman", "2.001","190.096", "Uruguay", "2.452", "176.701", "Libya", "3.019","174.659", "Bahrain", "632","164.912", "Nigeria", "2.063", "158.326", "Kenya", "2.688","151.569", "North Macedonia", "4.772", "142.790", "Burma", "3.209","130.859", "Albania", "2.386", "121.580", "Algeria", "3.234", "121.232", "Estonia", "1.148", "120.673", "Korea. South", "1.821", "117.099", "Latvia", "2.118", "111.915", "Norway", "753","104.953", "Sri Lanka", "661", "104.512", "Cuba", "614","103.638", "Kosovo", "2.134", "102.426", "China", "4.845","97.080", "Montenegro", "1.485", "94.599", "Kyrgyzstan", "1.592", "92.513", "Ghana", "779","91.484", "Zambia", "1.249","90.008", "Uzbekistan", "646", "86.405", "Finland", "908","69.804", "Mozambique", "814", "68.922", "El Salvador", "2.117", "66.826", "Luxembourg", "792", "65.998", "Cameroon", "991","63.720", "Cyprus", "303","61.699", "Thailand", "178","61.086", "Singapore", "30","59.370", "Afghanistan", "2.611", "48.177", "Namibia", "638","46.600", "Botswana", "702","45.885", "Cote d'Ivoire", "285", "45.292", "Jamaica", "770","41.766", "Uganda", "341","40.249", "Senegal", "1.107", "38.191", "Zimbabwe", "1.565", "36.510", "Madagascar", "631", "34.052", "Malawi", "1.147","33.944", "Sudan", "2.349","33.608", "Mongolia", "97","30.249", "Malta", "413","29.768", "Congo Kinshasa", "763", "29.749", "Australia", "910", "29.052", "Maldives", "72","25.942", "Angola", "587","24.888", "Rwanda", "332","23.181", "Cabo Verde", "213", "22.568", "Gabon", "138","22.513", "Syria", "1.572","22.087", "Guinea", "141","18.452", "Eswatini", "671","18.314", "Mauritania", "455", "13.915", "Somalia", "713","13.780", "Mali", "477","13.308", "Tajikistan", "90", "13.286", "Burkina Faso", "157", "13.148", "Andorra", "125","13.017", "Haiti", "254","12.963", "Guyana", "293","12.898", "Togo", "122","12.631", "Belize", "322","11.761", "Cambodia", "88","10.986", "Djibouti", "142","10.915", "Papua New Guinea", "107", "10.730", "Lesotho", "316","10.678", "Congo Brazzaville", "144", "10.553", "South Sudan", "114", "10.220", "Bahamas", "198","10.170", "Trinidad and Tobago", "163", "10.157", "Suriname", "201","7.821", "Benin", "99","7.559", "Equatorial Guinea", "107", "6.898", "Nicaragua", "182","6.456", "Iceland", "29","6.359", "Central African Republic", "87", "6.220", "Yemen", "1.207","5.882", "Gambia", "174","5.354", "Seychelles", "26","5.220", "Niger", "191","5.059", "San Marino", "90","4.789", "Chad", "170","4.508", "Saint Lucia", "74", "4.049", "Sierra Leone", "79", "3.941", "Burundi", "6","3.833", "Comoros", "146","3.831", "Barbados", "44","3.731", "Guinea-Bissau", "67", "3.659", "Eritrea", "10","2.908", "Liechtenstein", "57", "2.865", "Vietnam", "35","2.610", "New Zealand", "26", "2.447", "Monaco", "32","2.301", "Sao Tome and Principe", "35", "2.124", "Timor-Leste", "3","2.099", "Liberia", "85","1.850", "Saint Vincent and the Grenadines", "11", "1.232", "Antigua and Barbuda", "32", "1.207", "Mauritius", "17","1.116", "Taiwan", "12","1.059", "Bhutan", "1","712", "Diamond Princess", "13", "604", "Laos", "0","509", "Tanzania", "21","224", "Brunei", "3","173", "Dominica", "0","159", "Grenada", "1","111", "Fiji", "2","44", "Saint Kitts and Nevis", "0", "27", "Holy See", "0","20", "Solomon Islands", "0", "9", "MS Zaandam", "2","4", "Marshall Islands", "0", "4", "Vanuatu", "1","3", "Samoa", "0","1", "Micronesia", "0"]
countryIndex = 1
casesIndex = 0
deathsIndex = 2
countries = []
cases = []
deaths = []
for item in lst:
print(f"Country: {lst[countryIndex]}")
print(f"Cases: {lst[casesIndex]}")
print(f"Deaths: {lst[deathsIndex]}")
print("")
countryToAppend = lst[countryIndex]
casesToAppend = lst[casesIndex]
deathsToAppend = lst[deathsIndex]
countries.append(countryToAppend)
cases.append(casesToAppend)
deaths.append(deathsToAppend)
countryIndex += 3
casesIndex += 3
deathsIndex += 3
total = sum(deaths)
print(f"Total deaths: {total}")

On top of the suggestion to replace the name of the data set to not use the reserved word list, my recommendation would be to leverage the ability to skip in the builtin range in an example like so:
# Lists to store data
countries = []
total_cases = []
total_deaths = []
# Iterate over thje range of the data skipping 3 at a time: 0, 3, ...
for x in range(0, len(data), 3):
# Parse out the cases a deaths to ints
cases = int(data[x].replace('.', ''))
deaths = int(data[x+2].replace('.', ''))
# We can just extract the country label
country_label = data[x+1]
countries.append(country_label)
total_cases.append(cases)
total_deaths.append(deaths)
# Get the desired sums
sum_cases = sum(total_cases)
sum_deaths = sum(total_deaths)
print(f"The total cases: {sum_cases}")
print(f"The total deaths: {sum_deaths}")
Above I renamed your dataset to be data and was able to sum up each list.

sum = 0
for i in range(2,len(l),3): # l is your list of data ,
sum = sum+int(l[i].replace('.','')) # here I removed the point between numbers ex: 574.280 --> 574280
print(sum)
#output : 3145239

PYTHON: How to get seaborn attributes to reset to default?

I cannot for the life of me get seaborn to go back to default settings.
I will place the code that I believe caused this issue, but I would recommend against running it unless you know how to fix it.
The culprit I believe is in the last chunk
sns.set(font_scale = 4)
before this question gets deleted because it's already been asked, I have tried the other posted solutions with no success. Just to name a quick few, resetting using sns.set(), sns.set_style(), sns.restore_defaults(). I have also tried resetting matplot settings to defaults as well. This attribute persists across all my files, so I cant even open a new file, delete the line of code that caused it, or run any past programs, or it will apply to those graphs too.
My seaborn version is 0.10.1, I have tried to update it, but I cant get it to go through. I am using anaconda's spyder IDE
The documentation says for versions after 0.8 that the styles/themes must be invoked to be reset, but if I try to use their solution sns.set_theme() I get an error saying that this module has no attribute.
I'm sure this persistence is considered a feature, but I desperately need it to go away!
import requests
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
if __name__ == '__main__':
#data prep
data_path = './assets/'
out_path = './output'
#scraping javascript map data via xml
endpoint = "https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData"
data = requests.get(endpoint, params={"id": "US_MAP_DATA"}).json()
#convert to df and export raw data as csv
df = pd.DataFrame(data["US_MAP_DATA"])
path = os.path.join(out_path,'Raw_CDC_Data.csv')
df.to_csv(path)
#Remove last data point (Total USA)
df.drop(df.tail(1).index,inplace=True)
#Create DF of just 50 states
state_abbr =["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA",
"HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD",
"MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ",
"NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC",
"SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"]
states = df[df['abbr'].isin(state_abbr)]
#Adding NYC to state of NY
# FILL THIS IN LATER
#Graphing
plt.style.use('default')
sns.set()
#add new col survival rate and save
states['survival_rate']=states['tot_cases']-states['tot_death']
states.drop(df.columns[[0]],axis=1)
states.reset_index(drop=True)
path = os.path.join(out_path,'CDC_Data_By_State.csv')
states.to_csv(path)
#Stacked BarPlot
fig, ax = plt.subplots()
colors = ['#e5c5b5','#a8dda8']
r=range(0,len(states.index))
plt.bar(r,states['survival_rate'],color=colors[0])
#ax = stacked['survival_rate','tot_death'].plot.bar(stacked=True, color=colors, ax=ax)
fig, ax = plt.subplots()
plt.figure(figsize=(20,35))
sns.set(font_scale=4)
ax = sns.barplot(x='tot_cases',y='abbr',data=states)
ax.set(title='USA Covid-19 Cases by State', ylabel='State', xlabel='Confirmed Cases')
path = os.path.join(out_path,'Total_Deaths_Bar.png')
plt.savefig(path)

Able to generate csv file, but they are not split into multiple columns, all columns are in first column

I have the code below, which takes in a json file and then parses it to csv. My code works, but the issue is that instead of the csv having multiple columns (what I want), it lists all of the data under column 1 without any seperation. Not sure where I'm wrong in my code below:
#flattens and converts json to csv file
def json2csv():
parse = format_json()
now = datetime.datetime.now()
today_str = now.strftime('%Y%m%d%H%M%S')
outputFilePathCSV = 'datastream_log/'
#outputs to given csv file
with open('datastream_csv/'+'datastream:' + today_str + '.csv', 'w') as csv_file:
filename.append(csv_file.name)
writer = csv.writer(csv_file, delimiter='\t')
#appends headers to top row
keylist = []
for key in parse[1]:
y = keylist.append(key)
writer.writerow(keylist)
#appends rows from dictionary
for data, data_rec in parse.items():
try:
writer.writerow( list(data_rec.values()))
except(KeyError, TypeError, AttributeError) as e:
pass
json2csv()
Example of JSON:
{"target": {"icao_address": "8963BB", "timestamp": "2019-12-02T21:41:03Z", "altitude_baro": 3075, "heading": 235.0, "speed": 162.0, "latitude": 43.403778, "longitude": 77.139225, "callsign": "FDB1735", "vertical_rate": -960, "squawk_code": "0573", "collection_type": "terrestrial", "ingestion_time": "2019-12-02T21:41:07Z", "tail_number": "A6-FEY", "icao_actype": "B738", "flight_number": "FZ1735", "origin_airport_icao": "OMDB", "destination_airport_icao": "UAAA", "scheduled_departure_time_utc": "2019-12-02T17:30:00Z", "scheduled_departure_time_local": "2019-12-02T21:30:00", "scheduled_arrival_time_utc": "2019-12-02T21:50:00Z", "scheduled_arrival_time_local": "2019-12-03T03:50:00"}}
{"target": {"icao_address": "4CA4EF", "timestamp": "2019-12-02T21:41:06Z", "altitude_baro": 15125, "heading": 80.0, "speed": 290.0, "latitude": 53.56636, "longitude": -3.047137, "callsign": "RYR9ZN", "vertical_rate": -1080, "collection_type": "terrestrial", "ingestion_time": "2019-12-02T21:41:10Z", "tail_number": "EI-DPK", "icao_actype": "B738", "flight_number": "FR156", "origin_airport_icao": "EIDW", "destination_airport_icao": "EGNM", "scheduled_departure_time_utc": "2019-12-02T20:55:00Z", "scheduled_departure_time_local": "2019-12-02T20:55:00", "scheduled_arrival_time_utc": "2019-12-02T21:55:00Z", "scheduled_arrival_time_local": "2019-12-02T21:55:00", "estimated_arrival_time_utc": "2019-12-02T21:53:00Z", "estimated_arrival_time_local": "2019-12-02T21:53:00"}}
CSV image of what I get:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python DataReader - Update with new information - python

Related

How to add entire columns to selected cells in template in Python?

How do I load a dataframe into an Excel template on Amazon Web Service's S3?

Trying to sum Python list [closed]

PYTHON: How to get seaborn attributes to reset to default?

Able to generate csv file, but they are not split into multiple columns, all columns are in first column

Categories

Resources