Python DataReader - Update with new information - python
import pandas as pd
from pandas_datareader import data as wb
tickers = ["MMM", "ABT", "ABBV", "ABMD", "ACN", "ATVI", "ADBE", "AMD", "AAP", "AES", "AFL", "A", "APD", "AKAM", "ALK", "ALB", "ARE", "ALXN", "ALGN", "ALLE", "LNT", "ALL", "GOOGL", "GOOG", "MO", "AMZN", "AMCR", "AEE", "AAL", "AEP", "AXP", "AIG", "AMT", "AWK", "AMP", "ABC", "AME", "AMGN", "APH", "ADI", "ANSS", "ANTM", "AON", "AOS", "APA", "AAPL", "AMAT", "APTV", "ADM", "ANET", "AJG", "AIZ", "T", "ATO", "ADSK", "ADP", "AZO", "AVB", "AVY", "BKR", "BLL", "BAC", "BK", "BAX", "BDX", "BBY", "BIO", "BIIB", "BLK", "BA", "BKNG", "BWA", "BXP", "BSX", "BMY", "AVGO", "BR", "CHRW", "COG", "CDNS", "CZR", "CPB", "COF", "CAH", "KMX", "CCL", "CARR", "CTLT", "CAT", "CBOE", "CBRE", "CDW", "CE", "CNC", "CNP", "CERN", "CF", "SCHW", "CHTR", "CVX", "CMG", "CB", "CHD", "CI", "CINF", "CTAS", "CSCO", "C", "CFG", "CTXS", "CLX", "CME", "CMS", "KO", "CTSH", "CL", "CMCSA", "CMA", "CAG", "COP", "ED", "STZ", "COO", "CPRT", "GLW", "CTVA", "COST", "CCI", "CSX", "CMI", "CVS", "DHI", "DHR", "DRI", "DVA", "DE", "DAL", "XRAY", "DVN", "DXCM", "FANG", "DLR", "DFS", "DISCA", "DISCK", "DISH", "DG", "DLTR", "D", "DPZ", "DOV", "DOW", "DTE", "DUK", "DRE", "DD", "DXC", "EMN", "ETN", "EBAY", "ECL", "EIX", "EW", "EA", "EMR", "ENPH", "ETR", "EOG", "EFX", "EQIX", "EQR", "ESS", "EL", "ETSY", "EVRG", "ES", "RE", "EXC", "EXPE", "EXPD", "EXR", "XOM", "FFIV", "FB", "FAST", "FRT", "FDX", "FIS", "FITB", "FE", "FRC", "FISV", "FLT", "FLIR", "FMC", "F", "FTNT", "FTV", "FBHS", "FOXA", "FOX", "BEN", "FCX", "GPS", "GRMN", "IT", "GNRC", "GD", "GE", "GIS", "GM", "GPC", "GILD", "GL", "GPN", "GS", "GWW", "HAL", "HBI", "HIG", "HAS", "HCA", "PEAK", "HSIC", "HSY", "HES", "HPE", "HLT", "HFC", "HOLX", "HD", "HON", "HRL", "HST", "HWM", "HPQ", "HUM", "HBAN", "HII", "IEX", "IDXX", "INFO", "ITW", "ILMN", "INCY", "IR", "INTC", "ICE", "IBM", "IP", "IPG", "IFF", "INTU", "ISRG", "IVZ", "IPGP", "IQV", "IRM", "JKHY", "J", "JBHT", "SJM", "JNJ", "JCI", "JPM", "JNPR", "KSU", "K", "KEY", "KEYS", "KMB", "KIM", "KMI", "KLAC", "KHC", "KR", "LB", "LHX", "LH", "LRCX", "LW", "LVS", "LEG", "LDOS", "LEN", "LLY", "LNC", "LIN", "LYV", "LKQ", "LMT", "L", "LOW", "LUMN", "LYB", "MTB", "MRO", "MPC", "MKTX", "MAR", "MMC", "MLM", "MAS", "MA", "MKC", "MXIM", "MCD", "MCK", "MDT", "MRK", "MET", "MTD", "MGM", "MCHP", "MU", "MSFT", "MAA", "MHK", "TAP", "MDLZ", "MPWR", "MNST", "MCO", "MS", "MOS", "MSI", "MSCI", "NDAQ", "NTAP", "NFLX", "NWL", "NEM", "NWSA", "NWS", "NEE", "NLSN", "NKE", "NI", "NSC", "NTRS", "NOC", "NLOK", "NCLH", "NOV", "NRG", "NUE", "NVDA", "NVR", "NXPI", "ORLY", "OXY", "ODFL", "OMC", "OKE", "ORCL", "OTIS", "PCAR", "PKG", "PH", "PAYX", "PAYC", "PYPL", "PENN", "PNR", "PBCT", "PEP", "PKI", "PRGO", "PFE", "PM", "PSX", "PNW", "PXD", "PNC", "POOL", "PPG", "PPL", "PFG", "PG", "PGR", "PLD", "PRU", "PEG", "PSA", "PHM", "PVH", "QRVO", "PWR", "QCOM", "DGX", "RL", "RJF", "RTX", "O", "REG", "REGN", "RF", "RSG", "RMD", "RHI", "ROK", "ROL", "ROP", "ROST", "RCL", "SPGI", "CRM", "SBAC", "SLB", "STX", "SEE", "SRE", "NOW", "SHW", "SPG", "SWKS", "SNA", "SO", "LUV", "SWK", "SBUX", "STT", "STE", "SYK", "SIVB", "SYF", "SNPS", "SYY", "TMUS", "TROW", "TTWO", "TPR", "TGT", "TEL", "TDY", "TFX", "TER", "TSLA", "TXN", "TXT", "TMO", "TJX", "TSCO", "TT", "TDG", "TRV", "TRMB", "TFC", "TWTR", "TYL", "TSN", "UDR", "ULTA", "USB", "UAA", "UA", "UNP", "UAL", "UNH", "UPS", "URI", "UHS", "UNM", "VLO", "VAR", "VTR", "VRSN", "VRSK", "VZ", "VRTX", "VFC", "VIAC", "VTRS", "V", "VNO", "VMC", "WRB", "WAB", "WMT", "WBA", "DIS", "WM", "WAT", "WEC", "WFC", "WELL", "WST", "WDC", "WU", "WRK", "WY", "WHR", "WMB", "WLTW", "WYNN", "XEL", "XLNX", "XYL", "YUM", "ZBRA", "ZBH", "ZION", "ZTS"]
financial_data = pd.DataFrame()
for t in tickers:
financial_data[t] = wb.DataReader(t, data_source='yahoo', start = '1995-1-1')["Adj Close"]
financial_data.to_excel("Financial Data.xlsx")
I am using Datareader to gather some stock info. I am grabbing a lot of info (from 1995 to 2021) and then I export it to Excel. I was wondering if there is a way, let's say tomorrow, to speed up the update of the information, instead of running the whole script on Python from start to bottom, since my goal tomorrow would just be to have a single new line on the whole Excel file. If I just execute the script, it will override the Excel file + add a new line of info. This seems pretty ineffective, and I was wondering if there's a way to "tell the script" I am just looking for tomorrow's info, instead of "telling it" to grab me again the information starting from 1995.
Thanks.
I don't know exactly how pandas works, but I would say it does lazy fast loading and this is not very computationally expensive. The costly thing is to operate with each loaded data. Then I think that in your case if the data is ordered by dates in increasing order, it would be enough to have a variable called timestamp_toStart initialized the first time to '1995-1-1' and that after this, after the first execution it is updated to the last value of the last date read. You could save this value in a file and reread it and load it every time you rerun the script.
financial_data = pd.DataFrame()
#load timestamp_toStart from the file here
for t in tickers:
financial_data[t] = wb.DataReader(t, data_source='yahoo', start = timestamp_toStart)["Adj Close"]
timestamp = wb.DataReader(t, data_source='yahoo', start = timestamp_toStart)["MMM"] #Not Sure about the correct syntax
timestamp_toStart = timestamp
#Save in a file timestamp_toStart
financial_data.to_excel("Financial Data.xlsx")
Related
How to add entire columns to selected cells in template in Python?
Problem With help from stack overflow, I can now append data to an existing Excel sheet on Amazon Web Service's S3. My current problem is that I want to add columns from a dataframe to multiple cells in the Excel sheet. At the moment, I can only add them individually. The main part in the reprex below where I need help is where it says "THIS IS WHERE I NEED HELP":-) Reprex #load packages from io import BytesIO from tempfile import NamedTemporaryFile import boto3 import pandas as pd from openpyxl import load_workbook from openpyxl.utils.dataframe import dataframe_to_rows # Load Template from S3 bucket_name="main_folder" object_key="sub_folder/template.xlsx" bucket_object = boto3.resource('s3').Bucket(bucket_name).Object(object_key) content = bucket_object.get()['Body'].read() # Input Data data_input = { 'Area': ['North', 'North', 'North', 'South', 'South', 'South', 'West', 'West', 'West', 'East', "East", "East"], "Sub-Area": ["North2", "North1", "North2", "South2", "South1", "South2", "West3", "West9", "West9", "East1", "East4", "East1"], "Workers": [1, 20, 30, 2, 33, 5, 3, 6, 44, 1, 11, 111], "Job1": ["T", "T", "T", "X", "T", "T", "T", "X", "T", "X", "T", "T"], "Job2": ["F", "X", "T", "X", "T", "F", "T", "X", "F", "X", "T", "T"], "Job3": ["T", "F", "T", "X", "X", "F", "F", "T", "X", "X", "T", "T"]} # Create DataFrame df = pd.DataFrame(data_input) # Load Workbook wb = load_workbook(filename=(BytesIO(content))) ws = wb['Sheet1'] # THIS IS WHERE I NEED HELP #Change contents ws["A2"] = df1["Area"][0] ws["A3"] = df1["Area"][1] ws["A4"] = df1["Area"][2] ws["A5"] = df1["Area"][3] ... ws["A14"] = df1["Area"][11] ws["D2"] = df1["Sub-Area"][0] ws["D3"] = df1["Sub-Area"][1] ws["D4"] = df1["Sub-Area"][2] ... ws["D14"] = df1["Sub-Area"][11] etc. # Save Workbook back to S3 s3 = boto3.client('s3') with NamedTemporaryFile() as tmp: filename = '/tmp/{}'.format("template.xlsx") wb.save(filename) s3.upload_file(Bucket =bucket_name, Filename = "template.xlsx", Key=object_key) Help Is there an easier way of doing this?
How do I load a dataframe into an Excel template on Amazon Web Service's S3?
Issue I have a dataframe. The template I want to use only has column headings. The column headings in the dataframe are identical to the template column headings. How do I paste the contents of the dataframe into the template excel sheet? Reprex Example dataframe import pandas as pd data_input = {'Area':['North', 'North', 'North', 'South', 'South', 'South', 'West', 'West', 'West', 'East', "East","East"], "Sub-Area": ["North2", "North1", "North2", "South2", "South1", "South2","West3", "West9", "West9", "East1", "East4", "East1"], "Workers": [1,20,30, 2,33,5, 3,6,44, 1, 11, 111], "Job1":["T", "T", "T", "X","T", "T", "T", "X", "T", "X","T", "T"], "Job2":["F", "X", "T", "X","T", "F", "T", "X", "F", "X","T", "T"], "Job3":["T", "F", "T", "X","X", "F", "F", "T","X", "X","T", "T"]} # Create DataFrame df1 = pd.DataFrame(data_input) Attempt #1 # Save dataframe to the template file on S3 with io.BytesIO() as output: with pd.ExcelWriter(output, engine='openpyxl') as writer: df1.to_excel(writer, sheet_name='Sheet1',startcol = 0, startrow=2) data = output.getvalue() s3 = boto3.resource('s3') s3.Bucket('main_folder').put_object(Key='sub_folder/template.xlsx', Body=data) Problem: The above solution just writes my dataset over the template file. Attempt #2: Appending the dataframe via mode = "a" # Save dataframe to the template file on S3 # Save file to S3 with io.BytesIO() as output: # here I add mode = "a" with pd.ExcelWriter(output, engine='openpyxl', mode = "a") as writer: df1.to_excel(writer, sheet_name='Sheet1',startcol = 0, startrow=2) data = output.getvalue() s3 = boto3.resource('s3') s3.Bucket('main_folder').put_object(Key='sub_folder/template.xlsx', Body=data) Problem: Error Message BadZipFile: File is not a zip file Attempt 3 In response to a comment from jsn, I tried to first append the df to the template and then load that to S3 but it overwrote all the formatting of the template again. #downloading template template = pd.read_excel('s3://main_folder/sub_folder/template.xlsx', sheet_name="Sheet1") #appending the dataframe template = template.append(df1) #now loading to S3 with io.BytesIO() as output: with pd.ExcelWriter(output, engine='openpyxl') as writer: template.to_excel(writer, sheet_name='Sheet1') data = output.getvalue() s3 = boto3.resource('s3') s3.Bucket('main_folder').put_object(Key='sub_folder/template.xlsx', Body=data) Any help would be appreciated
The pandas library may not be suited to store xlsx formatting state. The alternative here could be openpyxl library which lets you load a workbook and does integrate with pandas to append your data. You could attempt to do something like this: from io import BytesIO from tempfile import NamedTemporaryFile import boto3 import pandas as pd from openpyxl import load_workbook from openpyxl.utils.dataframe import dataframe_to_rows # Load Template from S3 bucket_object = boto3.resource('s3').Bucket(bucket_name).Object(object_key) content = bucket_object.get()['Body'].read() # Input Data data_input = { 'Area': ['North', 'North', 'North', 'South', 'South', 'South', 'West', 'West', 'West', 'East', "East", "East"], "Sub-Area": ["North2", "North1", "North2", "South2", "South1", "South2", "West3", "West9", "West9", "East1", "East4", "East1"], "Workers": [1, 20, 30, 2, 33, 5, 3, 6, 44, 1, 11, 111], "Job1": ["T", "T", "T", "X", "T", "T", "T", "X", "T", "X", "T", "T"], "Job2": ["F", "X", "T", "X", "T", "F", "T", "X", "F", "X", "T", "T"], "Job3": ["T", "F", "T", "X", "X", "F", "F", "T", "X", "X", "T", "T"]} # Create DataFrame df = pd.DataFrame(data_input) # Load Workbook wb = load_workbook(filename=(BytesIO(content))) ws = wb['Sheet1'] # Append contents of Input Data to Workbook for r in dataframe_to_rows(df, index=False, header=False): ws.append(r) # Save Workbook back to S3 with NamedTemporaryFile() as tmp: filename = '/tmp/{}'.format(object_key) wb.save(filename) s3_resource.Bucket(bucket_name).upload_file(Filename=filename, Key=dest_filename) Reference material as follows: https://openpyxl.readthedocs.io/en/stable/pandas.html https://danh-was-here.netlify.app/save-excel-workbook-to-aws-s3-with-python/ Note: This was tested locally (i.e. without AWS) and the output suggested that formatting applied to the heading columns in the template file remained even after the new data was added.
Trying to sum Python list [closed]
Closed. This question needs debugging details. It is not currently accepting answers. Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question. Closed 1 year ago. Improve this question this is my first post here, I am learning and practicing Python. The problem is that anything I try to run after a for loop not running, so at the end I can’t get a total count. Maybe I should have created a function but now I need to know why this is happening, what have I done wrong? lst = ["32.225.012", "US", "574.280", "17.997.267", "India", "201.187", "14.521.289", "Brazil", "398.185", "5.626.942", "France", "104.077", "4.751.026", "Turkey", "39.398", "4.732.981", "Russia", "107.547", "4.427.433", "United Kingdom", "127.734", "3.994.894", "Italy", "120.256", "3.504.799", "Spain", "77.943", "3.351.014", "Germany", "82.395", "2.905.172", "Argentina", "62.599", "2.824.626", "Colombia", "72.725", "2.776.927", "Poland", "66.533", "2.459.906", "Iran", "70.966", "2.333.126", "Mexico", "215.547", "2.102.130", "Ukraine", "45.211", "1.775.062", "Peru", "60.416", "1.657.035", "Indonesia", "45.116", "1.626.033", "Czechia", "29.141", "1.578.450", "South Africa", "54.285", "1.506.455", "Netherlands", "17.339", "1.210.077", "Canada", "24.105", "1.184.271", "Chile", "26.073", "1.051.868", "Iraq", "15.392", "1.051.779", "Romania", "27.833", "1.020.495", "Philippines", "17.031", "979.034", "Belgium", "24.104", "960.520", "Sweden", "14.000", "838.323", "Israel", "6.361", "835.563", "Portugal", "16.973", "810.231", "Pakistan", "17.530", "774.399", "Hungary", "27.172", "754.614", "Bangladesh", "11.305", "708.265", "Jordan", "8.754", "685.937", "Serbia", "6.312", "656.077", "Switzerland", "10.617", "614.510", "Austria", "10.152", "580.666", "Japan", "10.052", "524.241", "Lebanon", "7.224", "516.301", "United Arab Emirates", "1.580", "510.465", "Morocco", "9.015", "415.281", "Saudi Arabia", "6.935", "402.491", "Bulgaria", "16.278", "401.593", "Malaysia", "1.477", "381.180", "Slovakia", "11.611", "377.662", "Ecuador", "18.470", "366.709", "Kazakhstan", "3.326", "363.533", "Panama", "6.216", "355.924", "Belarus", "2.522", "340.493", "Greece", "10.242", "327.737", "Croatia", "7.001", "316.521", "Azerbaijan", "4.461", "312.699", "Nepal", "3.211","307.401", "Georgia", "4.077", "305.313", "Tunisia", "10.563", "300.258", "Bolivia", "12.885", "294.550", "West Bank and Gaza", "3.206", "271.814", "Paraguay", "6.094", "271.145", "Kuwait", "1.546", "265.819", "Dominican Republic", "3.467", "255.288", "Ethiopia", "3.639", "250.479", "Denmark", "2.482", "250.138", "Moldova", "5.780", "247.857", "Ireland", "4.896", "244.555", "Lithuania", "3.900", "243.167", "Costa Rica", "3.186", "238.421", "Slovenia", "4.236", "224.621", "Guatemala", "7.478", "224.517", "Egypt", "13.168", "214.872", "Armenia", "4.071", "208.356", "Honduras", "5.212", "204.289", "Qatar", "445","197.378", "Bosnia and Herzegovina", "8.464", "193.721", "Venezuela", "2.082", "192.326", "Oman", "2.001","190.096", "Uruguay", "2.452", "176.701", "Libya", "3.019","174.659", "Bahrain", "632","164.912", "Nigeria", "2.063", "158.326", "Kenya", "2.688","151.569", "North Macedonia", "4.772", "142.790", "Burma", "3.209","130.859", "Albania", "2.386", "121.580", "Algeria", "3.234", "121.232", "Estonia", "1.148", "120.673", "Korea. South", "1.821", "117.099", "Latvia", "2.118", "111.915", "Norway", "753","104.953", "Sri Lanka", "661", "104.512", "Cuba", "614","103.638", "Kosovo", "2.134", "102.426", "China", "4.845","97.080", "Montenegro", "1.485", "94.599", "Kyrgyzstan", "1.592", "92.513", "Ghana", "779","91.484", "Zambia", "1.249","90.008", "Uzbekistan", "646", "86.405", "Finland", "908","69.804", "Mozambique", "814", "68.922", "El Salvador", "2.117", "66.826", "Luxembourg", "792", "65.998", "Cameroon", "991","63.720", "Cyprus", "303","61.699", "Thailand", "178","61.086", "Singapore", "30","59.370", "Afghanistan", "2.611", "48.177", "Namibia", "638","46.600", "Botswana", "702","45.885", "Cote d'Ivoire", "285", "45.292", "Jamaica", "770","41.766", "Uganda", "341","40.249", "Senegal", "1.107", "38.191", "Zimbabwe", "1.565", "36.510", "Madagascar", "631", "34.052", "Malawi", "1.147","33.944", "Sudan", "2.349","33.608", "Mongolia", "97","30.249", "Malta", "413","29.768", "Congo Kinshasa", "763", "29.749", "Australia", "910", "29.052", "Maldives", "72","25.942", "Angola", "587","24.888", "Rwanda", "332","23.181", "Cabo Verde", "213", "22.568", "Gabon", "138","22.513", "Syria", "1.572","22.087", "Guinea", "141","18.452", "Eswatini", "671","18.314", "Mauritania", "455", "13.915", "Somalia", "713","13.780", "Mali", "477","13.308", "Tajikistan", "90", "13.286", "Burkina Faso", "157", "13.148", "Andorra", "125","13.017", "Haiti", "254","12.963", "Guyana", "293","12.898", "Togo", "122","12.631", "Belize", "322","11.761", "Cambodia", "88","10.986", "Djibouti", "142","10.915", "Papua New Guinea", "107", "10.730", "Lesotho", "316","10.678", "Congo Brazzaville", "144", "10.553", "South Sudan", "114", "10.220", "Bahamas", "198","10.170", "Trinidad and Tobago", "163", "10.157", "Suriname", "201","7.821", "Benin", "99","7.559", "Equatorial Guinea", "107", "6.898", "Nicaragua", "182","6.456", "Iceland", "29","6.359", "Central African Republic", "87", "6.220", "Yemen", "1.207","5.882", "Gambia", "174","5.354", "Seychelles", "26","5.220", "Niger", "191","5.059", "San Marino", "90","4.789", "Chad", "170","4.508", "Saint Lucia", "74", "4.049", "Sierra Leone", "79", "3.941", "Burundi", "6","3.833", "Comoros", "146","3.831", "Barbados", "44","3.731", "Guinea-Bissau", "67", "3.659", "Eritrea", "10","2.908", "Liechtenstein", "57", "2.865", "Vietnam", "35","2.610", "New Zealand", "26", "2.447", "Monaco", "32","2.301", "Sao Tome and Principe", "35", "2.124", "Timor-Leste", "3","2.099", "Liberia", "85","1.850", "Saint Vincent and the Grenadines", "11", "1.232", "Antigua and Barbuda", "32", "1.207", "Mauritius", "17","1.116", "Taiwan", "12","1.059", "Bhutan", "1","712", "Diamond Princess", "13", "604", "Laos", "0","509", "Tanzania", "21","224", "Brunei", "3","173", "Dominica", "0","159", "Grenada", "1","111", "Fiji", "2","44", "Saint Kitts and Nevis", "0", "27", "Holy See", "0","20", "Solomon Islands", "0", "9", "MS Zaandam", "2","4", "Marshall Islands", "0", "4", "Vanuatu", "1","3", "Samoa", "0","1", "Micronesia", "0"] countryIndex = 1 casesIndex = 0 deathsIndex = 2 countries = [] cases = [] deaths = [] for item in lst: print(f"Country: {lst[countryIndex]}") print(f"Cases: {lst[casesIndex]}") print(f"Deaths: {lst[deathsIndex]}") print("") countryToAppend = lst[countryIndex] casesToAppend = lst[casesIndex] deathsToAppend = lst[deathsIndex] countries.append(countryToAppend) cases.append(casesToAppend) deaths.append(deathsToAppend) countryIndex += 3 casesIndex += 3 deathsIndex += 3 total = sum(deaths) print(f"Total deaths: {total}")
On top of the suggestion to replace the name of the data set to not use the reserved word list, my recommendation would be to leverage the ability to skip in the builtin range in an example like so: # Lists to store data countries = [] total_cases = [] total_deaths = [] # Iterate over thje range of the data skipping 3 at a time: 0, 3, ... for x in range(0, len(data), 3): # Parse out the cases a deaths to ints cases = int(data[x].replace('.', '')) deaths = int(data[x+2].replace('.', '')) # We can just extract the country label country_label = data[x+1] countries.append(country_label) total_cases.append(cases) total_deaths.append(deaths) # Get the desired sums sum_cases = sum(total_cases) sum_deaths = sum(total_deaths) print(f"The total cases: {sum_cases}") print(f"The total deaths: {sum_deaths}") Above I renamed your dataset to be data and was able to sum up each list.
sum = 0 for i in range(2,len(l),3): # l is your list of data , sum = sum+int(l[i].replace('.','')) # here I removed the point between numbers ex: 574.280 --> 574280 print(sum) #output : 3145239
PYTHON: How to get seaborn attributes to reset to default?
I cannot for the life of me get seaborn to go back to default settings. I will place the code that I believe caused this issue, but I would recommend against running it unless you know how to fix it. The culprit I believe is in the last chunk sns.set(font_scale = 4) before this question gets deleted because it's already been asked, I have tried the other posted solutions with no success. Just to name a quick few, resetting using sns.set(), sns.set_style(), sns.restore_defaults(). I have also tried resetting matplot settings to defaults as well. This attribute persists across all my files, so I cant even open a new file, delete the line of code that caused it, or run any past programs, or it will apply to those graphs too. My seaborn version is 0.10.1, I have tried to update it, but I cant get it to go through. I am using anaconda's spyder IDE The documentation says for versions after 0.8 that the styles/themes must be invoked to be reset, but if I try to use their solution sns.set_theme() I get an error saying that this module has no attribute. I'm sure this persistence is considered a feature, but I desperately need it to go away! import requests import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import os if __name__ == '__main__': #data prep data_path = './assets/' out_path = './output' #scraping javascript map data via xml endpoint = "https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData" data = requests.get(endpoint, params={"id": "US_MAP_DATA"}).json() #convert to df and export raw data as csv df = pd.DataFrame(data["US_MAP_DATA"]) path = os.path.join(out_path,'Raw_CDC_Data.csv') df.to_csv(path) #Remove last data point (Total USA) df.drop(df.tail(1).index,inplace=True) #Create DF of just 50 states state_abbr =["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"] states = df[df['abbr'].isin(state_abbr)] #Adding NYC to state of NY # FILL THIS IN LATER #Graphing plt.style.use('default') sns.set() #add new col survival rate and save states['survival_rate']=states['tot_cases']-states['tot_death'] states.drop(df.columns[[0]],axis=1) states.reset_index(drop=True) path = os.path.join(out_path,'CDC_Data_By_State.csv') states.to_csv(path) #Stacked BarPlot fig, ax = plt.subplots() colors = ['#e5c5b5','#a8dda8'] r=range(0,len(states.index)) plt.bar(r,states['survival_rate'],color=colors[0]) #ax = stacked['survival_rate','tot_death'].plot.bar(stacked=True, color=colors, ax=ax) fig, ax = plt.subplots() plt.figure(figsize=(20,35)) sns.set(font_scale=4) ax = sns.barplot(x='tot_cases',y='abbr',data=states) ax.set(title='USA Covid-19 Cases by State', ylabel='State', xlabel='Confirmed Cases') path = os.path.join(out_path,'Total_Deaths_Bar.png') plt.savefig(path)
Able to generate csv file, but they are not split into multiple columns, all columns are in first column
I have the code below, which takes in a json file and then parses it to csv. My code works, but the issue is that instead of the csv having multiple columns (what I want), it lists all of the data under column 1 without any seperation. Not sure where I'm wrong in my code below: #flattens and converts json to csv file def json2csv(): parse = format_json() now = datetime.datetime.now() today_str = now.strftime('%Y%m%d%H%M%S') outputFilePathCSV = 'datastream_log/' #outputs to given csv file with open('datastream_csv/'+'datastream:' + today_str + '.csv', 'w') as csv_file: filename.append(csv_file.name) writer = csv.writer(csv_file, delimiter='\t') #appends headers to top row keylist = [] for key in parse[1]: y = keylist.append(key) writer.writerow(keylist) #appends rows from dictionary for data, data_rec in parse.items(): try: writer.writerow( list(data_rec.values())) except(KeyError, TypeError, AttributeError) as e: pass json2csv() Example of JSON: {"target": {"icao_address": "8963BB", "timestamp": "2019-12-02T21:41:03Z", "altitude_baro": 3075, "heading": 235.0, "speed": 162.0, "latitude": 43.403778, "longitude": 77.139225, "callsign": "FDB1735", "vertical_rate": -960, "squawk_code": "0573", "collection_type": "terrestrial", "ingestion_time": "2019-12-02T21:41:07Z", "tail_number": "A6-FEY", "icao_actype": "B738", "flight_number": "FZ1735", "origin_airport_icao": "OMDB", "destination_airport_icao": "UAAA", "scheduled_departure_time_utc": "2019-12-02T17:30:00Z", "scheduled_departure_time_local": "2019-12-02T21:30:00", "scheduled_arrival_time_utc": "2019-12-02T21:50:00Z", "scheduled_arrival_time_local": "2019-12-03T03:50:00"}} {"target": {"icao_address": "4CA4EF", "timestamp": "2019-12-02T21:41:06Z", "altitude_baro": 15125, "heading": 80.0, "speed": 290.0, "latitude": 53.56636, "longitude": -3.047137, "callsign": "RYR9ZN", "vertical_rate": -1080, "collection_type": "terrestrial", "ingestion_time": "2019-12-02T21:41:10Z", "tail_number": "EI-DPK", "icao_actype": "B738", "flight_number": "FR156", "origin_airport_icao": "EIDW", "destination_airport_icao": "EGNM", "scheduled_departure_time_utc": "2019-12-02T20:55:00Z", "scheduled_departure_time_local": "2019-12-02T20:55:00", "scheduled_arrival_time_utc": "2019-12-02T21:55:00Z", "scheduled_arrival_time_local": "2019-12-02T21:55:00", "estimated_arrival_time_utc": "2019-12-02T21:53:00Z", "estimated_arrival_time_local": "2019-12-02T21:53:00"}} CSV image of what I get: