Trying to sum Python list [closed] - python

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
this is my first post here, I am learning and practicing Python.
The problem is that anything I try to run after a for loop not running, so at the end I can’t get a total count. Maybe I should have created a function but now I need to know why this is happening, what have I done wrong?
lst = ["32.225.012", "US", "574.280", "17.997.267", "India", "201.187", "14.521.289", "Brazil", "398.185", "5.626.942", "France", "104.077", "4.751.026", "Turkey", "39.398", "4.732.981", "Russia", "107.547", "4.427.433", "United Kingdom", "127.734", "3.994.894", "Italy", "120.256", "3.504.799", "Spain", "77.943", "3.351.014", "Germany", "82.395", "2.905.172", "Argentina", "62.599", "2.824.626", "Colombia", "72.725", "2.776.927", "Poland", "66.533", "2.459.906", "Iran", "70.966", "2.333.126", "Mexico", "215.547", "2.102.130", "Ukraine", "45.211", "1.775.062", "Peru", "60.416", "1.657.035", "Indonesia", "45.116", "1.626.033", "Czechia", "29.141", "1.578.450", "South Africa", "54.285", "1.506.455", "Netherlands", "17.339", "1.210.077", "Canada", "24.105", "1.184.271", "Chile", "26.073", "1.051.868", "Iraq", "15.392", "1.051.779", "Romania", "27.833", "1.020.495", "Philippines", "17.031", "979.034", "Belgium", "24.104", "960.520", "Sweden", "14.000", "838.323", "Israel", "6.361", "835.563", "Portugal", "16.973", "810.231", "Pakistan", "17.530", "774.399", "Hungary", "27.172", "754.614", "Bangladesh", "11.305", "708.265", "Jordan", "8.754", "685.937", "Serbia", "6.312", "656.077", "Switzerland", "10.617", "614.510", "Austria", "10.152", "580.666", "Japan", "10.052", "524.241", "Lebanon", "7.224", "516.301", "United Arab Emirates", "1.580", "510.465", "Morocco", "9.015", "415.281", "Saudi Arabia", "6.935", "402.491", "Bulgaria", "16.278", "401.593", "Malaysia", "1.477", "381.180", "Slovakia", "11.611", "377.662", "Ecuador", "18.470", "366.709", "Kazakhstan", "3.326", "363.533", "Panama", "6.216", "355.924", "Belarus", "2.522", "340.493", "Greece", "10.242", "327.737", "Croatia", "7.001", "316.521", "Azerbaijan", "4.461", "312.699", "Nepal", "3.211","307.401", "Georgia", "4.077", "305.313", "Tunisia", "10.563", "300.258", "Bolivia", "12.885", "294.550", "West Bank and Gaza", "3.206", "271.814", "Paraguay", "6.094", "271.145", "Kuwait", "1.546", "265.819", "Dominican Republic", "3.467", "255.288", "Ethiopia", "3.639", "250.479", "Denmark", "2.482", "250.138", "Moldova", "5.780", "247.857", "Ireland", "4.896", "244.555", "Lithuania", "3.900", "243.167", "Costa Rica", "3.186", "238.421", "Slovenia", "4.236", "224.621", "Guatemala", "7.478", "224.517", "Egypt", "13.168", "214.872", "Armenia", "4.071", "208.356", "Honduras", "5.212", "204.289", "Qatar", "445","197.378", "Bosnia and Herzegovina", "8.464", "193.721", "Venezuela", "2.082", "192.326", "Oman", "2.001","190.096", "Uruguay", "2.452", "176.701", "Libya", "3.019","174.659", "Bahrain", "632","164.912", "Nigeria", "2.063", "158.326", "Kenya", "2.688","151.569", "North Macedonia", "4.772", "142.790", "Burma", "3.209","130.859", "Albania", "2.386", "121.580", "Algeria", "3.234", "121.232", "Estonia", "1.148", "120.673", "Korea. South", "1.821", "117.099", "Latvia", "2.118", "111.915", "Norway", "753","104.953", "Sri Lanka", "661", "104.512", "Cuba", "614","103.638", "Kosovo", "2.134", "102.426", "China", "4.845","97.080", "Montenegro", "1.485", "94.599", "Kyrgyzstan", "1.592", "92.513", "Ghana", "779","91.484", "Zambia", "1.249","90.008", "Uzbekistan", "646", "86.405", "Finland", "908","69.804", "Mozambique", "814", "68.922", "El Salvador", "2.117", "66.826", "Luxembourg", "792", "65.998", "Cameroon", "991","63.720", "Cyprus", "303","61.699", "Thailand", "178","61.086", "Singapore", "30","59.370", "Afghanistan", "2.611", "48.177", "Namibia", "638","46.600", "Botswana", "702","45.885", "Cote d'Ivoire", "285", "45.292", "Jamaica", "770","41.766", "Uganda", "341","40.249", "Senegal", "1.107", "38.191", "Zimbabwe", "1.565", "36.510", "Madagascar", "631", "34.052", "Malawi", "1.147","33.944", "Sudan", "2.349","33.608", "Mongolia", "97","30.249", "Malta", "413","29.768", "Congo Kinshasa", "763", "29.749", "Australia", "910", "29.052", "Maldives", "72","25.942", "Angola", "587","24.888", "Rwanda", "332","23.181", "Cabo Verde", "213", "22.568", "Gabon", "138","22.513", "Syria", "1.572","22.087", "Guinea", "141","18.452", "Eswatini", "671","18.314", "Mauritania", "455", "13.915", "Somalia", "713","13.780", "Mali", "477","13.308", "Tajikistan", "90", "13.286", "Burkina Faso", "157", "13.148", "Andorra", "125","13.017", "Haiti", "254","12.963", "Guyana", "293","12.898", "Togo", "122","12.631", "Belize", "322","11.761", "Cambodia", "88","10.986", "Djibouti", "142","10.915", "Papua New Guinea", "107", "10.730", "Lesotho", "316","10.678", "Congo Brazzaville", "144", "10.553", "South Sudan", "114", "10.220", "Bahamas", "198","10.170", "Trinidad and Tobago", "163", "10.157", "Suriname", "201","7.821", "Benin", "99","7.559", "Equatorial Guinea", "107", "6.898", "Nicaragua", "182","6.456", "Iceland", "29","6.359", "Central African Republic", "87", "6.220", "Yemen", "1.207","5.882", "Gambia", "174","5.354", "Seychelles", "26","5.220", "Niger", "191","5.059", "San Marino", "90","4.789", "Chad", "170","4.508", "Saint Lucia", "74", "4.049", "Sierra Leone", "79", "3.941", "Burundi", "6","3.833", "Comoros", "146","3.831", "Barbados", "44","3.731", "Guinea-Bissau", "67", "3.659", "Eritrea", "10","2.908", "Liechtenstein", "57", "2.865", "Vietnam", "35","2.610", "New Zealand", "26", "2.447", "Monaco", "32","2.301", "Sao Tome and Principe", "35", "2.124", "Timor-Leste", "3","2.099", "Liberia", "85","1.850", "Saint Vincent and the Grenadines", "11", "1.232", "Antigua and Barbuda", "32", "1.207", "Mauritius", "17","1.116", "Taiwan", "12","1.059", "Bhutan", "1","712", "Diamond Princess", "13", "604", "Laos", "0","509", "Tanzania", "21","224", "Brunei", "3","173", "Dominica", "0","159", "Grenada", "1","111", "Fiji", "2","44", "Saint Kitts and Nevis", "0", "27", "Holy See", "0","20", "Solomon Islands", "0", "9", "MS Zaandam", "2","4", "Marshall Islands", "0", "4", "Vanuatu", "1","3", "Samoa", "0","1", "Micronesia", "0"]
countryIndex = 1
casesIndex = 0
deathsIndex = 2
countries = []
cases = []
deaths = []
for item in lst:
print(f"Country: {lst[countryIndex]}")
print(f"Cases: {lst[casesIndex]}")
print(f"Deaths: {lst[deathsIndex]}")
print("")
countryToAppend = lst[countryIndex]
casesToAppend = lst[casesIndex]
deathsToAppend = lst[deathsIndex]
countries.append(countryToAppend)
cases.append(casesToAppend)
deaths.append(deathsToAppend)
countryIndex += 3
casesIndex += 3
deathsIndex += 3
total = sum(deaths)
print(f"Total deaths: {total}")

On top of the suggestion to replace the name of the data set to not use the reserved word list, my recommendation would be to leverage the ability to skip in the builtin range in an example like so:
# Lists to store data
countries = []
total_cases = []
total_deaths = []
# Iterate over thje range of the data skipping 3 at a time: 0, 3, ...
for x in range(0, len(data), 3):
# Parse out the cases a deaths to ints
cases = int(data[x].replace('.', ''))
deaths = int(data[x+2].replace('.', ''))
# We can just extract the country label
country_label = data[x+1]
countries.append(country_label)
total_cases.append(cases)
total_deaths.append(deaths)
# Get the desired sums
sum_cases = sum(total_cases)
sum_deaths = sum(total_deaths)
print(f"The total cases: {sum_cases}")
print(f"The total deaths: {sum_deaths}")
Above I renamed your dataset to be data and was able to sum up each list.

sum = 0
for i in range(2,len(l),3): # l is your list of data ,
sum = sum+int(l[i].replace('.','')) # here I removed the point between numbers ex: 574.280 --> 574280
print(sum)
#output : 3145239

Related

How to iterate and group through dictionary in python

I am fighting with the query as I have to count What are the total CO2 emissions by each Airline?
I have managed to get all the data to a dictionary which looks like
Flight Number is 2HX and the airline is IT and the aircraft is E195 going from EDDF to LIMF
Flight distance is 542.93 km
Flight CO2 emissions is 16.87 kg
Flight Number is 8031 and the airline is ES and the aircraft is B752 going from LEBL to EDDP
Flight distance is 1365.97 km
Flight CO2 emissions is 31.07 kg
Flight Number is 39DV and the airline is ES and the aircraft is A320 going from LEPA to LEMD
Flight distance is 546.33 km
Flight CO2 emissions is 16.92 kg
All calculations are done by all of the flights but I would like to group them based on the AIRLINE, thus increasing the total results for them and printing them accordingly.
Any ideas how I could start it?
JSON file loaded looks like this
[{"hex": "150694", "reg_number": "RA-67220", "flag": "RU", "lat": 51.633911, "lng": 50.050518, "alt": 11582, "dir": 290, "speed": 761, "v_speed": 0.3, "squawk": "0507", "flight_number": "9004", "flight_icao": "TUL9004", "dep_icao": "VIDP", "dep_iata": "DEL", "airline_icao": "PLG", "aircraft_icao": "CRJ2", "updated": 1675528289, "status": "en-route"}, {"hex": "152038", "reg_number": "RA-73784", "flag": "RU", "lat": 43.352108, "lng": 35.634342, "alt": 11277, "dir": 4, "speed": 881, "v_speed": 0, "squawk": "7313", "flight_number": "427", "flight_icao": "AFL427", "flight_iata": "SU427", "dep_icao": "HESH", "dep_iata": "SSH", "arr_icao": "UUEE", "arr_iata": "SVO", "airline_icao": "AFL", "airline_iata": "SU", "aircraft_icao": "A333", "updated": 1675528054, "status": "en-route"}, {"hex": "152052", "reg_number": "RA-73810", "flag": "RU", "lat": 59.739784, "lng": 85.652138, "alt": 9745, "dir": 89, "speed": 801, "v_speed": 0, "squawk": "5521", "flight_number": "173", "flight_icao": "SVR173", "flight_iata": "U6173", "dep_icao": "USSS", "dep_iata": "SVX", "arr_icao": "UHHH", "arr_iata": "KHV", "airline_icao": "SVR", "airline_iata": "U6", "aircraft_icao": "A319", "updated": 1675528294, "status": "en-route"}
Basically function for listing flights look like this, but I would like to group them by airlines and add value of co2 emissions to each of individual results
def list_all_flights(self):
#List all flights
total_result = 0
for i in self.flights_list.read_data_file(): # json.file
if(i.get('dep_icao') and i.get('arr_icao')):
print(f"Flight Number is {i['flight_number']} and the airline is {i['flag']} and the aircraft is {i['aircraft_icao']} going from {i['dep_icao']} to {i['arr_icao']}");
I have managed to count all encounters of different airline inside new dictionary and it works
if 'flag' in i:
temp[i['flag']] = temp.get(i['flag'], 0) + 1'
Now I would like to add the result for co2 emissions as a total for an airline.
By making use of Pandas module (you can install via pip install pandas) I made this.
import pandas as pd
df = pd.read_json("data.json")
result = {a_iata:[] for a_iata in df.airline_iata.unique()}
for a_iata in result:
result[a_iata] = df.loc[df.airline_iata == a_iata]
Where data.json is the data that you have provided. The code essentially filtrates every entry by its airline_iata value and stores them into individual DataFrames. You can see the data by using result['AIRLINE_CODE'] and it will return the DataFrame.
When you are constructing your message, you can use something like this:
temp_df = result['AIRLINE_CODE']
message = f"Flight reg number is {temp_df.reg_number}..."
You can fill the message out however you like.

Python DataReader - Update with new information

import pandas as pd
from pandas_datareader import data as wb
tickers = ["MMM", "ABT", "ABBV", "ABMD", "ACN", "ATVI", "ADBE", "AMD", "AAP", "AES", "AFL", "A", "APD", "AKAM", "ALK", "ALB", "ARE", "ALXN", "ALGN", "ALLE", "LNT", "ALL", "GOOGL", "GOOG", "MO", "AMZN", "AMCR", "AEE", "AAL", "AEP", "AXP", "AIG", "AMT", "AWK", "AMP", "ABC", "AME", "AMGN", "APH", "ADI", "ANSS", "ANTM", "AON", "AOS", "APA", "AAPL", "AMAT", "APTV", "ADM", "ANET", "AJG", "AIZ", "T", "ATO", "ADSK", "ADP", "AZO", "AVB", "AVY", "BKR", "BLL", "BAC", "BK", "BAX", "BDX", "BBY", "BIO", "BIIB", "BLK", "BA", "BKNG", "BWA", "BXP", "BSX", "BMY", "AVGO", "BR", "CHRW", "COG", "CDNS", "CZR", "CPB", "COF", "CAH", "KMX", "CCL", "CARR", "CTLT", "CAT", "CBOE", "CBRE", "CDW", "CE", "CNC", "CNP", "CERN", "CF", "SCHW", "CHTR", "CVX", "CMG", "CB", "CHD", "CI", "CINF", "CTAS", "CSCO", "C", "CFG", "CTXS", "CLX", "CME", "CMS", "KO", "CTSH", "CL", "CMCSA", "CMA", "CAG", "COP", "ED", "STZ", "COO", "CPRT", "GLW", "CTVA", "COST", "CCI", "CSX", "CMI", "CVS", "DHI", "DHR", "DRI", "DVA", "DE", "DAL", "XRAY", "DVN", "DXCM", "FANG", "DLR", "DFS", "DISCA", "DISCK", "DISH", "DG", "DLTR", "D", "DPZ", "DOV", "DOW", "DTE", "DUK", "DRE", "DD", "DXC", "EMN", "ETN", "EBAY", "ECL", "EIX", "EW", "EA", "EMR", "ENPH", "ETR", "EOG", "EFX", "EQIX", "EQR", "ESS", "EL", "ETSY", "EVRG", "ES", "RE", "EXC", "EXPE", "EXPD", "EXR", "XOM", "FFIV", "FB", "FAST", "FRT", "FDX", "FIS", "FITB", "FE", "FRC", "FISV", "FLT", "FLIR", "FMC", "F", "FTNT", "FTV", "FBHS", "FOXA", "FOX", "BEN", "FCX", "GPS", "GRMN", "IT", "GNRC", "GD", "GE", "GIS", "GM", "GPC", "GILD", "GL", "GPN", "GS", "GWW", "HAL", "HBI", "HIG", "HAS", "HCA", "PEAK", "HSIC", "HSY", "HES", "HPE", "HLT", "HFC", "HOLX", "HD", "HON", "HRL", "HST", "HWM", "HPQ", "HUM", "HBAN", "HII", "IEX", "IDXX", "INFO", "ITW", "ILMN", "INCY", "IR", "INTC", "ICE", "IBM", "IP", "IPG", "IFF", "INTU", "ISRG", "IVZ", "IPGP", "IQV", "IRM", "JKHY", "J", "JBHT", "SJM", "JNJ", "JCI", "JPM", "JNPR", "KSU", "K", "KEY", "KEYS", "KMB", "KIM", "KMI", "KLAC", "KHC", "KR", "LB", "LHX", "LH", "LRCX", "LW", "LVS", "LEG", "LDOS", "LEN", "LLY", "LNC", "LIN", "LYV", "LKQ", "LMT", "L", "LOW", "LUMN", "LYB", "MTB", "MRO", "MPC", "MKTX", "MAR", "MMC", "MLM", "MAS", "MA", "MKC", "MXIM", "MCD", "MCK", "MDT", "MRK", "MET", "MTD", "MGM", "MCHP", "MU", "MSFT", "MAA", "MHK", "TAP", "MDLZ", "MPWR", "MNST", "MCO", "MS", "MOS", "MSI", "MSCI", "NDAQ", "NTAP", "NFLX", "NWL", "NEM", "NWSA", "NWS", "NEE", "NLSN", "NKE", "NI", "NSC", "NTRS", "NOC", "NLOK", "NCLH", "NOV", "NRG", "NUE", "NVDA", "NVR", "NXPI", "ORLY", "OXY", "ODFL", "OMC", "OKE", "ORCL", "OTIS", "PCAR", "PKG", "PH", "PAYX", "PAYC", "PYPL", "PENN", "PNR", "PBCT", "PEP", "PKI", "PRGO", "PFE", "PM", "PSX", "PNW", "PXD", "PNC", "POOL", "PPG", "PPL", "PFG", "PG", "PGR", "PLD", "PRU", "PEG", "PSA", "PHM", "PVH", "QRVO", "PWR", "QCOM", "DGX", "RL", "RJF", "RTX", "O", "REG", "REGN", "RF", "RSG", "RMD", "RHI", "ROK", "ROL", "ROP", "ROST", "RCL", "SPGI", "CRM", "SBAC", "SLB", "STX", "SEE", "SRE", "NOW", "SHW", "SPG", "SWKS", "SNA", "SO", "LUV", "SWK", "SBUX", "STT", "STE", "SYK", "SIVB", "SYF", "SNPS", "SYY", "TMUS", "TROW", "TTWO", "TPR", "TGT", "TEL", "TDY", "TFX", "TER", "TSLA", "TXN", "TXT", "TMO", "TJX", "TSCO", "TT", "TDG", "TRV", "TRMB", "TFC", "TWTR", "TYL", "TSN", "UDR", "ULTA", "USB", "UAA", "UA", "UNP", "UAL", "UNH", "UPS", "URI", "UHS", "UNM", "VLO", "VAR", "VTR", "VRSN", "VRSK", "VZ", "VRTX", "VFC", "VIAC", "VTRS", "V", "VNO", "VMC", "WRB", "WAB", "WMT", "WBA", "DIS", "WM", "WAT", "WEC", "WFC", "WELL", "WST", "WDC", "WU", "WRK", "WY", "WHR", "WMB", "WLTW", "WYNN", "XEL", "XLNX", "XYL", "YUM", "ZBRA", "ZBH", "ZION", "ZTS"]
financial_data = pd.DataFrame()
for t in tickers:
financial_data[t] = wb.DataReader(t, data_source='yahoo', start = '1995-1-1')["Adj Close"]
financial_data.to_excel("Financial Data.xlsx")
I am using Datareader to gather some stock info. I am grabbing a lot of info (from 1995 to 2021) and then I export it to Excel. I was wondering if there is a way, let's say tomorrow, to speed up the update of the information, instead of running the whole script on Python from start to bottom, since my goal tomorrow would just be to have a single new line on the whole Excel file. If I just execute the script, it will override the Excel file + add a new line of info. This seems pretty ineffective, and I was wondering if there's a way to "tell the script" I am just looking for tomorrow's info, instead of "telling it" to grab me again the information starting from 1995.
Thanks.
I don't know exactly how pandas works, but I would say it does lazy fast loading and this is not very computationally expensive. The costly thing is to operate with each loaded data. Then I think that in your case if the data is ordered by dates in increasing order, it would be enough to have a variable called timestamp_toStart initialized the first time to '1995-1-1' and that after this, after the first execution it is updated to the last value of the last date read. You could save this value in a file and reread it and load it every time you rerun the script.
financial_data = pd.DataFrame()
#load timestamp_toStart from the file here
for t in tickers:
financial_data[t] = wb.DataReader(t, data_source='yahoo', start = timestamp_toStart)["Adj Close"]
timestamp = wb.DataReader(t, data_source='yahoo', start = timestamp_toStart)["MMM"] #Not Sure about the correct syntax
timestamp_toStart = timestamp
#Save in a file timestamp_toStart
financial_data.to_excel("Financial Data.xlsx")

Convert Json to XML - Json with array values

I know there's a lot of similar questions like mine, but none of them worked for me.
My json file has array for actors, directors and genre. I'm having difficult to deal if this arrays while building the xml.
This is the json file:
[
{
"title":"The Kissing Booth",
"year":"2018",
"actors":[
"Megan du Plessis",
"Lincoln Pearson",
"Caitlyn de Abrue",
"Jack Fokkens",
"Stephen Jennings",
"Chloe Williams",
"Michael Miccoli",
"Juliet Blacher",
"Jesse Rowan-Goldberg",
"Chase Dallas",
"Joey King",
"Joel Courtney",
"Jacob Elordi",
"Carson White",
"Hilton Pelser"
],
"genre":[
"Comedy",
"Romance"
],
"description":"A high school student is forced to confront her secret crush at a kissing booth.",
"directors":[
"Vince Marcello"
]
},
{
"title":"Dune",
"year":"2020",
"actors":[
"Rebecca Ferguson",
"Zendaya",
"Jason Momoa",
"Timoth\u00e9e Chalamet",
"Dave Bautista",
"Josh Brolin",
"Oscar Isaac",
"Stellan Skarsg\u00e5rd",
"Javier Bardem",
"Charlotte Rampling",
"David Dastmalchian",
"Stephen McKinley Henderson",
"Sharon Duncan-Brewster",
"Chen Chang",
"Babs Olusanmokun"
],
"genre":[
"Adventure",
"Drama",
"Sci-Fi"
],
"description":"Feature adaptation of Frank Herbert's science fiction novel, about the son of a noble family entrusted with the protection of the most valuable asset and most vital element in the galaxy.",
"directors":[
"Denis Villeneuve"
]
},
{
"title":"Parasite",
"year":"2019",
"actors":[
"Kang-ho Song",
"Sun-kyun Lee",
"Yeo-jeong Jo",
"Woo-sik Choi",
"So-dam Park",
"Jeong-eun Lee",
"Hye-jin Jang",
"Myeong-hoon Park",
"Ji-so Jung",
"Hyun-jun Jung",
"Keun-rok Park",
"Jeong Esuz",
"Jo Jae-Myeong",
"Ik-han Jung",
"Kim Gyu Baek"
],
"genre":[
"Comedy",
"Drama",
"Thriller"
],
"description":"Greed and class discrimination threaten the newly formed symbiotic relationship between the wealthy Park family and the destitute Kim clan.",
"directors":[
"Bong Joon Ho"
]
},
{
"title":"Money Heist",
"year":null,
"actors":[
"\u00darsula Corber\u00f3",
"\u00c1lvaro Morte",
"Itziar Itu\u00f1o",
"Pedro Alonso",
"Miguel Herr\u00e1n",
"Jaime Lorente",
"Esther Acebo",
"Enrique Arce",
"Darko Peric",
"Alba Flores",
"Fernando Soto",
"Mario de la Rosa",
"Juan Fern\u00e1ndez",
"Rocco Narva",
"Paco Tous",
"Kiti M\u00e1nver",
"Hovik Keuchkerian",
"Rodrigo De la Serna",
"Najwa Nimri",
"Luka Peros",
"Roberto Garcia",
"Mar\u00eda Pedraza",
"Fernando Cayo",
"Antonio Cuellar Rodriguez",
"Anna Gras",
"Aitana Rinab Perez",
"Olalla Hern\u00e1ndez",
"Carlos Su\u00e1rez",
"Mari Carmen S\u00e1nchez",
"Antonio Romero",
"Pep Munn\u00e9"
],
"genre":[
"Action",
"Crime",
"Mystery",
"Thriller"
],
"description":"An unusual group of robbers attempt to carry out the most perfect robbery in Spanish history - stealing 2.4 billion euros from the Royal Mint of Spain."
},
{
"title":"The Vampire Diaries",
"year":null,
"actors":[
"Paul Wesley",
"Ian Somerhalder",
"Kat Graham",
"Candice King",
"Zach Roerig",
"Michael Trevino",
"Nina Dobrev",
"Steven R. McQueen",
"Matthew Davis",
"Michael Malarkey"
],
"genre":[
"Drama",
"Fantasy",
"Horror",
"Mystery",
"Romance",
"Thriller"
],
"description":"The lives, loves, dangers and disasters in the town, Mystic Falls, Virginia. Creatures of unspeakable horror lurk beneath this town as a teenage girl is suddenly torn between two vampire brothers."
}
]
I want to convert my json file to xml, and I my python code:
import json as j
import xml.etree.ElementTree as ET
with open("imdb_movie_sample.json") as json_format_file:
data = j.load(json_format_file)
root = ET.Element("movie")
ET.SubElement(root,"title").text = data["title"]
ET.SubElement(root,"year").text = str(data["year"])
actors = ET.SubElement(root,"actors") #.text = data["actors"]
actors.text = ''
for i in jsondata[0]['movie'][0]['actors']:
actors.text = actors.text + '\n\t\t' + i
genre = ET.SubElement(root,"genre") #.text = data["genre"]
genre.text = ''
for i in jsondata[0]['movie'][0]['genre']:
genre.text = genre.text + '\n\t\t' + i
ET.SubElement(root,"description").text = data["description"]
directors = ET.SubElement(root,"directors") #.text = data["directors"]
directors.text = ''
for i in jsondata[0]['movie'][0]['directors']:
directors.text = directors.text + '\n\t\t' + i
tree = ET.ElementTree(root)
tree.write("imdb_sample.xml")
Does anyone know how to help me doing this? Thanks.
I found this on pypi. I would always try looking on pypi to see what exists before asking others. Its an awesome resource with python packages created by tons of developers.
https://pypi.org/project/json2xml/

change column values in one df to match column values in different df?

I have 2 data frames I want to merge based on the column name. The name column in one df has abbreviated versions while the name column in the other df has the full name. what is the most efficient way to change the column names to match each other?
df1[names] = ["Man Utd", "Man City", "Chelsea", "Liverpool", "Spurs", "Arsenal"]
df2[names] = ["Manchester United", "Manchester City", "Chelsea FC", "Liverpool FC", "Tottenham Hotspurs", "Arsenal FC"]
You can create a dictionary like below using dict(zip())
df1['names'] = ["Man Utd", "Man City", "Chelsea", "Liverpool", "Spurs", "Arsenal"]
df2['names'] = ["Manchester United", "Manchester City", "Chelsea FC", "Liverpool FC", "Tottenham Hotspurs", "Arsenal FC"]
d=dict(zip(df1['names'],df2['names'])) #created a mapping dictionary
print(d)
{'Man Utd': 'Manchester United',
'Man City': 'Manchester City',
'Chelsea': 'Chelsea FC',
'Liverpool': 'Liverpool FC',
'Spurs': 'Tottenham Hotspurs',
'Arsenal': 'Arsenal FC'}
Then change df1[names] by
df1[names]=df1[names].map(d)
Post this you can perform merge as column names are same now.
The only way you can achieve it is to maintain a referential it order to match the two names columns
df1 = pd.DataFrame()
referential = {
"Man Utd": "Manchester United",
"Man City": "Manchester City",
"Chelsea": "Chelsea FC",
"Liverpool": "Liverpool FC",
"Spurs": "Tottenham Hotspurs",
"Arsenal": "Arsenal FC"
}
df1['names'] = ["Man Utd", "Man City", "Chelsea", "Liverpool", "Spurs", "Arsenal"]
df1['names'] = df1['names'].map(referential)
print(df1)
Constructing a dictionary and then feeding to pd.Series.map is one way. But, sticking with Pandas, you can also use pd.Series.replace directly:
lst1 = ["Man Utd", "Man City", "Chelsea", "Liverpool", "Spurs", "Arsenal"]
lst2 = ["Manchester United", "Manchester City", "Chelsea FC", "Liverpool FC",
"Tottenham Hotspurs", "Arsenal FC"]
# define input dictionary
df = pd.DataFrame({'names': lst1})
# replace values in lst1 by lst2, by index
df['names'] = df['names'].replace(lst1, lst2)
print(df)
names
0 Manchester United
1 Manchester City
2 Chelsea FC
3 Liverpool FC
4 Tottenham Hotspurs
5 Arsenal FC

trying to loop over random list and get vars as a list

im trying to loop over a set of lists and dicts and pull the correct info from them,
should run like:
get random from music, if random is a list then print list, if list contains dict print dict
this is as far as i got before i became confused! please help a noob!
import random
music = ['Band1', 'Band2', 'Band3', 'Band4']
Band1 = ['Album1']
Band2 = ['Album2']
Band3 = ['Album3']
Band4 = ['Album4']
Album1 = {
"01": 'Track1', "02": 'Track2', "03": 'Track3', "04": 'Track4',
"05": 'Track5', "06": 'Track6', "07": 'Track7', "08": 'Track8',
"09": 'Track9', "10": 'Track10', "11": 'Track11'}
i = random.choice(music)
if isinstance(i, list):
print('is instance')
I suggest a different data structure:
music = {
"Band 1": {
"Album A": ["1-Track A1", "1-Track A2", "1-Track A3"],
"Album B": ["1-Track B1", "1-Track B2", "1-Track B3"],
"Album C": ["1-Track C1", "1-Track C2", "1-Track C3"]
},
"Band 2": {
"Album A": ["2-Track A1", "2-Track A2", "2-Track A3"],
"Album B": ["2-Track B1", "2-Track B2", "2-Track B3"],
"Album C": ["2-Track C1", "2-Track C2", "2-Track C3"]
},
"Band 3": {
"Album A": ["3-Track A1", "3-Track A2", "3-Track A3"],
"Album B": ["3-Track B1", "3-Track B2", "3-Track B3"],
"Album C": ["3-Track C1", "3-Track C2", "3-Track C3"]
}
}
This is a dictionary of bands (key: band name) where each band is a dictionary containing albums (key: album name) where each album is a list containing the track names (index: track number - 1).
Then we can assume that our data structure contains only dictionaries, lists and strings. We want a function that picks a random track, i.e. a string.
Here's a recursive approach. If wanted, it could also be adapted to return the keys and indexes where it found the track as well. It's also capable of any nesting depth, so if you would want to group bands by countries or language or genre etc. that would be no problem.
import random
def pick_track(music_collection):
# we must pick a key and look that up if we get a dictionary
if isinstance(music_collection, dict):
chosen = music_collection[random.choice(list(music_collection.keys()))]
else:
chosen = random.choice(music_collection)
if isinstance(chosen, str): # it's a string, so it represents a track
return chosen
else: # it's a collection (list or dict) so we have to pick something from inside it
return pick_track(chosen)
Now we use this method like this to e.g. print 10 random tracks:
for i in range(5):
print(pick_track(music))
This could output the following example:
1-Track C1
2-Track C3
2-Track A3
3-Track A3
2-Track B1
Update:
You want to also get the keys and indexes where a track was found i.e. the band name, album name and track number? No problem, here's a modified function:
def pick_track2(music_collection):
if isinstance(music_collection, dict):
random_key = random.choice(list(music_collection.keys()))
else:
random_key = random.randrange(len(music_collection))
chosen = music_collection[random_key]
if isinstance(chosen, str):
return [random_key, chosen]
else:
return [random_key] + pick_track2(chosen)
It now does not return the track name as string, but a list of keys/indices that create the path to the picked track. You would use it like this:
for i in range(5):
print("Band: '{}' - Album: '{}' - Track {}: '{}'".format(*pick_track2(music)))
An example output:
Band: 'Band 1' - Album: 'Album C' - Track 1: '1-Track C2'
Band: 'Band 2' - Album: 'Album B' - Track 0: '2-Track B1'
Band: 'Band 1' - Album: 'Album B' - Track 0: '1-Track B1'
Band: 'Band 3' - Album: 'Album B' - Track 2: '3-Track B3'
Band: 'Band 3' - Album: 'Album B' - Track 2: '3-Track B3'
See this code running on ideone.com
Twisting your order and using the actual variables (not their names as strings) in your lists should get you started:
Album1 = {
"01": 'Track1', "02": 'Track2', "03": 'Track3', "04": 'Track4',
"05": 'Track5', "06": 'Track6', "07": 'Track7', "08": 'Track8',
"09": 'Track9', "10": 'Track10', "11": 'Track11'
}
Album2 = []
Album3 = ""
Album4 = 0
Band1 = [Album1]
Band2 = [Album2]
Band3 = [Album3]
Band4 = [Album4]
music = [Band1, Band2, Band3, Band4]
When I debug this code, I am getting "i" as string. So first you have to obtain the variable by name using globals function.
This code may help you:
import random
music = ['Band1', 'Band2', 'Band3', 'Band4']
Band1 = ['Album1']
Band2 = ['Album2']
Band3 = ['Album3']
Band4 = ['Album4']
Album1 = {
"01": 'Track1', "02": 'Track2', "03": 'Track3', "04": 'Track4',
"05": 'Track5', "06": 'Track6', "07": 'Track7', "08": 'Track8',
"09": 'Track9', "10": 'Track10', "11": 'Track11'}
Album2 = []
Album3 = ""
Album4 = 0
i = random.choice(music)
print i
#val = eval(i)[0]
#print type(eval(val))
val2 = globals()[i][0]
print type(globals()[val2])

Categories

Resources