I had a dictionary like:
a = {'date' : ['2012-03-09', '2012-01-12', '2012-11-11'],
'rate' : ['199', '900', '899'],
'country code' : ['1', '2', '44'],
'area code' : ['114', '11', '19'],
'product' : ['Mobile', 'Teddy', 'Handbag']}
Then I used zip function to concatenate the values:
data = [(a,b,c+d,e) for a,b,c,d,e in zip(*a.values())]
Output:
data = [('2012-03-09', '199', '1114', 'Mobile'),
('2012-01-12', '900', '211', 'Teddy'),
('2012-11-11', '899', '4419', 'Handbag')]
What if I want the function to itself search for the 'country code' and 'area code', and merge them. Any suggestions please?
A generic method to merge 'columns', letting you specify what columns to expect and what to merge up front:
def merged_pivot(data, *output_names, **merged_columns):
input_names = []
column_map = {}
for col in output_names:
start = len(input_names)
input_names.extend(merged_columns.get(col, [col]))
column_map[col] = slice(start, len(input_names))
for row in zip(*(data[c] for c in input_names)):
yield tuple(''.join(row[column_map[c]]) for c in output_names)
which you call with:
list(merged_pivot(a, 'date', 'rate', 'code', 'product', code=('country code', 'area code')))
passing in:
the list of mappings
each columns that makes up the output ('date', 'rate', 'code', 'product' in the above example)
any column in the output that is composed of a merged list of input columns (code=('country code', 'area code') in the example, so code in the output is formed by merging country code and area code).
Output:
>>> list(merged_pivot(a, 'date', 'rate', 'code', 'product', code=('country code', 'area code')))
[('2012-03-09', '199', '1114', 'Mobile'), ('2012-01-12', '900', '211', 'Teddy'), ('2012-11-11', '899', '4419', 'Handbag')]
or, slightly reformatted:
[('2012-03-09', '199', '1114', 'Mobile'),
('2012-01-12', '900', '211', 'Teddy'),
('2012-11-11', '899', '4419', 'Handbag')]
Instead of calling list() on the merged_pivot() generator, you can also just loop over it's output if all you need to do is process each row separately:
columns = ('date', 'rate', 'code', 'product')
for row in merged_pivot(a, *columns, code=('country code', 'area code')):
# do something with `row`
print row
You have to define the order of keys yourself (otherwise a.values returns it in an arbitrary order). I renamed your original dictionary to dd:
[(a,b,c+d,e) for a,b,c,d,e in zip(*(dd[k] for k in ('date', 'rate', 'country code', 'area code', 'product')))]
returns
[('2012-03-09', '199', '1114', 'Mobile'),
('2012-01-12', '900', '211', 'Teddy'),
('2012-11-11', '899', '4419', 'Handbag')]
Related
This is may be a very simple but I am not able to find a way forward. I have a dataframe with columns having different datatypes as shown below:
Now, I need to find those columns(years) for which we have values in both rows(i.e. omit columns with null values). I tried below code and could get the columns which have values in both the rows.
Iran.columns[~Iran.isnull().any()]
Output:
Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
'1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
'1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
'1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
'1987', '1988', '1989', '1990', '1993', '1994', '1995', '1996', '1997',
'1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006',
'2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015',
'2016', '2017'],
dtype='object')
My problem is that I am also getting 'Country Name', 'Country Code', 'Indicator Name', 'Indicator Code' columns as well which are not required. I am not able to think a way through.
I actually have world bank data for Population and GDP in two separate dataframes. From there, I created this 'Iran' name dataframe after concatenation from parent dataframes to solve this question: What years do we have complete data (GDP and Population) for Iran?
I just need to find answer in simple and neat way. Thanks for your help!
select a subset of your data in a new df, with only those column names that are numeric, for example (or a regex for 4 digits) and filter out everything that has blanks. then get the remaining column values with:
df.columns.tolist()
How to add a column based on one of the values of a list of lists in python?
I have the following list and I need to add a new column based on the value of Currency.
If Pound , Euro = Amount *0.9
If USD , Euro =Amount *1.2
I need to code without libraries.
[['Buyer', 'Seller', 'Amount', 'Property_Type','Currency'],
['100', '200', '4923', 'c', 'Pound'],
['600', '429', '838672', 'a', 'USD'],
['650', '400', '8672', 'a', 'Euro']
Result
[['Buyer', 'Seller', 'Amount', 'Property_Type', 'Currency', 'Euro'],
['100', '200', '5000', 'c', 'Livre', '6000'],
['600', '429', '10000', 'a', 'USD', '9000'],
['650', '400', '8600', 'a', 'Euro', '8600']
Thank you very much, any readings on how to import a csv and manipulate it, without libraries, would be much appreciated.
Assuming the columns are always in the same order...
EXCH_RATES = {
'Pound': Decimal('0.9'),
'USD': Decimal('1.2'),
'Euro': 1,
}
rows[0].append('Euro')
for row in rows[1:]:
exch_rate = EXCH_RATES[row[4]]
row.append(str(exch_rate * Decimal[row[2]]))
Check the last item in the list inside the list then check what its currency is then change the amount like so:
lst = [['Buyer', 'Seller', 'Amount', 'Property_Type','Currency'], ['100', '200', '4923', 'c', 'Pound'], ['600', '429', '838672', 'a', 'USD'], ['650', '400', '8672', 'a', 'Euro']]
for i in range(3):
if lst[i][3] == 'Pound':
lst[i].append(str(int(lst[i][2]) * 0.9))
elif lst[i][3] == 'USD':
lst[i].append(str(int(lst[i][2]) * 1.2))
else:
lst[i].append(lst[i][2])
Although you would be better storing the data in a csv file but then you would have to use the csv library.
Tell me if this helps and if you want to use the csv library tell me so I can tell you how to use it.
I have a nested dictionary with the following structure. I am trying to convert it to pandas dataframe, however I have problems to split the 'mappings' dictionary to have it in separate columns.
{'16':
{'label': 't1',
'prefLab': 'name',
'altLabel': ['test1', 'test3'],
'map': [{'id': '16', 'idMap': {'ciID': 16, 'map3': '033441'}}]
},
'17':
{'label': 't2',
'prefLab': 'name2',
'broader': ['18'],
'altLabel': ['test2'],
'map': [{'id': '17', 'idMap': {'ciID': 17, 'map1': 1006558, 'map2': 1144}}]
}
}
ideal outcome would be a dataframe with the following structure.
label prefLab broader altLab ciID, map1, map2, map3 ...
16
17
Try with this: assuming your json format name is "data" then
train = pd.DataFrame.from_dict(data, orient='index')
I am getting html table based on day so if I search for 20 days it brings me 20 table and I want to add all 20 tables in 1 table so I can verify data within time series.
I have tried merge and add functions of pandas but it just add as string.
Table one
[['\xa0', 'All Issues', 'Investment Grade', 'High Yield', 'Convertible'],
['Total Issues Traded', '8039', '5456', '2386', '197'],
['Advances', '3834', '2671', '1075', '88'],
['Declines', '3668', '2580', '994', '94'],
['Unchanged', '163', '54', '99', '10'],
['52 Week High', '305', '100', '193', '12'],
['52 Week Low', '152', '83', '63', '6'],
['Dollar Volume*', '27568', '17000', '9299', '1269']]
table two
[['\xa0', 'All Issues', 'Investment Grade', 'High Yield', 'Convertible'],
['Total Issues Traded', '8039', '5456', '2386', '197'],
['Advances', '3834', '2671', '1075', '88'],
['Declines', '3668', '2580', '994', '94'],
['Unchanged', '163', '54', '99', '10'],
['52 Week High', '305', '100', '193', '12'],
['52 Week Low', '152', '83', '63', '6'],
['Dollar Volume*', '27568', '17000', '9299', '1269']]
code but it add as string.
tab_data = [[item.text for item in row_data.select("th,td")]
for row_data in tables.select("tr")]
df = pd.DataFrame(tab_data)
df2 = pd.DataFrame(tab_data)
df3 = df.add(df2,fill_value=0)
df
If you want to convert the numeric cells into integers, you would need to do that explicitly, as follows:
tab_data = [[int(item.text) if item.text.isdigit() else item.text
for item in row_data.select("th,td")]
for row_data in tables.select("tr")]
Hope it helps.
The way you are converting the data frame treats all values as text.
There are two options here.
Explicitly convert the strings to the data type you want using astype
Use read_html to create data frames from html tables, which also tries to do the data type conversion.
I have a listing containing values with measurement unit and i want to remove them, the original list is mentioned below:
['Dawn:', 'Sunrise:', 'Moonrise:', 'Dusk:', 'Sunset:\xa0', 'Moonset:', 'Daylight:', 'Length:', 'Phase:', 'Temperature', 'Dew\xa0Point ', 'Windchill', 'Humidity', 'Heat Index', 'Apparent Temperature', 'Solar Radiation', 'Evapotranspiration Today', 'Rainfall\xa0Today', 'Rainfall\xa0Rate', 'Rainfall\xa0This\xa0Month', 'Rainfall\xa0This\xa0Year', 'Rainfall\xa0Last Hour', 'Last rainfall', 'Wind\xa0Speed\xa0(gust)', 'Wind\xa0Speed\xa0(avg)', 'Wind Bearing', 'Beaufort\xa0F1', 'Barometer\xa0', 'Rising slowly']
['07:30', '08:04', '17:03', '19:05', '18:31', '01:45', '11:35', '10:27', 'Waxing Gibbous', '13.7\xa0°C', '11.4\xa0°C', '13.7\xa0°C', '86%', '13.7\xa0°C', '13.0\xa0°C', '0\xa0W/m²', '0.15\xa0mm', '0.0\xa0mm', '0.0\xa0mm/hr', '36.4\xa0mm', '36.4\xa0mm', '0.0\xa0mm', '2018-10-14 08:52', '6.1\xa0kts', '2.6\xa0kts', '229° SW', 'Light air', '1026.89\xa0mb', '0.27\xa0mb/hr']
To remove measurement units like degrees, kts, mb etc. i follow the below approach:
newlist = [word for line in test for word in line.split()]
#print(newlist)
testlist = ['°C', 'W/m²', 'mm','mm/hr', 'mb','kts', 'mb/hr', '%']
t = [x for x in newlist for d in testlist if d in x]
s = [r for r in newlist if r not in testlist]
After this code, i am able to remove all units, but then values which are strings and are separated by spaces like Waxing Gibbous becomes comma separated. is there possible i join them back with spaces?
Result of code:
['Dawn:', 'Sunrise:', 'Moonrise:', 'Dusk:', 'Sunset:\xa0', 'Moonset:', 'Daylight:', 'Length:', 'Phase:', 'Temperature', 'Dew\xa0Point ', 'Windchill', 'Humidity', 'Heat Index', 'Apparent Temperature', 'Solar Radiation', 'Evapotranspiration Today', 'Rainfall\xa0Today', 'Rainfall\xa0Rate', 'Rainfall\xa0This\xa0Month', 'Rainfall\xa0This\xa0Year', 'Rainfall\xa0Last Hour', 'Last rainfall', 'Wind\xa0Speed\xa0(gust)', 'Wind\xa0Speed\xa0(avg)', 'Wind Bearing', 'Beaufort\xa0F1', 'Barometer\xa0', 'Rising slowly']
['07:30', '08:04', '17:03', '19:05', '18:31', '01:45', '11:35', '10:27', 'Waxing', 'Gibbous', '13.7', '11.4', '13.7', '86%', '13.7', '13.0', '0', '0.15', '0.0', '0.0', '36.4', '36.4', '0.0', '2018-10-14', '08:52', '5.2', '2.4', '188°', 'S', 'Light', 'air', '1026.21', '0.23']
Main source code from where data is being fetched:
Data origin source code
Any help would be appreciated, thanks
So your source data identified earlier is from a dict called grouped (think if you could put that back in and show an example that be great)
From group you want to get all the keys as headers and values as values but replacing all the symbols you do not need.
The code below does that for you starting from your grouped dict and stores your header and value into 2 seperate list:
headers = []
values = []
testlist = ['°C', 'W/m²', 'mm','mm/hr', 'mb','kts', 'mb/hr']
for i in a[0]:
for k,v in i.items():
headers.append(k)
values.append(v)
for idx,v in enumerate(values):
for t in testlist:
values[idx] = values[idx].replace(t,'')
for h,v in zip(headers,values):
print('Header: {} , Value : {}'.format(h,v))
It definitely is helpful in the future if you outline where your source data begins and then your expected output.