Python Function returns wrong value - python

periodsList = []
su = '0:'
Su = []
sun = []
SUN = ''
I'm formating timetables by converting
extendedPeriods = ['0: 1200 - 1500',
'0: 1800 - 2330',
'2: 1200 - 1500',
'2: 1800 - 2330',
'3: 1200 - 1500',
'3: 1800 - 2330',
'4: 1200 - 1500',
'4: 1800 - 2330',
'5: 1200 - 1500',
'5: 1800 - 2330',
'6: 1200 - 1500',
'6: 1800 - 2330']
into '1200 - 1500/1800 - 2330'
su is the day identifier
Su, sun store some values
SUN stores the converted timetable
for line in extendedPeriods:
if su in line:
Su.append(line)
for item in Su:
sun.append(item.replace(su, '', 1).strip())
SUN = '/'.join([str(x) for x in sun])
Then I tried to write a function to apply my "converter" also to the other days..
def formatPeriods(id, store1, store2, periodsDay):
for line in extendedPeriods:
if id in line:
store1.append(line)
for item in store1:
store2.append(item.replace(id, '', 1).strip())
periodsDay = '/'.join([str(x) for x in store2])
return periodsDay
But the function returns 12 misformatted strings...
'1200 - 1500', '1200 - 1500/1200 - 1500/1800 - 2330',

You can use collections.OrderedDict here, if order doesn't matter then use collections.defaultdict
>>> from collections import OrderedDict
>>> dic = OrderedDict()
for item in extendedPeriods:
k,v = item.split(': ')
dic.setdefault(k,[]).append(v)
...
>>> for k,v in dic.iteritems():
... print "/".join(v)
...
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
To access a particular day you can use:
>>> print "/".join(dic['0']) #sunday
1200 - 1500/1800 - 2330
>>> print "/".join(dic['2']) #tuesday
1200 - 1500/1800 - 2330

This is your general logic:
from collections import defaultdict
d = defaultdict(list)
for i in extended_periods:
bits = i.split(':')
d[i[0].strip()].append(i[1].strip())
for i,v in d.iteritems():
print i,'/'.join(v)
The output is:
0 1200 - 1500/1800 - 2330
3 1200 - 1500/1800 - 2330
2 1200 - 1500/1800 - 2330
5 1200 - 1500/1800 - 2330
4 1200 - 1500/1800 - 2330
6 1200 - 1500/1800 - 2330
To make it function for a day, simply select d[0] (for Sunday, for example):
def schedule_per_day(day):
d = defaultdict(list)
for i in extended_periods:
bits = i.split(':')
d[i[0].strip()].append(i[1].strip())
return '/'.join(d[day]) if d.get(day) else None

Related

Slow processing of Python list

I have a file that has around 440K lines of data. I need to read these data and find the actual "table" in the text file. Part of the text file looks like this.
[BEGIN] 2022/4/8 14:00:05
<Z0301IPBBPE03>screen-length 0 temporary
Info: The configuration takes effect on the current user terminal interface only.
<Z0301IPBBPE03>display bgp vpnv4 vpn-instance Charging_VRF routing-table
BGP Local router ID is 10.12.24.19
Status codes: * - valid, > - best, d - damped, x - best external, a - add path,
h - history, i - internal, s - suppressed, S - Stale
Origin : i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V - valid, I - invalid, N - not-found
VPN-Instance Charging_VRF, Router ID 10.12.24.19:
Total Number of Routes: 2479
Network NextHop MED LocPrf PrefVal Path/Ogn
*>i 10.0.19.0/24 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
*>i 10.0.143.0/24 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
*>i 10.0.144.128/25 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
*>i 10.0.148.80/32 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
*>i 10.0.148.81/32 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
*>i 10.0.201.16/28 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
*>i 10.0.201.64/29 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
*>i 10.0.201.94/32 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
...
<Z0301IPBBPE03>display bgp vpnv4 vpn-instance Gb_VRF routing-table
BGP Local router ID is 10.12.24.19
Status codes: * - valid, > - best, d - damped, x - best external, a - add path,
h - history, i - internal, s - suppressed, S - Stale
Origin : i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V - valid, I - invalid, N - not-found
VPN-Instance Gb_VRF, Router ID 10.12.24.19:
Total Number of Routes: 1911
Network NextHop MED LocPrf PrefVal Path/Ogn
*>i 10.1.133.192/30 10.12.8.63 0 100 300 ?
* i 10.12.8.63 0 100 0 ?
*>i 10.1.133.216/30 10.12.8.64 0 100 300 ?
* i 10.12.8.64 0 100 0 ?
*>i 10.1.160.248/29 10.12.40.7 0 100 300 ?
* i 10.12.40.7 0 100 0 ?
*>i 10.1.161.0/29 10.12.40.8 0 100 300 ?
* i 10.12.40.8 0 100 0 ?
*>i 10.1.161.248/32 10.12.40.7 2 100 300 ?
* i 10.12.40.7 2 100 0 ?
*>i 10.1.161.249/32 10.12.40.7 2 100 300 ?
* i 10.12.40.7 2 100 0 ?
*>i 10.1.164.248/29 10.12.40.7 0 100 300 ?
* i 10.12.40.7 0 100 0 ?
*>i 10.1.165.0/29 10.12.40.8 0 100 300 ?
* i 10.12.40.8 0 100 0 ?
*>i 10.1.165.248/32 10.12.40.7 2 100 300 ?
* i 10.12.40.7 2 100 0 ?
The text file goes long way, and it has plenty of garbage lines which I did not want to, so I am trying to find the keywords (display bgp vpnv4 vpn-instance) and start reading once I found. The code looks like this, which I will convert the table into my dataframe.
My problem is that, reading this 440k lines of code and convert into dataframe takes me almost half an hour to complete, I am here to seek help to see if there is a better way to improve the efficiency. Thank you!
bgp_df = pd.DataFrame()
vrf_list = ['Charging_VRF', 'Gb_VRF', 'Gn_VRF']
def generate_bgp_network_list(block, vrf):
ip_address_list = block.split('\n')
ip_addresses = [[address for address in ip_address.strip().split(' ') if address] for ip_address in ip_address_list if ip_address] # generate list of lines
ip_addresses = [address for address in ip_addresses if len(address) > 0] # remove empty list
ip_addresses = [(ipaddress.IPv4Network(ip_address[1], False), ip_address[-1]) for ip_address in ip_addresses if validate_ipaddress(ip_address[1])]
bgp_data = [{'ip_network': address, 'vrf': vrf, 'as_number': as_number} for address, as_number in ip_addresses]
bgp_df = bgp_df.append(data, index=False)
def read_bgp_file(file):
if file == '':
return
file = open(file, encoding=get_encoding_type(file))
lines = file.readlines()
start = False
block = ''
lines = iter(lines)
for line in lines:
if '<' in line and len(block) > 0:
generate_bgp_network_list(block, vrf)
start = False
block = ''
if f'display bgp vpnv4 vpn-instance' in line:
vrf = line.strip().split(' ')[-2]
if vrf in vrf_list:
start = True
if start:
block += line
Looks to me that you only require lines starting with *>i. If this is your case, how about such simple approach:
def input_file_to_dataframe(file_name: str):
result = []
prefix = '*>i'
with open(file_name, "r") as file:
lines = file.readlines()
for line in lines:
line = line.strip()
if line.startswith(prefix):
line = line.replace(prefix, '').split()
result.append(line)
return pd.DataFrame(data=result)
Run with ~50k lines:
input_file_to_dataframe('file.txt')
# 46.3 ms ± 3.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
For me the readlines here is the major issue, because it will load all lines at once.
If you were iterating directly on the file, I expect it would read it line by line with a faster result:
with open(file_name, "r") as the_file:
for line in the_file:

Inconsistent index value in re module

I have two list which have different value. I tried to put the a list in an organized format with g.split. Although it work fine on the a list, but it cant filter b list properly
a = ['Sehingga 8 Ogos 2021: Jumlah kes COVID-19 yang dilaporkan adalah 18,688 kes (1,262,540 kes)\n\nPecahan setiap negeri (Kumulatif):\n\nSelangor - 6,565 (465,015)\nWPKL - 1,883 (140,404)\nJohor - 1,308 (100,452)\nSabah -Lagi 1,379 (93,835)\nSarawak - 581 (81,328)\nNegeri Sembilan - 1,140 (78,777)\nKedah - 1,610 (56,598)\nPulau Pinang - 694 (52,368)\nKelantan - 870 (49,433)\nPerak - 861 (43,924)\nMelaka - 526 (35,584)\nPahang - 602 (29,125)\nTerengganu - 598 (20,696)\nWP Labuan - 2 (9,711)\nWP Putrajaya - 63 (4,478)\nPerlis - 6 (812)\n\n- KPK KKM']
b = ['Sehingga 9 Ogos 2021. Jumlah kes COVID-19 yang dilaporkan adalah 17,236 kes (1,279,776 kes).\n\nPecahan setiap negeri (Kumulatif):\n\nSelangor - 5,740 (470,755)\nWPKL - 1,567 (141,971)\nJohor - 1,232 (101,684)\nSabah -Lagi 1,247 (95,082)\nSarawak - 589 (81,917)\nNegeri Sembilan - 1,215 (79,992)\nKedah - 1,328 (57,926)\nPulau Pinang - 908 (53,276)\nKelantan - 914 (50,347)\nPerak - 935 (44,859)\nMelaka - 360 (35,944)\nPahang - 604 (29,729)\nTerengganu - 501 (21,197)\nWP Labuan - 8 (9,719)\nWP Putrajaya - 66 (4,544)\nPerlis - 22 (834)\n\n- KPK KKM']
My code
out = []
for v in b:
for g in re.findall(r"^(.*?\(.*?\))\n", v, flags=re.M):
out.append(g.split(":")[0])
print(*out[0])
Whenever i print print(out[0]) in b list it only show me Selangor - 5 , 7 4 0 (470,755) which is wrong, it should be Sehingga 9 Ogos 2021.
I tried the same code but this time in a list and it work properly without any issues. However I noticed there is minor differences between the two list, one is the ':' and '.' after the Sehingga 8 Ogos 2021. How can I make the function to work on both list? I'm still new to re and gsplit, does anyone have any idea on this ? Thanks.
`
There are issue with your data format and regex, I am not that good at regex but this works on me.
import re
a = ['Sehingga 8 Ogos 2021: Jumlah kes COVID-19 yang dilaporkan adalah 18,688 kes (1,262,540 kes)\n\nPecahan setiap negeri (Kumulatif):\n\nSelangor - 6,565 (465,015)\nWPKL - 1,883 (140,404)\nJohor - 1,308 (100,452)\nSabah -Lagi 1,379 (93,835)\nSarawak - 581 (81,328)\nNegeri Sembilan - 1,140 (78,777)\nKedah - 1,610 (56,598)\nPulau Pinang - 694 (52,368)\nKelantan - 870 (49,433)\nPerak - 861 (43,924)\nMelaka - 526 (35,584)\nPahang - 602 (29,125)\nTerengganu - 598 (20,696)\nWP Labuan - 2 (9,711)\nWP Putrajaya - 63 (4,478)\nPerlis - 6 (812)\n\n- KPK KKM']
b = ['Sehingga 9 Ogos 2021. Jumlah kes COVID-19 yang dilaporkan adalah 17,236 kes (1,279,776 kes).\n\nPecahan setiap negeri (Kumulatif):\n\nSelangor - 5,740 (470,755)\nWPKL - 1,567 (141,971)\nJohor - 1,232 (101,684)\nSabah -Lagi 1,247 (95,082)\nSarawak - 589 (81,917)\nNegeri Sembilan - 1,215 (79,992)\nKedah - 1,328 (57,926)\nPulau Pinang - 908 (53,276)\nKelantan - 914 (50,347)\nPerak - 935 (44,859)\nMelaka - 360 (35,944)\nPahang - 604 (29,729)\nTerengganu - 501 (21,197)\nWP Labuan - 8 (9,719)\nWP Putrajaya - 66 (4,544)\nPerlis - 22 (834)\n\n- KPK KKM']
out = []
for v in b:
regex_list = re.findall(r"^(.*?\(.*?\))\n", v.replace('.\n', '\n').replace('.',':'), flags=re.M)
for g in regex_list:
print(g)
out.append(g.split(":")[0])
print(*out[0])

Change timezone based on data from another column

Here's my example dataframe:
Office Design ... SiteLog Duration
0 DQFEMOZM - 2141 ZMI_PE ... 6/28/2019 7:59 6
1 DQFEMOZM - 2141 ZMI_PE ... 6/28/2019 7:47 5
2 DQFEMOZM - 2141 ZMI_PE ... 6/27/2019 4:58 2
3 DQFEMOZM - 2141 ZMI_PE ... 6/27/2019 4:52 2
4 YMTSZUXXQN - 1031 ZMI_PE ... 6/3/2019 4:10 4
6 YMTSZUXXQN - 1031 ZMI_PE ... 6/2/2019 22:36 6
9 UTUXMW - 1046 ZMI_PE ... 6/26/2019 20:01 336
10 UTUXMW - 1046 ZMI_PE ... 6/26/2019 14:16 828
11 UTUXMW - 1046 ZMI_PE ... 6/14/2019 16:33 2
12 UTUXMW - 1046 ZMI_PE ... 6/14/2019 15:07 2
14 GMUH-FZAB XMHMX - 2114 ZMI_PE ... 6/25/2019 5:35 3
15 TSGADANXDMY - 1215 ZMI_PE ... 6/9/2019 3:10 3
16 TSGADANXDMY - 1215 ZMI_PE ... 6/8/2019 19:03 2
17 TSGADANXDMY - 1215 ZMI_PE ... 6/8/2019 3:59 2
18 PDARPQY - 1154 ZMI_PE ... 6/30/2019 7:06 1
19 PDARPQY - 1154 ZMI_PE ... 6/18/2019 5:04 216
21 MSGMEEUEEUY - 2027 ZMI_PE ... 6/27/2019 17:36 2
23 MSGMEEUEEUY - 2027 ZMI_PE ... 6/4/2019 9:32 11
25 MSGMEEUEEUY - 2027 ZMI_PE ... 6/2/2019 22:37 4
26 MSGMEEUEEUY - 2027 ZMI_PE ... 6/2/2019 22:25 2
28 MSGMEEUEEUY - 2027 ZMI_PE ... 5/29/2019 23:24 2
All the example site logs are in PST. What I'm trying to do is take certain rows, say office "DQFEMOZM - 2141" and change the site log timestamp to EST.
I've tried using the tz_localize and tz_convert functions but haven't been able to get them to work.
import pandas as pd
from pytz import all_timezones
data = pd.read_csv('lab.csv')
data = data.drop_duplicates('SiteLog')
data = data.drop(data[data.Duration == 0].index)
DQFEMOZM = data[data.Office == 'DQFEMOZM - 2141'].index
DQFEMOZM = DQFEMOZM.tz_localize('America/Los_Angeles')
DQFEMOZM = DQFEMOZM.tz_convert('America/New_York')
Part of the error message I'm receiving:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
DQFEMOZM = DQFEMOZM.tz_convert('America/New_York')
AttributeError: 'Int64Index' object has no attribute 'tz_convert'
You are assigning index to data which makes it Int64. You can accomplish the task like the following:
from pandas import Timestamp
import pandas as pd
data = pd.read_csv("/Users/user/Desktop/Book6.csv")
data = data.drop_duplicates("SiteLog")
for office, datetime in zip(data["Office"], data["SiteLog"]):
if office == "DQFEMOZM - 2141":
raw_time = Timestamp(datetime)
print(raw_time)
loc_raw_time = raw_time.tz_localize("America/Los_Angeles")
print(loc_raw_time)
new_raw_time = loc_raw_time.tz_convert("America/New_York")
print(new_raw_time)
I used first two rows for example:
Office Design ... SiteLog Duration
0 DQFEMOZM - 2141 ZMI_PE ... 6/28/2019 7:59 6
1 DQFEMOZM - 2141 ZMI_PE ... 6/28/2019 7:47 5
and the code output is (for your reference I am just printing all the time zones that you transfer),
2019-06-28 07:59:00
2019-06-28 07:59:00-07:00
2019-06-28 10:59:00-04:00
2019-06-28 07:47:00
2019-06-28 07:47:00-07:00
2019-06-28 10:47:00-04:00
You can't convert time like that. You'll need to use the pytz and the datetime module.
import pytz, datetime
I started out with smaller data frame to test.
>>> data = pd.read_csv ('test.csv')
>>> df = pd.DataFrame (data)
>>> df
Office Design SiteLog Duration
0 DQFEMOZM - 2141 ZMI_PE 6/28/2019 7:59 6
1 UTUXMW - 1046 ZMI_PE 6/28/2019 7:47 5
2 YMTSZUXXQN - 1031 ZMI_PE 6/27/2019 4:58 2
3 DQFEMOZM - 2144 ZMI_PE 6/27/2019 4:52 2
Next, create a date/time conversion function.
>>> def date_conversion (df):
... nytimes = []
... for i, record in enumerate (df.Office):
... if 'DQFEMOZM - 2141' in record:
... time_obj = datetime.datetime.strptime(df.SiteLog [i], '%m/%d/%Y %H:%M') # Convert string into date/time object for localization and conversion
... pacific_time = pytz.timezone('America/Los_Angeles').localize(time_obj)
... new_york_time = pacific_time.astimezone(pytz.timezone('America/New_York'))
... nytimes.append(new_york_time.strftime('%m/%d/%Y %H:%M')) # Converting time object back to string
... else:
... nytimes.append ('-')
... return nytimes
Finally, insert the converted time to your dataframe.
>>> df.insert (3, 'SiteLog_NY', date_conversion (df), True)
>>> df
Office Design SiteLog SiteLog_NY Duration
0 DQFEMOZM - 2141 ZMI_PE 6/28/2019 7:59 06/28/2019 10:59 6
1 UTUXMW - 1046 ZMI_PE 6/28/2019 7:47 - 5
2 YMTSZUXXQN - 1031 ZMI_PE 6/27/2019 4:58 - 2
3 DQFEMOZM - 2144 ZMI_PE 6/27/2019 4:52 - 2

Python 2.7. BeautifulSoup not opening HTML

I have some weird problem with BeautifulSoup. I'm trying to read this website: http://lol.esportswikis.com/wiki/2015_International_Wild_Card_Invitational/Match_History and all I get is empty list. But if I try different site like: http://lol.esportswikis.com/wiki/Season_1_World_Championship/Match_History it works like a charm. Any idea whats the problem?
From chrome inspector tool I get the same code for 1st line of the table for both websites, so whats the problem?
#mw-content-text > table.wikitable > tbody > tr:nth-child(3)
mw-content-text > table.wikitable > tr #I use this
Even If I try to open just the wikitable
url = 'insert url here'
con = urllib2.urlopen(url)
HTML = con.read()
soup = BeautifulSoup(HTML, 'html.parser')
stuff = soup.select('#mw-content-text > table.wikitable')
print stuff
It just prints a empty list
import requests
from bs4 import BeautifulSoup
r = requests.get('http://lol.esportswikis.com/wiki/2015_International_Wild_Card_Invitational/Match_History')
soup = BeautifulSoup(r.text , 'lxml')
row_text=[]
for row in soup.select('.wikitable tr')[2:]:
print(row.get_text(' ',strip=True).replace(' • ','•'))
i don't have 2.7 env, so i use requests+bs4, but it's amost the same
i separate each cell by space in each row in case you want to split it
out:
2015-04-25 5.6 Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge Yang•Revolta•Tockers•micaO•Jockster 36:54 k k - - - - - - SB - -
2015-04-25 5.6 Yang•Revolta•Tockers•micaO•Jockster Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge 53:47 k k - - - - - - SB - -
2015-04-25 5.6 Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge Yang•Revolta•Tockers•micaO•Jockster 53:40 k k - - - - - - SB - -
2015-04-25 5.6 Yang•Revolta•Tockers•micaO•Jockster Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge 32:30 k k - - - - - - SB - -
2015-04-24 5.6 WarL0cK•007x•G4•Moss•Lloyd Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge 45:53 k k - - - - - - SB - -
2015-04-24 5.6 Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge WarL0cK•007x•G4•Lloyd•Moss 34:16 k k - - - - - - SB - -
2015-04-24 5.6 WarL0cK•007x•G4•Lloyd•Moss Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge 39:39 k k - - - - - - SB - -
2015-04-24 5.6 Yang•Revolta•Tockers•micaO•Jockster Smurf•Symphony•Kira•LeX•Dimonko 36:58 k k - - - - - - SB - -
2015-04-24 5.6 Smurf•Symphony•Kira•LeX•Dimonko Yang•Revolta•Tockers•micaO•Jockster 32:03 k k - - - - - - SB - -
2015-04-24 5.6 Yang•Revolta•Tockers•micaO•Jockster Smurf•Symphony•Kira•LeX•Dimonko 24:06 k k - - - - - - SB - -
2015-04-24 5.6 Smurf•Symphony•Kira•LeX•Dimonko Yang•Revolta•Tockers•micaO•Jockster 38:01 k k - - - - - - SB - -
2015-04-23 5.6 BonziN•Astarore•Ceros•Yutapongo•KazuXD Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge 32:01 k k - - - - - - SB - -
2015-04-23 5.6 WarL0cK•007x•G4•Lloyd•Moss Swip3rR•Spookz•Swiffer•Raydere•Rosey 35:05 k k - - - - - - SB - -
2015-04-23 5.6 Yang•Revolta•Tockers•micaO•Jockster Helior•Juliostito•Regi•Zeicro•BearJew 18:40 k k - - - - - - SB - -
2015-04-23 5.6 Swip3rR•Spookz•Swiffer•Raydere•Rosey BonziN•Astarore•Ceros•Yutapongo•KazuXD 36:26 k k - - - - - - SB - -
2015-04-23 5.6 Helior•Juliostito•Regi•Zeicro•BearJew WarL0cK•007x•G4•Lloyd•Moss 28:26 k k - - - - - - SB - -
2015-04-22 5.6 Smurf•Symphony•Kira•LeX•Dimonko Yang•Revolta•Tockers•micaO•Jockster 37:03 k k - - - - - - SB - -
2015-04-22 5.6 BonziN•Astarore•Ceros•Yutapongo•KazuXD Helior•Juliostito•Regi•Zeicro•BearJew 34:25 k k - - - - - - SB - -
2015-04-22 5.6 Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge Smurf•Symphony•Kira•LeX•Dimonko 40:49 k k - - - - - - SB - -
2015-04-22 5.6 Yang•Revolta•Tockers•micaO•Jockster WarL0cK•007x•G4•Lloyd•Moss 29:30 k k - - - - - - SB - -
2015-04-22 5.6 Smurf•Symphony•Kira•LeX•Dimonko Swip3rR•Spookz•Swiffer•Raydere•Rosey 29:42 k k - - - - - - SB - -
2015-04-22 5.6 Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge Yang•Revolta•Tockers•micaO•Jockster 37:33 k k - - - - - - SB - -
2015-04-22 5.6 BonziN•Astarore•Ceros•Yutapongo•KazuXD WarL0cK•007x•G4•Lloyd•Moss 27:38 k k - - - - - - SB - -
2015-04-22 5.6 Helior•Juliostito•Regi•Zeicro•BearJew Smurf•Symphony•Kira•LeX•Dimonko 38:38 k k - - - - - - SB - -
2015-04-21 5.6 Yang•Revolta•Tockers•micaO•Jockster BonziN•Astarore•Ceros•Yutapongo•KazuXD 35:52 k k - - - - - - SB - -
2015-04-21 5.6 Swip3rR•Spookz•Swiffer•Raydere•Rosey Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge 44:25 k k - - - - - - SB - -
2015-04-21 5.6 WarL0cK•007x•G4•Lloyd•Moss Smurf•Symphony•Kira•LeX•Dimonko 45:10 k k - - - - - - SB - -
2015-04-21 5.6 Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge Helior•Juliostito•Regi•Zeicro•BearJew 35:51 k k - - - - - - SB - -
2015-04-21 5.6 Swip3rR•Spookz•Swiffer•Raydere•Rosey Yang•Revolta•Tockers•micaO•Jockster 43:21 k k - - - - - - SB - -
2015-04-21 5.6 WarL0cK•007x•G4•Lloyd•Moss Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge 29:40 k k - - - - - - SB - -
2015-04-21 5.6 Smurf•Symphony•Kira•LeX•Dimonko BonziN•Astarore•Ceros•Yutapongo•KazuXD 30:17 k k - - - - - - SB - -
2015-04-21 5.6 Helior•Juliostito•Regi•Zeicro•BearJew Swip3rR•Spookz•Swiffer•Raydere•Rosey 43:52 63.3k 16 3 2 81.4k 22 11 3 -18.1k ▼ 18.1k -6 ▼ 6 -8 ▼ 8 -1 ▼ 1 - - SB - -

Convert python pandas rows to columns

Decade difference (kg) Version
0 1510 - 1500 -0.346051 v1.0h
1 1510 - 1500 -3.553251 A2011
2 1520 - 1510 -0.356409 v1.0h
3 1520 - 1510 -2.797978 A2011
4 1530 - 1520 -0.358922 v1.0h
I want to transform the pandas dataframe such that the 2 unique enteries in the Version column are transfered to become columns. How do I do that?
The resulting dataframe should not have a multiindex
In [28]: df.pivot(index='Decade', columns='Version', values='difference (kg)')
Out[28]:
Version A2011 v1.0h
Decade
1510 - 1500 -3.553251 -0.346051
1520 - 1510 -2.797978 -0.356409
1530 - 1520 NaN -0.358922
or
In [31]: df.pivot(index='difference (kg)', columns='Version', values='Decade')
Out[31]:
Version A2011 v1.0h
difference (kg)
-3.553251 1510 - 1500 None
-2.797978 1520 - 1510 None
-0.358922 None 1530 - 1520
-0.356409 None 1520 - 1510
-0.346051 None 1510 - 1500
both satisfy your requirements.

Categories

Resources