Python Function returns wrong value

Python Function returns wrong value - python

periodsList = []
su = '0:'
Su = []
sun = []
SUN = ''
I'm formating timetables by converting
extendedPeriods = ['0: 1200 - 1500',
'0: 1800 - 2330',
'2: 1200 - 1500',
'2: 1800 - 2330',
'3: 1200 - 1500',
'3: 1800 - 2330',
'4: 1200 - 1500',
'4: 1800 - 2330',
'5: 1200 - 1500',
'5: 1800 - 2330',
'6: 1200 - 1500',
'6: 1800 - 2330']
into '1200 - 1500/1800 - 2330'
su is the day identifier
Su, sun store some values
SUN stores the converted timetable
for line in extendedPeriods:
if su in line:
Su.append(line)
for item in Su:
sun.append(item.replace(su, '', 1).strip())
SUN = '/'.join([str(x) for x in sun])
Then I tried to write a function to apply my "converter" also to the other days..
def formatPeriods(id, store1, store2, periodsDay):
for line in extendedPeriods:
if id in line:
store1.append(line)
for item in store1:
store2.append(item.replace(id, '', 1).strip())
periodsDay = '/'.join([str(x) for x in store2])
return periodsDay
But the function returns 12 misformatted strings...
'1200 - 1500', '1200 - 1500/1200 - 1500/1800 - 2330',

You can use collections.OrderedDict here, if order doesn't matter then use collections.defaultdict
>>> from collections import OrderedDict
>>> dic = OrderedDict()
for item in extendedPeriods:
k,v = item.split(': ')
dic.setdefault(k,[]).append(v)
...
>>> for k,v in dic.iteritems():
... print "/".join(v)
...
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
To access a particular day you can use:
>>> print "/".join(dic['0']) #sunday
1200 - 1500/1800 - 2330
>>> print "/".join(dic['2']) #tuesday
1200 - 1500/1800 - 2330

This is your general logic:
from collections import defaultdict
d = defaultdict(list)
for i in extended_periods:
bits = i.split(':')
d[i[0].strip()].append(i[1].strip())
for i,v in d.iteritems():
print i,'/'.join(v)
The output is:
0 1200 - 1500/1800 - 2330
3 1200 - 1500/1800 - 2330
2 1200 - 1500/1800 - 2330
5 1200 - 1500/1800 - 2330
4 1200 - 1500/1800 - 2330
6 1200 - 1500/1800 - 2330
To make it function for a day, simply select d[0] (for Sunday, for example):
def schedule_per_day(day):
d = defaultdict(list)
for i in extended_periods:
bits = i.split(':')
d[i[0].strip()].append(i[1].strip())
return '/'.join(d[day]) if d.get(day) else None

Related

Slow processing of Python list

I have a file that has around 440K lines of data. I need to read these data and find the actual "table" in the text file. Part of the text file looks like this.
[BEGIN] 2022/4/8 14:00:05
<Z0301IPBBPE03>screen-length 0 temporary
Info: The configuration takes effect on the current user terminal interface only.
<Z0301IPBBPE03>display bgp vpnv4 vpn-instance Charging_VRF routing-table
BGP Local router ID is 10.12.24.19
Status codes: * - valid, > - best, d - damped, x - best external, a - add path,
h - history, i - internal, s - suppressed, S - Stale
Origin : i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V - valid, I - invalid, N - not-found
VPN-Instance Charging_VRF, Router ID 10.12.24.19:
Total Number of Routes: 2479
Network NextHop MED LocPrf PrefVal Path/Ogn
*>i 10.0.19.0/24 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
*>i 10.0.143.0/24 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
*>i 10.0.144.128/25 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
*>i 10.0.148.80/32 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
*>i 10.0.148.81/32 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
*>i 10.0.201.16/28 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
*>i 10.0.201.64/29 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
*>i 10.0.201.94/32 10.12.8.21 0 100 300 ?
* i 10.12.8.22 0 100 0 ?
...
<Z0301IPBBPE03>display bgp vpnv4 vpn-instance Gb_VRF routing-table
BGP Local router ID is 10.12.24.19
Status codes: * - valid, > - best, d - damped, x - best external, a - add path,
h - history, i - internal, s - suppressed, S - Stale
Origin : i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V - valid, I - invalid, N - not-found
VPN-Instance Gb_VRF, Router ID 10.12.24.19:
Total Number of Routes: 1911
Network NextHop MED LocPrf PrefVal Path/Ogn
*>i 10.1.133.192/30 10.12.8.63 0 100 300 ?
* i 10.12.8.63 0 100 0 ?
*>i 10.1.133.216/30 10.12.8.64 0 100 300 ?
* i 10.12.8.64 0 100 0 ?
*>i 10.1.160.248/29 10.12.40.7 0 100 300 ?
* i 10.12.40.7 0 100 0 ?
*>i 10.1.161.0/29 10.12.40.8 0 100 300 ?
* i 10.12.40.8 0 100 0 ?
*>i 10.1.161.248/32 10.12.40.7 2 100 300 ?
* i 10.12.40.7 2 100 0 ?
*>i 10.1.161.249/32 10.12.40.7 2 100 300 ?
* i 10.12.40.7 2 100 0 ?
*>i 10.1.164.248/29 10.12.40.7 0 100 300 ?
* i 10.12.40.7 0 100 0 ?
*>i 10.1.165.0/29 10.12.40.8 0 100 300 ?
* i 10.12.40.8 0 100 0 ?
*>i 10.1.165.248/32 10.12.40.7 2 100 300 ?
* i 10.12.40.7 2 100 0 ?
The text file goes long way, and it has plenty of garbage lines which I did not want to, so I am trying to find the keywords (display bgp vpnv4 vpn-instance) and start reading once I found. The code looks like this, which I will convert the table into my dataframe.
My problem is that, reading this 440k lines of code and convert into dataframe takes me almost half an hour to complete, I am here to seek help to see if there is a better way to improve the efficiency. Thank you!
bgp_df = pd.DataFrame()
vrf_list = ['Charging_VRF', 'Gb_VRF', 'Gn_VRF']
def generate_bgp_network_list(block, vrf):
ip_address_list = block.split('\n')
ip_addresses = [[address for address in ip_address.strip().split(' ') if address] for ip_address in ip_address_list if ip_address] # generate list of lines
ip_addresses = [address for address in ip_addresses if len(address) > 0] # remove empty list
ip_addresses = [(ipaddress.IPv4Network(ip_address[1], False), ip_address[-1]) for ip_address in ip_addresses if validate_ipaddress(ip_address[1])]
bgp_data = [{'ip_network': address, 'vrf': vrf, 'as_number': as_number} for address, as_number in ip_addresses]
bgp_df = bgp_df.append(data, index=False)
def read_bgp_file(file):
if file == '':
return
file = open(file, encoding=get_encoding_type(file))
lines = file.readlines()
start = False
block = ''
lines = iter(lines)
for line in lines:
if '<' in line and len(block) > 0:
generate_bgp_network_list(block, vrf)
start = False
block = ''
if f'display bgp vpnv4 vpn-instance' in line:
vrf = line.strip().split(' ')[-2]
if vrf in vrf_list:
start = True
if start:
block += line

Looks to me that you only require lines starting with *>i. If this is your case, how about such simple approach:
def input_file_to_dataframe(file_name: str):
result = []
prefix = '*>i'
with open(file_name, "r") as file:
lines = file.readlines()
for line in lines:
line = line.strip()
if line.startswith(prefix):
line = line.replace(prefix, '').split()
result.append(line)
return pd.DataFrame(data=result)
Run with ~50k lines:
input_file_to_dataframe('file.txt')
# 46.3 ms ± 3.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

For me the readlines here is the major issue, because it will load all lines at once.
If you were iterating directly on the file, I expect it would read it line by line with a faster result:
with open(file_name, "r") as the_file:
for line in the_file:

Inconsistent index value in re module

I have two list which have different value. I tried to put the a list in an organized format with g.split. Although it work fine on the a list, but it cant filter b list properly
a = ['Sehingga 8 Ogos 2021: Jumlah kes COVID-19 yang dilaporkan adalah 18,688 kes (1,262,540 kes)\n\nPecahan setiap negeri (Kumulatif):\n\nSelangor - 6,565 (465,015)\nWPKL - 1,883 (140,404)\nJohor - 1,308 (100,452)\nSabah -Lagi 1,379 (93,835)\nSarawak - 581 (81,328)\nNegeri Sembilan - 1,140 (78,777)\nKedah - 1,610 (56,598)\nPulau Pinang - 694 (52,368)\nKelantan - 870 (49,433)\nPerak - 861 (43,924)\nMelaka - 526 (35,584)\nPahang - 602 (29,125)\nTerengganu - 598 (20,696)\nWP Labuan - 2 (9,711)\nWP Putrajaya - 63 (4,478)\nPerlis - 6 (812)\n\n- KPK KKM']
b = ['Sehingga 9 Ogos 2021. Jumlah kes COVID-19 yang dilaporkan adalah 17,236 kes (1,279,776 kes).\n\nPecahan setiap negeri (Kumulatif):\n\nSelangor - 5,740 (470,755)\nWPKL - 1,567 (141,971)\nJohor - 1,232 (101,684)\nSabah -Lagi 1,247 (95,082)\nSarawak - 589 (81,917)\nNegeri Sembilan - 1,215 (79,992)\nKedah - 1,328 (57,926)\nPulau Pinang - 908 (53,276)\nKelantan - 914 (50,347)\nPerak - 935 (44,859)\nMelaka - 360 (35,944)\nPahang - 604 (29,729)\nTerengganu - 501 (21,197)\nWP Labuan - 8 (9,719)\nWP Putrajaya - 66 (4,544)\nPerlis - 22 (834)\n\n- KPK KKM']
My code
out = []
for v in b:
for g in re.findall(r"^(.*?\(.*?\))\n", v, flags=re.M):
out.append(g.split(":")[0])
print(*out[0])
Whenever i print print(out[0]) in b list it only show me Selangor - 5 , 7 4 0 (470,755) which is wrong, it should be Sehingga 9 Ogos 2021.
I tried the same code but this time in a list and it work properly without any issues. However I noticed there is minor differences between the two list, one is the ':' and '.' after the Sehingga 8 Ogos 2021. How can I make the function to work on both list? I'm still new to re and gsplit, does anyone have any idea on this ? Thanks.
`

There are issue with your data format and regex, I am not that good at regex but this works on me.
import re
a = ['Sehingga 8 Ogos 2021: Jumlah kes COVID-19 yang dilaporkan adalah 18,688 kes (1,262,540 kes)\n\nPecahan setiap negeri (Kumulatif):\n\nSelangor - 6,565 (465,015)\nWPKL - 1,883 (140,404)\nJohor - 1,308 (100,452)\nSabah -Lagi 1,379 (93,835)\nSarawak - 581 (81,328)\nNegeri Sembilan - 1,140 (78,777)\nKedah - 1,610 (56,598)\nPulau Pinang - 694 (52,368)\nKelantan - 870 (49,433)\nPerak - 861 (43,924)\nMelaka - 526 (35,584)\nPahang - 602 (29,125)\nTerengganu - 598 (20,696)\nWP Labuan - 2 (9,711)\nWP Putrajaya - 63 (4,478)\nPerlis - 6 (812)\n\n- KPK KKM']
b = ['Sehingga 9 Ogos 2021. Jumlah kes COVID-19 yang dilaporkan adalah 17,236 kes (1,279,776 kes).\n\nPecahan setiap negeri (Kumulatif):\n\nSelangor - 5,740 (470,755)\nWPKL - 1,567 (141,971)\nJohor - 1,232 (101,684)\nSabah -Lagi 1,247 (95,082)\nSarawak - 589 (81,917)\nNegeri Sembilan - 1,215 (79,992)\nKedah - 1,328 (57,926)\nPulau Pinang - 908 (53,276)\nKelantan - 914 (50,347)\nPerak - 935 (44,859)\nMelaka - 360 (35,944)\nPahang - 604 (29,729)\nTerengganu - 501 (21,197)\nWP Labuan - 8 (9,719)\nWP Putrajaya - 66 (4,544)\nPerlis - 22 (834)\n\n- KPK KKM']
out = []
for v in b:
regex_list = re.findall(r"^(.*?\(.*?\))\n", v.replace('.\n', '\n').replace('.',':'), flags=re.M)
for g in regex_list:
print(g)
out.append(g.split(":")[0])
print(*out[0])

Change timezone based on data from another column

Here's my example dataframe:
Office Design ... SiteLog Duration
0 DQFEMOZM - 2141 ZMI_PE ... 6/28/2019 7:59 6
1 DQFEMOZM - 2141 ZMI_PE ... 6/28/2019 7:47 5
2 DQFEMOZM - 2141 ZMI_PE ... 6/27/2019 4:58 2
3 DQFEMOZM - 2141 ZMI_PE ... 6/27/2019 4:52 2
4 YMTSZUXXQN - 1031 ZMI_PE ... 6/3/2019 4:10 4
6 YMTSZUXXQN - 1031 ZMI_PE ... 6/2/2019 22:36 6
9 UTUXMW - 1046 ZMI_PE ... 6/26/2019 20:01 336
10 UTUXMW - 1046 ZMI_PE ... 6/26/2019 14:16 828
11 UTUXMW - 1046 ZMI_PE ... 6/14/2019 16:33 2
12 UTUXMW - 1046 ZMI_PE ... 6/14/2019 15:07 2
14 GMUH-FZAB XMHMX - 2114 ZMI_PE ... 6/25/2019 5:35 3
15 TSGADANXDMY - 1215 ZMI_PE ... 6/9/2019 3:10 3
16 TSGADANXDMY - 1215 ZMI_PE ... 6/8/2019 19:03 2
17 TSGADANXDMY - 1215 ZMI_PE ... 6/8/2019 3:59 2
18 PDARPQY - 1154 ZMI_PE ... 6/30/2019 7:06 1
19 PDARPQY - 1154 ZMI_PE ... 6/18/2019 5:04 216
21 MSGMEEUEEUY - 2027 ZMI_PE ... 6/27/2019 17:36 2
23 MSGMEEUEEUY - 2027 ZMI_PE ... 6/4/2019 9:32 11
25 MSGMEEUEEUY - 2027 ZMI_PE ... 6/2/2019 22:37 4
26 MSGMEEUEEUY - 2027 ZMI_PE ... 6/2/2019 22:25 2
28 MSGMEEUEEUY - 2027 ZMI_PE ... 5/29/2019 23:24 2
All the example site logs are in PST. What I'm trying to do is take certain rows, say office "DQFEMOZM - 2141" and change the site log timestamp to EST.
I've tried using the tz_localize and tz_convert functions but haven't been able to get them to work.
import pandas as pd
from pytz import all_timezones
data = pd.read_csv('lab.csv')
data = data.drop_duplicates('SiteLog')
data = data.drop(data[data.Duration == 0].index)
DQFEMOZM = data[data.Office == 'DQFEMOZM - 2141'].index
DQFEMOZM = DQFEMOZM.tz_localize('America/Los_Angeles')
DQFEMOZM = DQFEMOZM.tz_convert('America/New_York')
Part of the error message I'm receiving:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
DQFEMOZM = DQFEMOZM.tz_convert('America/New_York')
AttributeError: 'Int64Index' object has no attribute 'tz_convert'

You are assigning index to data which makes it Int64. You can accomplish the task like the following:
from pandas import Timestamp
import pandas as pd
data = pd.read_csv("/Users/user/Desktop/Book6.csv")
data = data.drop_duplicates("SiteLog")
for office, datetime in zip(data["Office"], data["SiteLog"]):
if office == "DQFEMOZM - 2141":
raw_time = Timestamp(datetime)
print(raw_time)
loc_raw_time = raw_time.tz_localize("America/Los_Angeles")
print(loc_raw_time)
new_raw_time = loc_raw_time.tz_convert("America/New_York")
print(new_raw_time)
I used first two rows for example:
Office Design ... SiteLog Duration
0 DQFEMOZM - 2141 ZMI_PE ... 6/28/2019 7:59 6
1 DQFEMOZM - 2141 ZMI_PE ... 6/28/2019 7:47 5
and the code output is (for your reference I am just printing all the time zones that you transfer),
2019-06-28 07:59:00
2019-06-28 07:59:00-07:00
2019-06-28 10:59:00-04:00
2019-06-28 07:47:00
2019-06-28 07:47:00-07:00
2019-06-28 10:47:00-04:00

You can't convert time like that. You'll need to use the pytz and the datetime module.
import pytz, datetime
I started out with smaller data frame to test.
>>> data = pd.read_csv ('test.csv')
>>> df = pd.DataFrame (data)
>>> df
Office Design SiteLog Duration
0 DQFEMOZM - 2141 ZMI_PE 6/28/2019 7:59 6
1 UTUXMW - 1046 ZMI_PE 6/28/2019 7:47 5
2 YMTSZUXXQN - 1031 ZMI_PE 6/27/2019 4:58 2
3 DQFEMOZM - 2144 ZMI_PE 6/27/2019 4:52 2
Next, create a date/time conversion function.
>>> def date_conversion (df):
... nytimes = []
... for i, record in enumerate (df.Office):
... if 'DQFEMOZM - 2141' in record:
... time_obj = datetime.datetime.strptime(df.SiteLog [i], '%m/%d/%Y %H:%M') # Convert string into date/time object for localization and conversion
... pacific_time = pytz.timezone('America/Los_Angeles').localize(time_obj)
... new_york_time = pacific_time.astimezone(pytz.timezone('America/New_York'))
... nytimes.append(new_york_time.strftime('%m/%d/%Y %H:%M')) # Converting time object back to string
... else:
... nytimes.append ('-')
... return nytimes
Finally, insert the converted time to your dataframe.
>>> df.insert (3, 'SiteLog_NY', date_conversion (df), True)
>>> df
Office Design SiteLog SiteLog_NY Duration
0 DQFEMOZM - 2141 ZMI_PE 6/28/2019 7:59 06/28/2019 10:59 6
1 UTUXMW - 1046 ZMI_PE 6/28/2019 7:47 - 5
2 YMTSZUXXQN - 1031 ZMI_PE 6/27/2019 4:58 - 2
3 DQFEMOZM - 2144 ZMI_PE 6/27/2019 4:52 - 2

Python 2.7. BeautifulSoup not opening HTML

I have some weird problem with BeautifulSoup. I'm trying to read this website: http://lol.esportswikis.com/wiki/2015_International_Wild_Card_Invitational/Match_History and all I get is empty list. But if I try different site like: http://lol.esportswikis.com/wiki/Season_1_World_Championship/Match_History it works like a charm. Any idea whats the problem?
From chrome inspector tool I get the same code for 1st line of the table for both websites, so whats the problem?
#mw-content-text > table.wikitable > tbody > tr:nth-child(3)
mw-content-text > table.wikitable > tr #I use this
Even If I try to open just the wikitable
url = 'insert url here'
con = urllib2.urlopen(url)
HTML = con.read()
soup = BeautifulSoup(HTML, 'html.parser')
stuff = soup.select('#mw-content-text > table.wikitable')
print stuff
It just prints a empty list

import requests
from bs4 import BeautifulSoup
r = requests.get('http://lol.esportswikis.com/wiki/2015_International_Wild_Card_Invitational/Match_History')
soup = BeautifulSoup(r.text , 'lxml')
row_text=[]
for row in soup.select('.wikitable tr')[2:]:
print(row.get_text(' ',strip=True).replace(' • ','•'))
i don't have 2.7 env, so i use requests+bs4, but it's amost the same
i separate each cell by space in each row in case you want to split it
out:
2015-04-25 5.6 Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge Yang•Revolta•Tockers•micaO•Jockster 36:54 k k - - - - - - SB - -
2015-04-25 5.6 Yang•Revolta•Tockers•micaO•Jockster Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge 53:47 k k - - - - - - SB - -
2015-04-25 5.6 Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge Yang•Revolta•Tockers•micaO•Jockster 53:40 k k - - - - - - SB - -
2015-04-25 5.6 Yang•Revolta•Tockers•micaO•Jockster Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge 32:30 k k - - - - - - SB - -
2015-04-24 5.6 WarL0cK•007x•G4•Moss•Lloyd Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge 45:53 k k - - - - - - SB - -
2015-04-24 5.6 Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge WarL0cK•007x•G4•Lloyd•Moss 34:16 k k - - - - - - SB - -
2015-04-24 5.6 WarL0cK•007x•G4•Lloyd•Moss Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge 39:39 k k - - - - - - SB - -
2015-04-24 5.6 Yang•Revolta•Tockers•micaO•Jockster Smurf•Symphony•Kira•LeX•Dimonko 36:58 k k - - - - - - SB - -
2015-04-24 5.6 Smurf•Symphony•Kira•LeX•Dimonko Yang•Revolta•Tockers•micaO•Jockster 32:03 k k - - - - - - SB - -
2015-04-24 5.6 Yang•Revolta•Tockers•micaO•Jockster Smurf•Symphony•Kira•LeX•Dimonko 24:06 k k - - - - - - SB - -
2015-04-24 5.6 Smurf•Symphony•Kira•LeX•Dimonko Yang•Revolta•Tockers•micaO•Jockster 38:01 k k - - - - - - SB - -
2015-04-23 5.6 BonziN•Astarore•Ceros•Yutapongo•KazuXD Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge 32:01 k k - - - - - - SB - -
2015-04-23 5.6 WarL0cK•007x•G4•Lloyd•Moss Swip3rR•Spookz•Swiffer•Raydere•Rosey 35:05 k k - - - - - - SB - -
2015-04-23 5.6 Yang•Revolta•Tockers•micaO•Jockster Helior•Juliostito•Regi•Zeicro•BearJew 18:40 k k - - - - - - SB - -
2015-04-23 5.6 Swip3rR•Spookz•Swiffer•Raydere•Rosey BonziN•Astarore•Ceros•Yutapongo•KazuXD 36:26 k k - - - - - - SB - -
2015-04-23 5.6 Helior•Juliostito•Regi•Zeicro•BearJew WarL0cK•007x•G4•Lloyd•Moss 28:26 k k - - - - - - SB - -
2015-04-22 5.6 Smurf•Symphony•Kira•LeX•Dimonko Yang•Revolta•Tockers•micaO•Jockster 37:03 k k - - - - - - SB - -
2015-04-22 5.6 BonziN•Astarore•Ceros•Yutapongo•KazuXD Helior•Juliostito•Regi•Zeicro•BearJew 34:25 k k - - - - - - SB - -
2015-04-22 5.6 Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge Smurf•Symphony•Kira•LeX•Dimonko 40:49 k k - - - - - - SB - -
2015-04-22 5.6 Yang•Revolta•Tockers•micaO•Jockster WarL0cK•007x•G4•Lloyd•Moss 29:30 k k - - - - - - SB - -
2015-04-22 5.6 Smurf•Symphony•Kira•LeX•Dimonko Swip3rR•Spookz•Swiffer•Raydere•Rosey 29:42 k k - - - - - - SB - -
2015-04-22 5.6 Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge Yang•Revolta•Tockers•micaO•Jockster 37:33 k k - - - - - - SB - -
2015-04-22 5.6 BonziN•Astarore•Ceros•Yutapongo•KazuXD WarL0cK•007x•G4•Lloyd•Moss 27:38 k k - - - - - - SB - -
2015-04-22 5.6 Helior•Juliostito•Regi•Zeicro•BearJew Smurf•Symphony•Kira•LeX•Dimonko 38:38 k k - - - - - - SB - -
2015-04-21 5.6 Yang•Revolta•Tockers•micaO•Jockster BonziN•Astarore•Ceros•Yutapongo•KazuXD 35:52 k k - - - - - - SB - -
2015-04-21 5.6 Swip3rR•Spookz•Swiffer•Raydere•Rosey Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge 44:25 k k - - - - - - SB - -
2015-04-21 5.6 WarL0cK•007x•G4•Lloyd•Moss Smurf•Symphony•Kira•LeX•Dimonko 45:10 k k - - - - - - SB - -
2015-04-21 5.6 Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge Helior•Juliostito•Regi•Zeicro•BearJew 35:51 k k - - - - - - SB - -
2015-04-21 5.6 Swip3rR•Spookz•Swiffer•Raydere•Rosey Yang•Revolta•Tockers•micaO•Jockster 43:21 k k - - - - - - SB - -
2015-04-21 5.6 WarL0cK•007x•G4•Lloyd•Moss Thaldrin•Theokoles•Energy•Nardeus•Dumbledoge 29:40 k k - - - - - - SB - -
2015-04-21 5.6 Smurf•Symphony•Kira•LeX•Dimonko BonziN•Astarore•Ceros•Yutapongo•KazuXD 30:17 k k - - - - - - SB - -
2015-04-21 5.6 Helior•Juliostito•Regi•Zeicro•BearJew Swip3rR•Spookz•Swiffer•Raydere•Rosey 43:52 63.3k 16 3 2 81.4k 22 11 3 -18.1k ▼ 18.1k -6 ▼ 6 -8 ▼ 8 -1 ▼ 1 - - SB - -

Convert python pandas rows to columns

Decade difference (kg) Version
0 1510 - 1500 -0.346051 v1.0h
1 1510 - 1500 -3.553251 A2011
2 1520 - 1510 -0.356409 v1.0h
3 1520 - 1510 -2.797978 A2011
4 1530 - 1520 -0.358922 v1.0h
I want to transform the pandas dataframe such that the 2 unique enteries in the Version column are transfered to become columns. How do I do that?
The resulting dataframe should not have a multiindex

In [28]: df.pivot(index='Decade', columns='Version', values='difference (kg)')
Out[28]:
Version A2011 v1.0h
Decade
1510 - 1500 -3.553251 -0.346051
1520 - 1510 -2.797978 -0.356409
1530 - 1520 NaN -0.358922
or
In [31]: df.pivot(index='difference (kg)', columns='Version', values='Decade')
Out[31]:
Version A2011 v1.0h
difference (kg)
-3.553251 1510 - 1500 None
-2.797978 1520 - 1510 None
-0.358922 None 1530 - 1520
-0.356409 None 1520 - 1510
-0.346051 None 1510 - 1500
both satisfy your requirements.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Function returns wrong value - python

Related

Slow processing of Python list

Inconsistent index value in re module

Change timezone based on data from another column

Python 2.7. BeautifulSoup not opening HTML

Convert python pandas rows to columns

Categories

Resources