I have a pandas dataframe (df2) with about 160,000 rows. I'm trying to change some of the values in a column (url).
The strings in this column have lengths between 108 and 150 characters. If the string is not 108 characters, I want to replace it with the same string, cutting off the last 10 characters. IF the string is 108 characters. I want to leave it alone. Please note that i'm not trying to make every string 108 characters, I'm just trying to cut off the last 10 characters of any string that isn't 108 characters.
example: len(s) = 114, replace with s[:-10]
I built a function that will do this, but it's insanely slow, probably because it rebuilds the dataframe in each loop.
for i in df2.url:
if len(i) != 108:
new_i = i[:-10]
df2 = df2.replace(i,new_i)
There has to be a faster way to do this, but I haven't been able to figure out how. I would love the expertise of someone more versed in pandas.
Below is an example of 200 rows of the column I'm trying to change:
['https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1301108?gameHash=bde58669fc59c853&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291187?gameHash=f7fcd2d6ca775fb5&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291192?gameHash=005335984c8f8a3a&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1301128?gameHash=fcbd2630c0faec49&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1301159?gameHash=9a7726176fdabfde&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1301169?gameHash=5d816e6d30d2b659&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1301183?gameHash=396641afdcdd99d9&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT02/1271494?gameHash=bd51798e1358c47f',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130153?gameHash=00a7861ac0a23aef',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT02/1271495?gameHash=0d828bbc9aa9996c',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT02/1271497?gameHash=bd4810bb801abf24',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130166?gameHash=1cff679b64acb047',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130177?gameHash=1f92cbefd9a965e0',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT02/1271500?gameHash=abbdae6c3e7b4006',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT02/1271505?gameHash=7c970a84e132a578',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130182?gameHash=ccb50f6e86e4c3df',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130193?gameHash=0995997660a65721',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1301262?gameHash=c594a9a52f46cc50',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130196?gameHash=31553f5bb6ba4420',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1301270?gameHash=5b3babb5d392d78d',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130201?gameHash=3d2aa031c17d90ae',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1301290?gameHash=31ce80069fdbc873',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130210?gameHash=91c7b22cded939ff',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1301305?gameHash=3f8d664b3b988446',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130221?gameHash=a8580ee66ffbb525',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291406?gameHash=5220923eb35c42c6',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291426?gameHash=83c7c51530ea074e',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291442?gameHash=28f7b485f710168f',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291458?gameHash=49cc14d02ccd0674',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291470?gameHash=f087c853097c2dd9',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT02/1261474?gameHash=e6c01a288de5dc41',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130229?gameHash=1489421028163983',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT02/1261475?gameHash=c984e795d6406cd5',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130243?gameHash=5491d110de253089',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT02/1261482?gameHash=f2283324f82caa66&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130253?gameHash=f8e39ae785d11c0c',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130264?gameHash=a98718c088ce663c',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT02/1261488?gameHash=6517011920487fbf&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291651?gameHash=5ec1b3473060dfd2',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291682?gameHash=a8f2c06d04117279',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291703?gameHash=cfb2d078f289825c',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291737?gameHash=cf67a15df43c2bb2',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291748?gameHash=7a3c085cf703d7bd',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291789?gameHash=51e5ed28085fd299',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291812?gameHash=e540d208bbc69bb3',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291835?gameHash=a75ab48a22470022',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1291845?gameHash=2eab12f8ffd0dfd0',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130294?gameHash=ecf040ad60fa9726&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130299?gameHash=499a21480080a722&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130306?gameHash=d0e60bf49b6bf008&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1292296?gameHash=3db885bd11a047bc',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1292315?gameHash=2ecf71aaea031312',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1292329?gameHash=5ed85b948b32b8e8',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1292341?gameHash=7335d6ca06763dc0&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1292345?gameHash=6f86444cce429244',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1292348?gameHash=c6a4eec48810e8d5&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1292353?gameHash=6db57c090ed235bd&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1292354?gameHash=79845cdf9a6e88db',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120429?gameHash=436739b9e99a246e&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120436?gameHash=58bc4281a76534f3&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120440?gameHash=4b74592ff226c39f&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120447?gameHash=9358d210749ab778&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1292579?gameHash=14865e88bd1e30a7',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1292607?gameHash=0ae34d7f67620dc4',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1292635?gameHash=f94944bb4f061f0d',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1292648?gameHash=1338dde99c71877f&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1302501?gameHash=f71748ae9cad5866&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1302519?gameHash=672c1377c3d37ed0&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1302531?gameHash=49cf9a8f3942b9c8&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1302595?gameHash=314d39ea940b354f',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1302628?gameHash=0ab39ec364a3ff5b',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1302635?gameHash=5625553825f5994e',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1302651?gameHash=555c7cd73dff952d',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1292960?gameHash=e3ce73c142354517',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1292974?gameHash=ab79b8f6f354bc0b',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1302827?gameHash=6a1a5de57a7ce6b9&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1302855?gameHash=f9144d0822d68632&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1302881?gameHash=369cd071defeadd9&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1302906?gameHash=c65d2e76e9aa721e&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130488?gameHash=411522a3de69bb79',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130489?gameHash=51c4c81c13a484c7',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130496?gameHash=9575986535e4f4c2',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293312?gameHash=8e2209227e28843b',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120557?gameHash=f5bec07774ed5a5e',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293319?gameHash=762cb3a92744846f',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130515?gameHash=548d7e528ef1f81e',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293370?gameHash=8a70038d2eba61de',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293393?gameHash=841d85edbfa78057',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130518?gameHash=6764d64a5ef8377e&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120578?gameHash=838a1db0f44411c8&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120583?gameHash=c542c3368048efd6&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120585?gameHash=925d9c523a0b0bdb&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293765?gameHash=53412e36eb2eab86',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1303478?gameHash=8df5ef3d826ad211&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1303509?gameHash=d0849b1ba82d4826&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293812?gameHash=48d825f1bb110b55',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1303533?gameHash=3a712b015a672d8d&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293850?gameHash=0a29fdee10ed35d0',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293885?gameHash=1ffaffd98da7e806',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1303581?gameHash=2bf61273d44c302f&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293897?gameHash=77ccf507e1eaa05c',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293899?gameHash=aa93723cded96f3b',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293901?gameHash=f5fb660360f96ad6',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293909?gameHash=245dbdf428788434',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1303619?gameHash=2e2f2ff9c6a32595&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1303626?gameHash=3bba86d0f9ff1d11&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293929?gameHash=f4b6f53e68bbbc86',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1303641?gameHash=25ffa91aeb9ed707',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293950?gameHash=e2f3a99412844d36',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1303655?gameHash=4ff2ebbe72e635bb',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1293964?gameHash=10bd6ec239231196',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130548?gameHash=afd267703d3cbbb1&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1303666?gameHash=a30e98d241d22eef',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130553?gameHash=f4360fb632593491&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130560?gameHash=e1e5bae936585a24&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120607?gameHash=f4b702f689f87c90',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130563?gameHash=43a7c73ecd281a63&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120622?gameHash=c87f08d06f392f3f',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120629?gameHash=6b39ee929c2ebc47',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120638?gameHash=eb17c2013b9ee77d',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120649?gameHash=aab6f321110ef3ed',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1294174?gameHash=8f5cb3f02bf790d7&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1303861?gameHash=02847551947ca67d&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1294191?gameHash=e574ac58bbe81abb&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1303876?gameHash=e733bc45e47f4856&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1303904?gameHash=d8aac7332b9edfe8',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1294233?gameHash=6762c2c72bc47359&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1303974?gameHash=28f566b2fa35a32a&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1294260?gameHash=246841d34c9660aa&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1303999?gameHash=619d5a2d571a1b01&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304020?gameHash=99508a2da285eb4c&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304032?gameHash=30e5b243a407326c&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304035?gameHash=f8e3702e77f87cc7',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304054?gameHash=db39c4bcb7c2320e&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304075?gameHash=a3d0d6acfb8b92f1&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304079?gameHash=8339c23d8d925f8b',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304081?gameHash=d3e5c8f0270ce96f&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304111?gameHash=64ded61e41c18ccd',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304122?gameHash=bf7e80351592ce98',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304132?gameHash=ff37582431bd7e7b&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130611?gameHash=a099e1df984018a1',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304158?gameHash=62b9c13c8cecf652',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1294417?gameHash=746905a629b8f374',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130621?gameHash=9d171c9622870a7b',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304165?gameHash=c34ae80c4ee8c7bd',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304169?gameHash=ee6bc6a087a6bc36&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304193?gameHash=fe234e8ca7d2343f',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130630?gameHash=b1b183ad3374db06',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304217?gameHash=b29c2b7461c7700f',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304223?gameHash=7c70a52e69b01c56',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130643?gameHash=15bb88ac79a622a1&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120705?gameHash=a6532b3af6accaf2',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304264?gameHash=f5e69d8e2f6bae5e',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1120711?gameHash=79659ad2d107f0d9',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304282?gameHash=f9ea42cec97e930f',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130676?gameHash=f3e34e47140460ff',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304302?gameHash=617f3af3e7d2ab4d',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130696?gameHash=532d412c3f38c0c5',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304324?gameHash=01bf1a7465a412ba',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130709?gameHash=1491ee8228ad66ec',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304356?gameHash=fc584c5143087c0b',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304374?gameHash=175112ba57cbce5e',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304391?gameHash=72ad86120a14eb54',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304399?gameHash=2536b98ac19e617d',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304412?gameHash=a30a480459e9151c',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1294710?gameHash=0d01dd80aa803997',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1294725?gameHash=3ae63821918e2b43',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1294738?gameHash=5fd20a0eec2c86f4',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304487?gameHash=0d5d1e4e719e8c46&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130723?gameHash=c5113e7f25839c2e&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304504?gameHash=fda4895c0bea1e8a&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1294750?gameHash=fa9335e1a61165a5',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130734?gameHash=88de231d14ea4b07&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304521?gameHash=b70af7bde6c54520&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1294784?gameHash=7d1bf4754cda9b46',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130739?gameHash=4ddb470392dd9248&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304544?gameHash=b14635dac9add7b4&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1294813?gameHash=51f4579db6e7049f',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130748?gameHash=8f544ac73c53a606&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304574?gameHash=a5dcde67b90f29e3&tab=overview',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304609?gameHash=7a5a6778a7074f09',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304635?gameHash=4cda5972cf8dd6bd',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304671?gameHash=67e29eccbbc8f667',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304691?gameHash=1572b2bb76b73da1',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304700?gameHash=b50bc9265ac35f9f',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304739?gameHash=b80bb99cbce5bd71',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304768?gameHash=6ed6e9d7108f27e0',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304785?gameHash=13054e8f14fcae76',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304794?gameHash=b4c27881f0c4481c',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304884?gameHash=29e8f7f002108b46',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304889?gameHash=7774d22665d9526d',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304894?gameHash=bac5c580914f7aaa',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1304928?gameHash=b8b029b3d4002fbc',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT04/1130793?gameHash=a3f6e45612b56302',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1295356?gameHash=731afc76037bd245',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1295369?gameHash=e743122ca08b77d8',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1295383?gameHash=072ee1028f03f4c9',
'http://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1295402?gameHash=560c984fc1ba1168',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1295440?gameHash=32cdbf5ce1441159&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1295467?gameHash=bcc21c92fa78e889&tab=overview',
'https://matchhistory.na.leagueoflegends.com/en/#match-details/ESPORTSTMNT01/1295489?gameHash=95386dea5cf1ab09&tab=overview']
Basic Solution
The below solution makes use of a lambda function defined within a call to pandas.DataFrame.apply().
df['url'] = df['url'].apply(lambda x: x if len(x) == 108 else x[:-10])
Here, each value within df['url'] (x) remains the same if len(x) == 108, otherwise it is updated to be x[:-10].
Handling Exceptions
The below solution is similar to that provided above, however in this case some basic exception handling has been implemented within the url_trim() function called by pandas.DataFrame.apply().
This is more robust than the first solution and will not halt code execution when an exception is thrown within pandas.DataFrame.apply() due to unexpected values within df['url'] rows, in these cases the value is simply left unchanged - for example if numpy.nan is used for null values.
def url_trim(x):
try:
if len(x) != 108:
return x[:-10]
else:
return x
except:
return x
df['url'] = df['url'].apply(lambda x: url_trim(x))
The following code will check the length of the url columns and prune of last 10 characters of the string if the string is below 108.The modified url will be included in modified_url column.
# Get string length
df["string_length"] = df["url"].astype(str).str.len()
# Create a filter based on string length
filter_length = df["string_length"]<108
# Extract string for the filter
df["modified_url"]=df["url"]
df.loc[filter_length,"modified_url"]=df[filter_length]["url"].astype(str).str[:-10]
I have a SpaCy dependency tree made by this code:
from spacy import displacy
text = "We could say to them that if in fact that's all there is, then we could, Oh, we can do something."
print(displacy.render(nlp(text), style='dep', jupyter = True, options = {'distance': 120}))
That prints out this:
SpaCy determines that this entire string is connected in a dependency tree. What I am trying to figure out is how to discern how direct or indirect the connection is between a word and the next word. For example, looking at the first 3 words:
'We' is connected to the next word 'could', because it is directly connected to 'say', which is directly connected to 'could'. Therefor, it is 2 connection points away from the next word.
'could' is directly connected to 'say'. There for it is 1 connection point away from the start.
and so on.
Essentially, I want to make a df that would look like this:
word connection_points_to_next_word
We 2
could 1
say 1
...
I'm not sure how to achieve this. As SpaCy makes this graph, I'm sure there is some efficient way to calculate the number of vertices required to connect adjacent nodes, but all of SpaCy's tools I've found, such as:
token.lefts
token.rights
token.subtree
token.children
more here https://spacy.io/api/token
Include connection information, but not how direct this connection is. Any ideas how to get closer to this problem?
Using the networkx library, we can build an undirected graph from the edgelist of token-children relationships. I am using the index of the token in the document as a unique identifier so that repeat words are treated as separate nodes.
import spacy
import networkx as nx
nlp= spacy.load('en_core_web_lg')
text = "We could say to them that if in fact that's all there is, then we could, Oh, we can do something."
doc = nlp(text)
edges = []
for tok in doc:
edges.extend([(tok.i, child.i) for child in tok.children])
The shortest path between neighboring tokens can be calculated as below:
for idx, _ in enumerate(doc):
if idx < len(doc)-1:
print(doc[idx], doc[idx+1], nx.shortest_path_length(graph,source=idx, target=idx+1))
Output:
We could 2
could say 1
say to 1
to them 1
them that 4
that if 3
if in 2
in fact 1
fact that 3
that 's 1
's all 1
all there 2
there is 1
is , 4
, then 2
then we 2
we could 2
could , 2
, Oh 2
Oh , 2
, we 2
we can 2
can do 1
do something 1
something . 3
I want to scrape the Interactions table from the Entrez Gene page.
The Interactions table is populated from a web server and when I tried to use the XML package in R, I could get the Entrez gene page, but the Interactions table body was empty (it had not been populated by the web server).
Dealing with the web server issue in R may be solvable (and I'd love to see how), but it seemed Biopython was an easier path.
I put together the following, which gives me what I want for an example gene:
# Pull the Entrez gene page for MAP1B using Biopython
from Bio import Entrez
Entrez.email = "jamayfie#vasci.umass.edu"
handle = Entrez.efetch(db="gene", id="4131", retmode="xml")
record = Entrez.read(handle)
handle.close()
PPI_Entrez = []
PPI_Sym = []
# Find the Dictionary that contains the Interaction table
for x in range(1, len(record[0]["Entrezgene_comments"])):
if ('Gene-commentary_heading', 'Interactions') in record[0]["Entrezgene_comments"][x].items():
for y in range(0, len(record[0]["Entrezgene_comments"][x]['Gene-commentary_comment'])):
EntrezID = record[0]["Entrezgene_comments"][x]['Gene-commentary_comment'][y]['Gene-commentary_comment'][1]['Gene-commentary_source'][0]['Other-source_src']['Dbtag']['Dbtag_tag']['Object-id']['Object-id_id']
PPI_Entrez.append(EntrezID)
Sym = record[0]["Entrezgene_comments"][x]['Gene-commentary_comment'][y]['Gene-commentary_comment'][1]['Gene-commentary_source'][0]['Other-source_anchor']
PPI_Sym.append(Sym)
# Return the desired values: I want the Entrez ID and Gene symbol for each interacting protein
PPI_Entrez # Returns the EntrezID
PPI_Sym # Returns the gene symbol
This code works, giving me what I want. But I think its ugly, and am concerned that if the Entrez gene page changes slightly in format it will break the code. In particular, there must be a better way to extract the desired information than specifying the full path, as I do with:
record[0]["Entrezgene_comments"][x]['Gene-commentary_comment'][y]['Gene-commentary_comment'][1]['Gene-commentary_source'][0]['Other-source_anchor']
But I cannot figure out how to search through a dictionary of dictionaries without specifying each level I want to descend. When I try functions like find(), they operate on the next level down, but not all the way to the bottom.
Is there a wildcard symbol, a Python equivalent of "//", or a function I can use to get to ['Object-id_id'] without naming the full path? Other suggestions for cleaner code are also appreciated.
I'm not sure about xpath in Python, but if the code works, then I would not worry removing full paths or if Entrez Gene XML will change. Since you first tried R, you could get the XML using a system call to Entrez Direct below or a package like rentrez.
doc <- xmlParse( system("efetch -db=gene -id=4131 -format xml", intern=TRUE) )
Next, get the nodes corresponding to rows in the table at http://www.ncbi.nlm.nih.gov/gene/4131#interactions
x <- getNodeSet(doc, "//Gene-commentary_heading[.='Interactions']/../Gene-commentary_comment/Gene-commentary" )
length(x)
[1] 64
x[1]
x[50]
Try the easy stuff first
xmlToDataFrame(x[1:4])
Gene-commentary_type Gene-commentary_text Gene-commentary_refs Gene-commentary_source Gene-commentary_comment
1 18 Affinity Capture-MS 24457600 BioGRID110304BioGRID 255BioGRID110304255GeneID8726EEDBioGRID114265
2 18 Reconstituted Complex 20195357 BioGRID110304BioGRID 255BioGRID110304255GeneID2353FOSBioGRID108636
3 18 Reconstituted Complex 20195357 BioGRID110304BioGRID 255BioGRID110304255GeneID1936EEF1DBioGRID108256
4 18 Affinity Capture-MS 2345592220562859 BioGRID110304BioGRID 255BioGRID110304255GeneID6789STK4BioGRID112665
Gene-commentary_create-date Gene-commentary_update-date
1 2014461120 201410513330
2 201312810490 201410513330
3 201312810490 201410513330
4 20137710360 201410513330
Some tags like text, refs, source, and dates should be easy to parse
sapply(x, function(x) paste( xpathSApply(x, ".//PubMedId", xmlValue), collapse=", "))
I'm not sure about the comments or how Products, Interactants and Other Genes listed in the table are stored in the XML, but I get one or three symbols and three ids for each node here.
sapply(x, function(x) paste( xpathSApply(x, ".//Gene-commentary_comment//Other-source_anchor", xmlValue), collapse=" + "))
sapply(x, function(x) paste( xpathSApply(x, ".//Gene-commentary_comment//Object-id_id", xmlValue), collapse=" + "))
Finally, since I think Entrez Gene just copies IntAct and BioGrid, you could try those sites too. Biogrid has a really simple Rest service, but you have to register for a key.
url <- "http://webservice.thebiogrid.org/interactions?geneList=MAP1B&taxId=9606&includeHeader=TRUE&accesskey=[ your ACCESSKEY ]"
biogrid <- read.delim(url)
dim(biogrid)
[1] 58 24
head(biogrid[, c(8:9,12)])
Official.Symbol.Interactor.A Official.Symbol.Interactor.B Experimental.System
1 ANP32A MAP1B Two-hybrid
2 MAP1B ANP32A Two-hybrid
3 RASSF1 MAP1B Affinity Capture-Western
4 RASSF1 MAP1B Two-hybrid
5 ANP32A MAP1B Affinity Capture-Western
6 GAN MAP1B Affinity Capture-Western
Environment: Win 7; Python 2.76
Hello all…I need to pick up some texts from a string, which looks like:
“C-603WallWizard45256CCCylinders:2HorizontalOpposedBore:1-1/4Stroke:1-1/8Length: SingleVerticalBore:1-111Height:6Width:K-720Cooling:AirWeight:6LBS1.5H.P.#54500RPMC-60150ccGasEngineCylinder:4VerticalInlineBore:1Stroke:1Cycle:4Weight:6-1/2LBSLength:10Width: :AirLength16Cooling:AirLength:5Width:4L-233Height:6Weight: 4TheBlackKnightc-609SteamEngineBore:11/16Stroke:11/16Length:3Width:3Height:4TheChallengerC-600Bore:1Stroke:1P-305Weight:18LBSLength:12Width:7Height:8C-606Wall15ccGasEngineJ-142Cylinder:SingleVerticalBore:1Stroke:1-1/8Cooling:1Stroke:1-1/4HP:: /4Stroke:1-7/:6Width:6Height:9Weight:4LBS1.75H.P.#65200RPM”
The wanted are:
I. Combinations of 1 letter + 3 numbers, joint by ‘-’. Such as: C-603, K-720, C-606 etc
II. Combinations of 5 continuous numbers. Such as: 45256, 54500, 60150, 65200 etc
My idea is to:
slice the string into every pieces, like ‘C’, ‘-’, ‘6’, ‘0’, ‘3’, … ‘R’, ‘P’, ‘M’
combine them into 4 digits and 5 digits, like ‘C-60’, ‘-603’, ‘603W’… and ‘C-603W’, ‘-603W’ , ‘603Wa’
pick up the ones fits the criteria I and II
sounds like a way? If yes, what commands I can use in the processes?
Thanks.
Going with regular expressions is one way to do it:
>>> data = '''C-603WallWizard45256CCCylinders:2HorizontalOpposedBore:1-1/4Stroke:1-1/8Length: SingleVerticalBore:1-111Height:6Width:K-720Cooling:AirWeight:6LBS1.5H.P.#54500RPMC-60150ccGasEngineCylinder:4VerticalInlineBore:1Stroke:1Cycle:4Weight:6-1/2LBSLength:10Width: :AirLength16Cooling:AirLength:5Width:4L-233Height:6Weight: 4TheBlackKnightc-609SteamEngineBore:11/16Stroke:11/16Length:3Width:3Height:4TheChallengerC-600Bore:1Stroke:1P-305Weight:18LBSLength:12Width:7Height:8C-606Wall15ccGasEngineJ-142Cylinder:SingleVerticalBore:1Stroke:1-1/8Cooling:1Stroke:1-1/4HP:: /4Stroke:1-7/:6Width:6Height:9Weight:4LBS1.75H.P.#65200RPM'''
>>> one_letter_three_numbers = re.compile(r'.\-\d{3}', re.IGNORECASE)
>>> re.findall(one_letter_three_numbers, data)
['C-603', '1-111', 'K-720', 'C-601', 'L-233', 'c-609', 'C-600', 'P-305', 'C-606', 'J-142']
>>> five_continuous = re.compile(r'\d{5}', re.IGNORECASE)
>>> re.findall(five_continuous, data)
['45256', '54500', '60150', '65200']