I have a data frame like
query
-----------
[]
[(apple,10),(orange,15)]
[(apple,2),(orange,5)]
python is reading this as a string instead of a list because when I do
df['query'].apply(lambda x: len(x))
I get 2 instead of 0 for the first row. Is there a way to convert this to a list.
You can use apply():
df['query'] = df['query'].apply(lambda x: x.strip('[]').split(','))
os, by list comprehension:
df['query'] = [x.strip('[]').split(',') for x in df['query']]
or, use ast.literal_eval():
import ast
df['query'] = df['query'].apply(lambda x: ast.literal_eval(x))
or,
df['query'] = df['query'].apply(ast.literal_eval)
Related
I have a list of files like this:
my_list=['l.txt','PPT_6_202008062343HLC.txt','PPT_6_202008070522HLC.txt','PPT_12_202008062343HLC.txt','PPT_12_202008070522HLC.txt']
and I want to have a final list with the latest that files that begins with ppt_6 and ppt_12 and keep the other elements items, like this:
final_list=
['PPT_6_202008070522HLC.txt', 'PPT_12_202008070522HLC.txt', 'l.txt']
right now I'm doing this:
from datetime import datetime
now = datetime.now()
new_arc=[]
time_6=[]
time_12=[]
for i in my_list:
if i[4:5]=='6':
time_6.append(i)
elif i[4:5]=='1':
time_12.append(i)
else:
new_arc.append(i)
time_6 = [max(t for t in time_6 if datetime.strptime(t[-15:-3], '%Y%m%d%H%M') < now)]
time_12 = [max(t for t in time_12 if datetime.strptime(t[-15:-3], '%Y%m%d%H%M') < now)]
final_list=time_6+time_12+new_arc
is there a better way of doing this ?
The datetime format into these filenames allows you not to use datetime functions, alphabetical order is enough.
You can remove all items matching the two patterns and finally append the most recent of them, which are the maximum (alphabetically) elements.
p1 = [x for x in my_list if x.startswith("PPT_6")]
p2 = [x for x in my_list if x.startswith("PPT_12")]
result = [x for x in my_list if x not in p1 and x not in p2]
result.append(max(p1))
result.append(max(p2))
print(result)
Since the file names already have a date order, you could simply sort on them. Then group by the prefix (PPT_6 and PPT_12). Finally get the top row from each group.
from itertools import groupby
#get prefix up to nth _
def split_nth(text, n):
grp = text.split('_')
return '_'.join(grp[:n])
my_list =['l.txt','PPT_6_202008062343HLC.txt','PPT_6_202008070522HLC.txt',
'PPT_12_202008062343HLC.txt','PPT_12_202008070522HLC.txt']
sorted_list = sorted(my_list[1:], reverse=True)
groups = groupby(sorted_list, key=lambda x: split_nth(x, 2))
result = [next(v) for _, v in groups]
result.append(my_list[0])
The best I could come up with was this:
import re
my_list = [
'l.txt','PPT_6_202008062343HLC.txt','PPT_6_202008070522HLC.txt',
'PPT_12_202008062343HLC.txt','PPT_12_202008070522HLC.txt'
]
patterns = (re.compile("PPT_6"), re.compile("PPT_12"))
final_list = [sorted(list(filter(pattern.match, problem_list)))[0]
for pattern in patterns]
final_list += list(filter(re.compile("[^PPT]").match, problem_list))
Depending on how many file names you're going to be working with, I don't think it should be too bad.
I got a list of strings. Those strings have all the two markers in. I would love to extract the string between those two markers for each string in that list.
example:
markers 'XXX' and 'YYY' --> therefore i want to extract 78665786 and 6866
['XXX78665786YYYjajk', 'XXX6866YYYz6767'....]
You can just loop over your list and grab the substring. You can do something like:
import re
my_list = ['XXX78665786YYYjajk', 'XXX6866YYYz6767']
output = []
for item in my_list:
output.append(re.search('XXX(.*)YYY', item).group(1))
print(output)
Output:
['78665786', '6866']
import re
l = ['XXX78665786YYYjajk', 'XXX6866YYYz6767'....]
l = [re.search(r'XXX(.*)YYY', i).group(1) for i in l]
This should work
Another solution would be:
import re
test_string=['XXX78665786YYYjajk','XXX78665783336YYYjajk']
int_val=[int(re.search(r'\d+', x).group()) for x in test_string]
the command split() splits a String into different parts.
list1 = ['XXX78665786YYYjajk', 'XXX6866YYYz6767']
list2 = []
for i in list1:
d = i.split("XXX")
for g in d:
d = g.split("YYY")
list2.append(d)
print(list2)
it's saved into a list
This question already has answers here:
Splitting a list based on a delimiter word
(4 answers)
Closed 4 years ago.
I am trying to split a list that I have into individual lists whenever a specific character or a group of characters occur.
eg.
Main_list = [ 'abcd 1233','cdgfh3738','hryg21','**L**','gdyrhr657','abc31637','**R**','7473hrtfgf'...]
I want to break this list and save it into a sublist whenever I encounter an 'L' or an 'R'
Desired Result:
sublist_1 = ['abcd 1233','cdgfh3738','hryg21']
sublist_2 = ['gdyrhr657','abc31637']
sublist 3 = ['7473hrtfgf'...]
Is there a built in function or a quick way to do this ?
Edit: I do not want the delimiter to be in the list
Use a dictionary for a variable number of variables.
In this case, you can use itertools.groupby to efficiently separate your lists:
L = ['abcd 1233','cdgfh3738','hryg21','**L**',
'gdyrhr657','abc31637','**R**','7473hrtfgf']
from itertools import groupby
# define separator keys
def split_condition(x):
return x in {'**L**', '**R**'}
# define groupby object
grouper = groupby(L, key=split_condition)
# convert to dictionary via enumerate
res = dict(enumerate((list(j) for i, j in grouper if not i), 1))
print(res)
{1: ['abcd 1233', 'cdgfh3738', 'hryg21'],
2: ['gdyrhr657', 'abc31637'],
3: ['7473hrtfgf']}
Consider using one of many helpful tools from a library, i.e. more_itertools.split_at:
Given
import more_itertools as mit
lst = [
"abcd 1233", "cdgfh3738", "hryg21", "**L**",
"gdyrhr657", "abc31637", "**R**",
"7473hrtfgf"
]
Code
result = list(mit.split_at(lst, pred=lambda x: set(x) & {"L", "R"}))
Demo
sublist_1, sublist_2, sublist_3 = result
sublist_1
# ['abcd 1233', 'cdgfh3738', 'hryg21']
sublist_2
# ['gdyrhr657', 'abc31637']
sublist_3
# ['7473hrtfgf']
Details
The more_itertools.split_at function splits an iterable at positions that meet a special condition. The conditional function (predicate) happens to be a lambda function, which is equivalent to and substitutable with the following regular function:
def pred(x):
a = set(x)
b = {"L", "R"}
return a.intersection(b)
Whenever characters of string x intersect with L or R, the predicate returns True, and the split occurs at that position.
Install this package at the commandline via > pip install more_itertools.
#Polyhedronic, you can also try this.
>>> import re
>>> Main_list = [ 'abcd 1233','cdgfh3738','hryg21','**L**','gdyrhr657','abc31637','**R**','7473hrtfgf']
>>>
>>> s = ','.join(Main_list)
>>> s
'abcd 1233,cdgfh3738,hryg21,**L**,gdyrhr657,abc31637,**R**,7473hrtfgf'
>>>
>>> items = re.split('\*\*R\*\*|\*\*L\*\*', s)
>>> items
['abcd 1233,cdgfh3738,hryg21,', ',gdyrhr657,abc31637,', ',7473hrtfgf']
>>>
>>> output = [[a for a in item.split(',') if a] for item in items]
>>> output
[['abcd 1233', 'cdgfh3738', 'hryg21'], ['gdyrhr657', 'abc31637'], ['7473hrtfgf']]
>>>
>>> sublist_1 = output[0]
>>> sublist_2 = output[1]
>>> sublist_3 = output[2]
>>>
>>> sublist_1
['abcd 1233', 'cdgfh3738', 'hryg21']
>>>
>>> sublist_2
['gdyrhr657', 'abc31637']
>>>
>>> sublist_3
['7473hrtfgf']
>>>
I have a list like below - from this i have filter the tables that begin with 'test:SF.AcuraUsage_' (string matching)
test:SF.AcuraUsage_20150311
test:SF.AcuraUsage_20150312
test:SF.AcuraUsage_20150313
test:SF.AcuraUsage_20150314
test:SF.AcuraUsage_20150315
test:SF.AcuraUsage_20150316
test:SF.AcuraUsage_20150317
test:SF.ClientUsage_20150318
test:SF.ClientUsage_20150319
test:SF.ClientUsage_20150320
test:SF.ClientUsage_20150321
I am using this for loop but not sure why it does not work:
for x in list:
if(x 'test:SF.AcuraUsage_'):
print x
I tried this out:
for x in list:
alllist = x
vehiclelist = [x for x in alllist if x.startswith('geotab-bigdata-test:StoreForward.VehicleInfo')]
Still i get the error ' dictionary object has no attribute startswith'.
You shouldn't name your list list, since it overrides the built-in type list
But, if you'd like to filter that list using Python, consider using this list comprehension:
acura = [x for x in list if x.startswith('test:SF.AcuraUsage')]
then, if you'd like to output it
for x in acura:
print(x)
List comprehensions are good for that.
Get a list with all the items that begin with 'test:SF.AcuraUsage_' :
new_list = [x for x in list if x.startswith('test:SF.AcuraUsage_' ')]
Or the items that do not begin with 'test:SF.AcuraUsage_' :
new_list = [x for x in list if not x.startswith('test:SF.AcuraUsage_' )]
using re module:
import re
for x in list:
ret = re.match('test:SF.AcuraUsage_(.*)',x)
if ret:
print(re.group())
I have a question to print values in list.
import time
strings = time.strftime("%Y,%m,%d")
t = strings.split(',')
date = [int(x) for x in t]
print date
then result is
[2016,5,15]
But I want to print values in date like this
20160515
How can I fix it?
What's wrong with doing it like this:
>>> strings = time.strftime("%Y%m%d")
>>> strings
'20160515'
you must just change your code :
import time
strings = time.strftime("%Y%m%d") # delete ','
print strings
Why don't just do:
time.strftime("%Y%m%d")
On the other hand, if you are just looking for a way to concatenate elements of a list, use join:
In [110]: s = time.strftime("%Y,%m,%d")
In [111]: sl = s.split(',')
In [112]: ''.join(sl)
Out[112]: '20160515'