Hello I would like to remove the time from this list of dates generated by an API:
['2020-07-31 00:00:00.000', '2020-04-30 04:00:00.000', '2020-01-28 05:00:00.000', '2019-10-30 04:00:00.000', '2019-07-30 04:00:00.000', '2019-04-30 04:00:00.000', '2019-01-29 05:00:00.000']
I want the list to look like this:
['2020-07-31', '2020-04-30', '2020-01-28', '2019-10-30', '2019-07-30', '2019-04-30, '2019-01-29']
The thing is I have no idea how to do this task and would like some help.
You can split the strings and use the first value of that split in a comprehension
dates = [date.split()[0] for date in dates]
How I would do it - assuming that orig is your original list
new_list = [item.split()[0] for item in orig]
This builds a new list from the original - by splitting each item in the original on whitespace, and then taking the first item in that split entry.
Related
I would like to use something like vlook-up/map function in python.
I have only a portion of entire name of some companies. i would like to know if the company is into the dataset, as the follow example.
Thank you
I can recreate the results checking one list against another. It's not very clear or logical what your match criteria are. "john usa" is a successful match with "aviation john" on the basis that "john" appears in both. But would "john usa" constitute a match with "usa mark sas" since "usa" appears in both? What about hyphens, comma's, etc?
It would help if this was cleared up.
In any case, I hope the following will help, good luck:-
#create two lists of tuples based on the existing dataframes.
check_list = list(df_check.to_records(index=False))
full_list = list(df_full.to_records(index=False))
#create a set - entries in a set are unique
results=set()
for check in check_list: #for each record to check...
for search_word in check[0].split(" "): #take the first column and split it into its words using space as a delimiter
found=any(search_word in rec[0] for rec in full_list) #is the word a substring of any of the records in full list? True or False
results.add((check[0], found)) #add the record we checked to the set with the result (the set avoids duplicate entries)
#build a dataframe based on the results
df_results=df(results, columns=["check", "found"])
df1['in DATASET'] = df1['NAME'].isin(df2['FULL DATASET'])
Given a list containing dates as part of a string:
['2021/08/01/EUR_USD.json',
'2021/08/02/EUR_USD.json',
'2021/08/03/EUR_USD.json',
'2021/08/04/EUR_USD.json',
'2021/08/05/EUR_USD.json',
'2021/08/06/EUR_USD.json',
'2021/08/08/EUR_USD.json',
'2021/08/09/EUR_USD.json',
'2021/08/10/EUR_USD.json',
'2021/08/11/EUR_USD.json',
'2021/08/12/EUR_USD.json',
'2021/08/13/EUR_USD.json',
'2021/08/15/EUR_USD.json']
I want to filter and return only the date and only for Sundays.
This can be done with a list comprehension as follows but I would be interested in other ways of doing this:
def dayOfWeek(l):
subset = ['/'.join(i.split("/")[0:3])
for i in l
if datetime.strptime('/'.join(
i.split("/")[0:3]),'%Y/%m/%d' ).weekday() == 6 ]
return subset
['2021-08-01', '2021-08-08', '2021-08-15']
List-comprehension is fine, but another way is to use filter builtin
list(filter(lambda x:datetime.strptime(x.rsplit('/', 1)[0], '%Y/%m/%d').weekday()==6, l))
['2021/08/01/EUR_USD.json', '2021/08/08/EUR_USD.json', '2021/08/15/EUR_USD.json']
There are other things you can do:
Instead of splitting and joining the strings, you can use rsplit, with maxsplit=1, then take the first item after the split
You are unnecessarily converting / delimited date string to - delimited date string by '-'.join(i.split("/")[0:3]),'%Y-%m-%d' ) which is useless
You can use the calendar module with an assignment expression for a shorter comprehension:
import calendar as cl, re
l = ['2021/08/01/EUR_USD.json', '2021/08/02/EUR_USD.json', '2021/08/03/EUR_USD.json', '2021/08/04/EUR_USD.json', '2021/08/05/EUR_USD.json', '2021/08/06/EUR_USD.json', '2021/08/08/EUR_USD.json', '2021/08/09/EUR_USD.json', '2021/08/10/EUR_USD.json', '2021/08/11/EUR_USD.json', '2021/08/12/EUR_USD.json', '2021/08/13/EUR_USD.json', '2021/08/15/EUR_USD.json']
r = ['-'.join(d) for i in l if cl.weekday(*map(int, (d:=re.findall('\d+', i)))) == 6]
Output:
['2021-08-01', '2021-08-08', '2021-08-15']
How can I select rows from a dataframe using a list of words as a reference?
E.g. I have a dataframe, df_business, and each item in the last column is a string containing comma-separated categories, like this:
categories: "Restaurants, Burgers, Coffee & Tea, Fast Food, Food"
I tried this, but it only gives me the rows for the businesses containing ONLY the word coffee in their categories:
bus_int = df_business.loc[(df_business['categories'].isin(['Coffee']))]
How can I get the businesses containing my word even when it's present among others, as shown above?
What you want is the contains method:
bus_int = df_business.loc[df_business.categories.str.contains('Coffee', regex=False)]
(The default is to treat the supplied argument as a regular expression, which you don't need if you're simply looking for a substring like this.)
Just use - .isin(List_name/the string you want)
Example -
list = ['apple', 'mango', 'banana'] #creating a list
df[df['Fruits_name'].isin(list)]
# this will find all the rows with this 3 fruit name
I have a dataframe named final where I have a column named CleanedText in which i have user reviews(Text). A review is of multiple line. I have done preprocessing and removed all commas, fullstops,htmltags,etc. So the data looks like, Review1(row1): pizza extremely delicious delivery late. Just like this, i have 10000 reviews(corresponding to 10000 rows). Now I want a list of list where every review should be in a list. Ex: [['Pizza','extremely','delicious','delivery','late'],['Tommatos','rotten'......[]...[]].
This assumes you've truly stripped the text of all of the 'fun' stuff. Give this a shot.
fulltext = 'stuff with many\nlines and words'
text_array = [line.split() for line in fulltext.splitlines()]
Hi Everyone I have python to find origin of a word so I got result in list's How I want it to separate or split it with comma (,).
origin=ety.origins(wordtodo)
print(origin)
>>[Word(how, Middle English (1100-1500) [enm]), Word(haugr, Old Norse [non])]
in the result I want text inside (...) braket's and store into different variable
e.g.
forigin=(how, Middle English (1100-1500) [enm])
and
sorigin=(haugr, Old Norse [non])
forigin = repr(origin[0])[4:]
sorigin = repr(origin[1])[4:]
Author of ety here 🙂
ety.origins returns a list of Word objects.
Get properties of a Word with specific fields .word and .language, or use .pretty to get a string version of the word/language in format {word} ({lang}) - e.g. 'how (Middle English (1100-1500))'