Compare a list with a date in pandas in Python - python

Hello I have this list:
b = [[2018-12-14, 2019-01-11, 2019-01-25, 2019-02-08, 2019-02-22, 2019-07-26],
[2018-06-14, 2018-07-11, 2018-07-25, 2018-08-08, 2018-08-22, 2019-01-26],
[2017-12-14, 2018-01-11, 2018-01-25, 2018-02-08, 2018-02-22, 2018-07-26]]
dtype: datetime64[ns]]
and I want to know if it's possible to compare this list of dates with another date. I am doing it like this:
r = df.loc[(b[1] > vdate)]
with:
vdate = dt.date(2018, 9, 19)
the output is correct because it select the values that satisfy the condition. But the problem is that I want to do that for all the list values. Something like:
r = df.loc[(b > vdate)] # Without [1]
but this get as an output an error as I expected.
I try some for loop and it seems like it works but I am not sure:
g = []
for i in range(len(b)):
r = df.loc[(b[i] > vdate)]
g.append(r)
Thank you so much for your time and any help would be perfect.

One may use the apply function as stated by #Joseph Developer, but a simple list comprehension would not require you to write the function. The following will give you a list of boolean telling you whether or not each date is greater than vdate :
is_after_b = [x > vdate for x in b]
And if you want to include this directly in your DataFrame you may write :
df['is_after_b'] = [ x > vdate for x in df.b]
Assuming that b is a column of df, which btw would make sure that the length of b and your DataFrame's columns match.
EDIT
I did not consider that b was a list of list, you would need to flatten b by using :
flat_b = [item for sublist in b for item in sublist]
And you can now use :
is_after_b = [x > vdate for x in flat_b]

if you want to go through the entire list just use the following method:
ds['new_list'] = ds['list_dates'].apply(function)
use the .apply () method to process your list through a function

Related

Python ( iteration problem ) with an exercice

The code :
import pandas as pd
import numpy as np
import csv
data = pd.read_csv("/content/NYC_temperature.csv", header=None,names = ['temperatures'])
np.cumsum(data['temperatures'])
printcounter = 0
list_30 = [15.22]#first temperature , i could have also added it by doing : list_30.append(i)[0] since it's every 30 values but doesn't append the first one :)
list_2 = [] #this is for the values of the subtraction (for the second iteration)
for i in data['temperatures']:
if (printcounter == 30):
list_30.append(i)
printcounter = 0
printcounter += 1
**for x in list_30:
substract = list_30[x] - list_30[x+1]**
list_2.append(substraction)
print(max(list_2))
Hey guys ! i'm really having trouble with the black part.
**for x in list_30:
substract = list_30[x] - list_30[x+1]**
I'm trying to iterate over the elements and sub stracting element x with the next element (x+1) but the following error pops out TypeError: 'float' object is not iterable. I have also tried to iterate using x instead of list_30[x] but then when I use next(x) I have another error.
for x in list_30: will iterate on list_30, and affect to x, the value of the item in the list, not the index in the list.
for your case you would prefer to loop on your list with indexes:
index = 0
while index < len(list_30):
substract = list_30[index] - list_30[index + 1]
edit: you will still have a problem when you will reach the last element of list_30 as there will be no element of list_30[laste_index + 1],
so you should probably stop before the end with while index < len(list_30) -1:
in case you want the index and the value, you can do:
for i, v in enumerate(list_30):
substract = v - list_30[i + 1]
but the first one look cleaner i my opinion
if you`re trying to find ifference btw two adjacent elements of an array (like differentiate it), you shoul probably use zip function
inp = [1, 2, 3, 4, 5]
delta = []
for x0,x1 in zip(inp, inp[1:]):
delta.append(x1-x0)
print(delta)
note that list of deltas will be one shorter than the input

Python How to use multiple for loops in list comprehension with conditions

I am still working on this below code, Code works perfectly fine. I am trying to reduce lines of code.
import calendar as c
def solve(first, last):
weekends = []
# x = [weekends.append(m) if c.weekday(y,m,1) == 4 and c.weekday(y,m,31) == 6 else 0 for m in [1,3,5,7,8,10,12] for y in range(first,last+1)]
for y in range(first,last+1):
for m in [1,3,5,7,8,10,12]:
if c.weekday(y,m,1) == 4 and c.weekday(y,m,31) == 6:
weekends.append(m)
return c.month_abbr[weekends[0]], c.month_abbr[weekends[len(weekends)-1]], len(weekends)
When Called for : solve(2016,2020)
This code returns the first month of 2016 which has 5 Fridays, Saturdays, Sundays; same for last month of 2020 and how many months satisfies this condition.
So OUTPUT is : ('Jan', 'May', 5)
The comment part of x variable is what I tried and that returns 0 and None(cause of else statement)
The order of statements in your x = ... is a bit messed up; your if should filter which values to include, not which of two alternative values to use. But also: Do not use append in a list comprehension to append to another list! Instead, the list comprehension itself, should be your result.
def solve(first, last):
weekends = [c.month_abbr[m] for y in range(first,last+1)
for m in [1,3,5,7,8,10,12]
if c.weekday(y,m,1) == 4]
return weekends[0], weekends[-1], len(weekends)
Some minor points I fixed:
get month_abbr directly in list comp instead of twice at the end
-1 in itself is a valid index
the two weekday-checks are redundant

Concatenate List in Python

I have 3 list's which are having one value i want to concatenate the list, so i used + operator to concatenate but the output is not what i expected. I need to use the list because in some cases i can get more results instead of one.
Lists:
A = ["F"]
B = ["SZLY"]
C = ["RQTS"]
D = ["19230711"]
Output:
['F']['SZLY']['RQTS']['19230711']
Expected Output:
FSZLYRQTS19230711
Update:
I used below code to concatenate. I used str() because i want to cast the topmost list element to string.
hrk = str(A)+str(B)+str(C)+str(D)
How can i get the expected output.
str on a list prints a representation of the list (for debug purposes). It's bad to process that as string further in your code.
most pythonic way: use join in a list comprehension for first & only item of your lists
A = ["F"]
B = ["SZLY"]
C = ["RQTS"]
D = ["19230711"]
print(["".join(x[0] for x in (A,B,C,D))])
results in:
FSZLYRQTS19230711
Try like this,
In [32]: A[0]+B[0]+C[0]+D[0]
Out[32]: 'FSZLYRQTS19230711'
Try:
A[0] + B[0] + C[0] + D[0]
You are trying to access first element of list so you have to access them by index.
What you are currently doing will create a single list with all the elements. Like:
A = ['2414214']
B = ['fefgg']
C = A + B
print C
# Will print
['2414214', 'fefgg']

iterate over several collections in parallel

I am trying to create a list of objects (from a class defined earlier) through a loop. The structure looks something like:
ticker_symbols = ["AZN", "AAPL", "YHOO"]
stock_list = []
for i in ticker_symbols:
stock = Share(i)
pe = stock.get_price_earnings_ratio()
ps = stock.get_price_sales()
stock_object = Company(pe, ps)
stock_list.append(stock_object)
I would however want to add one more attribute to the Company-objects (stock_object) through the loop. The attribute would be a value from another list, like (arbitrary numbers) [5, 10, 20] where the first attribute would go to the first object, the second to the second object etc.Is it possible to do something like:
for i, j in ticker_symbols, list2:
#dostuff
? Could not get this sort of nested loop to work on my own. Thankful for any help.
I believe that all you have to do is change the for loop.
Instead of "for i in ticker_symbols:" you should loop like
"for i in range(len(ticker_symbols))" and then use the index i to do whatever you want with the second list.
ticker_symbols = ["AZN", "AAPL", "YHOO"]
stock_list = []
for i in range(len(ticker_symbols):
stock = Share(ticker_symbols[i])
pe = stock.get_price_earnings_ratio()
ps = stock.get_price_sales()
# And then you can write
px = whatever list2[i]
stock_object = Company(pe, ps, px)
stock_list.append(stock_object)
Some people say that using index to iterate is not good practice, but I don't think so specially if the code works.
Try:
for i, j in zip(ticker_symbols, list2):
Or
for (k, i) in enumerate(ticker_symbols):
j = list2[k]
Equivalently:
for index in range(len(ticker_symbols)):
i = ticker_symbols[index]
j = list2[index]

Prepare my bigdata with Spark via Python

My 100m in size, quantized data:
(1424411938', [3885, 7898])
(3333333333', [3885, 7898])
Desired result:
(3885, [3333333333, 1424411938])
(7898, [3333333333, 1424411938])
So what I want, is to transform the data so that I group 3885 (for example) with all the data[0] that have it). Here is what I did in python:
def prepare(data):
result = []
for point_id, cluster in data:
for index, c in enumerate(cluster):
found = 0
for res in result:
if c == res[0]:
found = 1
if(found == 0):
result.append((c, []))
for res in result:
if c == res[0]:
res[1].append(point_id)
return result
but when I mapPartitions()'ed data RDD with prepare(), it seem to do what I want only in the current partition, thus return a bigger result than the desired.
For example, if the 1st record in the start was in the 1st partition and the 2nd in the 2nd, then I would get as a result:
(3885, [3333333333])
(7898, [3333333333])
(3885, [1424411938])
(7898, [1424411938])
How to modify my prepare() to get the desired effect? Alternatively, how to process the result that prepare() produces, so that I can get the desired result?
As you may already have noticed from the code, I do not care about speed at all.
Here is a way to create the data:
data = []
from random import randint
for i in xrange(0, 10):
data.append((randint(0, 100000000), (randint(0, 16000), randint(0, 16000))))
data = sc.parallelize(data)
You can use a bunch of basic pyspark transformations to achieve this.
>>> rdd = sc.parallelize([(1424411938, [3885, 7898]),(3333333333, [3885, 7898])])
>>> r = rdd.flatMap(lambda x: ((a,x[0]) for a in x[1]))
We used flatMap to have a key, value pair for every item in x[1] and we changed the data line format to (a, x[0]), the a here is every item in x[1]. To understand flatMap better you can look to the documentation.
>>> r2 = r.groupByKey().map(lambda x: (x[0],tuple(x[1])))
We just grouped all key, value pairs by their keys and used tuple function to convert iterable to tuple.
>>> r2.collect()
[(3885, (1424411938, 3333333333)), (7898, (1424411938, 3333333333))]
As you said you can use [:150] to have first 150 elements, I guess this would be proper usage:
r2 = r.groupByKey().map(lambda x: (x[0],tuple(x[1])[:150]))
I tried to be as explanatory as possible. I hope this helps.

Categories

Resources