After trying to scrape data from twitter using Snscrape, I am unable to get the data of tweets posted within the past hour only.
import pandas as pd
import snscrape.modules.twitter as sntwitter
from datetime import datetime, time
from datetime import timedelta
now = datetime.utcnow()
since = now - timedelta(hours=1)
since_str = since.strftime('%Y-%m-%d %H:%M:%S.%f%z')
until_str = now.strftime('%Y-%m-%d %H:%M:%S.%f%z')
# Query tweets with hashtag #SOSREX in the last one hour
query = '#SOSREX Since:' + since_str + ' until:' + until_str
SOSREX_data = []
for tweet in sntwitter.TwitterSearchScraper(query).get_items():
if len(SOSREX_data)>100:
break
else:
SOSREX_data.append([tweet.date,tweet.user.username,tweet.user.displayname,
tweet.content,tweet.likeCount,tweet.retweetCount,
tweet.sourceLabel,tweet.user.followersCount,tweet.user.location
])
Tweets_data = pd.DataFrame(SOSREX_data,
columns=["Date_tweeted","username","display_name",
"Tweets","Number_of_Likes","Number_retweets",
"Source_of_Tweet",
"number_of_followers","location"
])
Related
what is the best way to filter a rest request by date?
would work passing a variable maybe like this:
today = date.today() today_90 = today - timedelta(days = 90)
service-now.com/api/now/table/incident?sysparm_limit=1000&sysparm_query=sys_created_on**dates values here?**
I try to understanding your problem:
from datetime import date, timedelta
import requests
today = date.today()
today_90 = today - timedelta(days = 90)
r = requests.get('https://xxxx.service-now.com/api/now/table/incident?sysparm_limit=1000&sysparm_query=sys_created_on>' + str(today_90) + '&sysparm_query=sys_created_on<' + str(today))
I have a code that successfully pulls out data between 2 specific times but it gives me an error when I try to save it as a csv.
Here is the code:
import argparse
import boto3
import datetime
import pandas as pd
import csv
import json
parser = argparse.ArgumentParser()
parser.add_argument('--days', type=int, default=30)
args = parser.parse_args()
session = boto3.Session(profile_name='UMW')
cd = session.client('ce', 'us-west-1')
results = []
token = None
while True:
if token:
kwargs = {'NextPageToken': token}
else:
kwargs = {}
data = cd.get_cost_and_usage(TimePeriod={'Start': '2020-03-11', 'End':
'2020-06-10'}, Granularity='DAILY', Metrics=['AmortizedCost'],
GroupBy=[{'Type': 'DIMENSION', 'Key':
'LINKED_ACCOUNT'}]} ,{'Dimensions': {'Key': 'LINKED_ACCOUNT','Values':
['12394850028']}}]}, **kwargs)
for info in data['ResultsByTime']:
for group in info['Groups']:
print(group['Keys'][0], info['TimePeriod']['Start'],
group['Metrics']['AmortizedCost']['Amount'])#, group['Keys'][1])
token = data.get('NextPageToken')
if not token:
break
with open('test.csv', 'w',) as csvfile:
writer = csv.writer(csvfile)
writer.writerow([results])
I'm trying to save the results to a CSV but this gives me a blank CSV. it runs but comes out with no results and prints the results on the command line.
You can use something like this:
def ask_date():
year = int(input('Input year'))
month = int(input('Input month'))
day = int(input('Input day'))
date = datetime.date(year, month, day)
return date
If you use a specific date format (dd-mm-yyyy) that you know your users will aways use correctly, you can ask the date in this format, then parse it
def ask_date():
res = input()
day, month, year = res.split('-')
date = datetime.date(int(year), int(month), int(day))
return date
Goal is the use datetime to reiterate over
http://www.harness.org.au/racing/results/?firstDate=01-01-2019
http://www.harness.org.au/racing/results/?firstDate=02-01-2019.... to yesterdays date
(should be done in new_url = base_url + str(enddate1))
then once in that href, i want to circulate over meetingfulllisttable to get name and href to then get results data from each track that day.
My current error is'<=' not supported between instances of 'datetime.timedelta' and 'str' - which comes from my while loop. why is this? never used datetime before
from datetime import datetime, date, timedelta
import requests
import re
from bs4 import BeautifulSoup
base_url = "http://www.harness.org.au/racing/results/?firstDate="
base1_url = "http://www.harness.org.au"
webpage_response = requests.get('http://www.harness.org.au/racing/results/?firstDate=')
soup = BeautifulSoup(webpage_response.content, "html.parser")
format = "%d-%m-%y"
delta = timedelta(days=1)
yesterday = datetime.today() - timedelta(days=1)
yesterday1 = yesterday.strftime(format)
enddate = datetime(2019, 1, 1)
enddate1 = enddate.strftime(format)
while enddate1 <= yesterday1:
enddate1 =+ timedelta(days=1)
new_url = base_url + str(enddate1)
soup12 = requests.get(new_url)
soup1 = BeautifulSoup(soup12.content, "html.parser")
table1 = soup1.find('table', class_='meetingListFull')
for tr in table1.find_all('tr'):
all_cells = tr.find_all('td')
track = all_cells.a.href.get_text()
href = all_cells.get('href')
trackresults = base1_url + href
This
yesterday1 = yesterday.strftime(format)
Is a string. That's why you are getting that error
I'm very new to python and having trouble adjusting the results of an api request to handle UK daylight savings (Greenwich Mean Time / British Summer Time). When reading the Dark Sky documents they state :
The timezone is only used for determining the time of the request; the response will always be relative to the local time zone
I have built the below code to return historic weather information based upon hourly min/max/avg temperatures for each day for specific weather stations. I've been able to turn the returned UNIX time into a time stamp, but I need a way to deal with the data when the clocks change.
This is my whole code, if anyone can offer any advice I would be very grateful
import requests
import json
import pytemperature
import pandas as pd
pd.options.mode.chained_assignment = None # default='warn'
from pandas.io.json import json_normalize
import datetime
#Import CSV with station data
data_csv = pd.read_csv("Degree Days Stations.csv")
#Basic Information for the API request
URL = "https://api.darksky.net/forecast/"
AUTH = "REDACTED/"
EXCLUDES = "?exclude=currently,daily,minutely,alerts,flags"
#Use ZIP function to loop through the station_df dataframe
for STATION, NAME, CODE, LAT, LON in zip(data_csv['Station'], data_csv['Name'], data_csv['Code'], data_csv['Lat'], data_csv['Lon']):
#Result is based upon the time zone when running the script
date1 = '2019-10-26T00:00:00'
date2 = '2019-10-29T00:00:00'
start = datetime.datetime.strptime(date1, '%Y-%m-%dT%H:%M:%S')
end = datetime.datetime.strptime(date2, '%Y-%m-%dT%H:%M:%S')
step = datetime.timedelta(days=1)
while start <= end:
#Compile the Daily Values
DATE = (datetime.date.strftime(start.date(), "%Y-%m-%dT%H:%M:%S"))
#build the api url request
response = requests.get(URL+AUTH+str(LAT)+","+str(LON)+","+str(DATE)+EXCLUDES+"?units=si")
json_data = response.json()
#Flatten the data
json_df = json_normalize(json_data['hourly'],record_path=['data'],sep="_")
#Extract to new df
output_df = json_df[['time','temperature']]
#insert station name to dataframe for my debugging
output_df.insert(0, 'Name', NAME)
#Convert UNIX date to datetime
output_df['time'] = pd.to_datetime(output_df['time'], unit = 's')
##################
# Deal with timezone of output_df['time'] here
##################
#Convert temperatures from oF to oC
output_df['temperature'] = pytemperature.f2c(output_df['temperature'])
#Find the MIN/MAX/AVG from the hourly data for this day
MIN = output_df['temperature'].min()
MAX = output_df['temperature'].max()
AVG = output_df['temperature'].mean()
#Build the POST query
knackURL = "https://api.knack.com/v1/objects/object_34/records"
payload = '{ \
"field_647": "' + STATION + '", \
"field_649": "' + NAME + '", \
"field_650": "' + str(CODE) + '", \
"field_651": "' + str(DATE) + '", \
"field_652": "' + str(MIN) + '", \
"field_653": "' + str(MAX) + '", \
"field_655": "' + str(AVG) + '" \
}'
knackHEADERS = {
'X-Knack-Application-Id': "REDCATED",
'X-Knack-REST-API-Key': "REDACTED",
'Content-Type': "application/json"
}
#response = requests.request("POST", knackURL, data=payload, headers=knackHEADERS)
start += step
My results for October 27th (BST) and 28th (GMT) are shown below and are relevant to the current timezone (GMT). How can I ensure that that I get the same size of dataset starting from 00:00:00 ?
I've looked at arrow and pytz but can't seem to get it to work in the context of the dataframe. I've been working on the assumption that I need to test and deal withe the data when converting it from Unix... but I just can't get it right. Even trying to pass the data through arrow.get eg
d1 = arrow.get(output_df['time'])
print(d1)
Leaves me with the error : "Can't parse single argument type of '{}'".format(type(arg))
TypeError: Can't parse single argument type of ''
So I'm guessing that it doesn't want to work as part of a dataframe ?
Thank you in advance
To Repeatedly call a web URL which has time stamp in the end,
Example URL
'https://mywebApi/StartTime=2019-05-01%2000:00:00&&endTime=2019-05-01%2003:59:59'
StartTime=2019-05-01%2000:00:00
is URL representation of Time 2019-05-01 00:00:00
endTime=2019-05-01%2003:59:59
is URL representation of Time 2019-05-01 00:00:00
Requirement is to make repetitive calls , with 4 hour window.
While adding 4 hours, the date may change,
Is there a lean way to generate the URL String,
Some thing like
baseUrl = 'https://mywebApi/StartTime='
startTime = DateTime(2018-05-03 00:01:00)
terminationTime = DateTime(2019-05-03 00:05:00)
while (startTime < terminationTime):
endTime = startTime + hours(4)
url = baseUrl+str(startTime)+"endtime="+str(startTime)
# request get url
startTime = startTime + hours(1)
You can use Datetime.timedelta as well as the strftime function as follows:
from datetime import datetime, timedelta
baseUrl = 'https://mywebApi/StartTime='
startTime = datetime(year=2018, month=5, day=3, hour=0, minute=1, second=0)
terminationTime = datetime(year=2018, month=5, day=3, hour=3, minute=59, second=59)
while (startTime < terminationTime):
endTime = startTime + timedelta(hours=4)
url = baseUrl + startTime.strftime("%Y-%m-%d%20%H:%M:%S") + "endtime=" + endtime.strftime("%Y-%m-%d%20%H:%M:%S")
# request get url
startTime = endTime
The following link is useful https://www.guru99.com/date-time-and-datetime-classes-in-python.html or you can look at the official datetime documentation.
edit: using what u/John Gordan said to declare the initial dates