For loop issues with Quandl -- Python - python

I'm trying to create a for-loop that automatically runs through my parsed list of NASDAQ stocks, and inserts their Quandl codes to then be retrieved from Quandl's database. essentially creating a large data set of stocks to perform data analysis on. My code "appears" right, but when I print the query it only prints 'GOOG/NASDAQ_Ticker' and nothing else. Any help and/or suggestions will be most appreciated.
import quandl
import pandas as pd
import matplotlib.pyplot as plt
import numpy
def nasdaq():
nasdaq_list = pd.read_csv('C:\Users\NAME\Documents\DATASETS\NASDAQ.csv')
nasdaq_list = nasdaq_list[[0]]
print nasdaq_list
for abbv in nasdaq_list:
query = 'GOOG/NASDAQ_' + str(abbv)
print query
df = quandl.get(query, authtoken="authoken")
print df.tail()[['Close', 'Volume']]

Iterating over a pd.DataFrame as you have done iterates by column. For example,
>>> df = pd.DataFrame(np.arange(9).reshape((3,3)))
>>> df
0 1 2
0 0 1 2
1 3 4 5
2 6 7 8
>>> for i in df[[0]]: print(i)
0
I would just get the first column as a Series with .ix,
>>> for i in df.ix[:,0]: print(i)
0
3
6
Note that in general if you want to iterate by row over a DataFrame you're looking for iterrows().

Related

Making a table by using data which getting together 4 different list

I try to make a table by getting together four list.
My code as below:
from selenium import webdriver
import time
driver_path= "C:\\Users\\Bacanli\\Desktop\\chromedriver.exe"
browser=webdriver.Chrome(driver_path)
browser.get("http://www.bddk.org.tr/BultenHaftalik")
time.sleep(3)
Krediler=browser.find_element_by_xpath("//*[#id='tabloListesiItem-253']/span")
Krediler.click()
elements = browser.find_elements_by_css_selector("td.ortala:nth-child(2)")
TPs=browser.find_elements_by_css_selector("td[data-label='TP']")
YPs=browser.find_elements_by_css_selector("td[data-label='YP']")
Toplams=browser.find_elements_by_css_selector("td[data-label='Toplam']")
My intend is that make a new table by getting together elements, TPs, YPs, Toplams.
Thanks for your helping.
Pandas makes this easy for you:
import pandas as pd
df = pd.read_html('http://www.bddk.org.tr/BultenHaftalik')
will create a list of pandas dataframes from html tables on the page. The table you want is df[3].
Result df[3].head():
Unnamed: 0
Sektör / Krediler ( 9 Temmuz 2021 Cuma ) (Milyon TL)
TP
YP
TOPLAM
0
1
Toplam Krediler (2+10)
2.479.94928
1.427.80395
3.907.75323
1
2
Tüketici Kredileri ve Bireysel Kredi Kartları (3+7)
877.62363
30181
877.92544
2
3
Tüketici Kredileri (4+5+6)
710.18775
11070
710.29845
3
4
a) Konut
278.38213
7473
278.45686
4
5
b) Taşıt
14.91958
000
14.91958
export to csv with df[3].to_csv('filename.csv')
(or you could use the export to excel button above the table on the website)

How to Convert a text data into DataFrame

How i can convert the below text data into a pandas DataFrame:
(-9.83334315,-5.92063135,-7.83228037,5.55314146), (-5.53137301,-8.31010785,-3.28062536,-6.86067081),
(-11.49239039,-1.68053601,-4.14773043,-3.54143976), (-22.25802006,-10.12843806,-2.9688831,-2.70574665), (-20.3418791,-9.4157625,-3.348587,-7.65474665)
I want to convert this to Data frame with 4 rows and 5 columns. For example, the first row contains the first element of each parenthesis.
Thanks for your contribution.
Try this:
import pandas as pd
with open("file.txt") as f:
file = f.read()
df = pd.DataFrame([{f"name{id}": val.replace("(", "").replace(")", "") for id, val in enumerate(row.split(",")) if val} for row in file.split()])
import re
import pandas as pd
with open('file.txt') as f:
data = [re.findall(r'([\-\d.]+)',data) for data in f.readlines()]
df = pd.DataFrame(data).T.astype(float)
Output:
0 1 2 3 4
0 -9.833343 -5.531373 -11.492390 -22.258020 -20.341879
1 -5.920631 -8.310108 -1.680536 -10.128438 -9.415762
2 -7.832280 -3.280625 -4.147730 -2.968883 -3.348587
3 5.553141 -6.860671 -3.541440 -2.705747 -7.654747
Your data is basically in tuple of tuples forms, hence you can easily use pass a list of tuples instead of a tuple of tuples and get a DataFrame out of it.
Your Sample Data:
text_data = ((-9.83334315,-5.92063135,-7.83228037,5.55314146),(-5.53137301,-8.31010785,-3.28062536,-6.86067081),(-11.49239039,-1.68053601,-4.14773043,-3.54143976),(-22.25802006,-10.12843806,-2.9688831,-2.70574665),(-20.3418791,-9.4157625,-3.348587,-7.65474665))
Result:
As you see it's default takes up to 6 decimal place while you have 7, hence you can use pd.options.display.float_format and set it accordingly.
pd.options.display.float_format = '{:,.8f}'.format
To get your desired data, you simply use transpose altogether to get the desired result.
pd.DataFrame(list(text_data)).T
0 1 2 3 4
0 -9.83334315 -5.53137301 -11.49239039 -22.25802006 -20.34187910
1 -5.92063135 -8.31010785 -1.68053601 -10.12843806 -9.41576250
2 -7.83228037 -3.28062536 -4.14773043 -2.96888310 -3.34858700
3 5.55314146 -6.86067081 -3.54143976 -2.70574665 -7.65474665
OR
Simply, you can use as below as well, where you can create a DataFrame from a list of simple tuples.
data = (-9.83334315,-5.92063135,-7.83228037,5.55314146),(-5.53137301,-8.31010785,-3.28062536,-6.86067081),(-11.49239039,-1.68053601,-4.14773043,-3.54143976),(-22.25802006,-10.12843806,-2.9688831,-2.70574665),(-20.3418791,-9.4157625,-3.348587,-7.65474665)
# data = [(-9.83334315,-5.92063135,-7.83228037,5.55314146),(-5.53137301,-8.31010785,-3.28062536,-6.86067081),(-11.49239039,-1.68053601,-4.14773043,-3.54143976),(-22.25802006,-10.12843806,-2.9688831,-2.70574665),(-20.3418791,-9.4157625,-3.348587,-7.65474665)]
pd.DataFrame(data).T
0 1 2 3 4
0 -9.83334315 -5.53137301 -11.49239039 -22.25802006 -20.34187910
1 -5.92063135 -8.31010785 -1.68053601 -10.12843806 -9.41576250
2 -7.83228037 -3.28062536 -4.14773043 -2.96888310 -3.34858700
3 5.55314146 -6.86067081 -3.54143976 -2.70574665 -7.65474665
wrap the tuples as a list
data=[(-9.83334315,-5.92063135,-7.83228037,5.55314146),
(-5.53137301,-8.31010785,-3.28062536,-6.86067081),
(-11.49239039,-1.68053601,-4.14773043,-3.54143976),
(-22.25802006,-10.12843806,-2.9688831,-2.70574665),
(-20.3418791,-9.4157625,-3.348587,-7.65474665)]
df=pd.DataFrame(data, columns=['A','B','C','D'])
print(df)
output:
A B C D
0 -9.833343 -5.920631 -7.832280 5.553141
1 -5.531373 -8.310108 -3.280625 -6.860671
2 -11.492390 -1.680536 -4.147730 -3.541440
3 -22.258020 -10.128438 -2.968883 -2.705747
4 -20.341879 -9.415762 -3.348587 -7.654747

Calculate Time difference between two points in the same column (ArcGIS)

I am trying to calculate the time difference between two points in ArcGIS, using VBScript or Python. I have a dataset of over 10 thousand points. Each has coordinates, dates, and times. I want to create a new field and calculate the time difference in seconds.
The data looks as follows:
FID Shape N E DateTime
0 Point 4768252.94469 4768252.94469 2021/05/06 12:12:05
1 Point 4768245.79949 4768245.79949 2021/05/06 12:12:11
2 Point 4768241.44071 4768241.44071 2021/05/06 12:12:15
3 Point 4768237.3568 4768237.3568 2021/05/06 12:12:18
So, the result with the data showing up would be "6, 4, 3, ...". I would appreciate your help a lot as I have tried many things and none worked.
Here is one way to do it using the Pandas module for python.
You can do this:
# import module Pandas
import pandas as pd
# Data as a python Dictionary. Can be imported as CSV too.
data = {
'N' : ['4768252.94469', '4768245.79949', '4768241.44071', '4768237.3568'],
'E' : ['4768252.94469', '4768245.79949', '4768241.44071', '4768237.3568'],
'Time': ['12:12:05','12:12:11','12:12:15','12:12:18']
}
# Creating a Pandas Dataframe object
df = pd.DataFrame(data)
# If you want to import the data from CSV use df = pd.read_csv('csvname.csv')
# Converting Time column to datetime object
df['Time'] = pd.to_datetime(df['Time'])
# print the differences
print(df["Time"].diff())
output:
1 0 days 00:00:06
2 0 days 00:00:04
3 0 days 00:00:03

I want to add the values of two cells present in the same column based on their " index = somevalue"

I have a data frame with the column "Key" as index like below:
Key
Prediction
C11D0 0
C11D1 8
C12D0 1
C12D1 5
C13D0 3
C13D1 9
C14D0 4
C14D1 9
C15D0 5
C15D1 3
C1D0 5
C2D0 7
C3D0 4
C4D0 1
C4D1 9
I want to add the values of two cells in Prediction column when their "index = something". The logic is I want to add the values whose index matches for upto 4 letters. Example: indexes having "C11D0 & C11D1" or having "C14D0 & C14D1" ? Then the output will be:
Operation
Addition Result
C11D0+C11D1 8
C12D0+C12D1 6
C13D0+C13D1 12
you can use isin function.
Example:
import pandas as pd
df = pd.DataFrame({'id':[1,2,3,4,5,6], 'value':[1,2,1,3,7,1]})
df[df.id.isin([1,5,6])].value.sum()
output:
9
for your case
idx = ['C11D0', 'C11D1']
print(df[df.Key.isin(idx)].Prediction.sum()) #outputs 8
First set key as a column if it is the index:
df.reset_index(inplace=True)
Then you can use DataFrame.loc with boolean indexing:
df.loc[df['key'].isin(["C11D0","C11D1"]),'Prediciton'].sum()
You can also create a function for it:
def sum_select_df(key_list,df):
return pd.concat([df[df['Key'].isin(['C'+str(key)+'D1','C'+str(key)+'D0'])] for key in key_list])['Prediction'].sum()
sum_select_df([11,14],df)
Output:
21
Here is a complete solution, slightly different from the other answers so far. I tried to make it pretty self-explanatory, but let me know if you have any questions!
import numpy as np # only used to generate test data
import pandas as pd
import itertools as itt
start_inds = ["C11D0", "C11D1", "C12D0", "C12D1", "C13D0", "C13D1", "C14D0", "C14D1",
"C15D0", "C15D1", "C1D0", "C2D0", "C3D0", "C4D0", "C4D1"]
test_vals = np.random.randint(low=0, high=10, size=len(start_inds))
df = pd.DataFrame(data=test_vals, index=start_inds, columns=["prediction"])
ind_combs = itt.combinations(df.index.array, 2)
sum_records = ((f"{ind1}+{ind2}", df.loc[[ind1, ind2], "prediction"].sum())
for (ind1, ind2) in ind_combs if ind1[:4] == ind2[:4])
res_ind, res_vals = zip(*sum_records)
res_df = pd.DataFrame(data=res_vals, index=res_ind, columns=["sum_result"])

How to access elements from imported csv file with pandas in python?

Apologies for this basic question. I am new to Python and having some problem with my codes. I used pandas to load in a .csv file and having problem accessing particular elements.
import pandas as pd
dateYTM = pd.read_csv('Date.csv')
print(dateYTM)
## Result
# Date
# 0 20030131
# 1 20030228
# 2 20030331
# 3 20030430
# 4 20030530
#
# Process finished with exit code 0
How can I access say the first date? I tried many difference ways but wasn't able to achieve what I want? Many thanks.
You can use read_csv with parameter parse_dates loc, see Selection By Label:
import pandas as pd
import numpy as np
import io
temp=u"""Date,no
20030131,1
20030228,3
20030331,5
20030430,6
20030530,3
"""
#after testing replace io.StringIO(temp) to filename
dateYTM = pd.read_csv(io.StringIO(temp), parse_dates=['Date'])
print dateYTM
Date no
0 2003-01-31 1
1 2003-02-28 3
2 2003-03-31 5
3 2003-04-30 6
4 2003-05-30 3
#df.loc[index, column]
print dateYTM.loc[0, 'Date']
2003-01-31 00:00:00
print dateYTM.loc[0, 'no']
1
But if you need only one value, better is use at see Fast scalar value getting and setting:
#df.at[index, column]
print dateYTM.at[0, 'Date']
2003-01-31 00:00:00
print dateYTM.at[0, 'no']
1

Categories

Resources