Using a text file to create a dataframe with Python

Using a text file to create a dataframe with Python - python

I am trying to use a text file to create a database in which I can easily search through with specific values such as "items with roi>50%" or "items with monthly sales >50"
I got as far as reading the file in and creating a list that was split after each 'New Flip!' which indexed each entry into the list but I don't know how to best name each of the attributes of each object within the list so they can be called upon.
I also tried creating a pd dataframe with each item being the rows and each column being the attributes but once again I was able to create the individual rows that contained the entire contents of each index but don't know how to split up each attribute of each item.
The text file output is set up as:
Donquixote
`New Flip!`
__**Product Details**__
> **Name:** Star Wars
> **ASIN:** B0
__**Analytics**__
> **Rank:** 79,371/7,911,836 (Top 1.00%) **Toys & Games**
> **Monthly Sales:** 150
> **Offer Count #:** 8
__**Calculation**__
> **Sell:** $49.99
> **Buy:** $20.99
> **FBA Fees:** $7.50
> **Pick and Pack Fees:** $3.64
> **Ship to Amazon:** $0.53
> ━━━━━━━━━━━━
> **Profit:** $17.34
> **ROI:** 82%
__**Links**__
> **[Check Restriction](https://sellercentr)**
> **[Amazon](https://www.amazon.com/dp/B)**
> **[Keepa](https://keepa.com/#!product/1-B)**
> **[Bestbuy](https://www.bestbuy.com/si**
https://images-ext-1.discorda
https://images-ext-1.discordapp.net/extern
A︱01/03/22
{Reactions}
💾 keepa barcode 📦 📋
[02-Jan-22 11:23 PM]
{Embed}
Donquixote

I have assumed that your text document is item after item all in the same document (I have added "Batman" as an additional item so that you can see how the code works).
Donquixote
`New Flip!`
__**Product Details**__
> **Name:** Star Wars
> **ASIN:** B0
__**Analytics**__
> **Rank:** 79,371/7,911,836 (Top 1.00%) **Toys & Games**
> **Monthly Sales:** 150
> **Offer Count #:** 8
__**Calculation**__
> **Sell:** $49.99
> **Buy:** $20.99
> **FBA Fees:** $7.50
> **Pick and Pack Fees:** $3.64
> **Ship to Amazon:** $0.53
> ━━━━━━━━━━━━
> **Profit:** $17.34
> **ROI:** 82%
__**Links**__
> **[Check Restriction](https://sellercentr)**
> **[Amazon](https://www.amazon.com/dp/B)**
> **[Keepa](https://keepa.com/#!product/1-B)**
> **[Bestbuy](https://www.bestbuy.com/si**
https://images-ext-1.discorda
https://images-ext-1.discordapp.net/extern
A︱01/03/22
{Reactions}
💾 keepa barcode 📦 📋
[02-Jan-22 11:23 PM]
{Embed}
Donquixote
`New Flip!`
__**Product Details**__
> **Name:** Batman
> **ASIN:** B1
__**Analytics**__
> **Rank:** 79,371/7,911,836 (Top 1.00%) **Toys & Games**
> **Monthly Sales:** 20
> **Offer Count #:** 8
__**Calculation**__
> **Sell:** $45.99
> **Buy:** $20.99
> **FBA Fees:** $7.50
> **Pick and Pack Fees:** $3.64
> **Ship to Amazon:** $0.53
> ━━━━━━━━━━━━
> **Profit:** $17.34
> **ROI:** 56%
__**Links**__
> **[Check Restriction](https://sellercentr)**
> **[Amazon](https://www.amazon.com/dp/B)**
> **[Keepa](https://keepa.com/#!product/1-B)**
> **[Bestbuy](https://www.bestbuy.com/si**
https://images-ext-1.discorda
https://images-ext-1.discordapp.net/extern
A︱01/03/22
{Reactions}
💾 keepa barcode 📦 📋
[02-Jan-22 11:23 PM]
{Embed}
This, messy and rudimentary, code could then be used:
database = {}
data = {}
name = ""
# open file
with open("data.txt", "rt", encoding="utf8") as file:
for line in file:
# if "New Flip!" then new item
if "New Flip!" in line:
for x in data.keys():
# convert money values to floats
if "$" in data[x]:
data[x] = float(data[x].strip("$"))
# convert other numbers to floats
elif data[x].isdigit():
data[x] = float(data[x])
# convert ROI to float
if x == "ROI:":
data[x] = float(data[x].strip("%"))
# add item's data to the database
database[name] = data
# start a new item
data = {}
try:
# if a name, define "name" as this value to name the data correctly
if line.split("**")[1] == "Name:":
name = line.split("**")[2].strip()
# dictionary key names are after the first "**", and values after the second
data[line.split("**")[1]] = line.split("**")[2].strip()
except:
pass
import pandas as pd
# convert database to dataframe and transpose so each row is an item and each column a dictionary key
df = pd.DataFrame(database).T
# filter for items with ROI > 60
df[df["ROI:"] > 60]
# Product Details Name: ASIN: Analytics Rank: Monthly Sales: Offer Count #: Calculation Sell: Buy: FBA Fees: Pick and Pack Fees: Ship to Amazon: Profit: ROI: Links [Check Restriction](https://sellercentr) [Amazon](https://www.amazon.com/dp/B) [Keepa](https://keepa.com/#!product/1-B) [Bestbuy](https://www.bestbuy.com/si
#Star Wars __ Star Wars B0 __ 79,371/7,911,836 (Top 1.00%) 150.0 8.0 __ 49.99 20.99 7.5 3.64 0.53 17.34 82.0 __
Notice that Batman, which has ROI of 56%, is not included in the filtered dataframe.

Related

How do I add a daterange to open trades during when backtesting with backtrader?

I am trying to backtest a strategy where trades are only opened during 8.30 to 16.00 using backtrader.
Using the below attempt my code is running but returning no trades so my clsoing balane is the same as the opening. If I remove this filter my code is running correctly and trades are opening and closing so it is definitely the issue. Can anyone please help?
I have tried adding the datetime column of the data to a data feed using the below code:
` def __init__(self):
# Keep a reference to the "close" line in the data[0] dataseries
self.dataclose = self.datas[0].close
self.datatime = mdates.num2date(self.datas[0].datetime)
self.datatsi = self.datas[0].tsi
self.datapsar = self.datas[0].psar
self.databbmiddle = self.datas[0].bbmiddle
self.datastlower = self.datas[0].stlower
self.datastupper = self.datas[0].stupper
# To keep track of pending orders
self.order = None`
I then used the following code to try filter by this date range:
# Check if we are in the market
if not self.position:
current_time = self.datatime[0].time()
if datetime.time(8, 30) < current_time < datetime.time(16, 0):
if self.datatsi < 0 and self.datastupper[0] > self.dataclose[0] and self.datastlower[1] < self.dataclose[1] and self.dataclose[0] < self.databbmiddle[0] and self.datapsar[0] > self.dataclose[0]:
self.log('SELL CREATE, %.2f' % self.dataclose[0])
# Keep track of the created order to avoid a 2nd order
os = self.sell_bracket(size=100,price=sp1, stopprice=sp2, limitprice=sp3)
self.orefs = [o.ref for o in os]
else:
o1 = self.buy(exectype=bt.Order.Limit,price=bp1,transmit=False)
print('{}: Oref {} / Buy at {}'.format(self.datetime.date(), o1.ref, bp1))
o2 = self.sell(exectype=bt.Order.Stop,price=bp2,parent=o1,transmit=False)
print('{}: Oref {} / Sell Stop at {}'.format(self.datetime.date(), o2.ref, bp2))
o3 = self.sell(exectype=bt.Order.Limit,price=bp3,parent=o1,transmit=True)
print('{}: Oref {} / Sell Limit at {}'.format(self.datetime.date(), o3.ref, bp3))
self.orefs = [o1.ref, o2.ref, o3.ref] # self.sell(size=100, exectype=bt.Order.Limit, price=self.data.close[0]+16, parent=self.order, parent_bracket=bt.Order.Market)

Adding new fields conditioned and calculate the length using Python QGIS

I am developing a new plugin using python in qgis. I want to get an output with the result of a specific attribute table from another layer already existing in my project => layer "SUPPORT".
That's what I've done so far
my code :
`
tableSUPPORT = QgsVectorLayer('None', 'table_SUPPORT', 'memory')
tableSUPPORT.dataProvider().addAttributes(
[QgsField("Propriétaire", QVariant.String), QgsField("Nb. tronçons", QVariant.Int),
QgsField("Long. (m)", QVariant.Int)])
tableSUPPORT.updateFields()
dicoSUPPORT = {'FT': (0, 0), 'FREE MOBILE': (0, 0), 'PRIVE': (0, 0)}
for sup in coucheSUPPORT.getFeatures():
proprietaire = sup['PROPRIETAI']
if proprietaire in ['FT', 'FREE MOBILE', 'PRIVE']:
dicoSUPPORT[proprietaire] = tuple(map(operator.add, (1, sup['LGR_REEL']), dicoSUPPORT[proprietaire]))
else:
QMessageBox.critical(None, "Problème de support",
"Le support %s possède un propriétaire qui ne fait pas partie de la liste." % str(sup['LIBELLE']))
return None
ligneFT = QgsFeature()
ligneFT.setAttributes(['FT', int(dicoSUPPORT['FT'][0]), int(dicoSUPPORT['FT'][1])])
ligneFREE = QgsFeature()
ligneFREE.setAttributes(['FREE MOBILE', int(dicoSUPPORT['FREE MOBILE'][0]), int(dicoSUPPORT['FREE MOBILE'][1])])
lignePRIVE = QgsFeature()
lignePRIVE.setAttributes(['PRIVE', int(dicoSUPPORT['PRIVE'][0]), int(dicoSUPPORT['PRIVE'][1])])
tableSUPPORT.dataProvider().addFeatures([ligneFT])
tableSUPPORT.dataProvider().addFeatures([ligneFREE])
tableSUPPORT.dataProvider().addFeatures([lignePRIVE])
QgsProject.instance().addMapLayer(tableSUPPORT)
`
and here is the result obtained with this code :
result
but in fact I want this table with these specific rows and columns:
table
This is my support attribute table :
support attribute table
description of each row I want in the result:
`
> row 1 =>FT_SOUT:sum of the length ('LGR_REEL') and count the number if ('PROPRIETAI'='FT' and
> 'TYPE_STRUC'='TRANCHEE')
>
> row 2 =>FT_AERIEN:sum of the length('LGR_REEL') if ('PROPRIETAI'='FT' and
> 'TYPE_STRUC'='AERIEN')
>
> row 3 =>CREATION GC FREE RESEAU:sum of length('LGR_REEL') If
> ('PROPRIETAI'='FREE MOBILE' and 'TYPE_STRUC'='TRANCHEE')
>
> row 4 =>CREATION AERIEN FREE RESEAU:sum of the length ('LGR_REEL') If
> ('PROPRIETAI'='FREE MOBILE' and 'TYPE_STRUC'='AERIEN')
>
> row 5 =>PRIVE :sum of the length ('LGR_REEL') If
> ('PROPRIETAI'='PRIVE')
`
i have been asking this at stackexchang but no one answers me
this is my question
https://gis.stackexchange.com/questions/448698/adding-new-fields-conditioned-and-calculate-the-length-using-python-qgis

why python if while ends in a dead loop

order = 2
selected = 0
while selected < 21: # because I can only select 20 rows the most once.
current_tr = driver.find_element_by_xpath('/ html / body / table / tbody / tr / td / div / div[3] / table / tbody / tr[%d]' % order) # form line 1. below the table's header
if current_tr.get_attribute("bgcolor") is None: # no bgcolor means not yet reviewed
driver.find_element_by_xpath("//td[2]/div/a").click() # check the onclick content
div_content = driver.find_element_by_xpath("//td[2]/div/div").text # fetch onclick content
driver.find_element_by_xpath("//td[2]/div/div/a").click() # close the onclick content
print(div_content)
if "car" in div_content: #judge if certain string exists in onclick content
list_content = div_content.split("【car】")
car_close = list_content[1].strip() # fetch the content
list_car = car_close.split(" ")
car = list_doi[0]
print(car)
orderminus = order - 1
driver.find_element_by_xpath('// *[ # id = "%d"] / td[6] / a' % orderminus).click() # pick this row,
time.sleep(1)
selected = selected + 1
order = order + 0 #if this row is picked, the row will disappear, so the order won't change
else: ###problem is here, the else branch seems like never been executed ? otherwise the if always stands? no, not possible. there are ones exclude "car", the problem occurs at the first instance of div_content without "car"
order = order + 1 # if "car" is not in
time.sleep(1)
else: # if already reviewed, order + 1
order = order + 1
above is my code using selenium to navigate the webpage with a table.
First judgement: if the current row is reviewed,
not yet reviewed? ok, print the info;
already reviewed？skip it.
then plus judgement: if there certain string "car" in the info:
no? skip;
yes, click it, the row disappear;
But currently when I am running this, the actual status is :
when doing the plus judement, if the string "car" is not in the info,
it keeps printing the info, it seems it not doing the else branch, is doing the line 6_9 in this snippet, always, dead end loop.
Why? anybody give me a clue?
to make things clear， i have simplified my code as below:
list = []
list.append("ff122")
list.append("carff")
list.append("ff3232")
list.append("ffcar")
list.append("3232")
order = 0
selected = 0
while selected < 6:
current_tr = list[order]
print("round %d %s" % (order, current_tr))
if "ff" in current_tr:
print("ff is in current_tr")
if "car" in current_tr:
print("car")
selected = selected + 1
order = order + 0
else:
order = order + 1
print("order is %d" % order)
else: # if already reviewed, order + 1
order = order + 1
print("order is %d" % order)
everybody can run this, what I need to do is firstly filter the "ff", if "ff" exists, then filter "car". both two conditions TRUE, selected +1, until selected reach certain number. in real instance, don't doubt that the list is long enough.

Error when using google translate API to translate a dataframe

I'm trying to translate part of SQuAD 1.1 dataset to Sinhalese. I don't know whether i can use the json file straight into translation
What i tried so far is making a little dataframe of SQuAD dataset and try to translate that as a demo to myself. But i got different errors. Below is the error i'm getting now. Can you help me to fix that error or tell me a better way to complete my task using python.
```import googletrans
from googletrans import Translator
import os
from google.cloud import translate_v2 as translate
os.environ['GOOGLE_APPLICATION_CREDENTIALS']=r"C:\Users\Sathsara\Documents\Python Learning\Translation test\translationAPI\flash-medley-278816-b2012b874797.json"
# create a translator object
translator = Translator()
# use translate method to translate a string - by default, the destination language is english
translated = translator.translate('I am Sathsara Rasantha',dest='si')
# the translate method returns an object
print(translated)
# obtain translated string by using attribute .text
translated.text
import pandas as pd
translate_example = pd.read_json("example2.json")
translate_example
contexts = []
questions = []
answers_text = []
answers_start = []
for i in range(translate_example.shape[0]):
topic = translate_example.iloc[i,0]['paragraphs']
for sub_para in topic:
for q_a in sub_para['qas']:
questions.append(q_a['question'])
answers_start.append(q_a['answers'][0]['answer_start'])
answers_text.append(q_a['answers'][0]['text'])
contexts.append(sub_para['context'])
df = pd.DataFrame({"context":contexts, "question": questions, "answer_start": answers_start, "text": answers_text})
df
df=df.loc[0:2,:]
df
# make a deep copy of the data frame
df_si = df.copy()
# translate columns' name using rename function
df_si.rename(columns=lambda x: translator.translate(x).text, inplace=True)
df_si.columns
translations = {}
for column in df_si.columns:
# unique elements of the column
unique_elements = df_si[column].unique()
for element in unique_elements:
# add translation to the dictionary
translations[element] = translator.translate(element,dest='si').text
print(translations)
# modify all the terms of the data frame by using the previously created dictionary
df_si.replace(translations, inplace = True)
# check translation
df_si.head()```
This is the error i get
> --------------------------------------------------------------------------- TypeError Traceback (most recent call
> last) <ipython-input-24-f55a5ca59c36> in <module>
> 5 for element in unique_elements:
> 6 # add translation to the dictionary
> ----> 7 translations[element] = translator.translate(element,dest='si').text
> 8
> 9 print(translations)
>
> ~\Anaconda3\lib\site-packages\googletrans\client.py in translate(self,
> text, dest, src)
> 170
> 171 origin = text
> --> 172 data = self._translate(text, dest, src)
> 173
> 174 # this code will be updated when the format is changed.
>
> ~\Anaconda3\lib\site-packages\googletrans\client.py in
> _translate(self, text, dest, src)
> 73 text = text.decode('utf-8')
> 74
> ---> 75 token = self.token_acquirer.do(text)
> 76 params = utils.build_params(query=text, src=src, dest=dest,
> 77 token=token)
>
> ~\Anaconda3\lib\site-packages\googletrans\gtoken.py in do(self, text)
> 199 def do(self, text):
> 200 self._update()
> --> 201 tk = self.acquire(text)
> 202 return tk
>
> ~\Anaconda3\lib\site-packages\googletrans\gtoken.py in acquire(self,
> text)
> 144 a = []
> 145 # Convert text to ints
> --> 146 for i in text:
> 147 val = ord(i)
> 148 if val < 0x10000:
>
> TypeError: 'numpy.int64' object is not iterable

How to fix out of range index

I am trying to get test data from two CSV files. However whenever I put a test case in for example:
val = Test_data("AAPL.csv", "close", 25)
print (val)
I get:
Open.append(temp[1])
IndexError: list index out of range
I have tried changing the file to read, readlines, and readline.
def main():
pass
if __name__ == '__main__':
main()
def test_data(filename, col, day):
"""A test function to query the data you loaded into your program.
Args:
filename: A string for the filename containing the stock data,
in CSV format.
col: A string of either "date", "open", "high", "low", "close",
"volume", or "adj_close" for the column of stock market data to
look into.
The string arguments MUST be LOWERCASE!
day: An integer reflecting the absolute number of the day in the
data to look up, e.g. day 1, 15, or 1200 is row 1, 15, or 1200
in the file.
Returns:
A value selected for the stock on some particular day, in some
column col. The returned value *must* be of the appropriate type,
such as float, int or str.
"""
Date = list()
Open = list()
High = list()
Low = list()
Close = list()
AdjClose = list()
Volume = list()
file = open(filename, "r")
x= file.read()
for line in x:
temp = line.split(",")
Date.append(temp[0])
Open.append(temp[1])
High.append(temp[2])
Low.append(temp[3])
Close.append(temp[4])
AdjClose.append(temp[5])
Volume.append(temp[6])
if col == 'Date':
return Date[day - 1]
elif col == 'Open':
return Open[day - 1]
elif col == 'High':
return High[day - 1]
elif col == 'Low':
return Low[day - 1]
elif col == 'Close':
return Close[day - 1]
elif col == 'Adj close':
return AdjClose[day - 1]
elif col == 'Volume':
return Volume[day - 1]
def transact(funds, stocks, qty, price, buy=False, sell=False):
"""A bookkeeping function to help make stock transactions.
Args:
funds: An account balance, a float; it is a value of how much money you have,
currently.
stocks: An int, representing the number of stock you currently own.
qty: An int, representing how many stock you wish to buy or sell.
price: An float reflecting a price of a single stock.
buy: This option parameter, if set to true, will initiate a buy.
sell: This option parameter, if set to true, will initiate a sell.
Returns:
Two values *must* be returned. The first (a float) is the new
account balance (funds) as the transaction is completed. The second
is the number of stock now owned (an int) after the transaction is
complete.
Error condition #1: If the `buy` and `sell` keyword parameters are both set to true,
or both false. You *must* print an error message, and then return
the `funds` and `stocks` parameters unaltered. This is an ambiguous
transaction request!
Error condition #2: If you buy, or sell without enough funds or
stocks to sell, respectively. You *must* print an error message,
and then return the `funds` and `stocks` parameters unaltered. This
is an ambiguous transaction request!
"""
if buy == sell:
# print("Ambigious transaction! Can't determine whether to buy or sell. No action performed.")
return funds, stocks
elif buy and funds >= qty * price:
funds -= qty * price
stocks += qty
# print("Insufficient funds")
return funds, stocks
elif sell and stocks >= qty:
funds += qty * price
stocks -= qty
# print("Insufficient stock")
return funds, stocks
Error:
import project
cash_balance = 1000
stocks_owned = 25
price = test_data("AAPL.csv", "close", 42)
Traceback (most recent call last):
File "<input>", line 1, in <module>
NameError: name 'test_data' is not defined
What Im supposed to get:
>>> cash_balance = 1000
>>> stocks_owned = 25
>>> price = test_data("AAPL.csv", "close", 42)
>>> price
4.357143
I don't know if the problem is because it is not reading my data.

Aren't you supposed to do project.test_data()?
Or from project import test_data?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using a text file to create a dataframe with Python - python

Related

How do I add a daterange to open trades during when backtesting with backtrader?

Adding new fields conditioned and calculate the length using Python QGIS

why python if while ends in a dead loop

Error when using google translate API to translate a dataframe

How to fix out of range index

Categories

Resources