Pandas won´t create .csv file - python

I recently started diving into algo trading and building a bot for crypto trading.
For this i created a backtester with pandas to run different strategies with different parameters. The datasets (csv files) I use are rather larger (around 40mb each).
These are processed, but as soon as i want to save the processed data to a csv, nothing happens. No output whatsoever, not even an error message. I tried to use the full path, I tried to save it just with the filename, I even tried to save it as a .txt file. Nothing seems to work. I also tried the solutions I was able to find on stackoverflow.
I am using Anaconda3 in case that could be the source of my problem.
Here you can find the part of my code ,which tries to save the dataframe to a file.
results_df = pd.DataFrame(results)
results_df.columns = ['strategy', 'number_of_trades', "capital"]
print(results_df)
for i in range(2, len(results_df)):
if results_df.capital.iloc[i] < results_df.capital.iloc[0]:
results_df.drop([i],axis="index")
#results to csv
current_dir = os.getcwd()
results_df.to_csv(os.getcwd()+'\\file.csv')
print(results_df)
Thank you for your help!

You can simplifiy your code by a great deal and write it as (should also run faster):
results_df = pd.DataFrame(results)
results_df.columns = ['strategy', 'number_of_trades', "capital"]
print(results_df)
first_row_capital= results_df.capital.iloc[0]
indexer_capital_smaller= results_df.capital < first_row_capital
values_to_delete= indexer_capital_smaller[indexer_capital_smaller].index
results_df.drop(index=values_to_delete, inplace=True)
#results to csv
current_dir = os.getcwd()
results_df.to_csv(os.getcwd()+'\\file.csv')
print(results_df)
I think, the main problem in your code might be, that you write the csv each time you found an entry in the dataframe where capital sattisfies the condition and you write it only if you find such a case.
And if you just do the deletion for the csv output but don't need the dataframe in memory anymore, you can make it even simpler:
results_df = pd.DataFrame(results)
results_df.columns = ['strategy', 'number_of_trades', "capital"]
print(results_df)
first_row_capital= results_df.capital.iloc[0]
indexer_capital_smaller= results_df.capital < first_row_capital
#results to csv
current_dir = os.getcwd()
results_df[indexer_capital_smaller].to_csv(os.getcwd()+'\\file.csv')
print(results_df[indexer_capital_smaller])
This second variant only applies a filter before writing the filtered lines and before printing the content.

Related

Python Multiprocessing Executing Function on Many Files

I have a list of csv files that contain numbers, all the same size. I also have lists of data associated with each file (e.g. list of dates, list of authors of each). What I'm trying to do is read in the file, perform a function , then put the maximum value with its value, along with the data associated with it, into a dataframe.
I have approximately 100,000 files to compute, so being able to use multiprocessing would save an incredible amount of time. Important to note: I'm running the following code in Jupyter Notebook on windows, which I know has known issues with multiprocessing. The code I've tried so far is:
import needed things
files = [file1, file2,...,fileN]
dates = [date1, ... , dateN]
authors = [author1, ... , authorN]
def func(file):
#Reads in file, computes func on values
index = files.index(file)
d1 = {'Filename': file, 'Max': max, 'X-value': x-values, 'Date': dates[index], 'Author': authors[index]}
return d1
if __name__ == '__main__':
with mp.Pool() as pool:
data = pool.map(FindTriggers, files)
summaryDF = pd.DataFrame(data)
This runs indefinitely, a known issue, but I'm not sure what the error is or how to fix it. Thank you in advance for any help.

Is there any feasible solution to read WOT battle results .dat files?

I am new here to try to solve one of my interesting questions in World of Tanks. I heard that every battle data is reserved in the client's disk in the Wargaming.net folder because I want to make a batch of data analysis for our clan's battle performances.
image
It is said that these .dat files are a kind of json files, so I tried to use a couple of lines of Python code to read but failed.
import json
f = open('ex.dat', 'r', encoding='unicode_escape')
content = f.read()
a = json.loads(content)
print(type(a))
print(a)
f.close()
The code is very simple and obviously fails to make it. Well, could anyone tell me the truth about that?
Added on Feb. 9th, 2022
After I tried another set of codes via Jupyter Notebook, it seems like something can be shown from the .dat files
import struct
import numpy as np
import matplotlib.pyplot as plt
import io
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
fbuff = io.BufferedReader(f)
N = len(fbuff.read())
print('byte length: ', N)
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
data =struct.unpack('b'*N, f.read(1*N))
The result is a set of tuple but I have no idea how to deal with it now.
Here's how you can parse some parts of it.
import pickle
import zlib
file = '4402905758116487.dat'
cache_file = open(file, 'rb') # This can be improved to not keep the file opened.
# Converting pickle items from python2 to python3 you need to use the "bytes" encoding or "latin1".
legacyBattleResultVersion, brAllDataRaw = pickle.load(cache_file, encoding='bytes', errors='ignore')
arenaUniqueID, brAccount, brVehicleRaw, brOtherDataRaw = brAllDataRaw
# The data stored inside the pickled file will be a compressed pickle again.
vehicle_data = pickle.loads(zlib.decompress(brVehicleRaw), encoding='latin1')
account_data = pickle.loads(zlib.decompress(brAccount), encoding='latin1')
brCommon, brPlayersInfo, brPlayersVehicle, brPlayersResult = pickle.loads(zlib.decompress(brOtherDataRaw), encoding='latin1')
# Lastly you can print all of these and see a lot of data inside.
The response contains a mixture of more binary files as well as some data captured from the replays.
This is not a complete solution but it's a decent start to parsing these files.
First you can look at the replay file itself in a text editor. But it won't show the code at the beginning of the file that has to be cleaned out. Then there is a ton of info that you have to read in and figure out but it is the stats for each player in the game. THEN it comes to the part that has to do with the actual replay. You don't need that stuff.
You can grab the player IDs and tank IDs from WoT developer area API if you want.
After loading the pickle files like gabzo mentioned, you will see that it is simply a list of values and without knowing what the value is referring to, its hard to make sense of it. The identifiers for the values can be extracted from your game installation:
import zipfile
WOT_PKG_PATH = "Your/Game/Path/res/packages/scripts.pkg"
BATTLE_RESULTS_PATH = "scripts/common/battle_results/"
archive = zipfile.ZipFile(WOT_PKG_PATH, 'r')
for file in archive.namelist():
if file.startswith(BATTLE_RESULTS_PATH):
archive.extract(file)
You can then decompile the python files(uncompyle6) and then go through the code to see the identifiers for the values.
One thing to note is that the list of values for the main pickle objects (like brAccount from gabzo's code) always has a checksum as the first value. You can use this to check whether you have the right order and the correct identifiers for the values. The way these checksums are generated can be seen in the decompiled python files.
I have been tackling this problem for some time (albeit in Rust): https://github.com/dacite/wot-battle-results-parser/tree/main/datfile_parser.

openpyxl blocking excel file after first read

I am trying to overwrite a value in a given cell using openpyxl. I have two sheets. One is called Raw, it is populated by API calls. Second is Data that is fed off of Raw sheet. Two sheets have exactly identical shape (cols/rows). I am doing a comparison of the two to see if there is a bay assignment in Raw. If there is - grab it to Data sheet. If both Raw and Data have the value in that column missing - then run a complex Algo (irrelevant for this question) to assign bay number based on logic.
I am having problems with rewriting Excel using openpyxl.
Here's example of my code.
data_df = pd.read_excel('Algo Build v23test.xlsx', sheet_name='MondayData')
raw_df = pd.read_excel('Algo Build v23test.xlsx', sheet_name='MondayRaw')
no_bay_res = data_df[data_df['Bay assignment'].isnull()].reset_index() #grab rows where there is no bay assignment in a specific column
book = load_workbook("Algo Build v23test.xlsx")
sheet = book["MondayData"]
for index, reservation in no_bay_res.iterrows():
idx = int(reservation['index'])
if pd.isna(raw_df.iloc[idx, 13]):
continue
else:
value = raw_df.iat[idx,13]
data_df.iloc[idx, 13] = value
sheet.cell(idx+2, 14).value = int(value)
book.save("Algo Build v23test.xlsx")
book.close()
print(value) #302
Now the problem is that it seems that book.close() is not working. Book is still callable in python. Now, it overwrites Excel totally fine. However, if I try to run these two lines again
data_df = pd.read_excel('Algo Build v23test.xlsx', sheet_name='MondayData')
raw_df = pd.read_excel('Algo Build v23test.xlsx', sheet_name='MondayRaw')
I am getting datasets full of NULL values, except for the value that was replaced. (attached the image).
However, if I open that Excel file manually from the folder and save it (CTRL+S) and try running the code again - it works properly. Weirdest problem.
I need to loop the code above for Monday-Sunday, so I need it to be able to read the data again without manually resaving the file.
Due to some reason, pandas will read all the formulas as NaN after the file been used in the script by openpyxl until the file has been opened, saved and closed. Here's the code that helps do that within the script. However, it is rather slow.
import xlwings as xl
def df_from_excel(path, sheet_name):
app = xl.App(visible=False)
book = app.books.open(path)
book.save()
app.kill()
return pd.read_excel(path, sheet_name)
I got the same problem, the only workaround I found is to terminate the excel.exe manually from taskmanager. After that everything went fine.

How to read a lot of excel files in python pandas?

I have lots of excel files(xlsx format) and want to read and handle them.
For example, file names are ex201901, ex201902, .... 201912.
Its name is made by exYYYYMM format.
Anyway, to import these files in pandas as an usual case, it's easy.
import pandas as pd
df201901 = pd.read_excel(r'C:\\users\ex201901.xlsx)
df201902 = pd.read_excel(r'C:\\users\ex201902.xlsx)
df201903 = pd.read_excel(r'C:\\users\ex201903.xlsx)
df201904 = pd.read_excel(r'C:\\users\ex201904.xlsx)
....
df201912 = pd.read_excel(r'C:\\users\ex201912.xlsx)
However, it seem to be a boring and tedius.
In SAS program, I use Macro() syntax. But in python, I have no idea how to handle.
Can you help me how to handle the multiple and repeated jobs in easy way, like a SAS MACRO().
Thanks for reading.
Given that you'll probably want to somehow work with all data frames at once afterwards, it's a smell if you even put them into separate local variables, and in general, whenever you're experiencing a "this task feels repetitive because I'm doing the same thing over and over again", that calls for introducing loops of some sort. As you're planning to use pandas, chances are that you'll be iterating soon again (now that you have your files, you're probably going to be performing some transformations on the rows of those files), in which case you'll probably be best off looking into how control flow a la loops works in Python (and indeed in pandas) in general; good tutorials are plentiful.
In your particular case, depending on what kind of processing you are planning on doing afterwards, you'd probably benefit from having something like
df2019 = [pd.read_excel(rf'C:\users\ex2019{str(i).zfill(2)}.xlsx') for i in range(1, 13)]
With that, you can access the individual data frames through e.g. df2019[5] to get the data frame corresponding to June, or you can collapse all of them into a single data frame using df = pd.concat(df2019) if that's what suits your need.
If you have less structure in your file names, glob can come in handy. With that, the above could become something like
import glob
df2019 = list(map(pd.read_excel, glob.glob(r'C:\users\ex2019*.xlsx')))
You can use OS module from python. It has a method listdir which stores all the file names in the folder. Check the code below:
import os, re
listDir = os.listdir(FILE_PATH)
dfList = []
for aFile in listDir:
if re.search(r'ex20190[0-9]{1}.xlsx', aFile):
tmpDf = pd.read_excel(FILE_PATH + aFile)
dfList.append(tmpDf)
outDf = pd.concat(dfList)

Is there a way for 2 scripts to write to the same file?

I have limited python experience, but determined to learn. I am trying to create a script that would write some data inputs to excel until stopped. It is very straightforward when a single person is using it but the problem is that 2 people will be using it at once.
I am thinking about making it simple and just having 2 exact same scripts running at the same time, but the problem comes in when the file is going to be saved. If I have two files being saved with the same name, one is going to overwrite the other and the data will be lost. Is there a way to have the scripts create files with different names without having to manually change the code? (This would eventually be scaled to up to 20 computers running it)
The loop looks like:
import xlwt
from xlwt import Workbook
wb = Workbook()
s1 = wb.add_sheet('Sheet 1')
data = []
while user != '0':
user = input('Scan ID Badge: ')
data.append(user)
order = input('Scan order: ')
data.append(order)
item = input('Scan item barcode: ')
data.append(item)
for i in range(len(data)):
s1.write(row,i,data[i])
wb.save('OrderData.xls')
data = []
row += 1
If you want to use a tabular form of data storage anyways, you could switch to a real database and on interval create an excel-like summary of the db file.
If you know all of the users using this script will be using machines with different network names, you could include the computer name in the XLS name:
import platform
filename = 'AssociateEfficiencyTemp-' + platform.node() + '.xls'
# ...
wb.save(filename)
(You can also use getpass.getuser() to (try and) get the username of the user running the script.)
You can then write another script that reads all of the separate files (glob.glob('AssociateEfficiencyTemp-*.xls') etc.) and combines them.
(I would suggest using another format than .xls for the intermediary files though, such as plain text files of JSON lines.)

Categories

Resources