It would be really useful (and cool) if I were able to load a csv file to a pandas dataframe, in the browser, using pyscript, without starting a local server. It would allow me to create easily distributable tools.
Is it even possible?
The closest I've seen is this code. It doesn't really load the csv to a single dataframe object (that I can then manipulate). It does skip the need to start a local server and displays the csv file in the browser.
Related
Short Explanation
Some csv files are incoming on a OneDrive Server which is synced onto a machine where a script is running to read them and push them onto BigQuery. And while the script is running fine now, I intend to run it only after all files are synced completely (i.e. available offline) on that machine since last push...
Long Explanation
So basically I use a local database for sales history of our organization which I want to push to bigquery as well to reflect realtime (lagged) info on dashboards and for other analyses and stuff as a lot of other data besides sales history resides there. Since database is strictly on-premises and cannot be accessed outside organization’s network (So literally no way to link to BigQuery!), I have some people there who export day to time sales (sales from start of the day till time of export) info periodically (1-2hrs) from database and upload to OneDrive. I got OneDrive on a machine where many other scripts are hosted (Its just convenient!) and I run (python) script for reading all csvs, combine them and push to BigQuery. Often there are duplicates so it is necessary to read all the files, remove duplicates and then push them to BigQuery (for which I use:
files = [file for file in os.listdir(input_directory) if file.count('-')<=1]
data = [pd.read_excel(input_directory+file) for file in files if file.endswith('.xlsx')]
all_data = pd.concat(data, ignore_index=True).drop_duplicates()
def upload():
all_data.to_gbq(project_id = project_id,
destination_table = table,
credentials = service_account.Credentials.from_service_account_file(
'credentials.json'),
progress_bar = True,
if_exists = 'replace')
What I am trying to do is to is only update bigquery table if there are any new changes when script is run since they don’t always got time to do it.
My current approach is I write the length of dataframe in a file at the end of script as:
with open("length.txt", "w") as f:
f.write(len(all_data))
and once all files are read in df, I use:
if len(all_data) > int(open("length.txt","r").readlines()[0]):
upload()
But doing this needs all files to be read in RAM Reading so many files actually make it a bit congested for other scripts on the machine (RAM-wise). So I do not even want to read them all in RAM as per my current approach.
I tried accessing file attributes as well and tried to build a logic based on date modified as well but as long as a new file is added, it got changed even when file is not fully downloaded on machine. I searched as well to access sync status of files and came across: Determine OneDrive Sync Status From Batch File but that did not help. Any help bettering this situation is appreciated!
We have similar workflows to this where we load data from files into a database regularly by script. For us, once a file has been processed, we move it to a different directory as part of the python script. This way, we allow the python script to load all data from all files in the directory as it is definitely new data.
If the files are cumulative (contain old data as well as new data) and therefore you only want to load any rows that are new, this is where it gets tricky. You are definitely on the right track, as we use the modified date to ascertain whether the file has changed since we last processed it. in python you can get this from the os library os.path.getmtime(file_path).
This should give you the last date/time the file was changed in any way, for any operating system.
I recommend just moving the files out of your folder containing new files once they are loaded to make it easier for your python script to handle. I do not know much about OneDrive though so i cannot help with that aspect.
Good luck!
I am developing a web application in which users can upload excel files. I know I can use the OPENROWSET function to read data from excel into a SQL Server but I am refraining from doing so because this function requires a file path.
It seems kind of indirect as I am uploading a file to a directory and then telling SQL Server go look in that directory for the file instead of just giving SQL Server the file.
The other option would be to read the Excel file into a pandas dataframe and then use the to_sql function but pandas read_excel function is quite slow and the other method I am sure would be faster.
Which of these two methods is "correct" when handling file uploads from a web application?
If the first method is not frowned upon or "incorrect", then I am almost certain it is faster and will use that. I just want an experienced developers thoughts or opinions. The webapp's backend is Python and flask.
If I am understanding your question correctly, you are trying to load the contents of an xls(s) file into a SQLServer database. This is actually not trivial to do, as depending on what is in the Excel file you might want to have one table, or more probably multiple tables based on the data. So I would step back for a bit and ask three questions:
What is the data I need to save and how should that data be structured in my SQL tables. Forget about excel at this point -- maybe just examine the first row of data and see how you need to save it.
How do I get the file into my web application? For example, when the user uploads a file you would want to use a POST form and send the file data to your server and your server to save that file (for example, either on S3, or in a /tmp folder, or into memory for temporary processing).
Now that you know what your input is (the xls(x) file and its location) and how you need to save your data (the sql schema), now it's time to decide what the best tool for the job is. Pandas is probably not going to be a good tool, unless you literally just want to load the file and dump it as-is with minimal (if any) changes to a single table. At this point I would suggest using something like xlrd if only xls files, or openpyxl for xls and xlsx files. This way you can shape your data any way you want. For example, if the user enters in malformed dates; empty cells (should they default to something?); mismatched types, etc.
In other words, the task you're describing is not trivial at all. It will take quite a bit of planning and designing, and then quite a good deal of python code once you have your design decided. Feel free to ask more questions here for more specific questions if you need to (for example, how to capture the POST data in a file update or whatever you need help with).
I am running my Python script in which I write excel files to put them into my EC2 instance. However, I have noticed that these excel files, although they are created, are only put into the server once the code stops.
I guess they are kept in cache but I would like them to be added to the server straight away. Is there a "commit()" to add to the code?
Many thanks
I guess they are kept in cache but I would like them to be added to the server straight away. Is there a "commit()" to add to the code?
No. It isn't possible to stream or write a partial xlsx file like a CSV or Html file since the file format is a collection of XML files in a Zip container and it can't be generated until the file is closed.
I have data in an excel file that I would like to use to create a case in PSSE. The data is organized as it would appear in a case in PSSE (ie. for bus Bus number, name, base kV, and so on. Of course the data can be entered manually but I'm working with over 500 buses. I have tried copied and pasting, but that seems to works only sometimes. For machine data, it barely works.
Is there a way to import this data to PSSE from an excel file? I have recently started running PSSE with Python, and maybe there is a way to do this?
--
MK.
Yes. You can import data from an excel file into PSSE using the python package xlrt, however, I would reccomend instead converting your excel file to csv before you import and use csv as it is much easier. Importing data using the API is not just a copy and paste job, into the nicely tabulated spreadsheet that PSSE has in its case data.
Refer to the API documentation for PSSE, chapter II. Search this function, BUS_DATA_2. You will see that you can create buses with this function.
So your job should be three fold.
Import the csv file data with each line being a list of each data parameter for your bus. Like voltage, name, baseKV, PU etc. Store it to another list.
Iterate through the new list you just created and call:
ierr = bus_data_2(i, intgar, realar, name)
and pass in your data from the csv file. (see PSSE API documentation on how to do this) This will effectively load data from the csv file to your case ( in the form of nodes or buses).
After you are finished, you will need to call a function called psspy.save("Casename.sav") to save your work in a new PSSE case.
Note: there are functions to load in line data, fix shunt data, generator data etc.
Your other option is to call up the PTI folks as they can give you training.
Good luck
If you have an Excel data file with exactly the same "format" and same "info" as the regular case file (.sav), try this:
Open any small example .sav file from the example sub-folder PSSE's installation folder
Copy the corresponding spreadsheet to the working case (shown in spreadsheet view) with the same "info" (say, bus, branch,etc.) in PSSE GUI
After finishing copying everything, then save the edited working case in GUI as a new working case.
If this doesn't work, I suggest you to ask this question on forum of "Python for Power Systems":
https://psspy.org/psse-help-forum/questions/
I'm developing a web application which creates visualizations of some data.
The data is taken from third parties, using their APIs, and imported in my database. The importation will be done sporadically, therefore my database will be pretty static.
The visualizations will be dynamically created in JavaScript, using d3.
When thinking about how to pass (and format) the data from the server to the client I thought I could export it to a .csv file and then load it from javascript (d3 has a builtin csv parser).
This way the csv file doubles as a caching system: it will regenerated (and therefore the database queried), only if it is older than, say, a week.
My question is: where and how should I save the generated the csv file? STATIC_ROOT, MEDIA_ROOT, another hardlinked directory?
Also, do you think the csv system is a good idea?
Sorry if the questions may seem useless, I literally picked up both django and d3 less than a week ago.
You can place the file in STATIC_ROOT, that would be a suitable location.
Two thoughts on the side:
Did you think about locking / mutexing the csv file while it is writing? Or is it not a problem if a client may get half a CSV file if the request comes in at an unlucky moment?
CSV is not the standard way to transfer a data series to a JS client. I would probably write a JSON array to the file.
In Django, we usually store the static files - files used by our website to render content (like CSS, JS) under the STATIC_ROOT. Files under the MEDIA_ROOT are usually media files like images and videos that Django lets the webserver to serve. I would store the visualization data file under a data directory within my app (which goes under the main django project directory). This article is a good resource to structure your django project.
As for using a CSV file for the data file that drives the visualization, I would prefer exporting your data as a JSON, since it is a more compact notation. Also, I would assume decoding JSON in JavaScript would be faster than CSV. Although it would depend on other parameters like the size and structure of data in the file.