I have installed chart of accounts A for company1. This chart was used couple months for accounting. How can I convert into chart of accounts B and keep old data for accounts (debit, credit, etc.)? In other words, is possible migrate data from one chart of accounts to another? Solution could be programmatically or trough web client interface (not important). Virtual charts of accounts can't be used. Chart of accounts B must became main chart with old data.
Every advice will help me a lot. Thanks
I don't know of any way to install another chart of accounts after you've run the initial configuration wizard on a new database. However, if all you want to do is change the account numbers, names, and parents to match a different chart of accounts, then you should be able to do that with a bunch of database updates. Either manually edit each account if there aren't too many accounts, or write a SQL or Python script to update all the accounts. To do that, you'll need to map each old account to a new account code, name, and parent, then use that map to generate a script.
IMO its very difficult we are currently migrating some data and its proving to be difficult.
I would advice you to pick a date in the future and tell everyone to just use another db with the correct chart of accounts.
Your finance dept will be the one to suggest what date is perfect. How about when a period starts.
I needed to do similar. It is possible to massage the chart from one form to another but I found in the end that creating a New Database, bringing in modules, assigning the new Chart and then importing all critical elements was the best and safest path.
If you have a lot of transactions that will be more difficult to do the import on. If that is the case, then massage your chart from one form to another.
I am sure there will be some way to do an active Migration sometime in the future. You defintely don't want to live with a bad chart or with out your history if you can help it.
The fastest way to do so is using a ETL like Talend or Pentaho (provided there is a logic as to which account will map to which other during the process). If not you will have to do so by hand.
In case there is a logic, you would export it to a format you can transform and re import. Uninstall your account chart and install the new. Then import all the data that you formatted using those tools.
Related
I would like to create a website where I show some text but mainly dynamic data in tables and plots. Let us assume that the user can choose whether he wants to see the DAX or the DOW JONES prices for a specific timeframe. I guess these data I have to store in a database. As I am not experienced with creating websites, I have no idea what the most reasonable setup for this website would be.
Would it be reasonable for this example to choose a database where every row corresponds of 9 fields, where the first column is the timestamp (lets say data for every minute), the next four columns correspond to the high, low, open, close price of DAX for this timestamp and columns 5 to 9 correspond to high, low, open, close price for DOW JONES?
Could this be scaled to hundreds of columns with a reasonable speed
of the database?
Is this an efficient implementation?
When this website is online, you can choose whether you want to see DAX or DOW JONES prices for a specific timeframe. The corresponding data would be chosen via python from the database and plotted in the graph. Is this the general idea how this will be implemented?
To get the data, I can run another python script on the webserver to dynamically collect the desired data and write them in the database?
As a total beginner with webhosting (is this even the right term?) it is very hard for me to ask precise questions. I would be happy if I could find out whats the general structure I need to create the website, the database and the connection between both. I was thinking about amazon web services.
You could use a database, but that doesn't seem necessary for what you described.
It would be reasonable to build the database as you described. Look into SQL for doing so. You can download a package XAMPP that will give you pretty much everything you need for that. This is easily scalable to hundreds of thousands of entries - that's what databases are for.
If your example of stock prices is actually what you are trying to show, however, this is completely unnecessary as there are already plenty of databases that have this data and will allow you to query them. What you would really want in this scenario is an API. Alpha Vantage is a free service that will serve you data on stock prices, and has plenty of documentation to help you get it set up with python.
I would structure the project like this:
Use the python library Flask to set up the back end.
In addition to instantiating the Flask app, instantiate the Alpha Vantage class as well (you will need to pip install both of these).
In one of the routes you declare under Flask, use the Alpha Vantage api to get the data you need and simply display it to the screen.
If I am assuming you are a complete beginner, one or more of those steps may not make sense to you, in which case take them one at a time. Start by learning how to build a basic Flask app, then look at the API.
YouTube is your friend for both of these things.
I am working on a web application for downloading resources of an unimportant type. It's written in python using the flask web framework. I use the SQLAlchemy DB system.
It has a user authentication system and you can download the resources only while logged in.
What I am trying to do is a download history chart for every resource and every user. To elaborate, each user could see two charts of their download activity on their profile page, for the last 7 days and the last year respectively. Each resource would also have a similar pair of charts, but they would instead visualize how many times the resource itself was downloaded in the time periods.
Here is an example screenshot of the charts
(Don't have enough reputation to embed images)
http://dl.dropbox.com/u/5011799/Selection_049.png
The problem is, I can't seem to figure out what the best way to store the downloads in a database would be. I found 2 ways that are relatively easy to implement and should work:
1) I could store the download count for each day in the last week in separate fields and every 24 hours just get rid of the first one and move them to the left by 1. This, however, seems like a kind of a hacky way to do this.
2) I could also create a separate table for the downloads and every time a user downloads a resource I would insert a row into the table with the Datetime, user_id of the downloader and the resource_id of the downloaded resource. This would allow me to do some nice querying of time periods etc. The problem with that configuration could be the row count in the table. I have no idea how heavily the website is going to be used, but if I do the math with 1000 downloads / day, I am going to end up with over 360k rows in just the first year. I don't know how fast that would to perform. I know I could just archive old entries if performace started being a huge problem.
I would like to know whether the 2nd option would be fast enough for a web app and what configuration you would use.
Thanks in advance.
I recommend the second approach, with periodic aggregation to improve performance.
Storing counts by day will force you to SELECT the existing count so that you can either add to it with an UPDATE statement or know that you need to INSERT a new record. That's two trips to the database on every download. And if things get out of whack, there's really no easy way to determine what happened or what the correct numbers ought to be. (You're not saving information about the individual events.) That's probably not a significant concern for a simple download count, but if this were sensitive information it might matter.
The second approach simply requires a single INSERT for each download, and because each event is stored separately, it's easy to troubleshoot. And, as you point out, you can slice this data any way you like.
As for performance, 360,000 rows is trivial for a modern RDBMS on contemporary hardware, but you do want to make sure you have an index on date, username/resource name or any other columns that will be used to select data.
Still, you might have more volume than you expect, or maybe your DB is iffy (I'm not familiar with SQLAlchemy). To reduce your row count you could create a weekly batch process (yeah, I know, batch ain't dead despite what some people say) during non-peak hours to create summary records by week.
It would probably be easiest to create your summary records in a different table that is simply keyed by week and year, or start/end dates, depending on how you want to use it. After you've generated the weekly summary for a period, you can archive or delete the daily detail records for that period.
Note: Scroll down to the Background section for useful details. Assume the project uses Python-Django and South, in the following illustration.
What's the best way to import the following CSV
"john","doe","savings","personal"
"john","doe","savings","business"
"john","doe","checking","personal"
"john","doe","checking","business"
"jemma","donut","checking","personal"
Into a PostgreSQL database with the related tables Person, Account, and AccountType considering:
Admin users can change the database model and CSV import-representation in real-time via a custom UI
The saved CSV-to-Database table/field mappings are used when regular users import CSV files
So far two approaches have been considered
ETL-API Approach: Providing an ETL API a spreadsheet, my CSV-to-Database table/field mappings, and connection info to the target database. The API would then load the spreadsheet and populate the target database tables. Looking at pygrametl I don't think what i'm aiming for is possible. In fact, i'm not sure any ETL APIs do this.
Row-level Insert Approach: Parsing the CSV-to-Database table/field mappings, parsing the spreadsheet, and generating SQL inserts in "join-order".
I implemented the second approach but am struggling with algorithm defects and code complexity. Is there a python ETL API out there that does what I want? Or an approach that doesn't involve reinventing the wheel?
Background
The company I work at is looking to move hundreds of project-specific design spreadsheets hosted in sharepoint into databases. We're near completing a web application that meets the need by allowing an administrator to define/model a database for each project, store spreadsheets in it, and define the browse experience. At this stage of completion transitioning to a commercial tool isn't an option. Think of the web application as a django-admin alternative, though it isn't, with a DB modeling UI, CSV import/export functionality, customizable browse, and modularized code to address project-specific customizations.
The implemented CSV import interface is cumbersome and buggy so i'm trying to get feedback and find alternate approaches.
How about separating the problem into two separate problems?
Create a Person class which represents a person in the database. This could use Django's ORM, or extend it, or you could do it yourself.
Now you have two issues:
Create a Person instance from a row in the CSV.
Save a Person instance to the database.
Now, instead of just CSV-to-Database, you have CSV-to-Person and Person-to-Database. I think this is conceptually cleaner. When the admins change the schema, that changes the Person-to-Database side. When the admins change the CSV format, they're changing the CSV-to-Database side. Now you can deal with each separately.
Does that help any?
I write import sub-systems almost every month at work, and as I do that kind of tasks to much I wrote sometime ago django-data-importer. This importer works like a django form and has readers for CSV, XLS and XLSX files that give you lists of dicts.
With data_importer readers you can read file to lists of dicts, iter on it with a for and save lines do DB.
With importer you can do same, but with bonus of validate each field of line, log errors and actions, and save it at end.
Please, take a look at https://github.com/chronossc/django-data-importer. I'm pretty sure that it will solve your problem and will help you with process of any kind of csv file from now :)
To solve your problem I suggest use data-importer with celery tasks. You upload the file and fire import task via a simple interface. Celery task will send file to importer and you can validate lines, save it, log errors for it. With some effort you can even present progress of task for users that uploaded the sheet.
I ended up taking a few steps back to address this problem per Occam's razor using updatable SQL views. It meant a few sacrifices:
Removing: South.DB-dependent real-time schema administration API, dynamic model loading, and dynamic ORM syncing
Defining models.py and an initial south migration by hand.
This allows for a simple approach to importing flat datasets (CSV/Excel) into a normalized database:
Define unmanaged models in models.py for each spreadsheet
Map those to updatable SQL Views (INSERT/UPDATE-INSTEAD SQL RULEs) in the initial south migration that adhere to the spreadsheet field layout
Iterating through the CSV/Excel spreadsheet rows and performing an INSERT INTO <VIEW> (<COLUMNS>) VALUES (<CSV-ROW-FIELDS>);
Here is another approach that I found on github. Basically it detects the schema and allows overrides. Its whole goal is to just generate raw sql to be executed by psql and or whatever driver.
https://github.com/nmccready/csv2psql
% python setup.py install
% csv2psql --schema=public --key=student_id,class_id example/enrolled.csv > enrolled.sql
% psql -f enrolled.sql
There are also a bunch of options for doing alters (creating primary keys from many existing cols) and merging / dumps.
Question is rather conceptual, then direct.
What's the best solution to keep two different calendars synchronised? I can run a cron job for example every minute, I can keep additional information in database. How to avoid events conflicts?
As far I was thinking about these two solutions. First one is keeping a database which gathers information from both calendars and each time compares if something new appeared in any of them. Inside this database we can judge, which events should be added, edited or removed and then send those information back to both calendars.
Second one is keepien two databases for both calendars and collecting information separately. Then, after those databases are compared, we can say, where did the changes occure and send information from database A to calendar B or from database B to calendar A. I'm afraid this solution leads to more conflicts when changes were made to both databases.
What do you think of these? To be more accurate, I mean two google calendars and script written in python using gdata. Any idea of more simple solution?
Most calendars, including the Google calendar, has ways to import and synchronize data. You can use these ways. Just import the gdata information (perhaps you need to make it into ics first, I don't know) into the Google calendar.
I'd like to start by asking for your opinion on how I should tackle this task, instead of simply how to structure my code.
Here is what I'm trying to do: I have a lot of data loaded into a mysql table for a large number of unique names + dates (i.e., where the date is a separate field). My goal is to be able to select a particular name (using rawinput, and perhaps in the future add a drop-down menu) and see a monthly trend, with a moving average, and perhaps other stats, for one of the fields (revenue, revenue per month, clicks, etc). What is your advice - to move this data to an excel workbook via python, or is there a way to display this information in python (with charts that compare to excel, of course)?
Thanks!
Analyze of such data (name,date) could be seen as issuing ad-hoc SQL queries to get timeseries information.
You will 'sample' your information by a date/time frame (day/week/month/year or more detailled by hour/minute) depending of how large is your dataset.
I often use such query where the date field is truncate to the sample rate, in mysql DATE_FORMAT function is cool for that (postgres and oracle use date_trunc and trunc respectivly)
What you want to see in your data is in your your WHERE conditions.
select DATE_FORMAT(date_field,'%Y-%m-%d') as day,
COUNT(*) as nb_event
FROM yourtable
WHERE name = 'specific_value_to_analyze'
GROUP BY DATE_FORMAT(date_field,'%Y-%m-%d');
execute this query and output to a csv file. You could use direct mysql commands for that, but I recommend to make a python script that execute such query, and you can use getopt options for output formatting (with or without columns headers, use different separator than default one, etc). And even you can build dynamically the query based on some options.
To plot such information, look at time series tools. If you have missing data (date that won't appears in result of such sql query) you should take care for the choice. Excel is not the correct one for that, I think (or not master enough it), but could be a start.
Personaly I found dygraph, a javascript library, really cool for time series plotting, and it can be used with a csv file as source. Careful in such configuration, due to crossdomain security constraint, the csv file and html page that display the Dygraph object should be on the same server (or whatever the security constraint of your browser want to accept).
I used to build such webapp using django, as it's my favourite web framework, where I wrap url call as this :
GET /timeserie/view/<category>/<value_to_plot>
GET /timeserie/csv/<category>/<value_to_plot>
The first url call a view that simply output a template file with a variable that reference the url to get the csv file for the Dygraph object :
<script type="text/javascript">
g3 = new Dygraph(
document.getElementById("graphdiv3"),
"{{ csv_url }}",
{
rollPeriod: 15,
showRoller: true
}
);
</script>
The second url call a view that generate the sql query and output the result as text/csv to be rendered by Dygraph.
It's "home made" could stand simple or be extended, run easily on any desktop computer, could be extended to output json format for use by others javascript libraries/framework.
Else there is tool in opensource, related to such reporting (but timeseries capabilities are often not enough for my need) like Pentaho, JasperReport, SOFA. You make the query as datasource inside a report in such tool and build a graph that output timeserie.
I found that today web technique with correct javascript library/framework is really start to be correct to challenge that old fashion of reporting by such classical BI tools and it make things interactive :-)
Your problem can be broken down into two main pieces: analyzing the data, and presenting it. I assume that you already know how to do the data analysis part, and you're wondering how to present it.
This seems like a problem that's particularly well suited to a web app. Is there a reason why you would want to avoid that?
If you're very new to web programming and programming in general, then something like web2py could be an easy way to get started. There's a simple tutorial here.
For a desktop database-heavy app, have a look at dabo. It makes things like creating views on database tables really simple. wxpython, on which it's built, also has lots of simple graphing features.