Thread: Implementing "thick"/"fat" databases
I've been following a few blogs (http://database-programmer.blogspot.com/, http://thehelsinkideclaration.blogspot.com/) that make a very compelling argument, in my opinion, to move as much business/transactional logic as possible into the database, so that client applications become little more than moving data into and out of the database using a well-defined API, most commonly (but not necessarily) through the use of stored procedures. Although there seems to be a lot of discussion out there of the reasons why one might want to do this, I'm really at a loss for finding good, concrete examples of how to do it. Consequently, I'm hoping that somebody can share their experience(s), or point me to some examples, of doing this with PostgreSQL. I'd consider myself fairly well-versed in using the various features of PostgreSQL to enforce data integrity, but much less so for implementing transactional logic. To focus on a more concrete example, let's consider adding a financial transaction to the database. The "traditional" way to do this, with the business logic in the application layer, leaves us with two steps: insert the transaction "header", then insert the line items: BEGIN; INSERT INTO transaction (id, date, description) VALUES (1, CURRENT_DATE, 'Transaction 1'); INSERT INTO line_item (transaction_id, account_id, amount) VALUES (1, 1, 50), (1, 2, -50); END; Now if we start moving this logic to the database, we'd have something like: BEGIN; SELECT create_transaction(1, current_date, 'Transaction 1'); SELECT create_line_item(1, 1, 50); SELECT create_line_item(1, 1, -50); END; But we've actually taken a step back, since we're making a round-trip to the database for each line item. That could be resolved by doing: BEGIN; SELECT create_transaction(1, current_date, 'Transaction 1'); SELECT create_line_item(transaction_id, account_id, amount) FROM (VALUES (1, 1, 50), (1, 2, -50)) AS line_item (transaction_id, account_id, amount); END; Better, but still not good, since we're invoking the function for each individual line item, which ultimately means separate INSERTs for each one. What we'd want is something like: BEGIN; SELECT create_transaction(1, current_date, 'Transaction 1'); SELECT create_line_items(((1, 1, 50), (1, 2, -50))); END; But this still falls short, since we're still basically managing the transaction in the application layer. The holy grail, so to speak, would be: SELECT create_transaction(1, current_date, 'Transaction 1', ((1, 50), (2, -50))); Perhaps I just need to spend more time digging through the documentation, but I really have no idea how to do something like this, or if it's even possible. I'm really hoping someone can provide an example, point me to some resources, or even just share their real-world experience of doing something like this. It would be very much appreciated. Thanks. Karl Nack Futurity, Inc 5121 N Ravenswood Ave Chicago, IL 60640 773-506-2007
Karl Nack wrote: > I've been following a few blogs > (http://database-programmer.blogspot.com/, > http://thehelsinkideclaration.blogspot.com/) that make a very compelling > argument, in my opinion, to move as much business/transactional logic as > possible into the database, so that client applications become little > more than moving data into and out of the database using a well-defined > API, most commonly (but not necessarily) through the use of stored > procedures. I strongly agree with that design philosophy. One principle is that the buck stops with the database and that regardless of what the application does, any business logic should be enforced by the database itself. Another principle is to treat the database like a code library, where the tables are its internal variables and its public API is stored procedures. Using stored procedures means you can interact with the database from your application in the same way your application interacts with itself, meaning with parameterized routine calls. <snip> > To focus on a more concrete example, let's consider adding a financial > transaction to the database. The "traditional" way to do this, with the > business logic in the application layer, leaves us with two steps: > insert the transaction "header", then insert the line items: > > BEGIN; > > INSERT INTO transaction (id, date, description) > VALUES (1, CURRENT_DATE, 'Transaction 1'); > > INSERT INTO line_item (transaction_id, account_id, amount) > VALUES (1, 1, 50), (1, 2, -50); > > END; <snip> Anything intended to be a single transaction can be a single stored procedure. The code is something like this (out of my head, adjust to make it correct): FUNCTION financial_trans (trans_id, when, desc, dest_acct, src_acct, amt) BEGIN INSERT INTO transaction (id, date, description) VALUES (trans_id, when, desc); INSERT INTO line_item (transaction_id, account_id, amount) VALUES (trans_id, dest_acct, amt), (trans_id, src_acct, -amt); END; SELECT financial_trans( 1, CURRENT_DATE, 'Transaction 1', 1, 2, 50 ); > But this still falls short, since we're still basically managing the > transaction in the application layer. The holy grail, so to speak, would > be: > > SELECT create_transaction(1, current_date, 'Transaction 1', ((1, 50), > (2, -50))); Well, not quite, because specifying the number "50" twice would be ridiculous for such a non-generic function; you can calculate the "-50" from it in the function. > Perhaps I just need to spend more time digging through the > documentation, but I really have no idea how to do something like this, > or if it's even possible. I'm really hoping someone can provide an > example, point me to some resources, or even just share their real-world > experience of doing something like this. It would be very much > appreciated. A general rule of thumb, however you would design a routine in a normal programming language, try to do it that way in PL/PgSQL, assuming that PL/PgSQL is a competent language, and then tweak to match what you actually can do. -- Darren Duncan
On 07/22/11 4:11 PM, Darren Duncan wrote: > Karl Nack wrote: >> I've been following a few blogs >> (http://database-programmer.blogspot.com/, >> http://thehelsinkideclaration.blogspot.com/) that make a very compelling >> argument, in my opinion, to move as much business/transactional logic as >> possible into the database, so that client applications become little >> more than moving data into and out of the database using a well-defined >> API, most commonly (but not necessarily) through the use of stored >> procedures. > > I strongly agree with that design philosophy. One principle is that > the buck stops with the database and that regardless of what the > application does, any business logic should be enforced by the > database itself. Another principle is to treat the database like a > code library, where the tables are its internal variables and its > public API is stored procedures. Using stored procedures means you > can interact with the database from your application in the same way > your application interacts with itself, meaning with parameterized > routine calls. the alternative 'modern' architecture is to implement the business logic in a webservices engine that sits in front of the database, and only use stored procedures for things that get significant performance boost where that is needed to meet your performance goals.. Only this business logic is allowed to directly query the operational database. The business logic in this middle tier still relies on the database server for data integrity and such. The presentation layer is implemented either in a conventional client application or in a webserver (not to be confused with the webservices).... so you have user -> browser -> webserver/presentation layer -> webservices/business logic -> database The main rationale for this sort of design pattern is that large complex business logic implemented in SQL stored procedures can be rather difficult to develop and maintain -- john r pierce N 37, W 122 santa cruz ca mid-left coast
John R Pierce wrote: > On 07/22/11 4:11 PM, Darren Duncan wrote: >> Karl Nack wrote: >>> I've been following a few blogs >>> (http://database-programmer.blogspot.com/, >>> http://thehelsinkideclaration.blogspot.com/) that make a very compelling >>> argument, in my opinion, to move as much business/transactional logic as >>> possible into the database, so that client applications become little >>> more than moving data into and out of the database using a well-defined >>> API, most commonly (but not necessarily) through the use of stored >>> procedures. >> >> I strongly agree with that design philosophy. One principle is that >> the buck stops with the database and that regardless of what the >> application does, any business logic should be enforced by the >> database itself. Another principle is to treat the database like a >> code library, where the tables are its internal variables and its >> public API is stored procedures. Using stored procedures means you >> can interact with the database from your application in the same way >> your application interacts with itself, meaning with parameterized >> routine calls. > > the alternative 'modern' architecture is to implement the business logic > in a webservices engine that sits in front of the database, and only use > stored procedures for things that get significant performance boost > where that is needed to meet your performance goals.. Only this > business logic is allowed to directly query the operational database. > The business logic in this middle tier still relies on the database > server for data integrity and such. The presentation layer is > implemented either in a conventional client application or in a > webserver (not to be confused with the webservices).... so you have > user -> browser -> webserver/presentation layer -> webservices/business > logic -> database > > The main rationale for this sort of design pattern is that large complex > business logic implemented in SQL stored procedures can be rather > difficult to develop and maintain I should clarify that the primary thing I support, with respect to putting it in the database, is the business rules/constraints, because the buck stops there. It should not be possible for any database user lacking in data-definition privileges to circumvent any of the business rules. So one can not circumvent by using a generic SQL shell, for example. As for the rest, yes I agree with you that this doesn't have to actually be in the database, though from a standpoint of good design principles, all of the business logic should still be in one place, next to if not in the database, and that all database access should go through the business logic layer. All logic that is not specific to an application should go in a logic layer, so it is shared by multiple applications whether web or command-line or whatever, and so then the application is largely just a user interface. In other words, thinking in the Model-View-Controller paradigm, the Model should be fat and the Controller should be thin. -- Darren Duncan
On 22 Jul 2011, at 21:15, Karl Nack wrote: > But this still falls short, since we're still basically managing the > transaction in the application layer. The problem you're facing here is that database statements work with records, while your example has a need to handle a setof (different types of) records in one go. > The holy grail, so to speak, would be: > > SELECT create_transaction(1, current_date, 'Transaction 1', ((1, 50), > (2, -50))); Now imagine someone not familiar with your design reading this query... To start with, they're going to assume this query SELECTs data, while it actually inserts it. Secondly, it's impossible tosee what the different data-values are supposed to represent without looking up the function - and quite possibly, it'simplementation. They're going to wonder what (1,50) and (2, -50) mean, what kind of date current_date gets assigned to,etc. Having to write queries like these is even worse, even if you designed the function. You'll be looking at your own documentationa lot while writing these. It would seem to me that the API you would provide for business logic like this should provide the users of said API withenough context to create valid statements. For example, you could use XML to describe the data (I'm no fan of XML, butit does suit a need here and allows validation of the provided data), especially as Postgres has XML parsing functionality. Or you could use a more sophisticated procedural language (plpython or plphp, for example) that's capable of marshallingand unmarshalling data structures to strings and vice versa (eg. '{foo:1,bar:2}'). You would still have a SELECT statement that INSERTs data, which is semantically a bad thing to do IMHO. Perhaps the better solution is (as others mentioned already) to move the data interpretation to a (web)service/applicationserver and have that perform the actual database operations. With that in mind, you would put business logic ("process an invoice") into an "application server", while you put data integritylogic ("don't allow transactions with no line_items") into the database. Alban Hertroys -- The scale of a problem often equals the size of an ego. !DSPAM:737,4e2a9c2112098024710106!
On Fri, Jul 22, 2011 at 12:15 PM, Karl Nack <karlnack@futurityinc.com> wrote: > I've been following a few blogs > (http://database-programmer.blogspot.com/, > http://thehelsinkideclaration.blogspot.com/) that make a very compelling > argument, in my opinion, to move as much business/transactional logic as > possible into the database, so that client applications become little > more than moving data into and out of the database using a well-defined > API, most commonly (but not necessarily) through the use of stored > procedures. There are costs and benefits associated with this. The major argument against is that if you have business processes that may frequently or quickly change at the requirement level, a heavier-weight process might not work so well. On the other hand if your database is being used by more than one application, or if you want it to be used by one or more application, then it ideally can provide a way to consistently enforce business logic and security across multiple applications. Another benefit is that depending on your architecture, you might be able to place a single API to call such procedures, generate parameterized query strings, and then pass those though, reducing the possibility of one type of SQL injection. Note however, you have to worry about other forms of SQL injection inside your stored procs so this isn't a magic bullet even if it helps. So I think it boils down to how much consistency and stability you want and where you want it. > Although there seems to be a lot of discussion out there of > the reasons why one might want to do this, I'm really at a loss for > finding good, concrete examples of how to do it. Consequently, I'm > hoping that somebody can share their experience(s), or point me to some > examples, of doing this with PostgreSQL. I'd consider myself fairly > well-versed in using the various features of PostgreSQL to enforce data > integrity, but much less so for implementing transactional logic. In LedgerSMB, we take this a step further by making the procedures into discoverable interfaces, so the application logic itself is a sort of thin glue between a UI layer and the database procedure layer. One thing I would suggest is to try to keep API calls as atomic as possible. You want to enforce consistency and so you need to have all relevant inputs passed to the function. See below for a suggested change to your API. > > To focus on a more concrete example, let's consider adding a financial > transaction to the database. The "traditional" way to do this, with the > business logic in the application layer, leaves us with two steps: > insert the transaction "header", then insert the line items: > > BEGIN; > > INSERT INTO transaction (id, date, description) > VALUES (1, CURRENT_DATE, 'Transaction 1'); > > INSERT INTO line_item (transaction_id, account_id, amount) > VALUES (1, 1, 50), (1, 2, -50); > > END; > > > Now if we start moving this logic to the database, we'd have something > like: > > BEGIN; > SELECT create_transaction(1, current_date, 'Transaction 1'); > SELECT create_line_item(1, 1, 50); > SELECT create_line_item(1, 1, -50); > END; Now, if you are doing double-entry bookkeeping this doesn't provide enough consistency, IMO. You can't check inside the function to ensure that the transaction is balanced. it would be better to: BEGIN; SELECT create_transaction(1, current_date, 'Transaction 1', '{{1, 1, 50},{1,1,-50}}'); COMMIT; Now for the application, you can create an API that is semantically clearer. But PostgreSQL doesn't provide an easy way of calling procedures of this sort out of select/update/insert statements and select is the only way to do this. Best Wishes, Chris Travers
> > Now, if you are doing double-entry bookkeeping this doesn't provide > enough consistency, IMO. You can't check inside the function to > ensure that the transaction is balanced. it would be better to: > > BEGIN; > SELECT create_transaction(1, current_date, 'Transaction 1', '{{1, 1, > 50},{1,1,-50}}'); > COMMIT; > > Now for the application, you can create an API that is semantically > clearer. But PostgreSQL doesn't provide an easy way of calling > procedures of this sort out of select/update/insert statements and > select is the only way to do this. > > One option to consider is restricting final tables but making staging tables available. You use normal inserts to buildup the staging table and then validate and transfer the data to the final table using a function. In the example youcan build a work-in-process transaction however you see fit but the final real transaction creation process would firstconfirm that the entry balances before copying the records to the transaction table and cleaning up the work-in-processtable. David J.
> > Now for the application, you can create an API that is semantically > clearer. But PostgreSQL doesn't provide an easy way of calling > procedures of this sort out of select/update/insert statements and > select is the only way to do this. > > A semantically accurate way to run "procedures" is: DO $$ BEGIN PERFORM function(); END $$; It would be nice if you could use PERFORM outside of plpgsql... Now, this is not standard SQL but it does convey the desired semantics. That said, proper function naming can convey similarsemantic information as well. David J.
On Sat, Jul 23, 2011 at 1:32 PM, David Johnston <polobo@yahoo.com> wrote: > >> >> Now for the application, you can create an API that is semantically >> clearer. But PostgreSQL doesn't provide an easy way of calling >> procedures of this sort out of select/update/insert statements and >> select is the only way to do this. >> >> > > A semantically accurate way to run "procedures" is: > > DO $$ BEGIN PERFORM function(); END $$; > > It would be nice if you could use PERFORM outside of plpgsql... > > Now, this is not standard SQL but it does convey the desired semantics. That said, proper function naming can convey similarsemantic information as well. The other option (one we try to follow in LedgerSMB) is to always make the procedure return useful information. So you are essentially calling a procedure and selecting the result for further use by your application. Best Wishes, Chris Travers
Karl Nack wrote: >>> SELECT create_transaction(1, current_date, 'Transaction 1', ((1, >>> 50), (2, -50))); >> Well, not quite, because specifying the number "50" twice would be >> ridiculous for such a non-generic function; you can calculate the "- >> 50" from it in the function. > > Not if there were more than two line-items per transaction. A paycheck > is a good example, where generally one or more income accounts are > credited and multiple tax accounts as well as one or more asset accounts > are debited. Ensuring that all the line-items add up to 0 would be one > of the data integrity rules implemented in the database (though most > likely checked in the application layer as well). It would help if you clarified your intent with another example. I was assuming here that we were dealing with a double-entry accounting system where every line item in one account had a corresponding line item in another account of equal magnitude, and so all line items were in pairs, because you showed what looked like 1 pair, hence specifying the "50" once makes sense. So are you wanting the 1 function to take a set of line-item pairs, or are you wanting to say do +50 in one account and -25 in each of 2 different accounts, as a total of 3 line items? If the former, I would still just specify the "50" once and have a set of {from, to, amount} triples as an argument, which would be relation/rowset-typed. If the latter, then you would specify the "50" twice. >> A general rule of thumb, however you would design a routine in a >> normal programming language, try to do it that way in PL/PgSQL, >> assuming that PL/PgSQL is a competent language, and then tweak to >> match what you actually can do. > > In the language I'm most familiar with, PHP, I could do this with an > associative array: > > $transaction = array( > 'id' => 1, > 'date' => date('Y-m-d'), > 'description' => 'Transaction 1', > 'line_items' => array( > array('account_id' => 1, 'amount' => 50), > array('account_id' => 2, 'amount' => -50), > ), > ); > >From which I can easily build the appropriate SQL statements. This would > be very similar in Python. I wonder if this would be achievable in > PL/PGSQL, maybe through the use of composite types and/or domains? Yes, you could. PL/PgSQL supports relation/rowset-typed arguments (declared as "TABLE OF <rowtype>" or something like that). Or alternately you could use temporary staging tables as quasi-arguments rather than using an actual argument. -- Darren Duncan
> > Now, if you are doing double-entry bookkeeping this doesn't provide > > enough consistency, IMO. You can't check inside the function to > > ensure that the transaction is balanced. > > One option to consider is restricting final tables but making staging > tables available. I would implement this using triggers. First, add a boolean "is_balanced" column to the transaction table, along with the following trigger: create or replace function check_txn_balance() returns trigger language plpgsql as $$ declare _amt numeric; begin if 'UPDATE' = TG_OP and new.is_balanced then return null; end if; select sum(amt) into _amt from line_item where txn_id = new.id; if _amt <> 0 then raise exception 'unbalanced transaction'; end if; update txn set is_balanced = true where id = new.id; return null; end; $$; create constraint trigger check_txn_balance after insert or update on txn deferrable initially deferred for each row execute procedure check_txn_balance(); Then, whenever we add, remove, or update a line item, unbalance the parent transaction, which triggers the balance check: create or replace function unbalance_txn() returns trigger language plpgsql as $$ begin if 'UPDATE' = TG_OP then if (new.txn_id, new.amt) = (old.txn_id, old.amt) then return null; end if; end if; if TG_OP in ('INSERT', 'UPDATE') then update txn set is_balanced = false where (id, is_balanced) = (new.txn_id, true); end if; if TG_OP in ('DELETE', 'UPDATE') then update txn set is_balanced = false where (id, is_balanced) = (old.txn_id, true); end if; return null; end; $$; create trigger unbalance_txn after insert or delete or update on line_item for each row execute procedure unbalance_txn(); At least, this seems to be a fairly efficient and foolproof way to do it to me. Karl Nack Futurity, Inc 5121 N Ravenswood Ave Chicago, IL 60640 773-506-2007
I was assuming here that we were dealing with a double-entry accounting system where every line item in one account had a corresponding line item in another account of equal magnitude, and so all line items were in pairs, because you showed what looked like 1 pair, hence specifying the "50" once makes sense.
So are you wanting the 1 function to take a set of line-item pairs, or are you wanting to say do +50 in one account and -25 in each of 2 different accounts, as a total of 3 line items? If the former, I would still just specify the "50" once and have a set of {from, to, amount} triples as an argument, which would be relation/rowset-typed. If the latter, then you would specify the "50" twice.
A little OT but all double-entry accounting requires is that the debits and credits of an entry - when totaled - equal zero (are equal); there is no requirement pertaining to each item having an equal but opposite counter-part. Nor are multiple accounts required; some ad-hoc entries simply "re-class" money within a single account so that different "controls" and/or amounts remain.
Ideally what you would want is a type called "accounting detail entry" which represents a credit/debit. For the function you can either make any "negative" amounts credits OR pass in two arrays - one for debits and one for credits. Either way you'd pass in an array of this "accounting detail entry" type along with information such as accounting date and journal number. To avoid using a type I would make a "create_entry" function that you'd use to create the detail items on a staging table, attached to a specific ID, and then create the final entry by calling the original function with user/date/journal/etc information and reference the detail records via the ID.
David J.
On Sat, Jul 23, 2011 at 3:51 PM, Karl Nack <karlnack@futurityinc.com> wrote: >> In LedgerSMB, we take this a step further by making the procedures >> into discoverable interfaces, so the application logic itself is a >> sort of thin glue between a UI layer and the database procedure layer. >> One thing I would suggest is to try to keep API calls as atomic as >> possible. You want to enforce consistency and so you need to have all >> relevant inputs passed to the function. See below for a suggested >> change to your API. > > Would you say LedgerSMB follows the thick/fat database principle? If so, > I shall have to spend some time with your source code. I'd be very > curious to see how you handle this. The current svn trunk (to be 1.3) does. Older code in trunk or 1.2 follows the "survival" principle (we inherited one heck of a codebase when we forked) :-). > >> > BEGIN; >> > SELECT create_transaction(1, current_date, 'Transaction 1'); >> > SELECT create_line_item(1, 1, 50); >> > SELECT create_line_item(1, 1, -50); >> > END; >> >> Now, if you are doing double-entry bookkeeping this doesn't provide >> enough consistency, IMO. You can't check inside the function to >> ensure that the transaction is balanced. > > Yes, but I'd implement the constraint "all transactions must balance" as > a trigger that fires when the transaction is complete. This would > enforce data integrity regardless of whether or not the database API is > used, which I think is also important. That's problematic to do in PostgreSQL because statement-level triggers don't have access to statement args, and I don't believe they can be deferred. Note the GL stuff is not on the thick db system yet, but we are working on it (for 1.4). Here's an example: CREATE OR REPLACE FUNCTION payment_bulk_post (in_transactions numeric[], in_batch_id int, in_source text, in_total numeric, in_ar_ap_accno text, in_cash_accno text, in_payment_date date, in_account_class int, in_payment_type int, in_exchangerate numeric, in_curr text) RETURNS int AS $$ DECLARE out_count int; t_voucher_id int; t_trans_id int; t_amount numeric; t_ar_ap_id int; t_cash_id int; t_currs text[]; t_exchangerate numeric; BEGIN IF in_batch_id IS NULL THEN -- t_voucher_id := NULL; RAISE EXCEPTION 'Bulk Post Must be from Batch!'; ELSE INSERT INTO voucher (batch_id, batch_class, trans_id) values (in_batch_id, (SELECT batch_class_id FROM batch WHERE id = in_batch_id), in_transactions[1][1]); t_voucher_id := currval('voucher_id_seq'); END IF; SELECT string_to_array(value, ':') into t_currs from defaults where setting_key = 'curr'; IF (in_curr IS NULL OR in_curr = t_currs[0]) THEN t_exchangerate := 1; ELSE t_exchangerate := in_exchangerate; END IF; CREATE TEMPORARY TABLE bulk_payments_in (id int, amount numeric); select id into t_ar_ap_id from chart where accno = in_ar_ap_accno; select id into t_cash_id from chart where accno = in_cash_accno; FOR out_count IN array_lower(in_transactions, 1) .. array_upper(in_transactions, 1) LOOP EXECUTE $E$ INSERT INTO bulk_payments_in(id, amount) VALUES ($E$ || quote_literal(in_transactions[out_count][1]) || $E$, $E$ || quote_literal(in_transactions[out_count][2]) || $E$)$E$; END LOOP; EXECUTE $E$ INSERT INTO acc_trans (trans_id, chart_id, amount, approved, voucher_id, transdate, source, payment_type) SELECT id, case when $E$ || quote_literal(in_account_class) || $E$ = 1 THEN $E$ || t_cash_id || $E$ WHEN $E$ || quote_literal(in_account_class) || $E$ = 2 THEN $E$ || t_ar_ap_id || $E$ ELSE -1 END, amount * $E$|| quote_literal(t_exchangerate) || $E$, CASE WHEN $E$|| t_voucher_id || $E$ IS NULL THEN true ELSE false END, $E$ || t_voucher_id || $E$, $E$|| quote_literal(in_payment_date) ||$E$ , $E$ ||COALESCE(quote_literal(in_source), 'NULL') || $E$ , $E$ || coalesce(quote_literal(in_payment_type), 'NULL') || $E$ FROM bulk_payments_in where amount <> 0 $E$; EXECUTE $E$ INSERT INTO acc_trans (trans_id, chart_id, amount, approved, voucher_id, transdate, source, payment_type) SELECT id, case when $E$ || quote_literal(in_account_class) || $E$ = 1 THEN $E$ || t_ar_ap_id || $E$ WHEN $E$ || quote_literal(in_account_class) || $E$ = 2 THEN $E$ || t_cash_id || $E$ ELSE -1 END, amount * -1 * $E$|| quote_literal(t_exchangerate) || $E$, CASE WHEN $E$|| t_voucher_id || $E$ IS NULL THEN true ELSE false END, $E$ || t_voucher_id || $E$, $E$|| quote_literal(in_payment_date) ||$E$ , $E$ ||COALESCE(quote_literal(in_source), 'null') ||$E$ , $E$ || coalesce(quote_literal(in_payment_type), 'NULL') || $E$ FROM bulk_payments_in where amount <> 0 $E$; IF in_account_class = 1 THEN EXECUTE $E$ UPDATE ap set paid = paid + (select amount from bulk_payments_in b where b.id = ap.id) where id in (select id from bulk_payments_in) $E$; ELSE EXECUTE $E$ UPDATE ar set paid = paid + (select amount from bulk_payments_in b where b.id = ar.id) where id in (select id from bulk_payments_in) $E$; END IF; EXECUTE $E$ DROP TABLE bulk_payments_in $E$; perform unlock_all(); return out_count; END; $$ language plpgsql; in_transactions is a n by 2 array of numeric values. The first is an integer representation of the invoice id to be paid. The second is the numeric amount to be paid on that invoice. The temporary table proved necessary because of cache misses when trying to loop through the array when hundreds of invoices were paid to one vendor. The code here is still far from ideal, as in the future we will probably just query against the array using generate_series. Hope this helps. Chris Travers
> CREATE OR REPLACE FUNCTION payment_bulk_post > (in_transactions numeric[], in_batch_id int, in_source text, in_total numeric, > in_ar_ap_accno text, in_cash_accno text, > in_payment_date date, in_account_class int, in_payment_type int, > in_exchangerate numeric, in_curr text) Oh and as a disclaimer, this was tacked onto a database schema which a) I did not design and b) is patently insane. That part of the database schema is being redesigned for 1.4. Those wonderful disclaimers when having to post code against databases designed by.... well better say nothing at all >:-D Best Wishes, Chris Travers
> In LedgerSMB, we take this a step further by making the procedures > into discoverable interfaces, so the application logic itself is a > sort of thin glue between a UI layer and the database procedure layer. > One thing I would suggest is to try to keep API calls as atomic as > possible. You want to enforce consistency and so you need to have all > relevant inputs passed to the function. See below for a suggested > change to your API. Would you say LedgerSMB follows the thick/fat database principle? If so, I shall have to spend some time with your source code. I'd be very curious to see how you handle this. > > BEGIN; > > SELECT create_transaction(1, current_date, 'Transaction 1'); > > SELECT create_line_item(1, 1, 50); > > SELECT create_line_item(1, 1, -50); > > END; > > Now, if you are doing double-entry bookkeeping this doesn't provide > enough consistency, IMO. You can't check inside the function to > ensure that the transaction is balanced. Yes, but I'd implement the constraint "all transactions must balance" as a trigger that fires when the transaction is complete. This would enforce data integrity regardless of whether or not the database API is used, which I think is also important. Karl Nack Futurity, Inc 5121 N Ravenswood Ave Chicago, IL 60640 773-506-2007
> For example, you could use XML to describe the > data (I'm no fan of XML, but it does suit a need here and allows > validation of the provided data), especially as Postgres has XML parsing > functionality. Would you go so far as to suggest making the entire parameter an XML statement, something like: CREATE FUNCTION create_transaction (txn xml) ... I don't know, it seems like with converting everything to and from XML, we really haven't made things any better, just added the complexity of adding an intermediate technology. Also, I'm not a fan of XML either, and for the most part have managed to steer relatively clear of it so far. Getting pulled into that black whole is something I'd rather avoid, if possible.... > Or you could use a more sophisticated procedural language (plpython or > plphp, for example) that's capable of marshalling and unmarshalling data > structures to strings and vice versa (eg. '{foo:1,bar:2}'). I haven't looked into the other procedural languages besides PGSQL, although I am familiar with Python in general. Perhaps that may be the way to go ... > You would still have a SELECT statement that INSERTs data, which is > semantically a bad thing to do IMHO. True, although others have suggested always returning something useful from the function. For example, it's more likely the transaction id would be assigned using a sequence inside the database. The function could then return that id after successfully creating the transaction. Karl Nack Futurity, Inc 5121 N Ravenswood Ave Chicago, IL 60640 773-506-2007
> > SELECT create_transaction(1, current_date, 'Transaction 1', ((1, > > 50), (2, -50))); > > Well, not quite, because specifying the number "50" twice would be > ridiculous for such a non-generic function; you can calculate the "- > 50" from it in the function. Not if there were more than two line-items per transaction. A paycheck is a good example, where generally one or more income accounts are credited and multiple tax accounts as well as one or more asset accounts are debited. Ensuring that all the line-items add up to 0 would be one of the data integrity rules implemented in the database (though most likely checked in the application layer as well). > A general rule of thumb, however you would design a routine in a > normal programming language, try to do it that way in PL/PgSQL, > assuming that PL/PgSQL is a competent language, and then tweak to > match what you actually can do. In the language I'm most familiar with, PHP, I could do this with an associative array: $transaction = array( 'id' => 1, 'date' => date('Y-m-d'), 'description' => 'Transaction 1', 'line_items' => array( array('account_id' => 1, 'amount' => 50), array('account_id' => 2, 'amount' => -50), ), ); From which I can easily build the appropriate SQL statements. This would be very similar in Python. I wonder if this would be achievable in PL/PGSQL, maybe through the use of composite types and/or domains? Karl Nack Futurity, Inc 5121 N Ravenswood Ave Chicago, IL 60640 773-506-2007
On 23/07/11 12:05, John R Pierce wrote: p { margin-bottom: 0.21cm; }
On 07/22/11 4:11 PM, Darren Duncan wrote:Karl Nack wrote:I've been following a few blogs
(http://database-programmer.blogspot.com/,
http://thehelsinkideclaration.blogspot.com/) that make a very compelling
argument, in my opinion, to move as much business/transactional logic as
possible into the database, so that client applications become little
more than moving data into and out of the database using a well-defined
API, most commonly (but not necessarily) through the use of stored
procedures.
I strongly agree with that design philosophy. One principle is that the buck stops with the database and that regardless of what the application does, any business logic should be enforced by the database itself. Another principle is to treat the database like a code library, where the tables are its internal variables and its public API is stored procedures. Using stored procedures means you can interact with the database from your application in the same way your application interacts with itself, meaning with parameterized routine calls.
the alternative 'modern' architecture is to implement the business logic in a webservices engine that sits in front of the database, and only use stored procedures for things that get significant performance boost where that is needed to meet your performance goals.. Only this business logic is allowed to directly query the operational database. The business logic in this middle tier still relies on the database server for data integrity and such. The presentation layer is implemented either in a conventional client application or in a webserver (not to be confused with the webservices).... so you have user -> browser -> webserver/presentation layer -> webservices/business logic -> database
The main rationale for this sort of design pattern is that large complex business logic implemented in SQL stored procedures can be rather difficult to develop and maintain
I was thinking similar thoughts, but you not only beat me to it, you made some good points I had not thought of!
The only thing I can think of adding: is that it would be good to lock down the database so that only the middleware can access it, everything else accesses the database via the middleware.
Cheers,
Gavin
> I was thinking similar thoughts, but you not only beat me to it, you made > some good points I had not thought of! > > The only thing I can think of adding: is that it would be good to lock down > the database so that only the middleware can access it, everything else > accesses the database via the middleware. In general, I am not convinced that middleware is inherently more maintainable than in-db procedures. But the fundamental question is: Is this a a one-application database? If it is, you can use the middleware to be that application lock the db down so only the middleware can use it etc. But what if it isn't? What if we want to support a variety of applications against the same relational database? This has to be fairly commonplace..... In this way my experience is that it is often helpful to maintain several levels of stable, public API's both on a table level if possible (as attachment points for triggers), stored proc API's for actually inserting data into relevant areas while enforcing appropriate business logic, and so forth. One of the things we are doing in LedgerSMB is to make the stored procedures discoverable, so the argument names (and eventually the return types) will have meaning the application can use in building calls for the procedure. This eases one important maintenance point because arguments are automatically picked up by the application and as long as best practices in coding are followed, will be handled sanely. (The interface will be extended in the future so that return types determine the class, and the arguments in determine whether we are talking about a presumed object property or a presumed application-specified argument.) Theoretically, we should be able to build objects in languages picking up methods and properties from the Pg system catalogs but we haven't gotten that far yet with code generation. Best Wishes, Chris Travers
I gave a talk on using postgresql as an application server at PG East in March. Basically, we try to implement all business logic using functions, using plpythonu when necessary. For example, we have functions that send email, ftp files, sync remote databases, etc. It is important to keep your MVC intact and not "php" your function code by mixing business logic with SQL statements. I am currently playing with interactive queries, where the function stops in the middle, and sends a message to the client asking for input. This isn't a necessarily a good idea in all cases, but there are some functions where you don't have the ability to ask a question until it is mostly finished processing (ie you dont have the information needed to ask the question until it finishes munging the data, which might be a long process). Let me know if you would like a copy of my presentation. Sim On 07/22/2011 10:15 PM, Karl Nack wrote: > I've been following a few blogs > (http://database-programmer.blogspot.com/, > http://thehelsinkideclaration.blogspot.com/) that make a very compelling > argument, in my opinion, to move as much business/transactional logic as > possible into the database, so that client applications become little > more than moving data into and out of the database using a well-defined > API, most commonly (but not necessarily) through the use of stored > procedures. Although there seems to be a lot of discussion out there of > the reasons why one might want to do this, I'm really at a loss for > finding good, concrete examples of how to do it. Consequently, I'm > hoping that somebody can share their experience(s), or point me to some > examples, of doing this with PostgreSQL. I'd consider myself fairly > well-versed in using the various features of PostgreSQL to enforce data > integrity, but much less so for implementing transactional logic. > > To focus on a more concrete example, let's consider adding a financial > transaction to the database. The "traditional" way to do this, with the > business logic in the application layer, leaves us with two steps: > insert the transaction "header", then insert the line items: > > BEGIN; > > INSERT INTO transaction (id, date, description) > VALUES (1, CURRENT_DATE, 'Transaction 1'); > > INSERT INTO line_item (transaction_id, account_id, amount) > VALUES (1, 1, 50), (1, 2, -50); > > END; > > > Now if we start moving this logic to the database, we'd have something > like: > > BEGIN; > SELECT create_transaction(1, current_date, 'Transaction 1'); > SELECT create_line_item(1, 1, 50); > SELECT create_line_item(1, 1, -50); > END; > > > But we've actually taken a step back, since we're making a round-trip to > the database for each line item. That could be resolved by doing: > > BEGIN; > > SELECT create_transaction(1, current_date, 'Transaction 1'); > > SELECT create_line_item(transaction_id, account_id, amount) > FROM (VALUES (1, 1, 50), (1, 2, -50)) > AS line_item (transaction_id, account_id, amount); > > END; > > > Better, but still not good, since we're invoking the function for each > individual line item, which ultimately means separate INSERTs for each > one. What we'd want is something like: > > BEGIN; > SELECT create_transaction(1, current_date, 'Transaction 1'); > SELECT create_line_items(((1, 1, 50), (1, 2, -50))); > END; > > > But this still falls short, since we're still basically managing the > transaction in the application layer. The holy grail, so to speak, would > be: > > SELECT create_transaction(1, current_date, 'Transaction 1', ((1, 50), > (2, -50))); > > > Perhaps I just need to spend more time digging through the > documentation, but I really have no idea how to do something like this, > or if it's even possible. I'm really hoping someone can provide an > example, point me to some resources, or even just share their real-world > experience of doing something like this. It would be very much > appreciated. > > Thanks. > > > Karl Nack > > Futurity, Inc > 5121 N Ravenswood Ave > Chicago, IL 60640 > 773-506-2007 >
Hello Chris, > > In LedgerSMB, we take this a step further by making the procedures > > into discoverable interfaces, how do you do that ? Karsten -- NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie! Jetzt informieren: http://www.gmx.net/de/go/freephone
On 24/07/11 17:51, Chris Travers wrote: >> I was thinking similar thoughts, but you not only beat me to it, you made >> some good points I had not thought of! >> >> The only thing I can think of adding: is that it would be good to lock down >> the database so that only the middleware can access it, everything else >> accesses the database via the middleware. > In general, I am not convinced that middleware is inherently more > maintainable than in-db procedures. > > But the fundamental question is: Is this a a one-application > database? If it is, you can use the middleware to be that application > lock the db down so only the middleware can use it etc. > > But what if it isn't? What if we want to support a variety of > applications against the same relational database? This has to be > fairly commonplace..... > > In this way my experience is that it is often helpful to maintain > several levels of stable, public API's both on a table level if > possible (as attachment points for triggers), stored proc API's for > actually inserting data into relevant areas while enforcing > appropriate business logic, and so forth. > > One of the things we are doing in LedgerSMB is to make the stored > procedures discoverable, so the argument names (and eventually the > return types) will have meaning the application can use in building > calls for the procedure. This eases one important maintenance point > because arguments are automatically picked up by the application and > as long as best practices in coding are followed, will be handled > sanely. (The interface will be extended in the future so that return > types determine the class, and the arguments in determine whether we > are talking about a presumed object property or a presumed > application-specified argument.) Theoretically, we should be able to > build objects in languages picking up methods and properties from the > Pg system catalogs but we haven't gotten that far yet with code > generation. > > Best Wishes, > Chris Travers So it really boils down to 'It depends...' :-) I first started designing systems over 30 years ago. I remember my first design principle I came up with, but more importantly that my next project ignored it for good reasons (same mainframe COBOL environment in both cases)! I feel that for a large company, then the middleware approach is probably better when you have many diverse applications that share a lot in common, but it depends on many different factors.
On Sun, Jul 24, 2011 at 12:34 AM, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > Hello Chris, > >> > In LedgerSMB, we take this a step further by making the procedures >> > into discoverable interfaces, > > how do you do that ? The way we do things in 1.3 beta is relatively free-form and not idea. In future versions I think we intend to tighten this up. I will give you examples of where we are for now as well as where we are headed. In 1.3, the idea is that as much of the function declaration should be semantically meaningful as possible. However we didn't spec out everything. (Problems and solutions below) So you might have a function like: CREATE OR REPLACE FUNCTION batch_search(in_class_id int, in_description text, in_created_by_eid int, in_date_from date, in_date_to date, in_amount_gt numeric, in_amount_lt numeric, in_approved bool) An application can query the argument names as long as the application knows this is a routine for searching batches. It can then map in the class_id, description, created_by_eid, date_from, date_to, amount_gt, amount_lt, and approved fields from a specified source to the arguments and generate an SQL query to hit this argument. There are some problems with this approach as we follow it right now. The biggest one is consistency. If the variables are not always named consistently to object properties it can cause maintenance problems. So this means object definitions and sql statements form complementary portions of code and are in different languages. A second problem is function overloading with loosely typed languages. Our solution is to test for and not allow function overloading. The final problem is that this does not separate object properties from function arguments. A better approach would be to name object properties and arguments with different prefixes. So: CREATE OR REPLACE FUNCTION -- we aren't doing anything with a batch, just looking for them batch_search(arg_class_id int, arg_description text, arg_created_by_eid int, arg_date_from date, arg_date_to date, arg_amount_gt numeric, arg_amount_lt numeric, arg_approved bool) RETURNS SETOF batch_search_result but CREATE OR REPLACE FUNCTION batch_post(obj_id INTEGER) RETURNS batch If we enforce that the function return type must return a type whose properties can be specified as input properties, the following becomes possible: 1) Automatic argument mapping to object properties in a fairly robust way and 2) Automatic merging of return results back into the object that called it. It would also be possible to write code generators to create object wrappers for the SQL types and functions which could then be extended as needed. It also means that object properties and methods as general data structures have at least part of their definition in SQL code. Applications of course may have to extend this in various ways in order to avoid ravioli code. But it means you have an internally consistent basis to build things on. Best Wishes, Chris Travers
On Sat, Jul 23, 2011 at 11:44 PM, Sim Zacks <sim@compulab.co.il> wrote: > I gave a talk on using postgresql as an application server at PG East in > March. > > Basically, we try to implement all business logic using functions, using > plpythonu when necessary. > > For example, we have functions that send email, ftp files, sync remote > databases, etc. > > > It is important to keep your MVC intact and not "php" your function code by > mixing business logic with SQL statements. It depends on what you mean by "business logic." In general my view is that some business logic doesn't belong in stored procedures. In general where I draw the line is between 'data logic' (i.e. logic necessary to retrieve, process, and store data in a meaningful and consistent way meeting inherent requirements including security requirements) and 'application logic' (i.e. logic as to how you obtain, display, present, and use the data). If the goal is to provide a consistent set of business logic to several applications hitting the database, in general you may not want your database to make assumptions about how the data is going to be used, or where it is going to come from. OTOH, if this is a single application database, then I would ask: What do you gain by putting it in plpythonu on the db server instead of on the client? > > > I am currently playing with interactive queries, where the function stops in > the middle, and sends a message to the client asking for input. This isn't a > necessarily a good idea in all cases, but there are some functions where you > don't have the ability to ask a question until it is mostly finished > processing (ie you dont have the information needed to ask the question > until it finishes munging the data, which might be a long process). How does this interface work? You have a separate network socket for this? Something else? What happens if a user goes to lunch or leaves early and then the query hangs pending input? How do you handle this? Is there a timeout on the input request? Best Wishes, Chris Travers
On 07/24/2011 06:58 PM, Chris Travers wrote: > On Sat, Jul 23, 2011 at 11:44 PM, Sim Zacks<sim@compulab.co.il> wrote: >> I gave a talk on using postgresql as an application server at PG East in >> March. >> >> Basically, we try to implement all business logic using functions, using >> plpythonu when necessary. >> >> For example, we have functions that send email, ftp files, sync remote >> databases, etc. >> >> >> It is important to keep your MVC intact and not "php" your function code by >> mixing business logic with SQL statements. > It depends on what you mean by "business logic." In general my view > is that some business logic doesn't belong in stored procedures. > > In general where I draw the line is between 'data logic' (i.e. logic > necessary to retrieve, process, and store data in a meaningful and > consistent way meeting inherent requirements including security > requirements) and 'application logic' (i.e. logic as to how you > obtain, display, present, and use the data). > > If the goal is to provide a consistent set of business logic to > several applications hitting the database, in general you may not want > your database to make assumptions about how the data is going to be > used, or where it is going to come from. > > OTOH, if this is a single application database, then I would ask: > What do you gain by putting it in plpythonu on the db server instead > of on the client? The goal is to make our system client agnostic, Most of our GUI is written in wxpython, we also have some web functions and even a barcode terminal function, written in C#. We would like to use an application server, so that all the code is run on the server but we don't want to be web-based. I don't want to have the same business logic written into various clients as that causes maintenance headaches. The way that postgresql works is that each session is its own process. That means that from the server perspective (at least in most ways), my plpython function is not connected to any other call on the database. My goal is to allow any client to request functionality and have it execute the same way every time. Data logic is built in by use of constraints and triggers and some functions, business logic is built only into functions. In other words, our postgresql server is a hybrid database/application server. >> >> I am currently playing with interactive queries, where the function stops in >> the middle, and sends a message to the client asking for input. This isn't a >> necessarily a good idea in all cases, but there are some functions where you >> don't have the ability to ask a question until it is mostly finished >> processing (ie you dont have the information needed to ask the question >> until it finishes munging the data, which might be a long process). > How does this interface work? You have a separate network socket for > this? Something else? > > What happens if a user goes to lunch or leaves early and then the > query hangs pending input? How do you handle this? Is there a > timeout on the input request? > > Best Wishes, > Chris Travers As I mentioned, I am still playing around and testing this, and have The concept is that each client has a SocketServer thread that listens on a specific port (8080 in this case). My database function is called python_prompt pasted below. It has a timeout of 2 minutes after which the function errors out with a timeout. The user will get a message that the function did not complete. The important thing is that it has a timeout. If the timeout is 20 minutes, that is also ok. Depending on the query that calls this function, there do not have to be any locks being used and as this is its own process and it is basically sleeping while waiting for the answer, it doesn't use resources (aside from a connection and the work mem in the calling query, of course). Using this, a plpgsql query can prompt the user, "You are about to use more stock then you have, are you sure you want to do this?" CREATE OR REPLACE FUNCTION python_prompt(v_question text) RETURNS boolean AS $BODY$ import time import socket ipaddr=plpy.execute("select inet_client_addr()")[0]["inet_client_addr"] HOST, PORT = str(ipaddr), 8080 sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.settimeout(120) sock.connect((HOST, PORT)) sock.send(v_question + "\n") ans=sock.recv(1024) sock.close() if ans=="yes": return true else: return false $BODY$ LANGUAGE plpythonu VOLATILE; I haven't fully decided to use this yet, but one of my goals is to make my GUI as dumb as possible.
Am 22.07.2011 21:15, schrieb Karl Nack: > to move as much business/transactional logic as > possible into the database, so that client applications become little > more than moving data into and out of the database using a well-defined > API, most commonly (but not necessarily) through the use of stored > procedures. Beside the points already mentioned, doing this will might cause bottle necks if you have complicated transactions as the DB-cluster might can not be scaled as good as maybe a farm of application server could be done. Cheers, Frank
2011/7/25 Frank Lanitz <frank@frank.uvena.de>: > Am 22.07.2011 21:15, schrieb Karl Nack: >> >> to move as much business/transactional logic as >> possible into the database, so that client applications become little >> more than moving data into and out of the database using a well-defined >> API, most commonly (but not necessarily) through the use of stored >> procedures. > > Beside the points already mentioned, doing this will might cause bottle > necks if you have complicated transactions as the DB-cluster might can not > be scaled as good as maybe a farm of application server could be done. > Yes, and no - this can decrease network overhead, can decrease a data conversion overhead. Sometimes I was surprised how much time I got with moving to stored procedures. Regards Pavel > Cheers, > Frank > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Am 25.07.2011 10:12, schrieb Pavel Stehule: > 2011/7/25 Frank Lanitz<frank@frank.uvena.de>: >> Am 22.07.2011 21:15, schrieb Karl Nack: >>> >>> to move as much business/transactional logic as >>> possible into the database, so that client applications become little >>> more than moving data into and out of the database using a well-defined >>> API, most commonly (but not necessarily) through the use of stored >>> procedures. >> >> Beside the points already mentioned, doing this will might cause bottle >> necks if you have complicated transactions as the DB-cluster might can not >> be scaled as good as maybe a farm of application server could be done. >> > > Yes, and no - this can decrease network overhead, can decrease a data > conversion overhead. Sometimes I was surprised how much time I got > with moving to stored procedures. Yep. Its always depending on what you are doing I guess. Cheers, Frank
On 07/25/2011 11:06 AM, Frank Lanitz wrote: > Am 22.07.2011 21:15, schrieb Karl Nack: >> to move as much business/transactional logic as >> possible into the database, so that client applications become little >> more than moving data into and out of the database using a well-defined >> API, most commonly (but not necessarily) through the use of stored >> procedures. > > Beside the points already mentioned, doing this will might cause > bottle necks if you have complicated transactions as the DB-cluster > might can not be scaled as good as maybe a farm of application server > could be done. > > Cheers, > Frank > If I understand you correctly, you are saying that to handle business logic processing, I may require X servers. Only a percentage of that traffic actually requires database processing. if I use a cluster of application servers against a single database, it will scale better then if I have to cluster my database, which brings in all sorts of messy master-master replication issues. Is this accurate? Sim
Am 25.07.2011 10:24, schrieb Sim Zacks: > On 07/25/2011 11:06 AM, Frank Lanitz wrote: > >> Am 22.07.2011 21:15, schrieb Karl Nack: >>> to move as much business/transactional logic as >>> possible into the database, so that client applications become little >>> more than moving data into and out of the database using a well-defined >>> API, most commonly (but not necessarily) through the use of stored >>> procedures. >> >> Beside the points already mentioned, doing this will might cause >> bottle necks if you have complicated transactions as the DB-cluster >> might can not be scaled as good as maybe a farm of application server >> could be done. >> >> Cheers, >> Frank >> > > If I understand you correctly, you are saying that to handle business > logic processing, I may require X servers. Only a percentage of that > traffic actually requires database processing. if I use a cluster of > application servers against a single database, it will scale better then > if I have to cluster my database, which brings in all sorts of messy > master-master replication issues. > > Is this accurate? As I don't know the kind of your application and business as well as your structure of code you already have I cannot say for sure. There is no golden-100%-all-will-be-solved-rule ... this is what I can say. Cheers, Frank
On Sun, Jul 24, 2011 at 11:53 PM, Sim Zacks <sim@compulab.co.il> wrote: > > The goal is to make our system client agnostic, Most of our GUI is written > in wxpython, we also have some web functions and even a barcode terminal > function, written in C#. We would like to use an application server, so that > all the code is run on the server but we don't want to be web-based. I don't > want to have the same business logic written into various clients as that > causes maintenance headaches. Ok. So we are talking about a multi-application database, and you are trying to provide some degree of consistency in business logic across the software applications (and hence uses of your data). So far, so good. Agreed to put anything that might need to be in common. > The way that postgresql works is that each session is its own process. That > means that from the server perspective (at least in most ways), my plpython > function is not connected to any other call on the database. My goal is to > allow any client to request functionality and have it execute the same way > every time. So.... I assume that means using Pl/Mono to make your barcode stuff work in the db too? > > Data logic is built in by use of constraints and triggers and some > functions, business logic is built only into functions. I guess I am using "data logic" more broadly than you. Any select, update, or insert statement against a relation (except, of course, simple function calls) is part of that data logic, and there are supporting dependencies too, such as security and permissions. IOW, I think a lot of 'business logic' is data logic. I separate this with what you actually use the data for. IOW, Generate a report with the following format is data logic (even perhaps, and email it to the head of product ordering), but use this report to decide what inventory to order is application logic (you probably want people making these decisions). Similarly "Generate me a list of invoices to be paid in the following format": is data logic, but selecting those invoices to be paid, entering the data, determining what bank account you want to pay them from etc. is application logic. Recording the payments is again data logic. Application logic might be better thought of as workflow logic. > > In other words, our postgresql server is a hybrid database/application > server. Right. One thing we are seeing in LedgerSMB as we go through this is that the database does indeed become an application server. The areas which are taking the new approach are mostly SQL and HTML templates (in TemplateToolkit), while the Perl code is shrinking pretty fast. I guess what I am wondering is if you are using it as an application server, is there a benefit to trying to put everything in the database? Or just those parts which necessarily follow from the database independent of workflow, etc or are likely to be common across diverse applications hitting your server? If so, what is it? > > As I mentioned, I am still playing around and testing this, and have > The concept is that each client has a SocketServer thread that listens on a > specific port (8080 in this case). Ok, so just pointing out here that if you go this route you are starting to break the client-agnosticism requirement, unless I misunderstand what you are trying to do :-). > My database function is called python_prompt pasted below. > It has a timeout of 2 minutes after which the function errors out with a > timeout. The user will get a message that the function did not complete. The > important thing is that it has a timeout. If the timeout is 20 minutes, that > is also ok. Depending on the query that calls this function, there do not > have to be any locks being used and as this is its own process and it is > basically sleeping while waiting for the answer, it doesn't use resources > (aside from a connection and the work mem in the calling query, of course). Now, if for sake of argument this happens after an insert or update, that means things are locked for that time, right? This would be more likely to cause deadlocks and if the person steps up entering the data, there would be a rollback, right? > > Using this, a plpgsql query can prompt the user, "You are about to use more > stock then you have, are you sure you want to do this?" > > CREATE OR REPLACE FUNCTION python_prompt(v_question text) > RETURNS boolean AS > $BODY$ > import time > import socket > ipaddr=plpy.execute("select inet_client_addr()")[0]["inet_client_addr"] > HOST, PORT = str(ipaddr), 8080 > sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > sock.settimeout(120) > sock.connect((HOST, PORT)) > sock.send(v_question + "\n") > ans=sock.recv(1024) > sock.close() > if ans=="yes": > return true > else: > return false > $BODY$ > LANGUAGE plpythonu VOLATILE; I guess I am wondering what you get by requiring that a client listens on port 8080 AND knows how to communicate with the server on it that you don't get from assuming that the client can manage the workflow (breaking the routines into two separate ones and prompting for input between). I also think there is a better way to solve this problem (see below). > > I haven't fully decided to use this yet, but one of my goals is to make my > GUI as dumb as possible. > Probably a personal preference but I don't like to try to push something as far as possible :-) Anyway, here's an alternative to your interactive query approach, the one we use for LedgerSMB user creation for 1.3. it';s the "if you don't tell me to do it anyway, I am raising an exception" approach. Basically the problem is that db users are used to enforce application permissions, and db users are cluster-wide while application users are specific to a database. So what happens when you are running two companies (which have to have complete data separation and different permissions) but a single user account is needed to access both of them? What we decided to do is have a stored procedure which creates the user, sets the password (valid for 1 day by default, until changed by the user, at which point it is valid for an admin-specified time). Now, one parameter to the function is whether this is an import of an existing db role or not (i.e. not == creation of new user/role). In the event you specify that this is an import, if you also specify a password, you get an error unconditionally. Don't want to change password on import. However, if you don't specify that this is an import and the role exists you get a different exception. Exception strings in this case are short but descriptive ('No password allowed,' and 'Duplicate user'), and these are handed back up to the application to process. The application then can flag that this role exists, and the user of the software can resubmit as an import. Now there's no reason the application if it wasn't a web-based app couldn't just pop up a confirmation box saying "This role exists. Do you want to import? Yes/no" and handle it there. Similarly if you add a do_anyway arg to your invoice item query, you can raise an exception "Negative Inventory" and the application can handle that, flag the user, and let the user decide whether to do it anyway. It seems to me that as you have timeouts, the application has to be prepared to retry functions anyway. Also one huge cost to putting too much interactive logic in the database in some environments. I haven't yet found a decent way of localizing strings in database queries. That means that if you want to support multiple languages, it's a LOT easier to deal with these strings if they are in the client codebase than if they are in the server codebase and interpolated in the client (gettext throws errors with variable interpolation if you identify the string on the server and mix localization of client-side things with server-side strings). I suppose you can patch gettext to ignore these errors....... Best Wishes, Chris Travers
On Mon, Jul 25, 2011 at 1:24 AM, Sim Zacks <sim@compulab.co.il> wrote: > > If I understand you correctly, you are saying that to handle business logic > processing, I may require X servers. Only a percentage of that traffic > actually requires database processing. if I use a cluster of application > servers against a single database, it will scale better then if I have to > cluster my database, which brings in all sorts of messy master-master > replication issues. > > Is this accurate? > I'd also say I have seen applications not put enough logic in the database, and that this can cause worse bottlenecks. Last time I looked at SQL-Ledger's payment processing interface, it did in Perl in the web app what would have been far better solved with a HAVING statement in the SQL (retrieve ALL invoices ever entered, along with aggregated payments and then compare these numbers in Perl and discard those which are already paid). Needless to say, this approach, when run on a large database, caused DB and web server bottlenecks....... Doh! Best Wishes, Chris Travers
On Sun, Jul 24, 2011 at 12:51 AM, Chris Travers <chris.travers@gmail.com> wrote: >> I was thinking similar thoughts, but you not only beat me to it, you made >> some good points I had not thought of! >> >> The only thing I can think of adding: is that it would be good to lock down >> the database so that only the middleware can access it, everything else >> accesses the database via the middleware. > > In general, I am not convinced that middleware is inherently more > maintainable than in-db procedures. exactly. procedural middlewares written in languages like java tend to be bug factories: *) over-(mis-)use of threads *) performance wins moving logic outside the database to scale it are balanced out by the extra traffic *) database concurrency issues *) OO representation of data is brittle and inflexible *) extremely (and IMNSHO unnecessarily) verbose *) framework all over the place *) libraries all over the place On the plus side though, languages like java have huge pools of available talent and excellent tools. These factors are *critical* for many IT companies. plpgsql may be the bee's knee's (it is) but DIY tools and methodologies tends to translate directly to a high cost of labor, and application developers used to rich tool environments tend to really dislike code practices that pl/pgsql requires like debugging by logging and remembering where stuff is and what it does. plpgsql is a niche language that tends to attract the same really smart people who write code in a more functional style. When done well, you can do a lot with a very small amount of code. I would love to see stronger validation features (with a higher emphasis on warnings) and better tools/editors to help bring plpgsql to the mainstream. merlin
On Mon, Jul 25, 2011 at 12:33 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > exactly. procedural middlewares written in languages like java tend to > be bug factories: > *) over-(mis-)use of threads > *) performance wins moving logic outside the database to scale it are > balanced out by the extra traffic Well, typically you have other performance issues other than the traffic. A well written SQL query can do more, and more efficiently, than most application programmers realize. The tendency to move stuff out of the database IMHO often occurs at the same time the SQL code itself ends up being less efficient. A lot of things (aggregates for example) use of CPU cycles on the db server that could probably be saved by moving things out of the db, but only at the cost of memory usage. I have seen CRAZY stuff coming out of middleware and I tend to suspect that the "move it out of the db" crowd sometimes tend to be stuck thinking in their languages and thus unable to take advantage of good, well written SQL. I have seen Perl code be used instead of HAVING clauses. I have seen loops through query results, firing other queries..... Ok, to be fair, I would call that program as something way below average in quality but still. > *) database concurrency issues > *) OO representation of data is brittle and inflexible And the DB ends up being designed around the ORM....... Which sucks....... > *) extremely (and IMNSHO unnecessarily) verbose > *) framework all over the place > *) libraries all over the place generally agreed. > > On the plus side though, languages like java have huge pools of > available talent and excellent tools. These factors are *critical* > for many IT companies. plpgsql may be the bee's knee's (it is) but > DIY tools and methodologies tends to translate directly to a high cost > of labor, and application developers used to rich tool environments > tend to really dislike code practices that pl/pgsql requires like > debugging by logging and remembering where stuff is and what it does. I dunno. It depends on how you use plpgsql. The approach we have taken in the LedgerSMB project is to use stored procs basically as what has been called "named queries." We try to put as much as possible into single SQL statements as possible, and we extensively use PLPGSQL's syntactic sugar to make things more manageable (a very large number of our stored procs, probably a clear majority, could be written in SQL). I think that if you come up with all these excellent tools, the quality of code will go down and you will end up with something that approximates most of the Middleware problems today. IMHO, the best use for PLPGSQL is that of straight SQL queries with just a little procedural logic around it. There are exceptions due to the fact that utility statements are unparameterized...... But for the most part, that is what I have found to work best. > > plpgsql is a niche language that tends to attract the same really > smart people who write code in a more functional style. When done > well, you can do a lot with a very small amount of code. I would love > to see stronger validation features (with a higher emphasis on > warnings) and better tools/editors to help bring plpgsql to the > mainstream. What kinds of tools/editors do you think we need? A heck of us really like VIM or EMACS (just don't ask which is best ;-) ). Best Wishes, Chris Travers
On 07/25/2011 06:24 PM, Chris Travers wrote:
No need for PL/Mono or any other client specific language. The GUI should be dumb, so all I really need to program design is the interface and input output methods. When you push a button, it should call the appropriate function. The functions on the barcode terminal (which is a Windows Mobile platform) can also be run on the client application written in wxpython. Keeping as much as possible off of the client allows me to share the work and be sure that both clients do the exact same thing when the user pushes the button.On Sun, Jul 24, 2011 at 11:53 PM, Sim Zacks <sim@compulab.co.il> wrote:The goal is to make our system client agnostic, Most of our GUI is written in wxpython, we also have some web functions and even a barcode terminal function, written in C#. We would like to use an application server, so that all the code is run on the server but we don't want to be web-based. I don't want to have the same business logic written into various clients as that causes maintenance headaches.Ok. So we are talking about a multi-application database, and you are trying to provide some degree of consistency in business logic across the software applications (and hence uses of your data). So far, so good. Agreed to put anything that might need to be in common.The way that postgresql works is that each session is its own process. That means that from the server perspective (at least in most ways), my plpython function is not connected to any other call on the database. My goal is to allow any client to request functionality and have it execute the same way every time.So.... I assume that means using Pl/Mono to make your barcode stuff work in the db too?
There is always functionality that needs to be on the client, however in my experience with multi-client applications, if it is business logic you will end up duplicating it and then needing to maintain 2 copies of the same thing.
So much of human interaction is definition of terms. Now I understand what you mean by data logic.Data logic is built in by use of constraints and triggers and some functions, business logic is built only into functions.I guess I am using "data logic" more broadly than you. Any select, update, or insert statement against a relation (except, of course, simple function calls) is part of that data logic, and there are supporting dependencies too, such as security and permissions. IOW, I think a lot of 'business logic' is data logic. I separate this with what you actually use the data for. IOW, Generate a report with the following format is data logic (even perhaps, and email it to the head of product ordering), but use this report to decide what inventory to order is application logic (you probably want people making these decisions). Similarly "Generate me a list of invoices to be paid in the following format": is data logic, but selecting those invoices to be paid, entering the data, determining what bank account you want to pay them from etc. is application logic. Recording the payments is again data logic. Application logic might be better thought of as workflow logic.
To me data logic is strictly whether the data will be considered corrupt if the database allows an action to occur. For example, not enforcing relationships. If you erase the parent the child has no meaning and therefore you have data in the system which is suspect.
As my db is designed with the business rules in mind, some of the business logic is included in the data logic, for example, 3 tables must be updated at the same time in order for the transaction to be considered valid. This is taken care of either through a function or triggers.
An example of Business Logic is when my stock is updated to 0, I want the record deleted.
This is done through a rule that on update that changes the value to 0 delete instead.
The benefits of putting everything into the database is having one platform to maintain. Also, anything that can talk to the database can call the functions, so we have a defined interface for all our functions. If I wanted to use JBOSS, for example, how would my wxpython or my c# app talk to it?In other words, our postgresql server is a hybrid database/application server.Right. One thing we are seeing in LedgerSMB as we go through this is that the database does indeed become an application server. The areas which are taking the new approach are mostly SQL and HTML templates (in TemplateToolkit), while the Perl code is shrinking pretty fast. I guess what I am wondering is if you are using it as an application server, is there a benefit to trying to put everything in the database? Or just those parts which necessarily follow from the database independent of workflow, etc or are likely to be common across diverse applications hitting your server? If so, what is it?
IOW, I did not find a middleware that was more appropriate for a multi-client/multi-platform environment then the database server.
Another big advantage of using the database for business logic, is that the plpython functions can be called from other functions. For example, in the python prompt function I posted earlier, I can have an plpgsql function say
if python_prompt('Are you sure?"') then
process
end if
It can also say
insert into table X with a trigger on the table that calls a send email function.
I have a database function that builds an HTML page and FTPs it to our website. This gives me a number of benefits: a) I don't have to worry about the ftp client version on the client or connection issues, firewalls, etc. b) it can be done with mutliple clients without changing the code, c) I don't have to worry about path issues.
All I need is to know that my server can do it and that my clients can connect to my server.
Any client that has the ability to listen to a port will be able to use this functionality.As I mentioned, I am still playing around and testing this, and have The concept is that each client has a SocketServer thread that listens on a specific port (8080 in this case).Ok, so just pointing out here that if you go this route you are starting to break the client-agnosticism requirement, unless I misunderstand what you are trying to do :-).
In C# for example, I would use System.Net.Sockets and the TcpListener class
In Python I would use the socketserver module
Yes. This is the type of functionality that must be used smartly.My database function is called python_prompt pasted below. It has a timeout of 2 minutes after which the function errors out with a timeout. The user will get a message that the function did not complete. The important thing is that it has a timeout. If the timeout is 20 minutes, that is also ok. Depending on the query that calls this function, there do not have to be any locks being used and as this is its own process and it is basically sleeping while waiting for the answer, it doesn't use resources (aside from a connection and the work mem in the calling query, of course).Now, if for sake of argument this happens after an insert or update, that means things are locked for that time, right? This would be more likely to cause deadlocks and if the person steps up entering the data, there would be a rollback, right?
Breaking it into 2 functions is not always practical because you lose your context (all variables etc..). It is not all done in a single transaction which means when you start the second function you would then have to verify that the work that is there is actually what you put there and nobody modified it.Using this, a plpgsql query can prompt the user, "You are about to use more stock then you have, are you sure you want to do this?" CREATE OR REPLACE FUNCTION python_prompt(v_question text) RETURNS boolean AS $BODY$ import time import socket ipaddr=plpy.execute("select inet_client_addr()")[0]["inet_client_addr"] HOST, PORT = str(ipaddr), 8080 sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.settimeout(120) sock.connect((HOST, PORT)) sock.send(v_question + "\n") ans=sock.recv(1024) sock.close() if ans=="yes": return true else: return false $BODY$ LANGUAGE plpythonu VOLATILE;I guess I am wondering what you get by requiring that a client listens on port 8080 AND knows how to communicate with the server on it that you don't get from assuming that the client can manage the workflow (breaking the routines into two separate ones and prompting for input between). I also think there is a better way to solve this problem (see below).
What we do today (the reason I am playing with this interactivity) is have the database return an error code, which the client understands to be a question. It then prompts the user and the query is sent again with the answer as a parameter. It works but it is ugly (IMO).
I haven't fully decided to use this yet, but one of my goals is to make my GUI as dumb as possible.Probably a personal preference but I don't like to try to push something as far as possible :-) Anyway, here's an alternative to your interactive query approach, the one we use for LedgerSMB user creation for 1.3. it';s the "if you don't tell me to do it anyway, I am raising an exception" approach. Basically the problem is that db users are used to enforce application permissions, and db users are cluster-wide while application users are specific to a database. So what happens when you are running two companies (which have to have complete data separation and different permissions) but a single user account is needed to access both of them? What we decided to do is have a stored procedure which creates the user, sets the password (valid for 1 day by default, until changed by the user, at which point it is valid for an admin-specified time). Now, one parameter to the function is whether this is an import of an existing db role or not (i.e. not == creation of new user/role). In the event you specify that this is an import, if you also specify a password, you get an error unconditionally. Don't want to change password on import. However, if you don't specify that this is an import and the role exists you get a different exception. Exception strings in this case are short but descriptive ('No password allowed,' and 'Duplicate user'), and these are handed back up to the application to process. The application then can flag that this role exists, and the user of the software can resubmit as an import. Now there's no reason the application if it wasn't a web-based app couldn't just pop up a confirmation box saying "This role exists. Do you want to import? Yes/no" and handle it there. Similarly if you add a do_anyway arg to your invoice item query, you can raise an exception "Negative Inventory" and the application can handle that, flag the user, and let the user decide whether to do it anyway. It seems to me that as you have timeouts, the application has to be prepared to retry functions anyway. Also one huge cost to putting too much interactive logic in the database in some environments. I haven't yet found a decent way of localizing strings in database queries. That means that if you want to support multiple languages, it's a LOT easier to deal with these strings if they are in the client codebase than if they are in the server codebase and interpolated in the client (gettext throws errors with variable interpolation if you identify the string on the server and mix localization of client-side things with server-side strings). I suppose you can patch gettext to ignore these errors....... Best Wishes, Chris Travers
On Tue, Jul 26, 2011 at 1:04 AM, Sim Zacks <sim@compulab.co.il> wrote: > No need for PL/Mono or any other client specific language. The GUI should be > dumb, so all I really need to program design is the interface and input > output methods. When you push a button, it should call the appropriate > function. The functions on the barcode terminal (which is a Windows Mobile > platform) can also be run on the client application written in wxpython. > Keeping as much as possible off of the client allows me to share the work > and be sure that both clients do the exact same thing when the user pushes > the button. > > There is always functionality that needs to be on the client, however in my > experience with multi-client applications, if it is business logic you will > end up duplicating it and then needing to maintain 2 copies of the same > thing. I am not sure it applies to all business logic. For example suppose the product team and the sales team both need to be able to be assigned tickets relating to customer feedback. They may have very different rules and processes for dealing with that. There is commonality (how the data is stored, retrieved, presented to the application), but if you want to enforce the different rules, you are going to have to put the differences in business logic somewhere. > > So much of human interaction is definition of terms. Now I understand what > you mean by data logic. > > To me data logic is strictly whether the data will be considered corrupt if > the database allows an action to occur. For example, not enforcing > relationships. If you erase the parent the child has no meaning and > therefore you have data in the system which is suspect. > As my db is designed with the business rules in mind, some of the business > logic is included in the data logic, for example, 3 tables must be updated > at the same time in order for the transaction to be considered valid. This > is taken care of either through a function or triggers. > > An example of Business Logic is when my stock is updated to 0, I want the > record deleted. > This is done through a rule that on update that changes the value to 0 > delete instead. Ok, that's a pretty simple example of logic that belongs in the database. > > The benefits of putting everything into the database is having one platform > to maintain. Also, anything that can talk to the database can call the > functions, so we have a defined interface for all our functions. If I wanted > to use JBOSS, for example, how would my wxpython or my c# app talk to it? That's not really what I am getting at. The question could be better phrased: Why put workflow-specific logic in the database (your interactive query idea is an example of that)? Why not use the db procedures to create a well-formed API which enforces business logic consistently across applications, but allows the applications to manage their own workflow? Isn't this what you'd want to use middleware for if you were going this route? > IOW, I did not find a middleware that was more appropriate for a > multi-client/multi-platform environment then the database server. > Another big advantage of using the database for business logic, is that the > plpython functions can be called from other functions. For example, in the > python prompt function I posted earlier, I can have an plpgsql function say > if python_prompt('Are you sure?"') then > process > end if So if arbitrary model function calls workflow functions, does that mean you have already broken out of the MVC approach? I mean does the MVC approach allow the model to control the controller? I guess the approach I take is to put as much of the model in the database as possible, and put the controllers on the client. Deciding when to prompt the user for input seems to me it should be a controller function. > Any client that has the ability to listen to a port will be able to use this > functionality. > In C# for example, I would use System.Net.Sockets and the TcpListener class > In Python I would use the socketserver module Any application also has an ability to call database functions as an API as well. > What we do today (the reason I am playing with this interactivity) is have > the database return an error code, which the client understands to be a > question. It then prompts the user and the query is sent again with the > answer as a parameter. It works but it is ugly (IMO). That's what we do with LedgerSMB and the user creation workflow. I think it has the advantage of a cleaner API. For example, our function definition for the save_user function is: CREATE OR REPLACE FUNCTION admin__save_user( in_id int, in_entity_id INT, in_username text, in_password TEXT, in_import BOOL ) returns int AS $$ There are some nice things about this. It allows the client application to specify at the outset whether the prospective user is to be imported into the application (i.e. created as a user of the application without having a new role created or whether this a new user that needs to have these created. The actual checking is done by: PERFORM rolname FROM pg_roles WHERE rolname = in_username; t_is_role := found; t_is_user := admin__is_user(in_username); IF t_is_role is true and t_is_user is false and in_import is false THEN RAISE EXCEPTION 'Duplicate user'; END IF; if t_is_role and in_password is not null then execute 'ALTER USER ' || quote_ident( in_username ) || ' WITH ENCRYPTED PASSWORD ' || quote_literal (in_password) || $e$ valid until $e$ || quote_literal(now() + '1 day'::interval); elsif in_import is false AND t_is_user is false AND in_password IS NULL THEN RAISE EXCEPTION 'No password'; elsif t_is_role is false THEN -- create an actual user execute 'CREATE USER ' || quote_ident( in_username ) || ' WITH ENCRYPTED PASSWORD ' || quote_literal (in_password) || $e$ valid until $e$ || quote_literal(now() + '1 day'::interval); END IF; In other words.... if there is a problem we raise an exception and the client has to check the exception code. In that case we don't return anything. The client can then respond to the exception string and either tell the user what happened or otherwise handle the error. So for example if the "No password" exception is hit, the client application can tell the user "I am sorry, but you must specify a password when creating a new user." On the other hand the "Duplicate user" exception can be checked and if that is the case, display a dialog: "Username already taken by another postgresql user. Import that user into LedgerSMB?" (yes/no). "Yes" can resubmit without a password field, and with in_import set to true. One major win of doing things this way is that the output can be readily localized. With the PLPython function I am not at all sure it can be without a lot of error-prone, manual extraction of strings. This may not be a big deal for an in-house app used in one country, but it could be a bigger issue as soon as national borders are crossed unless you expect everyone to be using the application in one specific language. And it keeps the API clean while keeping the client from also having to be a server ;-) Best Wishes, Chris Travers
On Mon, Jul 25, 2011 at 5:21 PM, Chris Travers <chris.travers@gmail.com> wrote: > On Mon, Jul 25, 2011 at 12:33 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > >> exactly. procedural middlewares written in languages like java tend to >> be bug factories: >> *) over-(mis-)use of threads >> *) performance wins moving logic outside the database to scale it are >> balanced out by the extra traffic > > Well, typically you have other performance issues other than the > traffic. A well written SQL query can do more, and more efficiently, > than most application programmers realize. The tendency to move stuff > out of the database IMHO often occurs at the same time the SQL code > itself ends up being less efficient. A lot of things (aggregates for > example) use of CPU cycles on the db server that could probably be > saved by moving things out of the db, but only at the cost of memory > usage. > > I have seen CRAZY stuff coming out of middleware and I tend to suspect > that the "move it out of the db" crowd sometimes tend to be stuck > thinking in their languages and thus unable to take advantage of good, > well written SQL. > > I have seen Perl code be used instead of HAVING clauses. I have seen > loops through query results, firing other queries..... Ok, to be > fair, I would call that program as something way below average in > quality but still. > >> *) database concurrency issues >> *) OO representation of data is brittle and inflexible > > And the DB ends up being designed around the ORM....... Which sucks....... > >> *) extremely (and IMNSHO unnecessarily) verbose >> *) framework all over the place >> *) libraries all over the place > > generally agreed. > >> >> On the plus side though, languages like java have huge pools of >> available talent and excellent tools. These factors are *critical* >> for many IT companies. plpgsql may be the bee's knee's (it is) but >> DIY tools and methodologies tends to translate directly to a high cost >> of labor, and application developers used to rich tool environments >> tend to really dislike code practices that pl/pgsql requires like >> debugging by logging and remembering where stuff is and what it does. > > I dunno. It depends on how you use plpgsql. > > The approach we have taken in the LedgerSMB project is to use stored > procs basically as what has been called "named queries." We try to > put as much as possible into single SQL statements as possible, and we > extensively use PLPGSQL's syntactic sugar to make things more > manageable (a very large number of our stored procs, probably a clear > majority, could be written in SQL). > > I think that if you come up with all these excellent tools, the > quality of code will go down and you will end up with something that > approximates most of the Middleware problems today. IMHO, the best > use for PLPGSQL is that of straight SQL queries with just a little > procedural logic around it. There are exceptions due to the fact that > utility statements are unparameterized...... But for the most part, > that is what I have found to work best. > >> >> plpgsql is a niche language that tends to attract the same really >> smart people who write code in a more functional style. When done >> well, you can do a lot with a very small amount of code. I would love >> to see stronger validation features (with a higher emphasis on >> warnings) and better tools/editors to help bring plpgsql to the >> mainstream. > > What kinds of tools/editors do you think we need? A heck of us really > like VIM or EMACS (just don't ask which is best ;-) ). Of course I pretty much agree on all points but you are preaching to the choir. You and I both code in a spartan, console heavy manner. Of course we prefer plpgsql because over the course of years of database application programming we've realized the advantages of first class queries and having code running tightly coupled with the data. Unfortunately when I talk about these concepts to the majority of programmers I've worked with over the years they look at me as if I just got off my spacecraft from Neptune. I think I've been cursed due to have discovered the secret to fast, efficient programming while continually being constrained from developing that way. Then again, most programmers probably feel like that :-). merlin
On Tue, Jul 26, 2011 at 8:13 PM, Karl Nack <karlnack@futurityinc.com> wrote: >> The current svn trunk (to be 1.3) does. > > So how far do you take this? I've been playing around with plpgsql a bit > and am pretty encouraged by what I'm discovering; now I'm at a point > where I'm thinking, "how far do/can I go with this?" Here are the limitations I have discovered: 1) Localization of exception strings is a bit of a problem. Hence exceptions need to be aimed at communicating to the application rather than the user. 2) Difficulties passing complex data structures back and forth and properly parsing it in the application. Currently we do a lot with two dimensional arrays but will probably shift to more arrays of complex types as we drop support for older versions of PostgreSQL and DBD::Pg. There are queries which do a lot of things in the db in a single SQL statement. The longest single SQL statement I have found thus far is a bit over 100 lines long (due to complex requirements and some shortcomings in the db schema we have inherited that we are working on replacing). It's still pretty easy to read and understand at that length, at least when compared to a function in a more general purpose language. Menu data is also stored in the database (application settings and menu argument data are the two areas where key/value modelling is used). The result is that the Perl codebase is shrinking in absolute terms, being replaced in part by SQL. However, a rigorous separation of named query and lightweight application logic has allowed us to shrink the amount of code total in the project while significantly adding functionality. > > Probably the best example is input validation. Constraints and triggers > on the database will (or at least should) prevent bad data from being > added to the database, but the UI generally needs to provide more > informative messages than errors thrown by the database, and provide > errors messages for every invalid field, whereas the database will > fail/stop on the first error. Consequently, I find that much of the data > logic ends up being duplicated outside of the database to enhance the > user experience. Might there be a way to move these validation routines > into the database as well, and unify all the data logic into one place? The best option is to use exceptions to communicate to the application what went wrong and then allow the application to handle those exceptions in many cases. In other cases, the application may need to know which inputs are mandatory. In general what we do is side with the exception trapping and handling. This means that if the query fails, we take the sql state, detect the type of error, and display an appropriate message. In some cases ("Access denied") we are terse. In other cases we are adding the full SQL error message to the message simply because the combination of an easy to read description of what happened "Required input not provided" and the sql message mentioning the field is enough for many users to figure out what they did wrong, It's still not idea. > >> > Yes, but I'd implement the constraint "all transactions must balance" as >> > a trigger that fires when the transaction is complete. This would >> > enforce data integrity regardless of whether or not the database API is >> > used, which I think is also important. >> >> That's problematic to do in PostgreSQL because statement-level >> triggers don't have access to statement args, and I don't believe they >> can be deferred. > > In another email I did provide a simple example of how I might implement > this; I don't know if you saw it. Of course, that assumes your database > isn't designed by .... :-) I didn't see it even looking back (I saw the initial attempt and the PHP sample). The ideal interface at the moment is something like SELECT create_transaction(1, current_date, 'Transaction 1', '{(1, 50), (2, -50)}'); This would allow you do do something like: SELECT create_transaction(1, current_date, 'Transaction 1', '{(1, 50), (2, -30), (3, -20)}'); as well since we are now talking about arrays of records. But this is a pain to code to/from SQL in a robust way. Good db drivers sometimes handle this automatically though. Best Wishes, Chris Travers
Le mardi 26 juillet 2011 à 11:23 -0500, Merlin Moncure a écrit : > I think I've been cursed due > to have discovered the secret to fast, efficient programming while > continually being constrained from developing that way. Then again, > most programmers probably feel like that :-). > I think there is a very good reason for that : contrary to the official discourse, I believe executives in organizations actively maintain complexity, consciously or not. See the accomplishments of free/open source software in sometimes very complex projects, without the need for a hierarchical management structure. I know I'm astounded. -- Vincent Veyron http://marica.fr/ Logiciel de gestion des sinistres et des contentieux pour le service juridique
> The current svn trunk (to be 1.3) does. So how far do you take this? I've been playing around with plpgsql a bit and am pretty encouraged by what I'm discovering; now I'm at a point where I'm thinking, "how far do/can I go with this?" Probably the best example is input validation. Constraints and triggers on the database will (or at least should) prevent bad data from being added to the database, but the UI generally needs to provide more informative messages than errors thrown by the database, and provide errors messages for every invalid field, whereas the database will fail/stop on the first error. Consequently, I find that much of the data logic ends up being duplicated outside of the database to enhance the user experience. Might there be a way to move these validation routines into the database as well, and unify all the data logic into one place? > > Yes, but I'd implement the constraint "all transactions must balance" as > > a trigger that fires when the transaction is complete. This would > > enforce data integrity regardless of whether or not the database API is > > used, which I think is also important. > > That's problematic to do in PostgreSQL because statement-level > triggers don't have access to statement args, and I don't believe they > can be deferred. In another email I did provide a simple example of how I might implement this; I don't know if you saw it. Of course, that assumes your database isn't designed by .... :-) Karl Nack Futurity, Inc 5121 N Ravenswood Ave Chicago, IL 60640 773-506-2007
On Wed, Jul 27, 2011 at 1:41 AM, Chris Travers <chris.travers@gmail.com> wrote: > On Tue, Jul 26, 2011 at 8:13 PM, Karl Nack <karlnack@futurityinc.com> wrote: >>> The current svn trunk (to be 1.3) does. >> >> So how far do you take this? I've been playing around with plpgsql a bit >> and am pretty encouraged by what I'm discovering; now I'm at a point >> where I'm thinking, "how far do/can I go with this?" > > Here are the limitations I have discovered: > > 1) Localization of exception strings is a bit of a problem. Hence > exceptions need to be aimed at communicating to the application rather > than the user. > > 2) Difficulties passing complex data structures back and forth and > properly parsing it in the application. Currently we do a lot with > two dimensional arrays but will probably shift to more arrays of > complex types as we drop support for older versions of PostgreSQL and > DBD::Pg. > > There are queries which do a lot of things in the db in a single SQL > statement. The longest single SQL statement I have found thus far is > a bit over 100 lines long (due to complex requirements and some > shortcomings in the db schema we have inherited that we are working on > replacing). It's still pretty easy to read and understand at that > length, at least when compared to a function in a more general purpose > language. > > Menu data is also stored in the database (application settings and > menu argument data are the two areas where key/value modelling is > used). > > The result is that the Perl codebase is shrinking in absolute terms, > being replaced in part by SQL. However, a rigorous separation of > named query and lightweight application logic has allowed us to shrink > the amount of code total in the project while significantly adding > functionality. > >> >> Probably the best example is input validation. Constraints and triggers >> on the database will (or at least should) prevent bad data from being >> added to the database, but the UI generally needs to provide more >> informative messages than errors thrown by the database, and provide >> errors messages for every invalid field, whereas the database will >> fail/stop on the first error. Consequently, I find that much of the data >> logic ends up being duplicated outside of the database to enhance the >> user experience. Might there be a way to move these validation routines >> into the database as well, and unify all the data logic into one place? > > The best option is to use exceptions to communicate to the application > what went wrong and then allow the application to handle those > exceptions in many cases. In other cases, the application may need to > know which inputs are mandatory. > > In general what we do is side with the exception trapping and > handling. This means that if the query fails, we take the sql state, > detect the type of error, and display an appropriate message. In some > cases ("Access denied") we are terse. In other cases we are adding > the full SQL error message to the message simply because the > combination of an easy to read description of what happened "Required > input not provided" and the sql message mentioning the field is enough > for many users to figure out what they did wrong, It's still not > idea. > >> >>> > Yes, but I'd implement the constraint "all transactions must balance" as >>> > a trigger that fires when the transaction is complete. This would >>> > enforce data integrity regardless of whether or not the database API is >>> > used, which I think is also important. >>> >>> That's problematic to do in PostgreSQL because statement-level >>> triggers don't have access to statement args, and I don't believe they >>> can be deferred. >> >> In another email I did provide a simple example of how I might implement >> this; I don't know if you saw it. Of course, that assumes your database >> isn't designed by .... :-) > > I didn't see it even looking back (I saw the initial attempt and the > PHP sample). > > The ideal interface at the moment is something like > > SELECT create_transaction(1, current_date, 'Transaction 1', '{(1, 50), > (2, -50)}'); > > This would allow you do do something like: > SELECT create_transaction(1, current_date, 'Transaction 1', '{(1, 50), > (2, -30), (3, -20)}'); as well since we are now talking about arrays of records. > > But this is a pain to code to/from SQL in a robust way. Good db > drivers sometimes handle this automatically though. Note, we wrote libpqtypes (http://libpqtypes.esilo.com/) precisely to deal with this problem -- first class handling of arrays and composites in the client. It's not much help for a perl client, but I think similar methodologies can be made for most languages. Sending rich data structures directly to procedures in the database transforms the way the application/database communications work for the better. It's new and weird to many developers, especially those trained on ORM usage patterns, but is also entirely effective. merlin
On Wed, Jul 27, 2011 at 7:01 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > Note, we wrote libpqtypes (http://libpqtypes.esilo.com/) precisely to > deal with this problem -- first class handling of arrays and > composites in the client. It's not much help for a perl client, but I > think similar methodologies can be made for most languages. Sending > rich data structures directly to procedures in the database transforms > the way the application/database communications work for the better. > It's new and weird to many developers, especially those trained on ORM > usage patterns, but is also entirely effective. > Cool :-) As I understand it DBD::Pg has excellent handling of both these things too. The reason we are not doing more with the composite types yet is because we currently support versions of DBD::Pg which support arrays well but not the composite types, though that will probably change in 1.4. I wonder which other languages have first class support for these areas of Pg? Best Wishes, Chris Travers
> I wonder which other languages have first class support for these areas of > Pg? While already supporting most if not all standard PG datatypes the psycopg2 interface lets you write in/out wrappers of arbitray complexity mapping PG <-> Python datatypes and insert them into the driver at runtime. Karsten -- NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie! Jetzt informieren: http://www.gmx.net/de/go/freephone
> I wonder which other languages have first class support for these areas > of Pg? It'd be nice if PHP could get there. :p Maybe it's time to look at some of these other languages. Or finally learn C and try hacking on the extension myself. Karl Nack Futurity, Inc 5121 N Ravenswood Ave Chicago, IL 60640 773-506-2007
On Wed, Jul 27, 2011 at 04:35:45PM +0200, Karsten Hilbert wrote: > > I wonder which other languages have first class support for these areas of > > Pg? > > While already supporting most if not all standard PG datatypes the > psycopg2 interface lets you write in/out wrappers of arbitray > complexity mapping PG <-> Python datatypes and insert them into > the driver at runtime. The same is true for the Chicken Scheme PostgreSQL egg. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth
> The best option is to use exceptions to communicate to the application > what went wrong and then allow the application to handle those > exceptions in many cases. In other cases, the application may need to > know which inputs are mandatory. So other than sanitizing input and making sure it's an appropriate data type, are you relying solely on the database for all your input validation? I guess a good generic example might be setting up a new user account, which might require a username, password, real name, and birth date. We might also put a reasonable constraint that, at the very least, the birth date cannot be greater than the current date. Now if the user submitted a blank page, ideally the page would come back with four errors, one for each field that was left blank. If you submitted this basically as-is to the database, it'd arbitrarily fail on the first column that didn't meet the NOT NULL constraint, and that would be the only error sent back to the client. So yes, this would work, but in theory it could take four or five times before every error was identified and the user notified. > > In another email I did provide a simple example of how I might implement > > this; I don't know if you saw it. Of course, that assumes your database > > isn't designed by .... :-) > > I didn't see it even looking back (I saw the initial attempt and the > PHP sample). Here's a link to the archived message: http://archives.postgresql.org/pgsql-general/2011-07/msg00631.php Feel free to comment/praise/criticize! :-) > SELECT create_transaction(1, current_date, 'Transaction 1', '{(1, 50), > (2, -30), (3, -20)}'); as well since we are now talking about arrays of > records. > > But this is a pain to code to/from SQL in a robust way. Good db > drivers sometimes handle this automatically though. Yes, I've coded exactly this with a plpgsql function and have been mostly pleased by how easy it is. Unfortunately, at least with PHP, it's not so straight forward to format user input into an SQL statement that uses arrays and composite types. It's even worse going the other way -- just Google how to convert SQL arrays into PHP arrays. :-( Karl Nack Futurity, Inc 5121 N Ravenswood Ave Chicago, IL 60640 773-506-2007
On 07/27/2011 07:18 PM, Karl Nack wrote: >> The best option is to use exceptions to communicate to the application >> what went wrong and then allow the application to handle those >> exceptions in many cases. In other cases, the application may need to >> know which inputs are mandatory. > So other than sanitizing input and making sure it's an appropriate data > type, are you relying solely on the database for all your input > validation? > > I guess a good generic example might be setting up a new user account, > which might require a username, password, real name, and birth date. We > might also put a reasonable constraint that, at the very least, the > birth date cannot be greater than the current date. Now if the user > submitted a blank page, ideally the page would come back with four > errors, one for each field that was left blank. If you submitted this > basically as-is to the database, it'd arbitrarily fail on the first > column that didn't meet the NOT NULL constraint, and that would be the > only error sent back to the client. So yes, this would work, but in > theory it could take four or five times before every error was > identified and the user notified. You can certainly have your function do all the tests before trying to insert the values. If you have 4 fields that require validation, have a return code of 4 chars (0000) Each char stands for 1 field. If it is a 0 it worked, if it is a 1 it didn't. When you finish your tests, check the value, if it is all 0s it worked and do your insert otherwise return the value. This is the binary method. Another opinion is that validation checks should be done by the client and only good input should be given to the function to begin with.