Thread: Implementing "thick"/"fat" databases

Implementing "thick"/"fat" databases

From

"Karl Nack"

Date:

22 July 2011, 16:21:29

I've been following a few blogs
(http://database-programmer.blogspot.com/,
http://thehelsinkideclaration.blogspot.com/) that make a very compelling
argument, in my opinion, to move as much business/transactional logic as
possible into the database, so that client applications become little
more than moving data into and out of the database using a well-defined
API, most commonly (but not necessarily) through the use of stored
procedures. Although there seems to be a lot of discussion out there of
the reasons why one might want to do this, I'm really at a loss for
finding good, concrete examples of how to do it. Consequently, I'm
hoping that somebody can share their experience(s), or point me to some
examples, of doing this with PostgreSQL. I'd consider myself fairly
well-versed in using the various features of PostgreSQL to enforce data
integrity, but much less so for implementing transactional logic.

To focus on a more concrete example, let's consider adding a financial
transaction to the database. The "traditional" way to do this, with the
business logic in the application layer, leaves us with two steps:
insert the transaction "header", then insert the line items:

BEGIN;

INSERT INTO transaction (id, date, description)
VALUES (1, CURRENT_DATE, 'Transaction 1');

INSERT INTO line_item (transaction_id, account_id, amount)
VALUES (1, 1, 50), (1, 2, -50);

END;


Now if we start moving this logic to the database, we'd have something
like:

BEGIN;
SELECT create_transaction(1, current_date, 'Transaction 1');
SELECT create_line_item(1, 1, 50);
SELECT create_line_item(1, 1, -50);
END;


But we've actually taken a step back, since we're making a round-trip to
the database for each line item. That could be resolved by doing:

BEGIN;

SELECT create_transaction(1, current_date, 'Transaction 1');

SELECT create_line_item(transaction_id, account_id, amount)
FROM (VALUES (1, 1, 50), (1, 2, -50))
    AS line_item (transaction_id, account_id, amount);

END;


Better, but still not good, since we're invoking the function for each
individual line item, which ultimately means separate INSERTs for each
one. What we'd want is something like:

BEGIN;
SELECT create_transaction(1, current_date, 'Transaction 1');
SELECT create_line_items(((1, 1, 50), (1, 2, -50)));
END;


But this still falls short, since we're still basically managing the
transaction in the application layer. The holy grail, so to speak, would
be:

SELECT create_transaction(1, current_date, 'Transaction 1', ((1, 50),
(2, -50)));


Perhaps I just need to spend more time digging through the
documentation, but I really have no idea how to do something like this,
or if it's even possible. I'm really hoping someone can provide an
example, point me to some resources, or even just share their real-world
experience of doing something like this. It would be very much
appreciated.

Thanks.


Karl Nack

Futurity, Inc
5121 N Ravenswood Ave
Chicago, IL 60640
773-506-2007

Re: Implementing "thick"/"fat" databases

From

Darren Duncan

Date:

22 July 2011, 20:11:29

Karl Nack wrote:
> I've been following a few blogs
> (http://database-programmer.blogspot.com/,
> http://thehelsinkideclaration.blogspot.com/) that make a very compelling
> argument, in my opinion, to move as much business/transactional logic as
> possible into the database, so that client applications become little
> more than moving data into and out of the database using a well-defined
> API, most commonly (but not necessarily) through the use of stored
> procedures.

I strongly agree with that design philosophy.  One principle is that the buck
stops with the database and that regardless of what the application does, any
business logic should be enforced by the database itself.  Another principle is
to treat the database like a code library, where the tables are its internal
variables and its public API is stored procedures.  Using stored procedures
means you can interact with the database from your application in the same way
your application interacts with itself, meaning with parameterized routine calls.

<snip>
> To focus on a more concrete example, let's consider adding a financial
> transaction to the database. The "traditional" way to do this, with the
> business logic in the application layer, leaves us with two steps:
> insert the transaction "header", then insert the line items:
>
> BEGIN;
>
> INSERT INTO transaction (id, date, description)
> VALUES (1, CURRENT_DATE, 'Transaction 1');
>
> INSERT INTO line_item (transaction_id, account_id, amount)
> VALUES (1, 1, 50), (1, 2, -50);
>
> END;
<snip>

Anything intended to be a single transaction can be a single stored procedure.

The code is something like this (out of my head, adjust to make it correct):

   FUNCTION financial_trans (trans_id, when, desc, dest_acct, src_acct, amt)
   BEGIN
     INSERT INTO transaction (id, date, description)
       VALUES (trans_id, when, desc);
     INSERT INTO line_item (transaction_id, account_id, amount)
       VALUES (trans_id, dest_acct, amt), (trans_id, src_acct, -amt);
   END;

   SELECT financial_trans( 1, CURRENT_DATE, 'Transaction 1', 1, 2, 50 );

> But this still falls short, since we're still basically managing the
> transaction in the application layer. The holy grail, so to speak, would
> be:
>
> SELECT create_transaction(1, current_date, 'Transaction 1', ((1, 50),
> (2, -50)));

Well, not quite, because specifying the number "50" twice would be ridiculous
for such a non-generic function; you can calculate the "-50" from it in the
function.

> Perhaps I just need to spend more time digging through the
> documentation, but I really have no idea how to do something like this,
> or if it's even possible. I'm really hoping someone can provide an
> example, point me to some resources, or even just share their real-world
> experience of doing something like this. It would be very much
> appreciated.

A general rule of thumb, however you would design a routine in a normal
programming language, try to do it that way in PL/PgSQL, assuming that PL/PgSQL
is a competent language, and then tweak to match what you actually can do.

-- Darren Duncan

Re: Implementing "thick"/"fat" databases

From

John R Pierce

Date:

22 July 2011, 21:05:57

On 07/22/11 4:11 PM, Darren Duncan wrote:
> Karl Nack wrote:
>> I've been following a few blogs
>> (http://database-programmer.blogspot.com/,
>> http://thehelsinkideclaration.blogspot.com/) that make a very compelling
>> argument, in my opinion, to move as much business/transactional logic as
>> possible into the database, so that client applications become little
>> more than moving data into and out of the database using a well-defined
>> API, most commonly (but not necessarily) through the use of stored
>> procedures.
>
> I strongly agree with that design philosophy.  One principle is that
> the buck stops with the database and that regardless of what the
> application does, any business logic should be enforced by the
> database itself.  Another principle is to treat the database like a
> code library, where the tables are its internal variables and its
> public API is stored procedures.  Using stored procedures means you
> can interact with the database from your application in the same way
> your application interacts with itself, meaning with parameterized
> routine calls.

the alternative 'modern' architecture is to implement the business logic
in a webservices engine that sits in front of the database, and only use
stored procedures for things that get significant performance boost
where that is needed to meet your performance goals..  Only this
business logic is allowed to directly query the operational database.
The business logic in this middle tier still relies on the database
server for data integrity and such.    The presentation layer is
implemented either in a conventional client application or in a
webserver (not to be confused with the webservices)....  so you have
user -> browser -> webserver/presentation layer -> webservices/business
logic -> database

The main rationale for this sort of design pattern is that large complex
business logic implemented in SQL stored procedures can be rather
difficult to develop and maintain

--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast

Re: Implementing "thick"/"fat" databases

From

Darren Duncan

Date:

22 July 2011, 22:33:02

John R Pierce wrote:
> On 07/22/11 4:11 PM, Darren Duncan wrote:
>> Karl Nack wrote:
>>> I've been following a few blogs
>>> (http://database-programmer.blogspot.com/,
>>> http://thehelsinkideclaration.blogspot.com/) that make a very compelling
>>> argument, in my opinion, to move as much business/transactional logic as
>>> possible into the database, so that client applications become little
>>> more than moving data into and out of the database using a well-defined
>>> API, most commonly (but not necessarily) through the use of stored
>>> procedures.
>>
>> I strongly agree with that design philosophy.  One principle is that
>> the buck stops with the database and that regardless of what the
>> application does, any business logic should be enforced by the
>> database itself.  Another principle is to treat the database like a
>> code library, where the tables are its internal variables and its
>> public API is stored procedures.  Using stored procedures means you
>> can interact with the database from your application in the same way
>> your application interacts with itself, meaning with parameterized
>> routine calls.
>
> the alternative 'modern' architecture is to implement the business logic
> in a webservices engine that sits in front of the database, and only use
> stored procedures for things that get significant performance boost
> where that is needed to meet your performance goals..  Only this
> business logic is allowed to directly query the operational database.
> The business logic in this middle tier still relies on the database
> server for data integrity and such.    The presentation layer is
> implemented either in a conventional client application or in a
> webserver (not to be confused with the webservices)....  so you have
> user -> browser -> webserver/presentation layer -> webservices/business
> logic -> database
>
> The main rationale for this sort of design pattern is that large complex
> business logic implemented in SQL stored procedures can be rather
> difficult to develop and maintain

I should clarify that the primary thing I support, with respect to putting it in
the database, is the business rules/constraints, because the buck stops there.
It should not be possible for any database user lacking in data-definition
privileges to circumvent any of the business rules.  So one can not circumvent
by using a generic SQL shell, for example.

As for the rest, yes I agree with you that this doesn't have to actually be in
the database, though from a standpoint of good design principles, all of the
business logic should still be in one place, next to if not in the database, and
that all database access should go through the business logic layer.

All logic that is not specific to an application should go in a logic layer, so
it is shared by multiple applications whether web or command-line or whatever,
and so then the application is largely just a user interface.

In other words, thinking in the Model-View-Controller paradigm, the Model should
be fat and the Controller should be thin.

-- Darren Duncan

Re: Implementing "thick"/"fat" databases

From

Alban Hertroys

Date:

23 July 2011, 07:02:17

On 22 Jul 2011, at 21:15, Karl Nack wrote:

> But this still falls short, since we're still basically managing the
> transaction in the application layer.

The problem you're facing here is that database statements work with records, while your example has a need to handle a
setof (different types of) records in one go. 

> The holy grail, so to speak, would be:
>
> SELECT create_transaction(1, current_date, 'Transaction 1', ((1, 50),
> (2, -50)));

Now imagine someone not familiar with your design reading this query...
To start with, they're going to assume this query SELECTs data, while it actually inserts it. Secondly, it's impossible
tosee what the different data-values are supposed to represent without looking up the function - and quite possibly,
it'simplementation. They're going to wonder what (1,50) and (2, -50) mean, what kind of date current_date gets assigned
to,etc. 

Having to write queries like these is even worse, even if you designed the function. You'll be looking at your own
documentationa lot while writing these. 

It would seem to me that the API you would provide for business logic like this should provide the users of said API
withenough context to create valid statements. For example, you could use XML to describe the data (I'm no fan of XML,
butit does suit a need here and allows validation of the provided data), especially as Postgres has XML parsing
functionality.
Or you could use a more sophisticated procedural language (plpython or plphp, for example) that's capable of
marshallingand unmarshalling data structures to strings and vice versa (eg. '{foo:1,bar:2}'). 

You would still have a SELECT statement that INSERTs data, which is semantically a bad thing to do IMHO.

Perhaps the better solution is (as others mentioned already) to move the data interpretation to a
(web)service/applicationserver and have that perform the actual database operations. 
With that in mind, you would put business logic ("process an invoice") into an "application server", while you put data
integritylogic ("don't allow transactions with no line_items") into the database. 

Alban Hertroys

--
The scale of a problem often equals the size of an ego.

!DSPAM:737,4e2a9c2112098024710106!

Re: Implementing "thick"/"fat" databases

From

Chris Travers

Date:

23 July 2011, 13:47:16

On Fri, Jul 22, 2011 at 12:15 PM, Karl Nack <karlnack@futurityinc.com> wrote:
> I've been following a few blogs
> (http://database-programmer.blogspot.com/,
> http://thehelsinkideclaration.blogspot.com/) that make a very compelling
> argument, in my opinion, to move as much business/transactional logic as
> possible into the database, so that client applications become little
> more than moving data into and out of the database using a well-defined
> API, most commonly (but not necessarily) through the use of stored
> procedures.

There are costs and benefits associated with this.  The major argument
against is that if you have business processes that may frequently or
quickly change at the requirement level, a heavier-weight process
might not work so well.

On the other hand if your database is being used by more than one
application, or if you want it to be used by one or more application,
then it ideally can provide a way to consistently enforce business
logic and security across multiple applications.  Another benefit is
that depending on your architecture, you might be able to place a
single API to call such procedures, generate parameterized query
strings, and then pass those though, reducing the possibility of one
type of SQL injection.  Note however, you have to worry about other
forms of SQL injection inside your stored procs so this isn't a magic
bullet even if it helps.

So I think it boils down to how much consistency and stability you
want and where you want it.

> Although there seems to be a lot of discussion out there of
> the reasons why one might want to do this, I'm really at a loss for
> finding good, concrete examples of how to do it. Consequently, I'm
> hoping that somebody can share their experience(s), or point me to some
> examples, of doing this with PostgreSQL. I'd consider myself fairly
> well-versed in using the various features of PostgreSQL to enforce data
> integrity, but much less so for implementing transactional logic.

In LedgerSMB, we take this a step further by making the procedures
into discoverable interfaces, so the application logic itself is a
sort of thin glue between a UI layer and the database procedure layer.
 One thing I would suggest is to try to keep API calls as atomic as
possible.  You want to enforce consistency and so you need to have all
relevant inputs passed to the function.  See below for a suggested
change to your API.

>
> To focus on a more concrete example, let's consider adding a financial
> transaction to the database. The "traditional" way to do this, with the
> business logic in the application layer, leaves us with two steps:
> insert the transaction "header", then insert the line items:
>
> BEGIN;
>
> INSERT INTO transaction (id, date, description)
> VALUES (1, CURRENT_DATE, 'Transaction 1');
>
> INSERT INTO line_item (transaction_id, account_id, amount)
> VALUES (1, 1, 50), (1, 2, -50);
>
> END;
>
>
> Now if we start moving this logic to the database, we'd have something
> like:
>
> BEGIN;
> SELECT create_transaction(1, current_date, 'Transaction 1');
> SELECT create_line_item(1, 1, 50);
> SELECT create_line_item(1, 1, -50);
> END;

Now, if you are doing double-entry bookkeeping this doesn't provide
enough consistency, IMO.  You can't check inside the function to
ensure that the transaction is balanced.  it would be better to:

BEGIN;
SELECT create_transaction(1, current_date, 'Transaction 1', '{{1, 1,
50},{1,1,-50}}');
COMMIT;

Now for the application, you can create an API that is semantically
clearer.  But PostgreSQL doesn't provide an easy way of calling
procedures of this sort out of select/update/insert statements and
select is the only way to do this.

 Best Wishes,
Chris Travers

Re: Implementing "thick"/"fat" databases

From

David Johnston

Date:

23 July 2011, 17:11:06

>
> Now, if you are doing double-entry bookkeeping this doesn't provide
> enough consistency, IMO.  You can't check inside the function to
> ensure that the transaction is balanced.  it would be better to:
>
> BEGIN;
> SELECT create_transaction(1, current_date, 'Transaction 1', '{{1, 1,
> 50},{1,1,-50}}');
> COMMIT;
>
> Now for the application, you can create an API that is semantically
> clearer.  But PostgreSQL doesn't provide an easy way of calling
> procedures of this sort out of select/update/insert statements and
> select is the only way to do this.
>
>

One option to consider is restricting final tables but making staging tables available.  You use normal inserts to
buildup the staging table and then validate and transfer the data to the final table using a function.  In the example
youcan build a work-in-process transaction however you see fit but the final real transaction creation process would
firstconfirm that the entry balances before copying the records to the transaction table and cleaning up the
work-in-processtable. 

David J.

Re: Implementing "thick"/"fat" databases

From

David Johnston

Date:

23 July 2011, 17:33:03

>
> Now for the application, you can create an API that is semantically
> clearer.  But PostgreSQL doesn't provide an easy way of calling
> procedures of this sort out of select/update/insert statements and
> select is the only way to do this.
>
>

A semantically accurate way to run "procedures" is:

DO $$ BEGIN PERFORM function(); END $$;

It would be nice if you could use PERFORM outside of plpgsql...

Now, this is not standard SQL but it does convey the desired semantics.  That said, proper function naming can convey
similarsemantic information as well. 

David J.

Re: Implementing "thick"/"fat" databases

From

Chris Travers

Date:

23 July 2011, 18:29:22

On Sat, Jul 23, 2011 at 1:32 PM, David Johnston <polobo@yahoo.com> wrote:
>
>>
>> Now for the application, you can create an API that is semantically
>> clearer.  But PostgreSQL doesn't provide an easy way of calling
>> procedures of this sort out of select/update/insert statements and
>> select is the only way to do this.
>>
>>
>
> A semantically accurate way to run "procedures" is:
>
> DO $$ BEGIN PERFORM function(); END $$;
>
> It would be nice if you could use PERFORM outside of plpgsql...
>
> Now, this is not standard SQL but it does convey the desired semantics.  That said, proper function naming can convey
similarsemantic information as well. 


The other option (one we try to follow in LedgerSMB) is to always make
the procedure return useful information.  So you are essentially
calling a procedure and selecting the result for further use by your
application.

Best Wishes,
Chris Travers

Re: Implementing "thick"/"fat" databases

From

Darren Duncan

Date:

23 July 2011, 19:58:22

Karl Nack wrote:
>>> SELECT create_transaction(1, current_date, 'Transaction 1', ((1,
>>> 50), (2, -50)));
>> Well, not quite, because specifying the number "50" twice would be
>> ridiculous for such a non-generic function; you can calculate the "-
>> 50" from it in the function.
>
> Not if there were more than two line-items per transaction. A paycheck
> is a good example, where generally one or more income accounts are
> credited and multiple tax accounts as well as one or more asset accounts
> are debited. Ensuring that all the line-items add up to 0 would be one
> of the data integrity rules implemented in the database (though most
> likely checked in the application layer as well).

It would help if you clarified your intent with another example.

I was assuming here that we were dealing with a double-entry accounting system
where every line item in one account had a corresponding line item in another
account of equal magnitude, and so all line items were in pairs, because you
showed what looked like 1 pair, hence specifying the "50" once makes sense.

So are you wanting the 1 function to take a set of line-item pairs, or are you
wanting to say do +50 in one account and -25 in each of 2 different accounts, as
a total of 3 line items?  If the former, I would still just specify the "50"
once and have a set of {from, to, amount} triples as an argument, which would be
relation/rowset-typed.  If the latter, then you would specify the "50" twice.

>> A general rule of thumb, however you would design a routine in a
>> normal programming language, try to do it that way in PL/PgSQL,
>> assuming that PL/PgSQL is a competent language, and then tweak to
>> match what you actually can do.
>
> In the language I'm most familiar with, PHP, I could do this with an
> associative array:
>
> $transaction = array(
>     'id' => 1,
>     'date' => date('Y-m-d'),
>     'description' => 'Transaction 1',
>     'line_items' => array(
>         array('account_id' => 1, 'amount' => 50),
>         array('account_id' => 2, 'amount' => -50),
>     ),
> );
>
>From which I can easily build the appropriate SQL statements. This would
> be very similar in Python. I wonder if this would be achievable in
> PL/PGSQL, maybe through the use of composite types and/or domains?

Yes, you could.  PL/PgSQL supports relation/rowset-typed arguments (declared as
"TABLE OF <rowtype>" or something like that).

Or alternately you could use temporary staging tables as quasi-arguments rather
than using an actual argument.

-- Darren Duncan

Re: Implementing "thick"/"fat" databases

From

"Karl Nack"

Date:

23 July 2011, 20:08:27

> > Now, if you are doing double-entry bookkeeping this doesn't provide
> > enough consistency, IMO.  You can't check inside the function to
> > ensure that the transaction is balanced.
>
> One option to consider is restricting final tables but making staging
> tables available.

I would implement this using triggers.

First, add a boolean "is_balanced" column to the transaction table,
along with the following trigger:

create or replace function check_txn_balance()
returns trigger
language plpgsql
as $$
declare
    _amt numeric;
begin
    if 'UPDATE' = TG_OP and new.is_balanced then
        return null;
    end if;

    select sum(amt)
    into _amt
    from line_item
    where txn_id = new.id;

    if _amt <> 0 then
        raise exception 'unbalanced transaction';
    end if;

    update txn
    set is_balanced = true
    where id = new.id;

    return null;
end;
$$;

create constraint trigger check_txn_balance
after insert or update on txn
deferrable initially deferred
for each row execute procedure check_txn_balance();


Then, whenever we add, remove, or update a line item, unbalance the
parent transaction, which triggers the balance check:

create or replace function unbalance_txn()
returns trigger
language plpgsql
as $$
begin
    if 'UPDATE' = TG_OP then
        if (new.txn_id, new.amt) = (old.txn_id, old.amt) then
            return null;
        end if;
    end if;

    if TG_OP in ('INSERT', 'UPDATE') then
        update txn
        set is_balanced = false
        where (id, is_balanced) = (new.txn_id, true);
    end if;

    if TG_OP in ('DELETE', 'UPDATE') then
        update txn
        set is_balanced = false
        where (id, is_balanced) = (old.txn_id, true);
    end if;

    return null;
end;
$$;

create trigger unbalance_txn
after insert or delete or update on line_item
for each row execute procedure unbalance_txn();


At least, this seems to be a fairly efficient and foolproof way to do it
to me.


Karl Nack

Futurity, Inc
5121 N Ravenswood Ave
Chicago, IL 60640
773-506-2007

Re: Implementing "thick"/"fat" databases

From

David Johnston

Date:

23 July 2011, 20:24:22

I was assuming here that we were dealing with a double-entry accounting system where every line item in one account had a corresponding line item in another account of equal magnitude, and so all line items were in pairs, because you showed what looked like 1 pair, hence specifying the "50" once makes sense.

So are you wanting the 1 function to take a set of line-item pairs, or are you wanting to say do +50 in one account and -25 in each of 2 different accounts, as a total of 3 line items? If the former, I would still just specify the "50" once and have a set of {from, to, amount} triples as an argument, which would be relation/rowset-typed. If the latter, then you would specify the "50" twice.

A little OT but all double-entry accounting requires is that the debits and credits of an entry - when totaled - equal zero (are equal); there is no requirement pertaining to each item having an equal but opposite counter-part. Nor are multiple accounts required; some ad-hoc entries simply "re-class" money within a single account so that different "controls" and/or amounts remain.

Ideally what you would want is a type called "accounting detail entry" which represents a credit/debit. For the function you can either make any "negative" amounts credits OR pass in two arrays - one for debits and one for credits. Either way you'd pass in an array of this "accounting detail entry" type along with information such as accounting date and journal number. To avoid using a type I would make a "create_entry" function that you'd use to create the detail items on a staging table, attached to a specific ID, and then create the final entry by calling the original function with user/date/journal/etc information and reference the detail records via the ID.

David J.

Re: Implementing "thick"/"fat" databases

From

Chris Travers

Date:

23 July 2011, 20:51:10

On Sat, Jul 23, 2011 at 3:51 PM, Karl Nack <karlnack@futurityinc.com> wrote:
>> In LedgerSMB, we take this a step further by making the procedures
>> into discoverable interfaces, so the application logic itself is a
>> sort of thin glue between a UI layer and the database procedure layer.
>>  One thing I would suggest is to try to keep API calls as atomic as
>> possible.  You want to enforce consistency and so you need to have all
>> relevant inputs passed to the function.  See below for a suggested
>> change to your API.
>
> Would you say LedgerSMB follows the thick/fat database principle? If so,
> I shall have to spend some time with your source code. I'd be very
> curious to see how you handle this.

The current svn trunk (to be 1.3) does.  Older code in trunk or 1.2
follows the "survival" principle (we inherited one heck of a codebase
when we forked) :-).
>
>> > BEGIN;
>> > SELECT create_transaction(1, current_date, 'Transaction 1');
>> > SELECT create_line_item(1, 1, 50);
>> > SELECT create_line_item(1, 1, -50);
>> > END;
>>
>> Now, if you are doing double-entry bookkeeping this doesn't provide
>> enough consistency, IMO.  You can't check inside the function to
>> ensure that the transaction is balanced.
>
> Yes, but I'd implement the constraint "all transactions must balance" as
> a trigger that fires when the transaction is complete. This would
> enforce data integrity regardless of whether or not the database API is
> used, which I think is also important.

That's problematic to do in PostgreSQL because statement-level
triggers don't have access to statement args, and I don't believe they
can be deferred.   Note the GL stuff is not on the thick db system
yet, but we are working on it (for 1.4).

Here's an example:

CREATE OR REPLACE FUNCTION payment_bulk_post
(in_transactions numeric[], in_batch_id int, in_source text, in_total numeric,
    in_ar_ap_accno text, in_cash_accno text,
    in_payment_date date, in_account_class int, in_payment_type int,
        in_exchangerate numeric, in_curr text)
RETURNS int AS
$$
DECLARE
    out_count int;
    t_voucher_id int;
    t_trans_id int;
    t_amount numeric;
        t_ar_ap_id int;
    t_cash_id int;
        t_currs text[];
        t_exchangerate numeric;
BEGIN
    IF in_batch_id IS NULL THEN
        -- t_voucher_id := NULL;
        RAISE EXCEPTION 'Bulk Post Must be from Batch!';
    ELSE
        INSERT INTO voucher (batch_id, batch_class, trans_id)
        values (in_batch_id,
                (SELECT batch_class_id FROM batch WHERE id = in_batch_id),
                in_transactions[1][1]);

        t_voucher_id := currval('voucher_id_seq');
    END IF;

    SELECT string_to_array(value, ':') into t_currs
          from defaults
         where setting_key = 'curr';

        IF (in_curr IS NULL OR in_curr = t_currs[0]) THEN
                t_exchangerate := 1;
        ELSE
                t_exchangerate := in_exchangerate;
        END IF;

    CREATE TEMPORARY TABLE bulk_payments_in (id int, amount numeric);

    select id into t_ar_ap_id from chart where accno = in_ar_ap_accno;
    select id into t_cash_id from chart where accno = in_cash_accno;

    FOR out_count IN
            array_lower(in_transactions, 1) ..
            array_upper(in_transactions, 1)
    LOOP
        EXECUTE $E$
            INSERT INTO bulk_payments_in(id, amount)
            VALUES ($E$ || quote_literal(in_transactions[out_count][1])
                || $E$, $E$ ||
                quote_literal(in_transactions[out_count][2])
                || $E$)$E$;
    END LOOP;
    EXECUTE $E$
        INSERT INTO acc_trans
            (trans_id, chart_id, amount, approved, voucher_id, transdate,
            source, payment_type)
        SELECT id,
        case when $E$ || quote_literal(in_account_class) || $E$ = 1
            THEN $E$ || t_cash_id || $E$
            WHEN $E$ || quote_literal(in_account_class) || $E$ = 2
            THEN $E$ || t_ar_ap_id || $E$
            ELSE -1 END,
        amount * $E$|| quote_literal(t_exchangerate) || $E$,
        CASE
            WHEN $E$|| t_voucher_id || $E$ IS NULL THEN true
            ELSE false END,
        $E$ || t_voucher_id || $E$, $E$|| quote_literal(in_payment_date)
        ||$E$ , $E$ ||COALESCE(quote_literal(in_source), 'NULL') ||
        $E$ , $E$ || coalesce(quote_literal(in_payment_type), 'NULL') || $E$
        FROM bulk_payments_in  where amount <> 0 $E$;

    EXECUTE $E$
        INSERT INTO acc_trans
            (trans_id, chart_id, amount, approved, voucher_id, transdate,
            source, payment_type)
        SELECT id,
        case when $E$ || quote_literal(in_account_class) || $E$ = 1
            THEN $E$ || t_ar_ap_id || $E$
            WHEN $E$ || quote_literal(in_account_class) || $E$ = 2
            THEN $E$ || t_cash_id || $E$
            ELSE -1 END,
        amount * -1 * $E$|| quote_literal(t_exchangerate) || $E$,
        CASE
            WHEN $E$|| t_voucher_id || $E$ IS NULL THEN true
            ELSE false END,
        $E$ || t_voucher_id || $E$, $E$|| quote_literal(in_payment_date)
        ||$E$ , $E$ ||COALESCE(quote_literal(in_source), 'null')
        ||$E$ , $E$ || coalesce(quote_literal(in_payment_type), 'NULL') || $E$
        FROM bulk_payments_in where amount <> 0 $E$;

        IF in_account_class = 1 THEN
            EXECUTE $E$
                UPDATE ap
                set paid = paid + (select amount from bulk_payments_in b
                     where b.id = ap.id)
                 where id in (select id from bulk_payments_in) $E$;
        ELSE
            EXECUTE $E$
                UPDATE ar
                set paid = paid + (select amount from bulk_payments_in b
                     where b.id = ar.id)
                 where id in (select id from bulk_payments_in) $E$;
        END IF;
    EXECUTE $E$ DROP TABLE bulk_payments_in $E$;
    perform unlock_all();
    return out_count;
END;
$$ language plpgsql;

in_transactions is a n by 2 array of numeric values.  The first is an
integer representation of the invoice id to be paid.  The second is
the numeric amount to be paid on that invoice.  The temporary table
proved necessary because of cache misses when trying to loop through
the array when hundreds of  invoices were paid to one vendor.  The
code here is still far from ideal, as in the future we will probably
just query against the array using generate_series.

Hope this helps.
Chris Travers

Re: Implementing "thick"/"fat" databases

From

Chris Travers

Date:

23 July 2011, 20:56:37

> CREATE OR REPLACE FUNCTION payment_bulk_post
> (in_transactions numeric[], in_batch_id int, in_source text, in_total numeric,
>        in_ar_ap_accno text, in_cash_accno text,
>        in_payment_date date, in_account_class int, in_payment_type int,
>        in_exchangerate numeric, in_curr text)

Oh and as a disclaimer, this was tacked onto a database schema which
a) I did not design and b) is patently insane.  That part of the
database schema is being redesigned for 1.4.

Those wonderful disclaimers when having to post code against databases
designed by.... well better say nothing at all >:-D

Best Wishes,
Chris Travers

Re: Implementing "thick"/"fat" databases

From

"Karl Nack"

Date:

23 July 2011, 23:19:13

> In LedgerSMB, we take this a step further by making the procedures
> into discoverable interfaces, so the application logic itself is a
> sort of thin glue between a UI layer and the database procedure layer.
>  One thing I would suggest is to try to keep API calls as atomic as
> possible.  You want to enforce consistency and so you need to have all
> relevant inputs passed to the function.  See below for a suggested
> change to your API.

Would you say LedgerSMB follows the thick/fat database principle? If so,
I shall have to spend some time with your source code. I'd be very
curious to see how you handle this.

> > BEGIN;
> > SELECT create_transaction(1, current_date, 'Transaction 1');
> > SELECT create_line_item(1, 1, 50);
> > SELECT create_line_item(1, 1, -50);
> > END;
>
> Now, if you are doing double-entry bookkeeping this doesn't provide
> enough consistency, IMO.  You can't check inside the function to
> ensure that the transaction is balanced.

Yes, but I'd implement the constraint "all transactions must balance" as
a trigger that fires when the transaction is complete. This would
enforce data integrity regardless of whether or not the database API is
used, which I think is also important.


Karl Nack

Futurity, Inc
5121 N Ravenswood Ave
Chicago, IL 60640
773-506-2007

Re: Implementing "thick"/"fat" databases

From

"Karl Nack"

Date:

23 July 2011, 23:19:16

>  For example, you could use XML to describe the
> data (I'm no fan of XML, but it does suit a need here and allows
> validation of the provided data), especially as Postgres has XML parsing
> functionality.

Would you go so far as to suggest making the entire parameter an XML
statement, something like:
CREATE FUNCTION create_transaction (txn xml) ...

I don't know, it seems like with converting everything to and from XML,
we really haven't made things any better, just added the complexity of
adding an intermediate technology. Also, I'm not a fan of XML either,
and for the most part have managed to steer relatively clear of it so
far. Getting pulled into that black whole is something I'd rather avoid,
if possible....


> Or you could use a more sophisticated procedural language (plpython or
> plphp, for example) that's capable of marshalling and unmarshalling data
> structures to strings and vice versa (eg. '{foo:1,bar:2}').

I haven't looked into the other procedural languages besides PGSQL,
although I am familiar with Python in general. Perhaps that may be the
way to go ...


> You would still have a SELECT statement that INSERTs data, which is
> semantically a bad thing to do IMHO.

True, although others have suggested always returning something useful
from the function. For example, it's more likely the transaction id
would be assigned using a sequence inside the database. The function
could then return that id after successfully creating the transaction.


Karl Nack

Futurity, Inc
5121 N Ravenswood Ave
Chicago, IL 60640
773-506-2007

Re: Implementing "thick"/"fat" databases

From

"Karl Nack"

Date:

23 July 2011, 23:19:30

> > SELECT create_transaction(1, current_date, 'Transaction 1', ((1,
> > 50), (2, -50)));
>
> Well, not quite, because specifying the number "50" twice would be
> ridiculous for such a non-generic function; you can calculate the "-
> 50" from it in the function.

Not if there were more than two line-items per transaction. A paycheck
is a good example, where generally one or more income accounts are
credited and multiple tax accounts as well as one or more asset accounts
are debited. Ensuring that all the line-items add up to 0 would be one
of the data integrity rules implemented in the database (though most
likely checked in the application layer as well).

> A general rule of thumb, however you would design a routine in a
> normal programming language, try to do it that way in PL/PgSQL,
> assuming that PL/PgSQL is a competent language, and then tweak to
> match what you actually can do.

In the language I'm most familiar with, PHP, I could do this with an
associative array:

$transaction = array(
    'id' => 1,
    'date' => date('Y-m-d'),
    'description' => 'Transaction 1',
    'line_items' => array(
        array('account_id' => 1, 'amount' => 50),
        array('account_id' => 2, 'amount' => -50),
    ),
);

From which I can easily build the appropriate SQL statements. This would
be very similar in Python. I wonder if this would be achievable in
PL/PGSQL, maybe through the use of composite types and/or domains?


Karl Nack

Futurity, Inc
5121 N Ravenswood Ave
Chicago, IL 60640
773-506-2007

Re: Implementing "thick"/"fat" databases

From

Gavin Flower

Date:

23 July 2011, 23:22:53

On 23/07/11 12:05, John R Pierce wrote:

On 07/22/11 4:11 PM, Darren Duncan wrote:
Karl Nack wrote:
I've been following a few blogs
(http://database-programmer.blogspot.com/,
http://thehelsinkideclaration.blogspot.com/) that make a very compelling
argument, in my opinion, to move as much business/transactional logic as
possible into the database, so that client applications become little
more than moving data into and out of the database using a well-defined
API, most commonly (but not necessarily) through the use of stored
procedures.

I strongly agree with that design philosophy. One principle is that the buck stops with the database and that regardless of what the application does, any business logic should be enforced by the database itself. Another principle is to treat the database like a code library, where the tables are its internal variables and its public API is stored procedures. Using stored procedures means you can interact with the database from your application in the same way your application interacts with itself, meaning with parameterized routine calls.

the alternative 'modern' architecture is to implement the business logic in a webservices engine that sits in front of the database, and only use stored procedures for things that get significant performance boost where that is needed to meet your performance goals.. Only this business logic is allowed to directly query the operational database. The business logic in this middle tier still relies on the database server for data integrity and such. The presentation layer is implemented either in a conventional client application or in a webserver (not to be confused with the webservices).... so you have user -> browser -> webserver/presentation layer -> webservices/business logic -> database

The main rationale for this sort of design pattern is that large complex business logic implemented in SQL stored procedures can be rather difficult to develop and maintain

p { margin-bottom: 0.21cm; }

I was thinking similar thoughts, but you not only beat me to it, you made some good points I had not thought of!

The only thing I can think of adding: is that it would be good to lock down the database so that only the middleware can access it, everything else accesses the database via the middleware.

Cheers,
Gavin

Re: Implementing "thick"/"fat" databases

From

Chris Travers

Date:

24 July 2011, 02:51:54

> I was thinking similar thoughts, but you not only beat me to it, you made
> some good points I had not thought of!
>
> The only thing I can think of adding: is that it would be good to lock down
> the database so that only the middleware can access it, everything else
> accesses the database via the middleware.

In general, I am not convinced that middleware is inherently more
maintainable than in-db procedures.

But the fundamental question is:  Is this a a one-application
database?  If it is, you can use the middleware to be that application
lock the db down so only the middleware can use it etc.

But what if it isn't?    What if we want to support a variety of
applications against the same relational database?  This has to be
fairly commonplace.....

In this way my experience is that it is often helpful to maintain
several levels of stable, public API's both on a table level if
possible (as attachment points for triggers), stored proc API's for
actually inserting data into relevant areas while enforcing
appropriate business logic, and so forth.

One of the things we are doing in LedgerSMB is to make the stored
procedures discoverable, so the argument names (and eventually the
return types) will have meaning the application can use in building
calls for the procedure.  This eases one important maintenance point
because arguments are automatically picked up by the application and
as long as best practices in coding are followed, will be handled
sanely.  (The interface will be extended in the future so that return
types determine the class, and the arguments in determine whether we
are talking about a presumed object property or a presumed
application-specified argument.)  Theoretically, we should be able to
build objects in languages picking up methods and properties from the
Pg system catalogs but we haven't gotten that far yet with code
generation.

Best Wishes,
Chris Travers

Re: Implementing "thick"/"fat" databases

From

Sim Zacks

Date:

24 July 2011, 03:45:26

I gave a talk on using postgresql as an application server  at PG East
in March.

Basically, we try to implement all business logic using functions, using
plpythonu when necessary.

For example, we have functions that send email, ftp files, sync remote
databases, etc.


It is important to keep your MVC intact and not "php" your function code
by mixing business logic with SQL statements.


I am currently playing with interactive queries, where the function
stops in the middle, and sends a message to the client asking for input.
This isn't a necessarily a good idea in all cases, but there are some
functions where you don't have the ability to ask a question until it is
mostly finished processing (ie you dont have the information needed to
ask the question until it finishes munging the data, which might be a
long process).


Let me know if you would like a copy of my presentation.


Sim


On 07/22/2011 10:15 PM, Karl Nack wrote:

> I've been following a few blogs
> (http://database-programmer.blogspot.com/,
> http://thehelsinkideclaration.blogspot.com/) that make a very compelling
> argument, in my opinion, to move as much business/transactional logic as
> possible into the database, so that client applications become little
> more than moving data into and out of the database using a well-defined
> API, most commonly (but not necessarily) through the use of stored
> procedures. Although there seems to be a lot of discussion out there of
> the reasons why one might want to do this, I'm really at a loss for
> finding good, concrete examples of how to do it. Consequently, I'm
> hoping that somebody can share their experience(s), or point me to some
> examples, of doing this with PostgreSQL. I'd consider myself fairly
> well-versed in using the various features of PostgreSQL to enforce data
> integrity, but much less so for implementing transactional logic.
>
> To focus on a more concrete example, let's consider adding a financial
> transaction to the database. The "traditional" way to do this, with the
> business logic in the application layer, leaves us with two steps:
> insert the transaction "header", then insert the line items:
>
> BEGIN;
>
> INSERT INTO transaction (id, date, description)
> VALUES (1, CURRENT_DATE, 'Transaction 1');
>
> INSERT INTO line_item (transaction_id, account_id, amount)
> VALUES (1, 1, 50), (1, 2, -50);
>
> END;
>
>
> Now if we start moving this logic to the database, we'd have something
> like:
>
> BEGIN;
> SELECT create_transaction(1, current_date, 'Transaction 1');
> SELECT create_line_item(1, 1, 50);
> SELECT create_line_item(1, 1, -50);
> END;
>
>
> But we've actually taken a step back, since we're making a round-trip to
> the database for each line item. That could be resolved by doing:
>
> BEGIN;
>
> SELECT create_transaction(1, current_date, 'Transaction 1');
>
> SELECT create_line_item(transaction_id, account_id, amount)
> FROM (VALUES (1, 1, 50), (1, 2, -50))
>      AS line_item (transaction_id, account_id, amount);
>
> END;
>
>
> Better, but still not good, since we're invoking the function for each
> individual line item, which ultimately means separate INSERTs for each
> one. What we'd want is something like:
>
> BEGIN;
> SELECT create_transaction(1, current_date, 'Transaction 1');
> SELECT create_line_items(((1, 1, 50), (1, 2, -50)));
> END;
>
>
> But this still falls short, since we're still basically managing the
> transaction in the application layer. The holy grail, so to speak, would
> be:
>
> SELECT create_transaction(1, current_date, 'Transaction 1', ((1, 50),
> (2, -50)));
>
>
> Perhaps I just need to spend more time digging through the
> documentation, but I really have no idea how to do something like this,
> or if it's even possible. I'm really hoping someone can provide an
> example, point me to some resources, or even just share their real-world
> experience of doing something like this. It would be very much
> appreciated.
>
> Thanks.
>
>
> Karl Nack
>
> Futurity, Inc
> 5121 N Ravenswood Ave
> Chicago, IL 60640
> 773-506-2007
>

Re: Implementing "thick"/"fat" databases

From

"Karsten Hilbert"

Date:

24 July 2011, 04:34:46

Hello Chris,

> > In LedgerSMB, we take this a step further by making the procedures
> > into discoverable interfaces,

how do you do that ?

Karsten

--
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!
Jetzt informieren: http://www.gmx.net/de/go/freephone

Re: Implementing "thick"/"fat" databases

From

Gavin Flower

Date:

24 July 2011, 08:27:40

On 24/07/11 17:51, Chris Travers wrote:
>> I was thinking similar thoughts, but you not only beat me to it, you made
>> some good points I had not thought of!
>>
>> The only thing I can think of adding: is that it would be good to lock down
>> the database so that only the middleware can access it, everything else
>> accesses the database via the middleware.
> In general, I am not convinced that middleware is inherently more
> maintainable than in-db procedures.
>
> But the fundamental question is:  Is this a a one-application
> database?  If it is, you can use the middleware to be that application
> lock the db down so only the middleware can use it etc.
>
> But what if it isn't?    What if we want to support a variety of
> applications against the same relational database?  This has to be
> fairly commonplace.....
>
> In this way my experience is that it is often helpful to maintain
> several levels of stable, public API's both on a table level if
> possible (as attachment points for triggers), stored proc API's for
> actually inserting data into relevant areas while enforcing
> appropriate business logic, and so forth.
>
> One of the things we are doing in LedgerSMB is to make the stored
> procedures discoverable, so the argument names (and eventually the
> return types) will have meaning the application can use in building
> calls for the procedure.  This eases one important maintenance point
> because arguments are automatically picked up by the application and
> as long as best practices in coding are followed, will be handled
> sanely.  (The interface will be extended in the future so that return
> types determine the class, and the arguments in determine whether we
> are talking about a presumed object property or a presumed
> application-specified argument.)  Theoretically, we should be able to
> build objects in languages picking up methods and properties from the
> Pg system catalogs but we haven't gotten that far yet with code
> generation.
>
> Best Wishes,
> Chris Travers
So it really boils down to 'It depends...'  :-)

I first started designing systems over 30 years ago.  I remember my
first design principle I came up with, but more importantly that my next
project ignored it for good reasons (same mainframe COBOL environment in
both cases)!

I feel that for a large company, then the middleware approach is
probably better when you have many diverse applications that share a lot
in common, but it depends on many different factors.

Re: Implementing "thick"/"fat" databases

From

Chris Travers

Date:

24 July 2011, 12:43:56

On Sun, Jul 24, 2011 at 12:34 AM, Karsten Hilbert
<Karsten.Hilbert@gmx.net> wrote:
> Hello Chris,
>
>> > In LedgerSMB, we take this a step further by making the procedures
>> > into discoverable interfaces,
>
> how do you do that ?

The way we do things in 1.3 beta is relatively free-form and not idea.
 In future versions I think we intend to tighten this up.    I will
give you examples of where we are for now as well as where we are
headed.

In 1.3, the idea is that as much of the function declaration should be
semantically meaningful as possible.  However we didn't spec out
everything.  (Problems and solutions below)

So you might have a function like:

CREATE OR REPLACE FUNCTION
batch_search(in_class_id int, in_description text, in_created_by_eid int,
        in_date_from date, in_date_to date,
        in_amount_gt numeric,
        in_amount_lt numeric, in_approved bool)

An application can query the argument names as long as the application
knows this is a routine for searching batches.  It can then map in the
class_id, description, created_by_eid, date_from, date_to, amount_gt,
amount_lt, and approved fields from a specified source to the
arguments and generate an SQL query to hit this argument.

There are some problems with this approach as we follow it right now.
The biggest one is consistency.  If the variables are not always named
consistently to object properties it can cause maintenance problems.
So this means object definitions and sql statements form complementary
portions of code and are in different languages.

A second problem is function overloading with loosely typed languages.
 Our solution is to test for and not allow function overloading.

The final problem is that this does not separate object properties
from function arguments.

A better approach would be to name object properties and arguments
with different prefixes.  So:

CREATE OR REPLACE FUNCTION  -- we aren't doing anything with a batch,
just looking for them
batch_search(arg_class_id int, arg_description text, arg_created_by_eid int,
        arg_date_from date, arg_date_to date,
        arg_amount_gt numeric,
        arg_amount_lt numeric, arg_approved bool)
RETURNS SETOF batch_search_result

but

CREATE OR REPLACE FUNCTION batch_post(obj_id INTEGER) RETURNS batch

If we enforce that the function return type must return a type whose
properties can be specified as input properties, the following becomes
possible:

1)  Automatic argument mapping to object properties in a fairly robust way and
2)  Automatic merging of return results back into the object that called it.

It would also be possible to write code generators to create object
wrappers for the SQL types and functions which could then be extended
as needed.  It also means that object properties and methods as
general data structures have at least part of their definition in SQL
code.  Applications of course may have to extend this in various ways
in order to avoid ravioli code.  But it means you have an internally
consistent basis to build things on.

Best Wishes,
Chris Travers

Re: Implementing "thick"/"fat" databases

From

Chris Travers

Date:

24 July 2011, 12:58:53

On Sat, Jul 23, 2011 at 11:44 PM, Sim Zacks <sim@compulab.co.il> wrote:
> I gave a talk on using postgresql as an application server  at PG East in
> March.
>
> Basically, we try to implement all business logic using functions, using
> plpythonu when necessary.
>
> For example, we have functions that send email, ftp files, sync remote
> databases, etc.
>
>
> It is important to keep your MVC intact and not "php" your function code by
> mixing business logic with SQL statements.

It depends on what you mean by "business logic."  In general my view
is that some business logic doesn't belong in stored procedures.

In general where I draw the line is between 'data logic' (i.e. logic
necessary to retrieve, process, and store data in a meaningful and
consistent way meeting inherent requirements including security
requirements) and 'application logic' (i.e. logic as to how you
obtain, display, present, and use the data).

If the goal is to provide a consistent set of business logic to
several applications hitting the database, in general you may not want
your database to make assumptions about how the data is going to be
used, or where it is going to come from.

OTOH, if this is a single application database, then I would ask:
What do you gain by putting it in plpythonu on the db server instead
of on the client?

>
>
> I am currently playing with interactive queries, where the function stops in
> the middle, and sends a message to the client asking for input. This isn't a
> necessarily a good idea in all cases, but there are some functions where you
> don't have the ability to ask a question until it is mostly finished
> processing (ie you dont have the information needed to ask the question
> until it finishes munging the data, which might be a long process).

How does this interface work?  You have a separate network socket for
this?  Something else?

What happens if a user goes to lunch or leaves early and then the
query hangs pending input?  How do you handle this?  Is there a
timeout on the input request?

Best Wishes,
Chris Travers

Re: Implementing "thick"/"fat" databases

From

Sim Zacks

Date:

25 July 2011, 03:54:39

On 07/24/2011 06:58 PM, Chris Travers wrote:

> On Sat, Jul 23, 2011 at 11:44 PM, Sim Zacks<sim@compulab.co.il>  wrote:
>> I gave a talk on using postgresql as an application server  at PG East in
>> March.
>>
>> Basically, we try to implement all business logic using functions, using
>> plpythonu when necessary.
>>
>> For example, we have functions that send email, ftp files, sync remote
>> databases, etc.
>>
>>
>> It is important to keep your MVC intact and not "php" your function code by
>> mixing business logic with SQL statements.
> It depends on what you mean by "business logic."  In general my view
> is that some business logic doesn't belong in stored procedures.
>
> In general where I draw the line is between 'data logic' (i.e. logic
> necessary to retrieve, process, and store data in a meaningful and
> consistent way meeting inherent requirements including security
> requirements) and 'application logic' (i.e. logic as to how you
> obtain, display, present, and use the data).
>
> If the goal is to provide a consistent set of business logic to
> several applications hitting the database, in general you may not want
> your database to make assumptions about how the data is going to be
> used, or where it is going to come from.
>
> OTOH, if this is a single application database, then I would ask:
> What do you gain by putting it in plpythonu on the db server instead
> of on the client?
The goal is to make our system client agnostic, Most of our GUI is
written in wxpython, we also have some web functions and even a barcode
terminal function, written in C#. We would like to use an application
server, so that all the code is run on the server but we don't want to
be web-based. I don't want to have the same business logic written into
various clients as that causes maintenance headaches.
The way that postgresql works is that each session is its own process.
That means that from the server perspective (at least in most ways), my
plpython function is not connected to any other call on the database. My
goal is to allow any client to request functionality and have it execute
the same way every time.

Data logic is built in by use of constraints and triggers and some
functions, business logic is built only into functions.

In other words, our postgresql server is a hybrid database/application
server.

>>
>> I am currently playing with interactive queries, where the function stops in
>> the middle, and sends a message to the client asking for input. This isn't a
>> necessarily a good idea in all cases, but there are some functions where you
>> don't have the ability to ask a question until it is mostly finished
>> processing (ie you dont have the information needed to ask the question
>> until it finishes munging the data, which might be a long process).
> How does this interface work?  You have a separate network socket for
> this?  Something else?
>
> What happens if a user goes to lunch or leaves early and then the
> query hangs pending input?  How do you handle this?  Is there a
> timeout on the input request?
>
> Best Wishes,
> Chris Travers
As I mentioned, I am still playing around and testing this, and have
The concept is that each client has a SocketServer thread that listens
on a specific port (8080 in this case).
My database function is called python_prompt pasted below.
It has a timeout of 2 minutes after which the function errors out with a
timeout. The user will get a message that the function did not complete.
The important thing is that it has a timeout. If the timeout is 20
minutes, that is also ok. Depending on the query that calls this
function, there do not have to be any locks being used and as this is
its own process and it is basically sleeping while waiting for the
answer, it doesn't use resources (aside from a connection and the work
mem in the calling query, of course).

Using this, a plpgsql query can prompt the user, "You are about to use
more stock then you have, are you sure you want to do this?"

CREATE OR REPLACE FUNCTION python_prompt(v_question text)
   RETURNS boolean AS
$BODY$
import time
import socket
ipaddr=plpy.execute("select inet_client_addr()")[0]["inet_client_addr"]
HOST, PORT = str(ipaddr), 8080
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(120)
sock.connect((HOST, PORT))
sock.send(v_question + "\n")
ans=sock.recv(1024)
sock.close()
if ans=="yes":
     return true
else:
     return false
$BODY$
   LANGUAGE plpythonu VOLATILE;

I haven't fully decided to use this yet, but one of my goals is to make
my GUI as dumb as possible.

Re: Implementing "thick"/"fat" databases

From

Frank Lanitz

Date:

25 July 2011, 05:06:55

Am 22.07.2011 21:15, schrieb Karl Nack:
> to move as much business/transactional logic as
> possible into the database, so that client applications become little
> more than moving data into and out of the database using a well-defined
> API, most commonly (but not necessarily) through the use of stored
> procedures.

Beside the points already mentioned, doing this will might cause bottle
necks if you have complicated transactions as the DB-cluster might can
not be scaled as good as maybe a farm of application server could be done.

Cheers,
Frank

Re: Implementing "thick"/"fat" databases

From

Pavel Stehule

Date:

25 July 2011, 05:12:51

2011/7/25 Frank Lanitz <frank@frank.uvena.de>:
> Am 22.07.2011 21:15, schrieb Karl Nack:
>>
>> to move as much business/transactional logic as
>> possible into the database, so that client applications become little
>> more than moving data into and out of the database using a well-defined
>> API, most commonly (but not necessarily) through the use of stored
>> procedures.
>
> Beside the points already mentioned, doing this will might cause bottle
> necks if you have complicated transactions as the DB-cluster might can not
> be scaled as good as maybe a farm of application server could be done.
>

Yes, and no - this can decrease network overhead, can decrease a data
conversion overhead. Sometimes I was surprised how much time I got
with moving to stored procedures.

Regards

Pavel

> Cheers,
> Frank
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: Implementing "thick"/"fat" databases

From

Frank Lanitz

Date:

25 July 2011, 05:22:31

Am 25.07.2011 10:12, schrieb Pavel Stehule:
> 2011/7/25 Frank Lanitz<frank@frank.uvena.de>:
>> Am 22.07.2011 21:15, schrieb Karl Nack:
>>>
>>> to move as much business/transactional logic as
>>> possible into the database, so that client applications become little
>>> more than moving data into and out of the database using a well-defined
>>> API, most commonly (but not necessarily) through the use of stored
>>> procedures.
>>
>> Beside the points already mentioned, doing this will might cause bottle
>> necks if you have complicated transactions as the DB-cluster might can not
>> be scaled as good as maybe a farm of application server could be done.
>>
>
> Yes, and no - this can decrease network overhead, can decrease a data
> conversion overhead. Sometimes I was surprised how much time I got
> with moving to stored procedures.

Yep. Its always depending on what you are doing I guess.

Cheers,
Frank

Re: Implementing "thick"/"fat" databases

From

Sim Zacks

Date:

25 July 2011, 05:25:12

On 07/25/2011 11:06 AM, Frank Lanitz wrote:

> Am 22.07.2011 21:15, schrieb Karl Nack:
>> to move as much business/transactional logic as
>> possible into the database, so that client applications become little
>> more than moving data into and out of the database using a well-defined
>> API, most commonly (but not necessarily) through the use of stored
>> procedures.
>
> Beside the points already mentioned, doing this will might cause
> bottle necks if you have complicated transactions as the DB-cluster
> might can not be scaled as good as maybe a farm of application server
> could be done.
>
> Cheers,
> Frank
>

If I understand you correctly, you are saying that to handle business
logic processing, I may require X servers. Only a percentage of that
traffic actually requires database processing. if I use a cluster of
application servers against a single database, it will scale better then
if I have to cluster my database, which brings in all sorts of messy
master-master replication issues.

Is this accurate?

Sim

Re: Implementing "thick"/"fat" databases

From

Frank Lanitz

Date:

25 July 2011, 05:31:03

Am 25.07.2011 10:24, schrieb Sim Zacks:
> On 07/25/2011 11:06 AM, Frank Lanitz wrote:
>
>> Am 22.07.2011 21:15, schrieb Karl Nack:
>>> to move as much business/transactional logic as
>>> possible into the database, so that client applications become little
>>> more than moving data into and out of the database using a well-defined
>>> API, most commonly (but not necessarily) through the use of stored
>>> procedures.
>>
>> Beside the points already mentioned, doing this will might cause
>> bottle necks if you have complicated transactions as the DB-cluster
>> might can not be scaled as good as maybe a farm of application server
>> could be done.
>>
>> Cheers,
>> Frank
>>
>
> If I understand you correctly, you are saying that to handle business
> logic processing, I may require X servers. Only a percentage of that
> traffic actually requires database processing. if I use a cluster of
> application servers against a single database, it will scale better then
> if I have to cluster my database, which brings in all sorts of messy
> master-master replication issues.
>
> Is this accurate?

As I don't know the kind of your application and business as well as
your structure of code you already have I cannot say for sure. There is
no golden-100%-all-will-be-solved-rule ... this is what I can say.

Cheers,
Frank

Re: Implementing "thick"/"fat" databases

From

Chris Travers

Date:

25 July 2011, 12:24:47

On Sun, Jul 24, 2011 at 11:53 PM, Sim Zacks <sim@compulab.co.il> wrote:

>
> The goal is to make our system client agnostic, Most of our GUI is written
> in wxpython, we also have some web functions and even a barcode terminal
> function, written in C#. We would like to use an application server, so that
> all the code is run on the server but we don't want to be web-based. I don't
> want to have the same business logic written into various clients as that
> causes maintenance headaches.

Ok.  So we are talking about a multi-application database, and you are
trying to provide some degree of consistency in business logic across
the software applications (and hence uses of your data).  So far, so
good.  Agreed to put anything that might need to be in common.

> The way that postgresql works is that each session is its own process. That
> means that from the server perspective (at least in most ways), my plpython
> function is not connected to any other call on the database. My goal is to
> allow any client to request functionality and have it execute the same way
> every time.

So.... I assume that means using Pl/Mono to make your barcode stuff
work in the db too?
>
> Data logic is built in by use of constraints and triggers and some
> functions, business logic is built only into functions.

I guess I am using "data logic" more broadly than you.    Any select,
update, or insert statement against a relation (except, of course,
simple function calls) is part of that data logic, and there are
supporting dependencies too, such as security and permissions.

IOW, I think a lot of 'business logic' is data logic.  I separate this
with what you actually use the data for.

IOW, Generate a report with the following format is data logic (even
perhaps, and email it to the head of product ordering), but use this
report to decide what inventory to order is application logic (you
probably want people making these decisions).

Similarly "Generate me a list of invoices to be paid in the following
format": is data logic, but selecting those invoices to be paid,
entering the data, determining what bank account you want to pay them
from etc. is application logic.  Recording the payments is again data
logic.  Application logic might be better thought of as workflow
logic.
>
> In other words, our postgresql server is a hybrid database/application
> server.

Right.  One thing we are seeing in LedgerSMB as we go through this is
that the database does indeed become an application server.  The areas
which are taking the new approach are mostly SQL and HTML templates
(in TemplateToolkit), while the Perl code is shrinking pretty fast.

I guess what I am wondering is if you are using it as an application
server, is there a benefit to trying to put everything in the
database?  Or just those parts which necessarily follow from the
database independent of workflow, etc or are likely to be common
across diverse applications hitting your server?

If so, what is it?
>

> As I mentioned, I am still playing around and testing this, and have
> The concept is that each client has a SocketServer thread that listens on a
> specific port (8080 in this case).

Ok, so just pointing out here that if you go this route you are
starting to break the client-agnosticism requirement, unless I
misunderstand what you are trying to do :-).

> My database function is called python_prompt pasted below.
> It has a timeout of 2 minutes after which the function errors out with a
> timeout. The user will get a message that the function did not complete. The
> important thing is that it has a timeout. If the timeout is 20 minutes, that
> is also ok. Depending on the query that calls this function, there do not
> have to be any locks being used and as this is its own process and it is
> basically sleeping while waiting for the answer, it doesn't use resources
> (aside from a connection and the work mem in the calling query, of course).

Now, if for sake of argument this happens after an insert or update,
that means things are locked for that time, right?  This would be more
likely to cause deadlocks and if the person steps up entering the
data, there would be a rollback, right?

>
> Using this, a plpgsql query can prompt the user, "You are about to use more
> stock then you have, are you sure you want to do this?"
>
> CREATE OR REPLACE FUNCTION python_prompt(v_question text)
>  RETURNS boolean AS
> $BODY$
> import time
> import socket
> ipaddr=plpy.execute("select inet_client_addr()")[0]["inet_client_addr"]
> HOST, PORT = str(ipaddr), 8080
> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> sock.settimeout(120)
> sock.connect((HOST, PORT))
> sock.send(v_question + "\n")
> ans=sock.recv(1024)
> sock.close()
> if ans=="yes":
>    return true
> else:
>    return false
> $BODY$
>  LANGUAGE plpythonu VOLATILE;

I guess I am wondering what you get by requiring that a client listens
on port 8080 AND knows how to communicate with the server on it that
you don't get from assuming that the client can manage the workflow
(breaking the routines into two separate ones and prompting for input
between).  I also think there is a better way to solve this problem
(see below).
>
> I haven't fully decided to use this yet, but one of my goals is to make my
> GUI as dumb as possible.
>
Probably a personal preference but I don't like to try to push
something as far as possible :-)

Anyway, here's an alternative to your interactive query approach, the
one we use for LedgerSMB user creation for 1.3.  it';s the "if you
don't tell me to do it anyway, I am raising an exception" approach.

Basically the problem is that db users are used to enforce application
permissions, and db users are cluster-wide while application users are
specific to a database.  So what happens when you are running two
companies (which have to have complete data separation and different
permissions) but a single user account is needed to access both of
them?

What we decided to do is have a stored procedure which creates the
user, sets the password (valid for 1 day by default, until changed by
the user, at which point it is valid for an admin-specified time).

Now, one parameter to the function is whether this is an import of an
existing db role or not (i.e. not == creation of new user/role).  In
the event you specify that this is an import, if you also specify a
password, you get an error unconditionally.  Don't want to change
password on import.  However, if you don't specify that this is an
import and the role exists you get a different exception.  Exception
strings in this case are short but descriptive ('No password allowed,'
and 'Duplicate user'), and these are handed back up to the application
to process.  The application then can flag that this role exists, and
the user of the software can resubmit as an import.  Now there's no
reason the application if it wasn't a web-based app couldn't just pop
up a confirmation box saying "This role exists.  Do you want to
import?  Yes/no" and handle it there.

Similarly if you add a do_anyway arg to your invoice item query, you
can raise an exception "Negative Inventory" and the application can
handle that, flag the user, and let the user decide whether to do it
anyway.

It seems to me that as you have timeouts, the application has to be
prepared to retry functions anyway.

Also one huge cost to putting too much interactive logic in the
database in some environments.  I haven't yet found a decent way of
localizing strings in database queries.  That means that if you want
to support multiple languages, it's a LOT easier to deal with these
strings if they are in the client codebase than if they are in the
server codebase and interpolated in the client (gettext throws errors
with variable interpolation if you identify the string on the server
and mix localization of client-side things with server-side strings).
I suppose you can patch gettext to ignore these errors.......

Best Wishes,
Chris Travers

Re: Implementing "thick"/"fat" databases

From

Chris Travers

Date:

25 July 2011, 12:35:10

On Mon, Jul 25, 2011 at 1:24 AM, Sim Zacks <sim@compulab.co.il> wrote:

>
> If I understand you correctly, you are saying that to handle business logic
> processing, I may require X servers. Only a percentage of that traffic
> actually requires database processing. if I use a cluster of application
> servers against a single database, it will scale better then if I have to
> cluster my database, which brings in all sorts of messy master-master
> replication issues.
>
> Is this accurate?
>

I'd also say I have seen applications not put enough logic in the
database, and that this can cause worse bottlenecks.

Last time I looked at SQL-Ledger's payment processing interface, it
did in Perl in the web app what would have been far better solved with
a HAVING statement in the SQL (retrieve ALL invoices ever entered,
along with aggregated payments and then compare these numbers in Perl
and discard those which are already paid).

Needless to say, this approach, when run on a large database, caused
DB and web server bottlenecks.......

Doh!

Best Wishes,
Chris Travers

Re: Implementing "thick"/"fat" databases

From

Merlin Moncure

Date:

25 July 2011, 17:18:57

On Sun, Jul 24, 2011 at 12:51 AM, Chris Travers <chris.travers@gmail.com> wrote:
>> I was thinking similar thoughts, but you not only beat me to it, you made
>> some good points I had not thought of!
>>
>> The only thing I can think of adding: is that it would be good to lock down
>> the database so that only the middleware can access it, everything else
>> accesses the database via the middleware.
>
> In general, I am not convinced that middleware is inherently more
> maintainable than in-db procedures.

exactly. procedural middlewares written in languages like java tend to
be bug factories:
*) over-(mis-)use of threads
*) performance wins moving logic outside the database to scale it are
balanced out by the extra traffic
*) database concurrency issues
*) OO representation of data is brittle and inflexible
*) extremely (and IMNSHO unnecessarily) verbose
*) framework all over the place
*) libraries all over the place

On the plus side though, languages like java have huge pools of
available talent and excellent tools.  These factors are *critical*
for many IT companies.  plpgsql may be the bee's knee's (it is) but
DIY tools and methodologies tends to translate directly to a high cost
of labor, and application developers used to rich tool environments
tend to really dislike code practices that pl/pgsql requires like
debugging by logging and remembering where stuff is and what it does.

plpgsql is a niche language that tends to attract the same really
smart people who write code in a more functional style.  When done
well, you can do a lot with a very small amount of code.  I would love
to see stronger validation features (with a higher emphasis on
warnings) and better tools/editors to help bring plpgsql to the
mainstream.

merlin

Re: Implementing "thick"/"fat" databases

From

Chris Travers

Date:

25 July 2011, 19:21:42

On Mon, Jul 25, 2011 at 12:33 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

> exactly. procedural middlewares written in languages like java tend to
> be bug factories:
> *) over-(mis-)use of threads
> *) performance wins moving logic outside the database to scale it are
> balanced out by the extra traffic

Well, typically you have other performance issues other than the
traffic.  A well written SQL query can do more, and more efficiently,
than most application programmers realize.  The tendency to move stuff
out of the database IMHO often occurs at the same time the SQL code
itself ends up being less efficient.  A lot of things (aggregates for
example) use of CPU cycles on the db server that could probably be
saved by moving things out of the db, but only at the cost of memory
usage.

I have seen CRAZY stuff coming out of middleware and I tend to suspect
that the "move it out of the db" crowd sometimes tend to be stuck
thinking in their languages and thus unable to take advantage of good,
well written SQL.

I have seen Perl code be used instead of HAVING clauses.  I have seen
loops through query results, firing other queries.....  Ok, to be
fair, I would call that program as something way below average in
quality but still.

> *) database concurrency issues
> *) OO representation of data is brittle and inflexible

And the DB ends up being designed around the ORM.......  Which sucks.......

> *) extremely (and IMNSHO unnecessarily) verbose
> *) framework all over the place
> *) libraries all over the place

generally agreed.

>
> On the plus side though, languages like java have huge pools of
> available talent and excellent tools.  These factors are *critical*
> for many IT companies.  plpgsql may be the bee's knee's (it is) but
> DIY tools and methodologies tends to translate directly to a high cost
> of labor, and application developers used to rich tool environments
> tend to really dislike code practices that pl/pgsql requires like
> debugging by logging and remembering where stuff is and what it does.

I dunno.  It depends on how you use plpgsql.

The approach we have taken in the LedgerSMB project is to use stored
procs basically as what has been called "named queries."  We try to
put as much as possible into single SQL statements as possible, and we
extensively use PLPGSQL's syntactic sugar to make things more
manageable (a very large number of our stored procs, probably a clear
majority, could be written in SQL).

I think that if you come up with all these excellent tools, the
quality of code will go down and you will end up with something that
approximates most of the Middleware problems today.  IMHO, the best
use for PLPGSQL is that of straight SQL queries with just a little
procedural logic around it.  There are exceptions due to the fact that
utility statements are unparameterized......  But for the most part,
that is what I have found to work best.

>
> plpgsql is a niche language that tends to attract the same really
> smart people who write code in a more functional style.  When done
> well, you can do a lot with a very small amount of code.  I would love
> to see stronger validation features (with a higher emphasis on
> warnings) and better tools/editors to help bring plpgsql to the
> mainstream.

What kinds of tools/editors do you think we need?  A heck of us really
like VIM or EMACS (just don't ask which is best ;-) ).

Best Wishes,
Chris Travers

Re: Implementing "thick"/"fat" databases

From

Sim Zacks

Date:

26 July 2011, 05:05:17

On 07/25/2011 06:24 PM, Chris Travers wrote:

On Sun, Jul 24, 2011 at 11:53 PM, Sim Zacks <sim@compulab.co.il> wrote:

The goal is to make our system client agnostic, Most of our GUI is written
in wxpython, we also have some web functions and even a barcode terminal
function, written in C#. We would like to use an application server, so that
all the code is run on the server but we don't want to be web-based. I don't
want to have the same business logic written into various clients as that
causes maintenance headaches.

Ok.  So we are talking about a multi-application database, and you are
trying to provide some degree of consistency in business logic across
the software applications (and hence uses of your data).  So far, so
good.  Agreed to put anything that might need to be in common.

The way that postgresql works is that each session is its own process. That
means that from the server perspective (at least in most ways), my plpython
function is not connected to any other call on the database. My goal is to
allow any client to request functionality and have it execute the same way
every time.

So.... I assume that means using Pl/Mono to make your barcode stuff
work in the db too?

No need for PL/Mono or any other client specific language. The GUI should be dumb, so all I really need to program design is the interface and input output methods. When you push a button, it should call the appropriate function. The functions on the barcode terminal (which is a Windows Mobile platform) can also be run on the client application written in wxpython. Keeping as much as possible off of the client allows me to share the work and be sure that both clients do the exact same thing when the user pushes the button.

There is always functionality that needs to be on the client, however in my experience with multi-client applications, if it is business logic you will end up duplicating it and then needing to maintain 2 copies of the same thing.

Data logic is built in by use of constraints and triggers and some
functions, business logic is built only into functions.

I guess I am using "data logic" more broadly than you.    Any select,
update, or insert statement against a relation (except, of course,
simple function calls) is part of that data logic, and there are
supporting dependencies too, such as security and permissions.

IOW, I think a lot of 'business logic' is data logic.  I separate this
with what you actually use the data for.

IOW, Generate a report with the following format is data logic (even
perhaps, and email it to the head of product ordering), but use this
report to decide what inventory to order is application logic (you
probably want people making these decisions).

Similarly "Generate me a list of invoices to be paid in the following
format": is data logic, but selecting those invoices to be paid,
entering the data, determining what bank account you want to pay them
from etc. is application logic.  Recording the payments is again data
logic.  Application logic might be better thought of as workflow
logic.

So much of human interaction is definition of terms. Now I understand what you mean by data logic.

To me data logic is strictly whether the data will be considered corrupt if the database allows an action to occur. For example, not enforcing relationships. If you erase the parent the child has no meaning and therefore you have data in the system which is suspect.
As my db is designed with the business rules in mind, some of the business logic is included in the data logic, for example, 3 tables must be updated at the same time in order for the transaction to be considered valid. This is taken care of either through a function or triggers.

An example of Business Logic is when my stock is updated to 0, I want the record deleted.
This is done through a rule that on update that changes the value to 0 delete instead.

In other words, our postgresql server is a hybrid database/application
server.

Right.  One thing we are seeing in LedgerSMB as we go through this is
that the database does indeed become an application server.  The areas
which are taking the new approach are mostly SQL and HTML templates
(in TemplateToolkit), while the Perl code is shrinking pretty fast.

I guess what I am wondering is if you are using it as an application
server, is there a benefit to trying to put everything in the
database?  Or just those parts which necessarily follow from the
database independent of workflow, etc or are likely to be common
across diverse applications hitting your server?

If so, what is it?

The benefits of putting everything into the database is having one platform to maintain. Also, anything that can talk to the database can call the functions, so we have a defined interface for all our functions. If I wanted to use JBOSS, for example, how would my wxpython or my c# app talk to it?
IOW, I did not find a middleware that was more appropriate for a multi-client/multi-platform environment then the database server.
Another big advantage of using the database for business logic, is that the plpython functions can be called from other functions. For example, in the python prompt function I posted earlier, I can have an plpgsql function say
if python_prompt('Are you sure?"') then
process
end if

It can also say
insert into table X with a trigger on the table that calls a send email function.

I have a database function that builds an HTML page and FTPs it to our website. This gives me a number of benefits: a) I don't have to worry about the ftp client version on the client or connection issues, firewalls, etc. b) it can be done with mutliple clients without changing the code, c) I don't have to worry about path issues.

All I need is to know that my server can do it and that my clients can connect to my server.

As I mentioned, I am still playing around and testing this, and have
The concept is that each client has a SocketServer thread that listens on a
specific port (8080 in this case).

Ok, so just pointing out here that if you go this route you are
starting to break the client-agnosticism requirement, unless I
misunderstand what you are trying to do :-).

Any client that has the ability to listen to a port will be able to use this functionality.
In C# for example, I would use System.Net.Sockets and the TcpListener class
In Python I would use the socketserver module

My database function is called python_prompt pasted below.
It has a timeout of 2 minutes after which the function errors out with a
timeout. The user will get a message that the function did not complete. The
important thing is that it has a timeout. If the timeout is 20 minutes, that
is also ok. Depending on the query that calls this function, there do not
have to be any locks being used and as this is its own process and it is
basically sleeping while waiting for the answer, it doesn't use resources
(aside from a connection and the work mem in the calling query, of course).

Now, if for sake of argument this happens after an insert or update,
that means things are locked for that time, right?  This would be more
likely to cause deadlocks and if the person steps up entering the
data, there would be a rollback, right?

Yes. This is the type of functionality that must be used smartly.

Using this, a plpgsql query can prompt the user, "You are about to use more
stock then you have, are you sure you want to do this?"

CREATE OR REPLACE FUNCTION python_prompt(v_question text)
 RETURNS boolean AS
$BODY$
import time
import socket
ipaddr=plpy.execute("select inet_client_addr()")[0]["inet_client_addr"]
HOST, PORT = str(ipaddr), 8080
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(120)
sock.connect((HOST, PORT))
sock.send(v_question + "\n")
ans=sock.recv(1024)
sock.close()
if ans=="yes":
   return true
else:
   return false
$BODY$
 LANGUAGE plpythonu VOLATILE;

I guess I am wondering what you get by requiring that a client listens
on port 8080 AND knows how to communicate with the server on it that
you don't get from assuming that the client can manage the workflow
(breaking the routines into two separate ones and prompting for input
between).  I also think there is a better way to solve this problem
(see below).

Breaking it into 2 functions is not always practical because you lose your context (all variables etc..). It is not all done in a single transaction which means when you start the second function you would then have to verify that the work that is there is actually what you put there and nobody modified it.

What we do today (the reason I am playing with this interactivity) is have the database return an error code, which the client understands to be a question. It then prompts the user and the query is sent again with the answer as a parameter. It works but it is ugly (IMO).

I haven't fully decided to use this yet, but one of my goals is to make my
GUI as dumb as possible.

Probably a personal preference but I don't like to try to push
something as far as possible :-)

Anyway, here's an alternative to your interactive query approach, the
one we use for LedgerSMB user creation for 1.3.  it';s the "if you
don't tell me to do it anyway, I am raising an exception" approach.

Basically the problem is that db users are used to enforce application
permissions, and db users are cluster-wide while application users are
specific to a database.  So what happens when you are running two
companies (which have to have complete data separation and different
permissions) but a single user account is needed to access both of
them?

What we decided to do is have a stored procedure which creates the
user, sets the password (valid for 1 day by default, until changed by
the user, at which point it is valid for an admin-specified time).

Now, one parameter to the function is whether this is an import of an
existing db role or not (i.e. not == creation of new user/role).  In
the event you specify that this is an import, if you also specify a
password, you get an error unconditionally.  Don't want to change
password on import.  However, if you don't specify that this is an
import and the role exists you get a different exception.  Exception
strings in this case are short but descriptive ('No password allowed,'
and 'Duplicate user'), and these are handed back up to the application
to process.  The application then can flag that this role exists, and
the user of the software can resubmit as an import.  Now there's no
reason the application if it wasn't a web-based app couldn't just pop
up a confirmation box saying "This role exists.  Do you want to
import?  Yes/no" and handle it there.

Similarly if you add a do_anyway arg to your invoice item query, you
can raise an exception "Negative Inventory" and the application can
handle that, flag the user, and let the user decide whether to do it
anyway.

It seems to me that as you have timeouts, the application has to be
prepared to retry functions anyway.

Also one huge cost to putting too much interactive logic in the
database in some environments.  I haven't yet found a decent way of
localizing strings in database queries.  That means that if you want
to support multiple languages, it's a LOT easier to deal with these
strings if they are in the client codebase than if they are in the
server codebase and interpolated in the client (gettext throws errors
with variable interpolation if you identify the string on the server
and mix localization of client-side things with server-side strings).
I suppose you can patch gettext to ignore these errors.......

Best Wishes,
Chris Travers

Re: Implementing "thick"/"fat" databases

From

Chris Travers

Date:

26 July 2011, 12:20:28

On Tue, Jul 26, 2011 at 1:04 AM, Sim Zacks <sim@compulab.co.il> wrote:

> No need for PL/Mono or any other client specific language. The GUI should be
> dumb, so all I really need to program design is the interface and input
> output methods. When you push a button, it should call the appropriate
> function. The functions on the barcode terminal (which is a Windows Mobile
> platform) can also be run on the client application written in wxpython.
> Keeping as much as possible off of the client allows me to share the work
> and be sure that both clients do the exact same thing when the user pushes
> the button.
>
> There is always functionality that needs to be on the client, however in my
> experience with multi-client applications, if it is business logic you will
> end up duplicating it and then needing to maintain 2 copies of the same
> thing.

I am not sure it applies to all business logic.  For example suppose
the product team and the sales team both need to be able to be
assigned tickets relating to customer feedback.  They may have very
different rules and processes for dealing with that.  There is
commonality (how the data is stored, retrieved, presented to the
application), but if you want to enforce the different rules, you are
going to have to put the differences in business logic somewhere.

>
> So much of human interaction is definition of terms. Now I understand what
> you mean by data logic.
>
> To me data logic is strictly whether the data will be considered corrupt if
> the database allows an action to occur. For example, not enforcing
> relationships. If you erase the parent the child has no meaning and
> therefore you have data in the system which is suspect.
> As my db is designed with the business rules in mind, some of the business
> logic is included in the data logic, for example, 3 tables must be updated
> at the same time in order for the transaction to be considered valid. This
> is taken care of either through a function or triggers.
>
> An example of Business Logic is when my stock is updated to 0, I want the
> record deleted.
> This is done through a rule that on update that changes the value to 0
> delete instead.

Ok, that's a pretty simple example of logic that belongs in the database.

>
> The benefits of putting everything into the database is having one platform
> to maintain. Also, anything that can talk to the database can call the
> functions, so we have a defined interface for all our functions. If I wanted
> to use JBOSS, for example, how would my wxpython or my c# app talk to it?

That's not really what I am getting at.  The question could be better phrased:
Why put workflow-specific logic in the database (your interactive
query idea is an example of that)?

Why not use the db procedures to create a well-formed API which
enforces business logic consistently across applications, but allows
the applications to manage their own workflow?  Isn't this what you'd
want to use middleware for if you were going this route?

> IOW, I did not find a middleware that was more appropriate for a
> multi-client/multi-platform environment then the database server.
> Another big advantage of using the database for business logic, is that the
> plpython functions can be called from other functions. For example, in the
> python prompt function I posted earlier, I can have an plpgsql function say
> if python_prompt('Are you sure?"') then
>     process
> end if

So if arbitrary model function calls workflow functions, does that
mean you have already broken out of the MVC approach?  I mean
does the MVC approach allow the model to control the controller?

I guess the approach I take is to put as much of the model in the
database as possible, and put the controllers on the client.  Deciding
when to prompt the user for input seems to me it should be a
controller function.

> Any client that has the ability to listen to a port will be able to use this
> functionality.
> In C# for example, I would use System.Net.Sockets and the TcpListener class
> In Python I would use the socketserver module

Any application also has an ability to call database functions as an
API as well.

> What we do today (the reason I am playing with this interactivity) is have
> the database return an error code, which the client understands to be a
> question. It then prompts the user and the query is sent again with the
> answer as a parameter. It works but it is ugly (IMO).

That's what we do with LedgerSMB and the user creation workflow.  I
think it has the advantage of a cleaner API.

For example, our function definition for the save_user function is:

CREATE OR REPLACE FUNCTION admin__save_user(
    in_id int,
    in_entity_id INT,
    in_username text,
    in_password TEXT,
    in_import BOOL
) returns int AS $$

There are some nice things about this.  It allows the client
application to specify at the outset whether the prospective user is
to be imported into the application (i.e. created as a user of the
application without having a new role created or whether this a new
user that needs to have these created.  The actual checking is done
by:

       PERFORM rolname FROM pg_roles WHERE rolname = in_username;
       t_is_role := found;
       t_is_user := admin__is_user(in_username);

       IF t_is_role is true and t_is_user is false and in_import is false THEN
          RAISE EXCEPTION 'Duplicate user';
        END IF;

        if t_is_role and in_password is not null then
                execute 'ALTER USER ' || quote_ident( in_username ) ||
                     ' WITH ENCRYPTED PASSWORD ' || quote_literal (in_password)
                     || $e$ valid until $e$ ||
                      quote_literal(now() + '1 day'::interval);
        elsif in_import is false AND t_is_user is false
              AND in_password IS NULL THEN
                RAISE EXCEPTION 'No password';
        elsif  t_is_role is false THEN
            -- create an actual user
                execute 'CREATE USER ' || quote_ident( in_username ) ||
                     ' WITH ENCRYPTED PASSWORD ' || quote_literal (in_password)
                     || $e$ valid until $e$ || quote_literal(now() +
'1 day'::interval);
       END IF;

In other words....

if there is a problem we raise an exception and the client has to
check the exception code.  In that case we don't return anything.  The
client can then respond to the exception string and either tell the
user what happened or otherwise handle the error.  So for example if
the "No password" exception is hit, the client application can tell
the user "I am sorry, but you must specify a password when creating a
new user."  On the other hand the "Duplicate user" exception can be
checked and if that is the case, display a dialog:

"Username already taken by another postgresql user.  Import that user
into LedgerSMB?" (yes/no).

"Yes" can resubmit  without a password field, and with in_import set to true.

One major win of doing things this way is that the output can be
readily localized.  With the PLPython function I am not at all sure it
can be without a lot of error-prone, manual extraction of strings.
This may not be a big deal for an in-house app used in one country,
but it could be a bigger issue as soon as national borders are crossed
unless you expect everyone to be using the application in one specific
language.  And it keeps the API clean while keeping the client from
also having to be a server ;-)

Best Wishes,
Chris Travers

Re: Implementing "thick"/"fat" databases

From

Merlin Moncure

Date:

26 July 2011, 13:25:52

On Mon, Jul 25, 2011 at 5:21 PM, Chris Travers <chris.travers@gmail.com> wrote:
> On Mon, Jul 25, 2011 at 12:33 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
>> exactly. procedural middlewares written in languages like java tend to
>> be bug factories:
>> *) over-(mis-)use of threads
>> *) performance wins moving logic outside the database to scale it are
>> balanced out by the extra traffic
>
> Well, typically you have other performance issues other than the
> traffic.  A well written SQL query can do more, and more efficiently,
> than most application programmers realize.  The tendency to move stuff
> out of the database IMHO often occurs at the same time the SQL code
> itself ends up being less efficient.  A lot of things (aggregates for
> example) use of CPU cycles on the db server that could probably be
> saved by moving things out of the db, but only at the cost of memory
> usage.
>
> I have seen CRAZY stuff coming out of middleware and I tend to suspect
> that the "move it out of the db" crowd sometimes tend to be stuck
> thinking in their languages and thus unable to take advantage of good,
> well written SQL.
>
> I have seen Perl code be used instead of HAVING clauses.  I have seen
> loops through query results, firing other queries.....  Ok, to be
> fair, I would call that program as something way below average in
> quality but still.
>
>> *) database concurrency issues
>> *) OO representation of data is brittle and inflexible
>
> And the DB ends up being designed around the ORM.......  Which sucks.......
>
>> *) extremely (and IMNSHO unnecessarily) verbose
>> *) framework all over the place
>> *) libraries all over the place
>
> generally agreed.
>
>>
>> On the plus side though, languages like java have huge pools of
>> available talent and excellent tools.  These factors are *critical*
>> for many IT companies.  plpgsql may be the bee's knee's (it is) but
>> DIY tools and methodologies tends to translate directly to a high cost
>> of labor, and application developers used to rich tool environments
>> tend to really dislike code practices that pl/pgsql requires like
>> debugging by logging and remembering where stuff is and what it does.
>
> I dunno.  It depends on how you use plpgsql.
>
> The approach we have taken in the LedgerSMB project is to use stored
> procs basically as what has been called "named queries."  We try to
> put as much as possible into single SQL statements as possible, and we
> extensively use PLPGSQL's syntactic sugar to make things more
> manageable (a very large number of our stored procs, probably a clear
> majority, could be written in SQL).
>
> I think that if you come up with all these excellent tools, the
> quality of code will go down and you will end up with something that
> approximates most of the Middleware problems today.  IMHO, the best
> use for PLPGSQL is that of straight SQL queries with just a little
> procedural logic around it.  There are exceptions due to the fact that
> utility statements are unparameterized......  But for the most part,
> that is what I have found to work best.
>
>>
>> plpgsql is a niche language that tends to attract the same really
>> smart people who write code in a more functional style.  When done
>> well, you can do a lot with a very small amount of code.  I would love
>> to see stronger validation features (with a higher emphasis on
>> warnings) and better tools/editors to help bring plpgsql to the
>> mainstream.
>
> What kinds of tools/editors do you think we need?  A heck of us really
> like VIM or EMACS (just don't ask which is best ;-) ).

Of course I pretty much agree on all points but you are preaching to
the choir.  You and I both code in a spartan, console heavy manner.
Of course we prefer plpgsql because over the course of years of
database application programming we've realized the advantages of
first class queries and having code running tightly coupled with the
data.  Unfortunately when I talk about these concepts to the majority
of programmers I've worked with over the years they look at me as if I
just got off my spacecraft from Neptune.  I think I've been cursed due
to have discovered the secret to fast, efficient programming while
continually being constrained from developing that way.   Then again,
most programmers probably feel like that :-).

merlin

Re: Implementing "thick"/"fat" databases

From

Chris Travers

Date:

27 July 2011, 04:37:36

On Tue, Jul 26, 2011 at 8:13 PM, Karl Nack <karlnack@futurityinc.com> wrote:
>> The current svn trunk (to be 1.3) does.
>
> So how far do you take this? I've been playing around with plpgsql a bit
> and am pretty encouraged by what I'm discovering; now I'm at a point
> where I'm thinking, "how far do/can I go with this?"

Here are the limitations I have discovered:

1)  Localization of exception strings is a bit of a problem.  Hence
exceptions need to be aimed at communicating to the application rather
than the user.

2)  Difficulties passing complex data structures back and forth and
properly parsing it in the application.  Currently we do a lot with
two dimensional arrays but will probably shift to more arrays of
complex types as we drop support for older versions of PostgreSQL and
DBD::Pg.

There are queries which do a lot of things in the db in a single SQL
statement.  The longest single SQL statement I have found thus far is
a bit over 100 lines long (due to complex requirements and some
shortcomings in the db schema we have inherited that we are working on
replacing).  It's still pretty easy to read and understand at that
length, at least when compared to a function in a more general purpose
language.

Menu data is also stored in the database (application settings and
menu argument data are the two areas where key/value modelling is
used).

The result is that the Perl codebase is shrinking in absolute terms,
being replaced in part by SQL.  However, a rigorous separation of
named query and lightweight application logic has allowed us to shrink
the amount of code total in the project while significantly adding
functionality.

>
> Probably the best example is input validation. Constraints and triggers
> on the database will (or at least should) prevent bad data from being
> added to the database, but the UI generally needs to provide more
> informative messages than errors thrown by the database, and provide
> errors messages for every invalid field, whereas the database will
> fail/stop on the first error. Consequently, I find that much of the data
> logic ends up being duplicated outside of the database to enhance the
> user experience. Might there be a way to move these validation routines
> into the database as well, and unify all the data logic into one place?

The best option is to use exceptions to communicate to the application
what went wrong and then allow the application to handle those
exceptions in many cases.  In other cases, the application may need to
know which inputs are mandatory.

In general what we do is side with the exception trapping and
handling.  This means that if the query fails, we take the sql state,
detect the type of error, and display an appropriate message.  In some
cases ("Access denied") we are terse.  In other cases we are adding
the full SQL error message to the message simply because the
combination of an easy to read description of what happened "Required
input not provided" and the sql message mentioning the field is enough
for many users to figure out what they did wrong,  It's still not
idea.

>
>> > Yes, but I'd implement the constraint "all transactions must balance" as
>> > a trigger that fires when the transaction is complete. This would
>> > enforce data integrity regardless of whether or not the database API is
>> > used, which I think is also important.
>>
>> That's problematic to do in PostgreSQL because statement-level
>> triggers don't have access to statement args, and I don't believe they
>> can be deferred.
>
> In another email I did provide a simple example of how I might implement
> this; I don't know if you saw it. Of course, that assumes your database
> isn't designed by .... :-)

I didn't see it even looking back (I saw the initial attempt and the
PHP sample).

The ideal interface at the moment is something like

SELECT create_transaction(1, current_date, 'Transaction 1', '{(1, 50),
(2, -50)}');

This would allow you do do something like:
SELECT create_transaction(1, current_date, 'Transaction 1', '{(1, 50),
(2, -30), (3, -20)}'); as well since we are now talking about arrays of records.

But this is a pain to code to/from SQL in a robust way.  Good db
drivers sometimes handle this automatically though.

Best Wishes,
Chris Travers

Re: Implementing "thick"/"fat" databases

From

Vincent Veyron

Date:

27 July 2011, 05:41:01

Le mardi 26 juillet 2011 à 11:23 -0500, Merlin Moncure a écrit :
>  I think I've been cursed due
> to have discovered the secret to fast, efficient programming while
> continually being constrained from developing that way.   Then again,
> most programmers probably feel like that :-).
>


I think there is a very good reason for that : contrary to the official
discourse, I believe executives in organizations actively maintain
complexity, consciously or not.

See the accomplishments of free/open source software in sometimes very
complex projects, without the need for a hierarchical management
structure.

I know I'm astounded.

--
Vincent Veyron
http://marica.fr/
Logiciel de gestion des sinistres et des contentieux pour le service juridique

Re: Implementing "thick"/"fat" databases

From

"Karl Nack"

Date:

27 July 2011, 06:22:13

> The current svn trunk (to be 1.3) does.

So how far do you take this? I've been playing around with plpgsql a bit
and am pretty encouraged by what I'm discovering; now I'm at a point
where I'm thinking, "how far do/can I go with this?"

Probably the best example is input validation. Constraints and triggers
on the database will (or at least should) prevent bad data from being
added to the database, but the UI generally needs to provide more
informative messages than errors thrown by the database, and provide
errors messages for every invalid field, whereas the database will
fail/stop on the first error. Consequently, I find that much of the data
logic ends up being duplicated outside of the database to enhance the
user experience. Might there be a way to move these validation routines
into the database as well, and unify all the data logic into one place?

> > Yes, but I'd implement the constraint "all transactions must balance" as
> > a trigger that fires when the transaction is complete. This would
> > enforce data integrity regardless of whether or not the database API is
> > used, which I think is also important.
>
> That's problematic to do in PostgreSQL because statement-level
> triggers don't have access to statement args, and I don't believe they
> can be deferred.

In another email I did provide a simple example of how I might implement
this; I don't know if you saw it. Of course, that assumes your database
isn't designed by .... :-)


Karl Nack

Futurity, Inc
5121 N Ravenswood Ave
Chicago, IL 60640
773-506-2007

Re: Implementing "thick"/"fat" databases

From

Merlin Moncure

Date:

27 July 2011, 11:02:02

On Wed, Jul 27, 2011 at 1:41 AM, Chris Travers <chris.travers@gmail.com> wrote:
> On Tue, Jul 26, 2011 at 8:13 PM, Karl Nack <karlnack@futurityinc.com> wrote:
>>> The current svn trunk (to be 1.3) does.
>>
>> So how far do you take this? I've been playing around with plpgsql a bit
>> and am pretty encouraged by what I'm discovering; now I'm at a point
>> where I'm thinking, "how far do/can I go with this?"
>
> Here are the limitations I have discovered:
>
> 1)  Localization of exception strings is a bit of a problem.  Hence
> exceptions need to be aimed at communicating to the application rather
> than the user.
>
> 2)  Difficulties passing complex data structures back and forth and
> properly parsing it in the application.  Currently we do a lot with
> two dimensional arrays but will probably shift to more arrays of
> complex types as we drop support for older versions of PostgreSQL and
> DBD::Pg.
>
> There are queries which do a lot of things in the db in a single SQL
> statement.  The longest single SQL statement I have found thus far is
> a bit over 100 lines long (due to complex requirements and some
> shortcomings in the db schema we have inherited that we are working on
> replacing).  It's still pretty easy to read and understand at that
> length, at least when compared to a function in a more general purpose
> language.
>
> Menu data is also stored in the database (application settings and
> menu argument data are the two areas where key/value modelling is
> used).
>
> The result is that the Perl codebase is shrinking in absolute terms,
> being replaced in part by SQL.  However, a rigorous separation of
> named query and lightweight application logic has allowed us to shrink
> the amount of code total in the project while significantly adding
> functionality.
>
>>
>> Probably the best example is input validation. Constraints and triggers
>> on the database will (or at least should) prevent bad data from being
>> added to the database, but the UI generally needs to provide more
>> informative messages than errors thrown by the database, and provide
>> errors messages for every invalid field, whereas the database will
>> fail/stop on the first error. Consequently, I find that much of the data
>> logic ends up being duplicated outside of the database to enhance the
>> user experience. Might there be a way to move these validation routines
>> into the database as well, and unify all the data logic into one place?
>
> The best option is to use exceptions to communicate to the application
> what went wrong and then allow the application to handle those
> exceptions in many cases.  In other cases, the application may need to
> know which inputs are mandatory.
>
> In general what we do is side with the exception trapping and
> handling.  This means that if the query fails, we take the sql state,
> detect the type of error, and display an appropriate message.  In some
> cases ("Access denied") we are terse.  In other cases we are adding
> the full SQL error message to the message simply because the
> combination of an easy to read description of what happened "Required
> input not provided" and the sql message mentioning the field is enough
> for many users to figure out what they did wrong,  It's still not
> idea.
>
>>
>>> > Yes, but I'd implement the constraint "all transactions must balance" as
>>> > a trigger that fires when the transaction is complete. This would
>>> > enforce data integrity regardless of whether or not the database API is
>>> > used, which I think is also important.
>>>
>>> That's problematic to do in PostgreSQL because statement-level
>>> triggers don't have access to statement args, and I don't believe they
>>> can be deferred.
>>
>> In another email I did provide a simple example of how I might implement
>> this; I don't know if you saw it. Of course, that assumes your database
>> isn't designed by .... :-)
>
> I didn't see it even looking back (I saw the initial attempt and the
> PHP sample).
>
> The ideal interface at the moment is something like
>
> SELECT create_transaction(1, current_date, 'Transaction 1', '{(1, 50),
> (2, -50)}');
>
> This would allow you do do something like:
> SELECT create_transaction(1, current_date, 'Transaction 1', '{(1, 50),
> (2, -30), (3, -20)}'); as well since we are now talking about arrays of records.
>
> But this is a pain to code to/from SQL in a robust way.  Good db
> drivers sometimes handle this automatically though.

Note, we wrote libpqtypes (http://libpqtypes.esilo.com/) precisely to
deal with this problem -- first class handling of arrays and
composites in the client.  It's not much help for a perl client, but I
think similar methodologies can be made for most languages. Sending
rich data structures directly to procedures in the database transforms
the way the application/database communications work for the better.
It's new and weird to many developers, especially those trained on ORM
usage patterns, but is also entirely effective.

merlin

Re: Implementing "thick"/"fat" databases

From

Chris Travers

Date:

27 July 2011, 11:19:40

On Wed, Jul 27, 2011 at 7:01 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

> Note, we wrote libpqtypes (http://libpqtypes.esilo.com/) precisely to
> deal with this problem -- first class handling of arrays and
> composites in the client.  It's not much help for a perl client, but I
> think similar methodologies can be made for most languages. Sending
> rich data structures directly to procedures in the database transforms
> the way the application/database communications work for the better.
> It's new and weird to many developers, especially those trained on ORM
> usage patterns, but is also entirely effective.
>
Cool :-)

As I understand it DBD::Pg has excellent handling of both these things
too.  The reason we are not doing more with the composite types yet is
because we currently support versions of DBD::Pg which support arrays
well but not the composite types, though that will probably change in
1.4.

I wonder which other languages have first class support for these areas of Pg?

Best Wishes,
Chris Travers

Re: Implementing "thick"/"fat" databases

From

"Karsten Hilbert"

Date:

27 July 2011, 11:35:56

> I wonder which other languages have first class support for these areas of
> Pg?

While already supporting most if not all standard PG datatypes the
psycopg2 interface lets you write in/out wrappers of arbitray
complexity mapping PG <-> Python datatypes and insert them into
the driver at runtime.

Karsten

--
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!
Jetzt informieren: http://www.gmx.net/de/go/freephone

Re: Implementing "thick"/"fat" databases

From

"Karl Nack"

Date:

27 July 2011, 12:28:01

> I wonder which other languages have first class support for these areas
> of Pg?

It'd be nice if PHP could get there. :p Maybe it's time to look at some
of these other languages. Or finally learn C and try hacking on the
extension myself.


Karl Nack

Futurity, Inc
5121 N Ravenswood Ave
Chicago, IL 60640
773-506-2007

Re: Implementing "thick"/"fat" databases

From

Peter Bex

Date:

27 July 2011, 13:05:36

On Wed, Jul 27, 2011 at 04:35:45PM +0200, Karsten Hilbert wrote:
> > I wonder which other languages have first class support for these areas of
> > Pg?
>
> While already supporting most if not all standard PG datatypes the
> psycopg2 interface lets you write in/out wrappers of arbitray
> complexity mapping PG <-> Python datatypes and insert them into
> the driver at runtime.

The same is true for the Chicken Scheme PostgreSQL egg.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                            -- Donald Knuth

Re: Implementing "thick"/"fat" databases

From

"Karl Nack"

Date:

27 July 2011, 13:18:22

> The best option is to use exceptions to communicate to the application
> what went wrong and then allow the application to handle those
> exceptions in many cases.  In other cases, the application may need to
> know which inputs are mandatory.

So other than sanitizing input and making sure it's an appropriate data
type, are you relying solely on the database for all your input
validation?

I guess a good generic example might be setting up a new user account,
which might require a username, password, real name, and birth date. We
might also put a reasonable constraint that, at the very least, the
birth date cannot be greater than the current date. Now if the user
submitted a blank page, ideally the page would come back with four
errors, one for each field that was left blank. If you submitted this
basically as-is to the database, it'd arbitrarily fail on the first
column that didn't meet the NOT NULL constraint, and that would be the
only error sent back to the client. So yes, this would work, but in
theory it could take four or five times before every error was
identified and the user notified.


> > In another email I did provide a simple example of how I might implement
> > this; I don't know if you saw it. Of course, that assumes your database
> > isn't designed by .... :-)
>
> I didn't see it even looking back (I saw the initial attempt and the
> PHP sample).

Here's a link to the archived message:
http://archives.postgresql.org/pgsql-general/2011-07/msg00631.php

Feel free to comment/praise/criticize! :-)

> SELECT create_transaction(1, current_date, 'Transaction 1', '{(1, 50),
> (2, -30), (3, -20)}'); as well since we are now talking about arrays of
> records.
>
> But this is a pain to code to/from SQL in a robust way.  Good db
> drivers sometimes handle this automatically though.

Yes, I've coded exactly this with a plpgsql function and have been
mostly pleased by how easy it is. Unfortunately, at least with PHP, it's
not so straight forward to format user input into an SQL statement that
uses arrays and composite types. It's even worse going the other way --
just Google how to convert SQL arrays into PHP arrays. :-(


Karl Nack

Futurity, Inc
5121 N Ravenswood Ave
Chicago, IL 60640
773-506-2007

Re: Implementing "thick"/"fat" databases

From

Sim Zacks

Date:

28 July 2011, 01:54:23

On 07/27/2011 07:18 PM, Karl Nack wrote:

>> The best option is to use exceptions to communicate to the application
>> what went wrong and then allow the application to handle those
>> exceptions in many cases.  In other cases, the application may need to
>> know which inputs are mandatory.
> So other than sanitizing input and making sure it's an appropriate data
> type, are you relying solely on the database for all your input
> validation?
>
> I guess a good generic example might be setting up a new user account,
> which might require a username, password, real name, and birth date. We
> might also put a reasonable constraint that, at the very least, the
> birth date cannot be greater than the current date. Now if the user
> submitted a blank page, ideally the page would come back with four
> errors, one for each field that was left blank. If you submitted this
> basically as-is to the database, it'd arbitrarily fail on the first
> column that didn't meet the NOT NULL constraint, and that would be the
> only error sent back to the client. So yes, this would work, but in
> theory it could take four or five times before every error was
> identified and the user notified.
You can certainly have your function do all the tests before trying to
insert the values.
If you have 4 fields that require validation, have a return code of 4
chars (0000)
Each char stands for 1 field. If it is a 0 it worked, if it is a 1 it
didn't.
When you finish your tests, check the value, if it is all 0s it worked
and do your insert otherwise return the value.
This is the binary method.

Another opinion is that validation checks should be done by the client
and only good input should be given to the function to begin with.