Thread: "Dynamic routing" to different databases
Hello all, We have a rather complex service delivery application that is presently designed with a "single-tenant" use case in mind: one self-contained PostgreSQL database with all necessary tables, and one service consuming that database, configured via a static connection string. We are trying to figure out the most economical technical possibilities for retrofitting "multi-tenant" functionality into it; you might think of it as a kind of virtual hosting, where different requests coming into the consuming service, segregated by, say, request URI domain, result in very different behaviour. Obviously, one way to do this is to add an additional layer of abstraction at the database schema and the application level itself, creating nested entities and so forth. This is impractical, however; it would be nothing short of a wholesale overhaul of the entire codebase (which consists in large part of PostgreSQL stored procedures for the business layer). So, what we're trying to figure out is the feasibility of grafting multi-tenant functionality into the existing paradigm. For instance, we could have a "meta-data" database with a table that defines DNS domains and maps them to other database names/connection info on the same server. We could then create a new and self-contained database for each "tenant". I think that would make the most sense by far. The major limitation is that the consuming service can connect to one database and one database only, so what is needed is some way to present foreign databases through one PostgreSQL connection handle to one particular database. The dblink facility seems to provide ways of doing this, but what I am not clear on is: 1) how transparently the foreign database can be rendered, vs. having to graft in lots of dblink_*() calls and 2) whether there is a significant performance hit in using dblink, assuming the "remote" tenant databases are located on the same cluster. Can this be done using foreign data wrappers or something of the sort, for instance? Failing that, is there some functionality in PgPool or another connection proxy which can effectively take SELECT * FROM tbl META_ATTRIB = 'domain.com'; and transform this into SELECT * FROM tbl; on the appropriate database for 'domain.com', which is a mapping that would be maintained in something that is, at least in principle, dynamically reloadable? Advice much appreciated! -- Alex -- Alex Balashov | Principal | Evariste Systems LLC 303 Perimeter Center North, Suite 300 Atlanta, GA 30346 United States Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/
After some more research, my conclusion - unless anyone has better suggestions - is to just bite the bullet and partition these multiple databases into schemas. That's going to take a lot of code modification, but not nearly as much as a full-on application-layer multitenant alternative. On 09/30/2015 11:23 AM, Alex Balashov wrote: > Hello all, > > We have a rather complex service delivery application that is presently > designed with a "single-tenant" use case in mind: one self-contained > PostgreSQL database with all necessary tables, and one service consuming > that database, configured via a static connection string. > > We are trying to figure out the most economical technical possibilities > for retrofitting "multi-tenant" functionality into it; you might think > of it as a kind of virtual hosting, where different requests coming into > the consuming service, segregated by, say, request URI domain, result in > very different behaviour. > > Obviously, one way to do this is to add an additional layer of > abstraction at the database schema and the application level itself, > creating nested entities and so forth. This is impractical, however; it > would be nothing short of a wholesale overhaul of the entire codebase > (which consists in large part of PostgreSQL stored procedures for the > business layer). > > So, what we're trying to figure out is the feasibility of grafting > multi-tenant functionality into the existing paradigm. For instance, we > could have a "meta-data" database with a table that defines DNS domains > and maps them to other database names/connection info on the same > server. We could then create a new and self-contained database for each > "tenant". I think that would make the most sense by far. > > The major limitation is that the consuming service can connect to one > database and one database only, so what is needed is some way to present > foreign databases through one PostgreSQL connection handle to one > particular database. > > The dblink facility seems to provide ways of doing this, but what I am > not clear on is: 1) how transparently the foreign database can be > rendered, vs. having to graft in lots of dblink_*() calls and 2) whether > there is a significant performance hit in using dblink, assuming the > "remote" tenant databases are located on the same cluster. Can this be > done using foreign data wrappers or something of the sort, for instance? > > Failing that, is there some functionality in PgPool or another > connection proxy which can effectively take > > SELECT * FROM tbl META_ATTRIB = 'domain.com'; > > and transform this into > > SELECT * FROM tbl; > > on the appropriate database for 'domain.com', which is a mapping that > would be maintained in something that is, at least in principle, > dynamically reloadable? > > Advice much appreciated! > > -- Alex > -- Alex Balashov | Principal | Evariste Systems LLC 303 Perimeter Center North, Suite 300 Atlanta, GA 30346 United States Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/
-----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Alex Balashov Sent: Wednesday, September 30, 2015 7:24 PM To: pgsql-admin@postgresql.org Subject: Re: [ADMIN] "Dynamic routing" to different databases After some more research, my conclusion - unless anyone has better suggestions - is to just bite the bullet and partitionthese multiple databases into schemas. That's going to take a lot of code modification, but not nearly as much as a full-on application-layer multitenant alternative. On 09/30/2015 11:23 AM, Alex Balashov wrote: > Hello all, > > We have a rather complex service delivery application that is > presently designed with a "single-tenant" use case in mind: one > self-contained PostgreSQL database with all necessary tables, and one > service consuming that database, configured via a static connection string. > > We are trying to figure out the most economical technical > possibilities for retrofitting "multi-tenant" functionality into it; > you might think of it as a kind of virtual hosting, where different > requests coming into the consuming service, segregated by, say, > request URI domain, result in very different behaviour. > > Obviously, one way to do this is to add an additional layer of > abstraction at the database schema and the application level itself, > creating nested entities and so forth. This is impractical, however; > it would be nothing short of a wholesale overhaul of the entire > codebase (which consists in large part of PostgreSQL stored procedures > for the business layer). > > So, what we're trying to figure out is the feasibility of grafting > multi-tenant functionality into the existing paradigm. For instance, > we could have a "meta-data" database with a table that defines DNS > domains and maps them to other database names/connection info on the > same server. We could then create a new and self-contained database > for each "tenant". I think that would make the most sense by far. > > The major limitation is that the consuming service can connect to one > database and one database only, so what is needed is some way to > present foreign databases through one PostgreSQL connection handle to > one particular database. > > The dblink facility seems to provide ways of doing this, but what I am > not clear on is: 1) how transparently the foreign database can be > rendered, vs. having to graft in lots of dblink_*() calls and 2) > whether there is a significant performance hit in using dblink, > assuming the "remote" tenant databases are located on the same > cluster. Can this be done using foreign data wrappers or something of the sort, for instance? > > Failing that, is there some functionality in PgPool or another > connection proxy which can effectively take > > SELECT * FROM tbl META_ATTRIB = 'domain.com'; > > and transform this into > > SELECT * FROM tbl; > > on the appropriate database for 'domain.com', which is a mapping that > would be maintained in something that is, at least in principle, > dynamically reloadable? > > Advice much appreciated! > > -- Alex > Why should it " take a lot of code modification"? Modifying "search_path" is not enough? Regards, Igor Neyman
On 10/01/2015 09:44 AM, Igor Neyman wrote: > Why should it " take a lot of code modification"? > Modifying "search_path" is not enough? You're right, it's not that big of a deal. I had been under the impression that SET search_path TO schema; is a psql-level construct, not something that can be embedded in live queries. So, I was expecting to have to rewrite thousands of lines of stored procedures to take a schema parameter and construct dynamic query strings with dynamic table names and so on. Assuming one can simply do: SET search_path TO <schema>; SELECT query_business_as_usual(); I'm in a pretty good place! -- Alex Balashov | Principal | Evariste Systems LLC 303 Perimeter Center North, Suite 300 Atlanta, GA 30346 United States Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/
> -----Original Message----- > From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin- > owner@postgresql.org] On Behalf Of Alex Balashov > Sent: Thursday, October 01, 2015 10:47 AM > To: pgsql-admin@postgresql.org > Subject: Re: [ADMIN] "Dynamic routing" to different databases > is a psql-level construct, not something that can be embedded in live queries. > So, I was expecting to have to rewrite thousands of lines of stored procedures > to take a schema parameter and construct dynamic query strings with dynamic > table names and so on. > > Assuming one can simply do: > > SET search_path TO <schema>; SELECT query_business_as_usual(); > > I'm in a pretty good place! You can set the default search path on a per-role/user basis. What you could do is map a DB user (or set of users) to aspecific schema, then set that users search path for the appropriate schema and the queries will automatically get routedto the appropriate schema. Segmenting your different applications user also allows you to use the role system to restrictaccess to the data between applications. The potential downside to this approach is that you won't be able to use something like pgbouncer to share DB connectionsbetween the different users/schemas. This can, depending on your usage and access patterns lead to problems withhigh connection counts to the DB being needed. Brad.
Brad, On 10/01/2015 12:48 PM, Nicholson, Brad (Toronto, ON, CA) wrote: > You can set the default search path on a per-role/user basis. What > you could do is map a DB user (or set of users) to a specific schema, > then set that users search path for the appropriate schema and the > queries will automatically get routed to the appropriate schema. > Segmenting your different applications user also allows you to use > the role system to restrict access to the data between applications. I'd love to, but unfortunately the service consuming the database is very, very strictly limited to one connection string, which entails one set of credentials, one server and one database, and more connections cannot be created as-needed at runtime without restarting the service, which in our industry is not acceptable. So, unfortunately I cannot avail myself of such an elegant solution; instead, I must access all the schemas via one role. -- Alex -- Alex Balashov | Principal | Evariste Systems LLC 303 Perimeter Center North, Suite 300 Atlanta, GA 30346 United States Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/