IN list processing performance (yet again) - Mailing list pgsql-performance
From | Dave Tenny |
---|---|
Subject | IN list processing performance (yet again) |
Date | |
Msg-id | 3ED4B0E5.3050703@attbi.com Whole thread Raw |
Responses |
Re: IN list processing performance (yet again)
Re: IN list processing performance (yet again) Re: IN list processing performance (yet again) Re: IN list processing performance (yet again) |
List | pgsql-performance |
Having grepped the web, it's clear that this isn't the first or last time this issue will be raised. My application relies heavily on IN lists. The lists are primarily constant integers, so queries look like: SELECT val FROM table WHERE id IN (43, 49, 1001, 100002, ...) Performance is critical, and the size of these lists depends a lot on how the larger 3-tier applicaiton is used, but it wouldn't be out of the question to retrieve 3000-10000 items. PostgreSQL 7.3.2 seems to have a lot of trouble with large lists. I ran an experiment that ran queries on a table of two integers (ID, VAL), where ID is a primary key and the subject of IN list predicates. The test used a table with one million rows ID is appropriately indexed, and I have VACUUMED/analyzed the database after table load. I ran tests on in-lists from about 100 to 100,000 entries. I also ran tests where I picked the rows out one-by-one using parameterized statements, e.g. SELECT val FROM table WHERE id = ? I'm trying to decide how much I should use parameterized statements and when to work around buffer size limitations in JDBC transport, general query processing, etc. So here are my questions as a prelude to the data noted below: 1) What is the acceptable limit of jdbc Statement buffer sizes for executeQuery()? (Presumably it's not really a JDBC question, but a PostgreSQL query/buffering/transport question). 2) What is the expected acceptable limit for the number of items in an IN list predicate such as those used here. (List of constants, not subselects). 3) What should I expect for performance capabilities/limitations from PostgreSQL on this type of problem? (Set my expectations, perhaps they're unrealistic). 4) What are my alternatives for picking specific rows for thousands of elements with sub-second response times if not IN lists? (This is crucial!) --------------------------------------------- Here is a summary of my observations of query times, and I've attached the test program (more on that below). 1) PostgreSQL exhibits worse-than-linear performance behavior with respect to IN list size. This is bad. 2) Parameterized statements exhibit the expected linear performance characteristics. 3) The break-even point for using IN lists vs. parameterized statements in my environment (RedHat Linux 9.0, PostgreSQL 7.3.2, 512MB memory, 7200RPM 100UDMA IDE disks, AMD1600Mhz) is about 700 items in the IN list. Beyond that, IN the IN list scalability curve makes it impractical. 4) God help you if you haven't vacuum/analyzed that the newly loaded table. Without this, IN list processing is even worse! For just 10 elements in the IN list: *Without* VACUUMDB, IN lists suck beyond measure: Elapsed time for 10 IN list elements: 2638 ms Elapsed time for 10 parameterized elements: 9 ms *With* VACUUMDB: IN lists recover a bit: Elapsed time for 10 IN list elements: 10 ms Elapsed time for 10 parameterized elements: 24 ms However it's VERY interesting to note that parameterized statements worked well. That implies probable disparity in plan generation. It's worth noting that it didn't *feel* like I was getting the same delay when I ran the query from the 'psql' client, but since it doesn't report times I can't be sure. 5) Rest of my results are vacuumed, (and listed in the attached program in detail). The interesting points are: For an IN list of 700 elements: MySQL 3.23.56 (with INNODB tables) takes 19ms, 73ms with parameterized statements. PostgreSQL takes 269ms, 263ms with parameterized statements. For larger lists, MySQL happily processed a 90,000 element IN list in 1449ms, 9283 ms using parameterized statements. PostgreSQL craps out trying to process 8000 elements with the error: out of free buffers: time to abort! PostgreSQL takes 45,566ms for 7000 elements in an IN list (ouch!) , and 2027ms for a parameterized statement. MySQL easily beats that with 10 times the data. 6) Using a remote client on the lan (10/100) to run the client piece on a separate machine from the database server yielded little change int he results. Prepared statements worked pretty well even with actual wire latency, surprise! (Of course it's very little latency on my lan, not like, say, the database server running in a different city which customers have been known to do). The MySQL and PostgreSQL installations are the default RedHat 9.0 distribution packages, I haven't tweaked either's installation parameters. (though MySQL was updated by RedHat from 3.23.54a to 3.23.56 as part of an automatic upgrade). My goal here isn't to say "why aren't you like MySQL". I've been using PostgreSQL for a year as the development database of choice with satisfactory results. But I won't be able to support customers who want to use PostgreSQL on large deployments of my products if I can't selectively retrieve data for several thousand elements in sub-second times. (PostgreSQL devos, if you want a feel good bullet, note that I don't support MySQL at all since lack of MVCC transactions is a showstopper from a multi-user performance standpoint). So I'm looking for (a) solutions, answers to my questions above, and (b) a statement of "we're on this" or "you're stuck with it" from PostgreSQL devos who know. ---------------------------------------------------------------- On the attached program. (a Java JDBC program) Sorry, you can't just run it "as is". Search for formfeeds (^L) if you want to skip all the result data I logged. Compilation is straightforward, simply supply the location of your JDBC jar file for compiling and running (mine is noted in the program). First, move the "if (false)" element below the table creation statements and run the program to create the table. Then VACUUM/analyze your database (suuuuure would be nice not to have to vacuum). Then move the "if (false)" element above the table creation so you won't have to do it every time. Then move past the formfeed and adjust the 'tryAmount' for loop to test the amounts you're interested. 100 to 1000 by 100's is a good starting point. Ignore the section (part of the if (false) logic) that attempts to figure out what the largest query size is. Unless you want to reproduce a hung postmaster in a CPU loop, which is what I got when I tried to run that logic, though whan I ran it I was missing the closing ')' in my IN list, which I've since added to that code. Thanks for any help! Dave import java.sql.* ; /* Query huge inlists, test for buffer overflow and performance using in-lists vs. parameterized statements, and perhaps batched statements (not yet) Results: MySQL blows PostgreSQL away here, and that's AFTER doing a VACUUMDB in PostgreSQL PostgreSQL does ok with VACUUMDB to 700 entries, the prepared statements are better. MySQL: IN lists are always superior to any practical limit (tested to 100,000 elements) Local-to-local on AMD1600Mhz, MySQL 3.23.56 with JDBC driver 3.0.8 [MYSQL was much better here...] Table update time: 109927 ms For 10,000 -> 100,000 by 10,000 Elapsed time for 90000 IN list elements: 1449 ms 62 elements/ms Elapsed time for 90000 parameterized elements: 9283 ms 9 elements/ms For 100 -> 10,000 by 100 Last received via IN list: 99 Elapsed time for 100 IN list elements: 6 ms 16 elements/ms Elapsed time for 100 parameterized elements: 20 ms 5 elements/ms Last received via parameterized elements: 99 Last received via IN list: 199 Elapsed time for 200 IN list elements: 7 ms 28 elements/ms Elapsed time for 200 parameterized elements: 45 ms 4 elements/ms Last received via parameterized elements: 199 Last received via IN list: 299 Elapsed time for 300 IN list elements: 22 ms 13 elements/ms Elapsed time for 300 parameterized elements: 64 ms 4 elements/ms Last received via parameterized elements: 299 Last received via IN list: 399 Elapsed time for 400 IN list elements: 39 ms 10 elements/ms Elapsed time for 400 parameterized elements: 63 ms 6 elements/ms Last received via parameterized elements: 399 Last received via IN list: 499 Elapsed time for 500 IN list elements: 12 ms 41 elements/ms Elapsed time for 500 parameterized elements: 116 ms 4 elements/ms Last received via parameterized elements: 499 Last received via IN list: 599 Elapsed time for 600 IN list elements: 8 ms 75 elements/ms Elapsed time for 600 parameterized elements: 65 ms 9 elements/ms Last received via parameterized elements: 599 Last received via IN list: 699 Elapsed time for 700 IN list elements: 19 ms 36 elements/ms Elapsed time for 700 parameterized elements: 73 ms 9 elements/ms Last received via parameterized elements: 699 Last received via IN list: 799 Elapsed time for 800 IN list elements: 12 ms 66 elements/ms Elapsed time for 800 parameterized elements: 84 ms 9 elements/ms Last received via parameterized elements: 799 Last received via IN list: 899 Elapsed time for 900 IN list elements: 12 ms 75 elements/ms Elapsed time for 900 parameterized elements: 97 ms 9 elements/ms Last received via parameterized elements: 899 Last received via IN list: 999 Elapsed time for 1000 IN list elements: 15 ms 66 elements/ms Elapsed time for 1000 parameterized elements: 109 ms 9 elements/ms Last received via parameterized elements: 999 Last received via IN list: 1099 Elapsed time for 1100 IN list elements: 16 ms 68 elements/ms Elapsed time for 1100 parameterized elements: 115 ms 9 elements/ms Last received via parameterized elements: 1099 Last received via IN list: 1199 Elapsed time for 1200 IN list elements: 16 ms 75 elements/ms Elapsed time for 1200 parameterized elements: 138 ms 8 elements/ms Last received via parameterized elements: 1199 Last received via IN list: 1299 Elapsed time for 1300 IN list elements: 19 ms 68 elements/ms Elapsed time for 1300 parameterized elements: 136 ms 9 elements/ms Last received via parameterized elements: 1299 Last received via IN list: 1399 Elapsed time for 1400 IN list elements: 21 ms 66 elements/ms Elapsed time for 1400 parameterized elements: 150 ms 9 elements/ms Last received via parameterized elements: 1399 Last received via IN list: 1499 Elapsed time for 1500 IN list elements: 21 ms 71 elements/ms Elapsed time for 1500 parameterized elements: 158 ms 9 elements/ms Last received via parameterized elements: 1499 Last received via IN list: 1599 Elapsed time for 1600 IN list elements: 22 ms 72 elements/ms Elapsed time for 1600 parameterized elements: 178 ms 8 elements/ms Last received via parameterized elements: 1599 Last received via IN list: 1699 Elapsed time for 1700 IN list elements: 22 ms 77 elements/ms Elapsed time for 1700 parameterized elements: 180 ms 9 elements/ms Last received via parameterized elements: 1699 Last received via IN list: 1799 Elapsed time for 1800 IN list elements: 26 ms 69 elements/ms Elapsed time for 1800 parameterized elements: 205 ms 8 elements/ms Last received via parameterized elements: 1799 Last received via IN list: 1899 Elapsed time for 1900 IN list elements: 57 ms 33 elements/ms Elapsed time for 1900 parameterized elements: 198 ms 9 elements/ms Last received via parameterized elements: 1899 Last received via IN list: 1999 Elapsed time for 2000 IN list elements: 29 ms 68 elements/ms Elapsed time for 2000 parameterized elements: 215 ms 9 elements/ms Last received via parameterized elements: 1999 Last received via IN list: 2099 Elapsed time for 2100 IN list elements: 30 ms 70 elements/ms Elapsed time for 2100 parameterized elements: 222 ms 9 elements/ms Last received via parameterized elements: 2099 Last received via IN list: 2199 Elapsed time for 2200 IN list elements: 32 ms 68 elements/ms Elapsed time for 2200 parameterized elements: 243 ms 9 elements/ms Last received via parameterized elements: 2199 Last received via IN list: 2299 Elapsed time for 2300 IN list elements: 32 ms 71 elements/ms Elapsed time for 2300 parameterized elements: 241 ms 9 elements/ms Last received via parameterized elements: 2299 Last received via IN list: 2399 Elapsed time for 2400 IN list elements: 33 ms 72 elements/ms Elapsed time for 2400 parameterized elements: 260 ms 9 elements/ms Last received via parameterized elements: 2399 Last received via IN list: 2499 Elapsed time for 2500 IN list elements: 64 ms 39 elements/ms Elapsed time for 2500 parameterized elements: 275 ms 9 elements/ms Last received via parameterized elements: 2499 Last received via IN list: 2599 Elapsed time for 2600 IN list elements: 35 ms 74 elements/ms Elapsed time for 2600 parameterized elements: 275 ms 9 elements/ms Last received via parameterized elements: 2599 Last received via IN list: 2699 Elapsed time for 2700 IN list elements: 36 ms 75 elements/ms Elapsed time for 2700 parameterized elements: 292 ms 9 elements/ms Last received via parameterized elements: 2699 Last received via IN list: 2799 Elapsed time for 2800 IN list elements: 41 ms 68 elements/ms Elapsed time for 2800 parameterized elements: 308 ms 9 elements/ms Last received via parameterized elements: 2799 Last received via IN list: 2899 Elapsed time for 2900 IN list elements: 40 ms 72 elements/ms Elapsed time for 2900 parameterized elements: 341 ms 8 elements/ms Last received via parameterized elements: 2899 Last received via IN list: 2999 Elapsed time for 3000 IN list elements: 42 ms 71 elements/ms Elapsed time for 3000 parameterized elements: 356 ms 8 elements/ms Last received via parameterized elements: 2999 Last received via IN list: 3099 Elapsed time for 3100 IN list elements: 42 ms 73 elements/ms Elapsed time for 3100 parameterized elements: 342 ms 9 elements/ms Last received via parameterized elements: 3099 Last received via IN list: 3199 Elapsed time for 3200 IN list elements: 43 ms 74 elements/ms Elapsed time for 3200 parameterized elements: 345 ms 9 elements/ms Last received via parameterized elements: 3199 Last received via IN list: 3299 Elapsed time for 3300 IN list elements: 43 ms 76 elements/ms Elapsed time for 3300 parameterized elements: 359 ms 9 elements/ms Last received via parameterized elements: 3299 Last received via IN list: 3399 Elapsed time for 3400 IN list elements: 77 ms 44 elements/ms Elapsed time for 3400 parameterized elements: 356 ms 9 elements/ms Last received via parameterized elements: 3399 Last received via IN list: 3499 Elapsed time for 3500 IN list elements: 50 ms 70 elements/ms Elapsed time for 3500 parameterized elements: 370 ms 9 elements/ms Last received via parameterized elements: 3499 Last received via IN list: 3599 Elapsed time for 3600 IN list elements: 84 ms 42 elements/ms Elapsed time for 3600 parameterized elements: 387 ms 9 elements/ms Last received via parameterized elements: 3599 Last received via IN list: 3699 Elapsed time for 3700 IN list elements: 52 ms 71 elements/ms Elapsed time for 3700 parameterized elements: 390 ms 9 elements/ms Last received via parameterized elements: 3699 Last received via IN list: 3799 Elapsed time for 3800 IN list elements: 54 ms 70 elements/ms Elapsed time for 3800 parameterized elements: 407 ms 9 elements/ms Last received via parameterized elements: 3799 Last received via IN list: 3899 Elapsed time for 3900 IN list elements: 84 ms 46 elements/ms Elapsed time for 3900 parameterized elements: 422 ms 9 elements/ms Last received via parameterized elements: 3899 Last received via IN list: 3999 Elapsed time for 4000 IN list elements: 58 ms 68 elements/ms Elapsed time for 4000 parameterized elements: 434 ms 9 elements/ms Last received via parameterized elements: 3999 Last received via IN list: 4099 Elapsed time for 4100 IN list elements: 58 ms 70 elements/ms Elapsed time for 4100 parameterized elements: 446 ms 9 elements/ms Last received via parameterized elements: 4099 Last received via IN list: 4199 Elapsed time for 4200 IN list elements: 61 ms 68 elements/ms Elapsed time for 4200 parameterized elements: 453 ms 9 elements/ms Last received via parameterized elements: 4199 Last received via IN list: 4299 Elapsed time for 4300 IN list elements: 63 ms 68 elements/ms Elapsed time for 4300 parameterized elements: 457 ms 9 elements/ms Last received via parameterized elements: 4299 Last received via IN list: 4399 Elapsed time for 4400 IN list elements: 94 ms 46 elements/ms Elapsed time for 4400 parameterized elements: 473 ms 9 elements/ms Last received via parameterized elements: 4399 Last received via IN list: 4499 Elapsed time for 4500 IN list elements: 64 ms 70 elements/ms Elapsed time for 4500 parameterized elements: 506 ms 8 elements/ms Last received via parameterized elements: 4499 Last received via IN list: 4599 Elapsed time for 4600 IN list elements: 72 ms 63 elements/ms Elapsed time for 4600 parameterized elements: 527 ms 8 elements/ms Last received via parameterized elements: 4599 Last received via IN list: 4699 Elapsed time for 4700 IN list elements: 68 ms 69 elements/ms Elapsed time for 4700 parameterized elements: 499 ms 9 elements/ms Last received via parameterized elements: 4699 Last received via IN list: 4799 Elapsed time for 4800 IN list elements: 70 ms 68 elements/ms Elapsed time for 4800 parameterized elements: 539 ms 8 elements/ms Last received via parameterized elements: 4799 Last received via IN list: 4899 Elapsed time for 4900 IN list elements: 100 ms 49 elements/ms Elapsed time for 4900 parameterized elements: 590 ms 8 elements/ms Last received via parameterized elements: 4899 Last received via IN list: 4999 Elapsed time for 5000 IN list elements: 111 ms 45 elements/ms Elapsed time for 5000 parameterized elements: 531 ms 9 elements/ms Last received via parameterized elements: 4999 Last received via IN list: 5099 Elapsed time for 5100 IN list elements: 79 ms 64 elements/ms Elapsed time for 5100 parameterized elements: 542 ms 9 elements/ms Last received via parameterized elements: 5099 Last received via IN list: 5199 Elapsed time for 5200 IN list elements: 108 ms 48 elements/ms Elapsed time for 5200 parameterized elements: 560 ms 9 elements/ms Last received via parameterized elements: 5199 Last received via IN list: 5299 Elapsed time for 5300 IN list elements: 78 ms 67 elements/ms Elapsed time for 5300 parameterized elements: 560 ms 9 elements/ms Last received via parameterized elements: 5299 Last received via IN list: 5399 Elapsed time for 5400 IN list elements: 81 ms 66 elements/ms Elapsed time for 5400 parameterized elements: 572 ms 9 elements/ms Last received via parameterized elements: 5399 Last received via IN list: 5499 Elapsed time for 5500 IN list elements: 80 ms 68 elements/ms Elapsed time for 5500 parameterized elements: 627 ms 8 elements/ms Last received via parameterized elements: 5499 Last received via IN list: 5599 Elapsed time for 5600 IN list elements: 81 ms 69 elements/ms Elapsed time for 5600 parameterized elements: 590 ms 9 elements/ms Last received via parameterized elements: 5599 Last received via IN list: 5699 Elapsed time for 5700 IN list elements: 86 ms 66 elements/ms Elapsed time for 5700 parameterized elements: 606 ms 9 elements/ms Last received via parameterized elements: 5699 Last received via IN list: 5799 Elapsed time for 5800 IN list elements: 85 ms 68 elements/ms Elapsed time for 5800 parameterized elements: 663 ms 8 elements/ms Last received via parameterized elements: 5799 Last received via IN list: 5899 Elapsed time for 5900 IN list elements: 83 ms 71 elements/ms Elapsed time for 5900 parameterized elements: 629 ms 9 elements/ms Last received via parameterized elements: 5899 Last received via IN list: 5999 Elapsed time for 6000 IN list elements: 91 ms 65 elements/ms Elapsed time for 6000 parameterized elements: 658 ms 9 elements/ms Last received via parameterized elements: 5999 Last received via IN list: 6099 Elapsed time for 6100 IN list elements: 87 ms 70 elements/ms Elapsed time for 6100 parameterized elements: 702 ms 8 elements/ms Last received via parameterized elements: 6099 Last received via IN list: 6199 Elapsed time for 6200 IN list elements: 88 ms 70 elements/ms Elapsed time for 6200 parameterized elements: 667 ms 9 elements/ms Last received via parameterized elements: 6199 Last received via IN list: 6299 Elapsed time for 6300 IN list elements: 91 ms 69 elements/ms Elapsed time for 6300 parameterized elements: 670 ms 9 elements/ms Last received via parameterized elements: 6299 Last received via IN list: 6399 Elapsed time for 6400 IN list elements: 91 ms 70 elements/ms Elapsed time for 6400 parameterized elements: 708 ms 9 elements/ms Last received via parameterized elements: 6399 Last received via IN list: 6499 Elapsed time for 6500 IN list elements: 96 ms 67 elements/ms Elapsed time for 6500 parameterized elements: 687 ms 9 elements/ms Last received via parameterized elements: 6499 Last received via IN list: 6599 Elapsed time for 6600 IN list elements: 95 ms 69 elements/ms Elapsed time for 6600 parameterized elements: 701 ms 9 elements/ms Last received via parameterized elements: 6599 Last received via IN list: 6699 Elapsed time for 6700 IN list elements: 134 ms 50 elements/ms Elapsed time for 6700 parameterized elements: 711 ms 9 elements/ms Last received via parameterized elements: 6699 Last received via IN list: 6799 Elapsed time for 6800 IN list elements: 98 ms 69 elements/ms Elapsed time for 6800 parameterized elements: 723 ms 9 elements/ms Last received via parameterized elements: 6799 Last received via IN list: 6899 Elapsed time for 6900 IN list elements: 99 ms 69 elements/ms Elapsed time for 6900 parameterized elements: 728 ms 9 elements/ms Last received via parameterized elements: 6899 Last received via IN list: 6999 Elapsed time for 7000 IN list elements: 141 ms 49 elements/ms Elapsed time for 7000 parameterized elements: 733 ms 9 elements/ms Last received via parameterized elements: 6999 Last received via IN list: 7099 Elapsed time for 7100 IN list elements: 108 ms 65 elements/ms Elapsed time for 7100 parameterized elements: 752 ms 9 elements/ms Last received via parameterized elements: 7099 Last received via IN list: 7199 Elapsed time for 7200 IN list elements: 104 ms 69 elements/ms Elapsed time for 7200 parameterized elements: 769 ms 9 elements/ms Last received via parameterized elements: 7199 Last received via IN list: 7299 Elapsed time for 7300 IN list elements: 142 ms 51 elements/ms Elapsed time for 7300 parameterized elements: 778 ms 9 elements/ms Last received via parameterized elements: 7299 Last received via IN list: 7399 Elapsed time for 7400 IN list elements: 127 ms 58 elements/ms Elapsed time for 7400 parameterized elements: 796 ms 9 elements/ms Last received via parameterized elements: 7399 Last received via IN list: 7499 Elapsed time for 7500 IN list elements: 110 ms 68 elements/ms Elapsed time for 7500 parameterized elements: 796 ms 9 elements/ms Last received via parameterized elements: 7499 Last received via IN list: 7599 Elapsed time for 7600 IN list elements: 144 ms 52 elements/ms Elapsed time for 7600 parameterized elements: 994 ms 7 elements/ms Last received via parameterized elements: 7599 Last received via IN list: 7699 Elapsed time for 7700 IN list elements: 122 ms 63 elements/ms Elapsed time for 7700 parameterized elements: 819 ms 9 elements/ms Last received via parameterized elements: 7699 Last received via IN list: 7799 Elapsed time for 7800 IN list elements: 114 ms 68 elements/ms Elapsed time for 7800 parameterized elements: 893 ms 8 elements/ms Last received via parameterized elements: 7799 Last received via IN list: 7899 Elapsed time for 7900 IN list elements: 152 ms 51 elements/ms Elapsed time for 7900 parameterized elements: 841 ms 9 elements/ms Last received via parameterized elements: 7899 Last received via IN list: 7999 Elapsed time for 8000 IN list elements: 119 ms 67 elements/ms Elapsed time for 8000 parameterized elements: 853 ms 9 elements/ms Last received via parameterized elements: 7999 Last received via IN list: 8099 Elapsed time for 8100 IN list elements: 121 ms 66 elements/ms Elapsed time for 8100 parameterized elements: 858 ms 9 elements/ms Last received via parameterized elements: 8099 Last received via IN list: 8199 Elapsed time for 8200 IN list elements: 157 ms 52 elements/ms Elapsed time for 8200 parameterized elements: 871 ms 9 elements/ms Last received via parameterized elements: 8199 Last received via IN list: 8299 Elapsed time for 8300 IN list elements: 131 ms 63 elements/ms Elapsed time for 8300 parameterized elements: 879 ms 9 elements/ms Last received via parameterized elements: 8299 Last received via IN list: 8399 Elapsed time for 8400 IN list elements: 124 ms 67 elements/ms Elapsed time for 8400 parameterized elements: 890 ms 9 elements/ms Last received via parameterized elements: 8399 Last received via IN list: 8499 Elapsed time for 8500 IN list elements: 163 ms 52 elements/ms Elapsed time for 8500 parameterized elements: 904 ms 9 elements/ms Last received via parameterized elements: 8499 Last received via IN list: 8599 Elapsed time for 8600 IN list elements: 131 ms 65 elements/ms Elapsed time for 8600 parameterized elements: 907 ms 9 elements/ms Last received via parameterized elements: 8599 Last received via IN list: 8699 Elapsed time for 8700 IN list elements: 127 ms 68 elements/ms Elapsed time for 8700 parameterized elements: 947 ms 9 elements/ms Last received via parameterized elements: 8699 Last received via IN list: 8799 Elapsed time for 8800 IN list elements: 166 ms 53 elements/ms Elapsed time for 8800 parameterized elements: 933 ms 9 elements/ms Last received via parameterized elements: 8799 Last received via IN list: 8899 Elapsed time for 8900 IN list elements: 133 ms 66 elements/ms Elapsed time for 8900 parameterized elements: 947 ms 9 elements/ms Last received via parameterized elements: 8899 Last received via IN list: 8999 Elapsed time for 9000 IN list elements: 142 ms 63 elements/ms Elapsed time for 9000 parameterized elements: 951 ms 9 elements/ms Last received via parameterized elements: 8999 Last received via IN list: 9099 Elapsed time for 9100 IN list elements: 173 ms 52 elements/ms Elapsed time for 9100 parameterized elements: 961 ms 9 elements/ms Last received via parameterized elements: 9099 Last received via IN list: 9199 Elapsed time for 9200 IN list elements: 138 ms 66 elements/ms Elapsed time for 9200 parameterized elements: 979 ms 9 elements/ms Last received via parameterized elements: 9199 Last received via IN list: 9299 Elapsed time for 9300 IN list elements: 135 ms 68 elements/ms Elapsed time for 9300 parameterized elements: 985 ms 9 elements/ms Last received via parameterized elements: 9299 Last received via IN list: 9399 Elapsed time for 9400 IN list elements: 175 ms 53 elements/ms Elapsed time for 9400 parameterized elements: 1000 ms 9 elements/ms Last received via parameterized elements: 9399 Last received via IN list: 9499 Elapsed time for 9500 IN list elements: 136 ms 69 elements/ms Elapsed time for 9500 parameterized elements: 1010 ms 9 elements/ms Last received via parameterized elements: 9499 Last received via IN list: 9599 Elapsed time for 9600 IN list elements: 144 ms 66 elements/ms Elapsed time for 9600 parameterized elements: 1017 ms 9 elements/ms Last received via parameterized elements: 9599 Last received via IN list: 9699 Elapsed time for 9700 IN list elements: 186 ms 52 elements/ms Elapsed time for 9700 parameterized elements: 1021 ms 9 elements/ms Last received via parameterized elements: 9699 Last received via IN list: 9799 Elapsed time for 9800 IN list elements: 142 ms 69 elements/ms Elapsed time for 9800 parameterized elements: 1048 ms 9 elements/ms Last received via parameterized elements: 9799 Last received via IN list: 9899 Elapsed time for 9900 IN list elements: 145 ms 68 elements/ms Elapsed time for 9900 parameterized elements: 1054 ms 9 elements/ms Last received via parameterized elements: 9899 Local-to-local on AMD 1600Mhz, PGSQL7.3.2, RH9.0 Table update time: 169xxx ms *Without* VACUUMDB, IN lists suck beyond measure: For just 10 elements in the IN list: Elapsed time for 10 IN list elements: 2638 ms 0 elements/ms Elapsed time for 10 parameterized elements: 9 ms 1 elements/ms *With* VACUUMDB: IN lists recover a bit: Elapsed time for 10 IN list elements: 10 ms 1 elements/ms Elapsed time for 10 parameterized elements: 24 ms 0 elements/ms [...] Elapsed time for 99 IN list elements: 16 ms 6 elements/ms Elapsed time for 99 parameterized elements: 36 ms 2 elements/ms Last received via parameterized elements: 98 Testing for some larger numbers of elements, however, reveals another story. Elapsed time for 1000 IN list elements: 676 ms 1 elements/ms Elapsed time for 1000 parameterized elements: 351 ms 2 elements/ms Elapsed time for 2000 IN list elements: 3197 ms 0 elements/ms Elapsed time for 2000 parameterized elements: 713 ms 2 elements/ms Elapsed time for 3000 IN list elements: 7266 ms 0 elements/ms Elapsed time for 3000 parameterized elements: 1080 ms 2 elements/ms Elapsed time for 4000 IN list elements: 13341 ms 0 elements/ms Elapsed time for 4000 parameterized elements: 1449 ms 2 elements/ms Based on the 'by-the-1000's approach, IN lists are on a seriously bad curve, while parameterized statements are linear. The cutoff on my AMD 1600 Mhz machine is at about 700 elements: Elapsed time for 600 IN list elements: 157 ms 3 elements/ms Elapsed time for 600 parameterized elements: 246 ms 2 elements/ms Elapsed time for 700 IN list elements: 269 ms 2 elements/ms Elapsed time for 700 parameterized elements: 263 ms 2 elements/ms Local(Intel PIII-M 1Ghz)-to-REMOTE(AMD1600Mhz), PostgreSQL 7.3.2 100 -> 1000 by 100's. IN List Query string length is 637 Last received via IN list: 99 Elapsed time for 100 IN list elements: 22 ms 4 elements/ms Elapsed time for 100 parameterized elements: 117 ms 0 elements/ms Last received via parameterized elements: 99 IN List Query string length is 1237 Last received via IN list: 199 Elapsed time for 200 IN list elements: 39 ms 5 elements/ms Elapsed time for 200 parameterized elements: 169 ms 1 elements/ms Last received via parameterized elements: 199 IN List Query string length is 1837 Last received via IN list: 299 Elapsed time for 300 IN list elements: 81 ms 3 elements/ms Elapsed time for 300 parameterized elements: 136 ms 2 elements/ms Last received via parameterized elements: 299 IN List Query string length is 2437 Last received via IN list: 399 Elapsed time for 400 IN list elements: 108 ms 3 elements/ms Elapsed time for 400 parameterized elements: 157 ms 2 elements/ms Last received via parameterized elements: 399 IN List Query string length is 3037 Last received via IN list: 499 Elapsed time for 500 IN list elements: 99 ms 5 elements/ms Elapsed time for 500 parameterized elements: 232 ms 2 elements/ms Last received via parameterized elements: 499 IN List Query string length is 3637 Last received via IN list: 599 Elapsed time for 600 IN list elements: 150 ms 4 elements/ms Elapsed time for 600 parameterized elements: 182 ms 3 elements/ms Last received via parameterized elements: 599 IN List Query string length is 4237 Last received via IN list: 699 Elapsed time for 700 IN list elements: 256 ms 2 elements/ms Elapsed time for 700 parameterized elements: 219 ms 3 elements/ms Last received via parameterized elements: 699 IN List Query string length is 4837 Last received via IN list: 799 Elapsed time for 800 IN list elements: 370 ms 2 elements/ms Elapsed time for 800 parameterized elements: 245 ms 3 elements/ms Last received via parameterized elements: 799 IN List Query string length is 5437 Last received via IN list: 899 Elapsed time for 900 IN list elements: 511 ms 1 elements/ms Elapsed time for 900 parameterized elements: 265 ms 3 elements/ms Last received via parameterized elements: 899 Curiously, the difference between local->local and local->remote here is small. The breakeven point was about 650 elements, versus slightly closer to 700 in local->local. Local(Intel PIII-M 1Ghz)-to-REMOTE(AMD1600Mhz), PostgreSQL 7.3.2 1000 -> 10000 by 1000's. IN List Query string length is 6037 Last received via IN list: 999 Elapsed time for 1000 IN list elements: 772 ms 1 elements/ms Elapsed time for 1000 parameterized elements: 495 ms 2 elements/ms Last received via parameterized elements: 999 IN List Query string length is 12037 Last received via IN list: 1999 Elapsed time for 2000 IN list elements: 3055 ms 0 elements/ms Elapsed time for 2000 parameterized elements: 681 ms 2 elements/ms Last received via parameterized elements: 1999 IN List Query string length is 18037 Last received via IN list: 2999 Elapsed time for 3000 IN list elements: 7205 ms 0 elements/ms Elapsed time for 3000 parameterized elements: 888 ms 3 elements/ms Last received via parameterized elements: 2999 IN List Query string length is 24037 Last received via IN list: 3999 Elapsed time for 4000 IN list elements: 12912 ms 0 elements/ms Elapsed time for 4000 parameterized elements: 1162 ms 3 elements/ms Last received via parameterized elements: 3999 IN List Query string length is 30037 Last received via IN list: 4999 Elapsed time for 5000 IN list elements: 21844 ms 0 elements/ms Elapsed time for 5000 parameterized elements: 1461 ms 3 elements/ms Last received via parameterized elements: 4999 IN List Query string length is 36037 Last received via IN list: 5999 Elapsed time for 6000 IN list elements: 32595 ms 0 elements/ms Elapsed time for 6000 parameterized elements: 1780 ms 3 elements/ms Last received via parameterized elements: 5999 IN List Query string length is 42037 Last received via IN list: 6999 Elapsed time for 7000 IN list elements: 46566 ms 0 elements/ms Elapsed time for 7000 parameterized elements: 2027 ms 3 elements/ms Last received via parameterized elements: 6999 IN List Query string length is 48037 java.sql.SQLException: ERROR: out of free buffers: time to abort! at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:126) at org.postgresql.jdbc1.AbstractJdbc1Connection.ExecSQL(AbstractJdbc1Connection.java:451) at org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:281) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:48) at org.postgresql.jdbc1.AbstractJdbc1Statement.executeQuery(AbstractJdbc1Statement.java:144) at org.postgresql.jdbc1.AbstractJdbc1Statement.executeQuery(AbstractJdbc1Statement.java:132) at INList.main(INList.java:775) */ class INList { public static void main(String[] args) { try { final int NROWS = 1000000 ; // one million final int IDSTART = 10000 ; // first ID value // ---------------- MySql driver ----------------- // -classpath .:/usr/java/mysql-connector-java-2.0.14/mysql-connector-java-2.0.14-bin.jar // RH9: -classpath .:/home/dave/mysql-connector-java-3.0.8-stable/mysql-connector-java-3.0.8-stable-bin.jar //Class.forName("com.mysql.jdbc.Driver") ; //Connection conn = DriverManager.getConnection("jdbc:mysql:///test", "dave", "") ; //String innodb = " TYPE=INNODB" ; // --------------- postgres 7.3.2 distribution JDBC2 driver linked as /usr/java/postgresql-jdbc.jar // -classpath .:/usr/java/postgresql-jdbc.jar Class.forName("org.postgresql.Driver") ; Connection conn = DriverManager.getConnection("jdbc:postgresql:test", "dave", "") ; String innodb = "" ; conn.setAutoCommit(false) ; long oldmillis, newmillis, elapsed ; Statement statement = conn.createStatement() ; PreparedStatement pstatement ; ResultSet rs ; int lastReceived ; if (false) { statement.executeUpdate("CREATE TABLE MILLION (ID INTEGER PRIMARY KEY, VAL INTEGER)" + innodb) ; conn.commit() ; // Read that "all statements of a connection commit/rollback as a group, API certainly implies this // conn.commit() ; pstatement should potentially fail because table isn't there without this! // Create the rows pstatement = conn.prepareStatement("INSERT INTO MILLION (ID, VAL) VALUES(?, ?)") ; oldmillis = System.currentTimeMillis() ; for (int i = 0 ; i < NROWS ; i++) { pstatement.setInt(1, IDSTART+i) ; pstatement.setInt(2, i) ; pstatement.executeUpdate() ; } conn.commit() ; newmillis = System.currentTimeMillis() ; pstatement.close() ; System.out.println("Table update time: " + (newmillis-oldmillis) + " ms") ; rs = statement.executeQuery("SELECT COUNT(*) FROM MILLION") ; rs.first() ; if (rs.getInt(1) != NROWS) System.out.println("**** Expected " + (NROWS) + " rows, got " + rs.getInt(1)) ; conn.commit() ; // Determine largest IN LIST StringBuffer sb = new StringBuffer(8000*1024) ; boolean successful = false ; int tryAmount = NROWS ; while (!successful) { sb.setLength(0) ; sb.append("SELECT MAX(VAL) FROM MILLION WHERE ID IN (") ; for (int i = 0 ; i < tryAmount ; i++) { sb.append(i+IDSTART) ; if (i < tryAmount-1) sb.append(',') ; } sb.append(')') ; try { rs = statement.executeQuery(sb.toString()) ; successful = true ; } catch (SQLException e) { System.out.println("IN list failed for " + tryAmount + " entries, buffer size was " + sb.length()) ; tryAmount -= 10000 ; if (tryAmount <= 0) break ; // be done with the while loop } } // while (!successful) conn.commit() ; // more for the next txn than this one, eliminate txn caching System.out.println("Largest IN list: " + tryAmount + " elements.") ; } // if (false) else { for (int tryAmount = 100 ; tryAmount < 1000 ; tryAmount+= 100) { StringBuffer sb = new StringBuffer(tryAmount*20) ; sb.append("SELECT VAL FROM MILLION WHERE ID IN (") ; for (int i = 0 ; i < tryAmount ; i++) { sb.append(i+IDSTART) ; if (i+1 < tryAmount) sb.append(',') ; } sb.append(')') ; // Try query with largest inlist, getting resultset with all values String query = sb.toString() ; System.out.println("IN List Query string length is " + query.length()) ; //query = query.replaceFirst("MAX(VAL)", "VAL") ; oldmillis = System.currentTimeMillis() ; rs = statement.executeQuery(query) ; lastReceived = -1 ; while (rs.next()) lastReceived = rs.getInt(1) ; System.out.println("Last received via IN list: " + lastReceived) ; conn.commit() ; newmillis = System.currentTimeMillis() ; elapsed = newmillis - oldmillis ; System.out.println("Elapsed time for " + tryAmount + " IN list elements: " + elapsed + " ms") ; System.out.println((tryAmount/elapsed) + " elements/ms") ; // Try the query with prepared statements lastReceived = -1 ; pstatement = conn.prepareStatement("SELECT VAL FROM MILLION WHERE ID = ?") ; oldmillis = System.currentTimeMillis() ; for (int i = 0 ; i < tryAmount ; i++) { pstatement.setInt(1, i+IDSTART) ; rs = pstatement.executeQuery() ; if (!rs.first()) System.out.println("Pstatement query failed") ; else lastReceived = rs.getInt(1) ; } conn.commit() ; newmillis = System.currentTimeMillis() ; pstatement.close() ; elapsed = newmillis - oldmillis ; System.out.println("Elapsed time for " + tryAmount + " parameterized elements: " + elapsed + " ms") ; System.out.println((tryAmount/elapsed) + " elements/ms") ; System.out.println("Last received via parameterized elements: " + lastReceived) ; } // for (tryAmount ...) } // else (if (false)) // Done, delete our table // statement.executeUpdate("DROP TABLE MILLION") ; conn.commit() ; } catch (Exception e) { e.printStackTrace() ; } } // main() } // class INList
pgsql-performance by date: