PostgreSQL Weekly News - March 28, 2021

Posted on 2021-03-29 by PWN
PWN

PostgreSQL Weekly News - March 28, 2021

Person of the week: https://postgresql.life/post/julien_riou/

PostgreSQL Product News

pspg 4.5.0 a pager designed for PostgreSQL, released. https://github.com/okbob/pspg/releases/tag/4.5.0

pgAdmin4 5.1, a web- and native GUI control center for PostgreSQL, released. https://www.pgadmin.org/docs/pgadmin4/5.1/release_notes_5_1.html

PostgreSQL Jobs for March

https://archives.postgresql.org/pgsql-jobs/2021-03/

PostgreSQL in the News

Planet PostgreSQL: https://planet.postgresql.org/

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm PST8PDT to [email protected].

Applied Patches

Andrew Dunstan pushed:

  • Don't run recover crash_temp_files test in Windows perl. This reverts commit 677271a3a125e294b33b891669f594a2c8cb36ce. "Unbreak recovery test on Windows" The test hangs on Windows, and attempts to remedy the problem have proved fragile at best. So we simply disable the test on Windows perl. (Msys perl seems perfectly happy). Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/ef823873840c9f341239e18633bdb0116d8d2738

  • Allow for installation-aware instances of PostgresNode. Currently instances of PostgresNode find their Postgres executables in the PATH of the caller. This modification allows for instances that know the installation path they are supposed to use, and the module adjusts the environment of methods that call Postgres executables appropriately. This facility is activated by passing the installation path to the constructor: my $node = PostgresNode->get_new_node('mynode', installation_path => '/path/to/installation'); This makes a number of things substantially easier, including . testing third party modules . testing different versions of postgres together . testing different builds of postgres together Discussion: https://postgr.es/m/[email protected] Reviewed-By: Alvaro Herrera, Michael Paquier, Dagfinn Ilmari Mannsåker https://git.postgresql.org/pg/commitdiff/b34ca595abd697e716ce369ec1b58624bdd1c431

Tom Lane pushed:

  • Make compression.sql regression test independent of default. This test will fail in "make installcheck" if the installation's default_toast_compression setting is not 'pglz'. Make it robust against that situation. Dilip Kumar Discussion: https://postgr.es/m/CAFiTN-t0w+Rc2U3S+y=7KWcLuOYNB5MfWeGdNa7+pg0UovVdcQ@mail.gmail.com https://git.postgresql.org/pg/commitdiff/fd1ac9a548966786cf7978e590be816c55936a50

  • Bring configure support for LZ4 up to snuff. It's not okay to just shove the pkg_config results right into our build flags, for a couple different reasons:

  • This fails to maintain the separation between CPPFLAGS and CFLAGS, as well as that between LDFLAGS and LIBS. (The CPPFLAGS angle is, I believe, the reason for warning messages reported when building with MacPorts' liblz4.) * If pkg_config emits anything other than -I/-D/-L/-l switches, it's highly unlikely that we want to absorb those. That'd be more likely to break the build than do anything helpful. (Even the -D case is questionable; but we're doing that for libxml2, so I kept it.) Also, it's not okay to skip doing an AC_CHECK_LIB probe, as evidenced by recent build failure on topminnow; that should have been caught at configure time. Model fixes for this on configure's libxml2 support. It appears that somebody overlooked an autoheader run, too. Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/4d399a6fbeb720b34d33441330910b7d853f703d

  • Fix assorted silliness in ATExecSetCompression(). It's not okay to scribble directly on a syscache entry. Nor to continue accessing said entry after releasing it. Also get rid of not-used local variables. Per valgrind testing. https://git.postgresql.org/pg/commitdiff/ac897c483485d3858ada23ca49650a0f2742a50f

  • Remove useless configure probe for <lz4/lz4.h>. This seems to have been just copied-and-pasted from some other header checks. But our C code is entirely unprepared to support such a header name, so it's only wasting cycles to look for it. If we did need to support it, some #ifdefs would be required. (A quick trawl at codesearch.debian.net finds some packages that reference lz4/lz4.h; but they use only that spelling, and appear to be intending to reference their own copy rather than a system-level installation of liblz4. There's no evidence of freestanding installations that require this spelling.) Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/2c75f8a612b207c7d36e5dc73317dc9ab6fb29d4

  • Mostly-cosmetic adjustments of TOAST-related macros. The authors of bbe0a81db hadn't quite got the idea that macros named like SOMETHING_4B_C were only meant for internal endianness-related details in postgres.h. Choose more legible names for macros that are intended to be used elsewhere. Rearrange postgres.h a bit to clarify the separation between those internal macros and ones intended for wider use. Also, avoid using the term "rawsize" for true decompressed size; we've used "extsize" for that, because "rawsize" generally denotes total Datum size including header. This choice seemed particularly unfortunate in tests that were comparing one of these meanings to the other. This patch includes a couple of not-purely-cosmetic changes: be sure that the shifts aligning compression methods are unsigned (not critical today, but will be when compression method 2 exists), and fix broken definition of VARATT_EXTERNAL_GET_COMPRESSION (now VARATT_EXTERNAL_GET_COMPRESS_METHOD), whose callers worked only accidentally. Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/aeb1631ed207cef2d80e20f79eb52c72f03bca7d

  • Short-circuit slice requests that are for more than the object's size. substring(), and perhaps other callers, isn't careful to pass a slice length that is no more than the datum's true size. Since toast_decompress_datum_slice's children will palloc the requested slice length, this can waste memory. Also, close study of the liblz4 documentation suggests that it is dependent on the caller to not ask for more than the correct amount of decompressed data; this squares with observed misbehavior with liblz4 1.8.3. Avoid these problems by switching to the normal full-decompression code path if the slice request is >= datum's decompressed size. Tom Lane and Dilip Kumar Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/063dd37ebc7644e8db6419565b50dca019e69e86

  • Avoid possible crash while finishing up a heap rewrite. end_heap_rewrite was not careful to ensure that the target relation is open at the smgr level before performing its final smgrimmedsync. In ordinary cases this is no problem, because it would have been opened earlier during the rewrite. However a crash can be reproduced by re-clustering an empty table with CLOBBER_CACHE_ALWAYS enabled. Although that exact scenario does not crash in v13, I think that's a chance result of unrelated planner changes, and the problem is likely still reachable with other test cases. The true proximate cause of this failure is commit c6b92041d, which replaced a call to heap_sync (which was careful about opening smgr) with a direct call to smgrimmedsync. Hence, back-patch to v13. Amul Sul, per report from Neha Sharma; cosmetic changes and test case by me. Discussion: https://postgr.es/m/CANiYTQsU7yMFpQYnv=BrcRVqK_3U3mtAzAsJCaqtzsDHfsUbdQ@mail.gmail.com https://git.postgresql.org/pg/commitdiff/9d523119fd38fd205cb9c8ea8e7cceeb54355818

  • Fix psql's \connect command some more. Jasen Betts reported yet another unintended side effect of commit 85c54287a: reconnecting with "\c service=whatever" did not have the expected results. The reason is that starting from the output of PQconndefaults() effectively allows environment variables (such as PGPORT) to override entries in the service file, whereas the normal priority is the other way around. Not using PQconndefaults at all would require yet a third main code path in do_connect's parameter setup, so I don't really want to fix it that way. But we can have the logic effectively ignore all the default values for just a couple more lines of code. This patch doesn't change the behavior for "\c -reuse-previous=on service=whatever". That remains significantly different from before 85c54287a, because many more parameters will be re-used, and thus not be possible for service entries to replace. But I think this is (mostly?) intentional. In any case, since libpq does not report where it got parameter values from, it's hard to do differently. Per bug #16936 from Jasen Betts. As with the previous patches, back-patch to all supported branches. (9.5 is unfortunately now out of support, so this won't get fixed there.) Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/ea80138545043c0cfcff8405b15626796f2695fe

Peter Geoghegan pushed:

  • Recycle nbtree pages deleted during same VACUUM. Maintain a simple array of metadata about pages that were deleted during nbtree VACUUM's current btvacuumscan() call. Use this metadata at the end of btvacuumscan() to attempt to place newly deleted pages in the FSM without further delay. It might not yet be safe to place any of the pages in the FSM by then (they may not be deemed recyclable), but we have little to lose and plenty to gain by trying. In practice there is a very good chance that this will work out when vacuuming larger indexes, where scanning the index naturally takes quite a while. This commit doesn't change the page recycling invariants; it merely improves the efficiency of page recycling within the confines of the existing design. Recycle safety is a part of nbtree's implementation of what Lanin & Shasha call "the drain technique". The design happens to use transaction IDs (they're stored in deleted pages), but that in itself doesn't align the cutoff for recycle safety to any of the XID-based cutoffs used by VACUUM (e.g., OldestXmin). All that matters is whether or not other backends might be able to observe various inconsistencies in the tree structure (that they cannot just detect and recover from by moving right). Recycle safety is purely a question of maintaining the consistency (or the apparent consistency) of a physical data structure. Note that running a simple serial test case involving a large range DELETE followed by a VACUUM VERBOSE will probably show that any newly deleted nbtree pages are not yet reusable/recyclable. This is expected in the absence of even one concurrent XID assignment. It is an old implementation restriction. In practice it's unlikely to be the thing that makes recycling remain unsafe, at least with larger indexes, where recycling newly deleted pages during the same VACUUM actually matters. An important high-level goal of this commit (as well as related recent commits e5d8a999 and 9f3665fb) is to make expensive deferred cleanup operations in index AMs rare in general. If index vacuuming frequently depends on the next VACUUM operation finishing off work that the current operation started, then the general behavior of index vacuuming is hard to predict. This is relevant to ongoing work that adds a vacuumlazy.c mechanism to skip index vacuuming in certain cases. Anything that makes the real world behavior of index vacuuming simpler and more linear will also make top-down modeling in vacuumlazy.c more robust. Author: Peter Geoghegan pg@bowt.ie Reviewed-By: Masahiko Sawada sawada.mshk@gmail.com Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com https://git.postgresql.org/pg/commitdiff/9dd963ae2534e9614f0abeccaafbd39f1b93ff8a

  • nbtree VACUUM: Cope with buggy opclasses. Teach nbtree VACUUM to press on with vacuuming in the event of a page deletion attempt that fails to "re-find" a downlink for its child/target page. There is no good reason to treat this as an irrecoverable error. But there is a good reason not to: pressing on at this point removes any question of VACUUM not making progress solely due to misbehavior from user-defined operator class code. Discussion: https://postgr.es/m/CAH2-Wzma5G9CTtMjbrXTwOym+U=aWg-R7=-htySuztgoJLvZXg@mail.gmail.com https://git.postgresql.org/pg/commitdiff/5b861baa550a369e04bf67fbe83f3a5a8c742fb4

Michaël Paquier pushed:

  • Fix timeline assignment in checkpoints with 2PC transactions. Any transactions found as still prepared by a checkpoint have their state data read from the WAL records generated by PREPARE TRANSACTION before being moved into their new location within pg_twophase/. While reading such records, the WAL reader uses the callback read_local_xlog_page() to read a page, that is shared across various parts of the system. This callback, since 1148e22a, has introduced an update of ThisTimeLineID when reading a record while in recovery, which is potentially helpful in the context of cascading WAL senders. This update of ThisTimeLineID interacts badly with the checkpointer if a promotion happens while some 2PC data is read from its record, as, by changing ThisTimeLineID, any follow-up WAL records would be written to an timeline older than the promoted one. This results in consistency issues. For instance, a subsequent server restart would cause a failure in finding a valid checkpoint record, resulting in a PANIC, for instance. This commit changes the code reading the 2PC data to reset the timeline once the 2PC record has been read, to prevent messing up with the static state of the checkpointer. It would be tempting to do the same thing directly in read_local_xlog_page(). However, based on the discussion that has led to 1148e22a, users may rely on the updates of ThisTimeLineID when a WAL record page is read in recovery, so changing this callback could break some cases that are working currently. A TAP test reproducing the issue is added, relying on a PITR to precisely trigger a promotion with a prepared transaction still tracked. Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao and myself. Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com Backpatch-through: 10 https://git.postgresql.org/pg/commitdiff/595b9cba2ab0cdd057e02d3c23f34a8bcfd90a2d

  • Simplify TAP tests of kerberos with expected log file contents. The TAP tests of kerberos rely on the logs generated by the backend to check various connection scenarios. In order to make sure that a given test does not overlap with the log contents generated by a previous test, the test suite relied on a logic with the logging collector and a rotation of the log files to ensure the uniqueness of the log generated with a wait phase. Parsing the log contents for expected patterns is a problem that has been solved in a simpler way by PostgresNode::issues_sql_like() where the log file is truncated before checking for the contents generated, with the backend sending its output to a log file given by pg_ctl instead. This commit switches the kerberos test suite to use such a method, removing any wait phase and simplifying the whole logic, resulting in less code. If a failure happens in the tests, the contents of the logs are still showed to the user at the moment of the failure thanks to like(), so this has no impact on debugging capabilities. I have bumped into this issue while reviewing a different patch set aiming at extending the kerberos test suite to check for multiple log patterns instead of one now. Author: Michael Paquier Reviewed-by: Stephen Frost, Bharath Rupireddy Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/11e1577a576fec6307aa0bfcde7333e63f907fa7

  • Fix new TAP test for 2PC transactions and PITRs on Windows. The test added by 595b9cb forgot that on Windows it is necessary to set up pg_hba.conf (see PostgresNode::set_replication_conf) with a specific entry or base backups fail. Any node that requires to support replication just needs to pass down allows_streaming at initialization. This updates the test to do so. Simplify things a bit while on it. Per buildfarm member fairywren. Any Windows hosts running this test would have failed, and I have reproduced the problem as well. Backpatch-through: 10 https://git.postgresql.org/pg/commitdiff/992d353a190c551db39bcab2dec0ecf14fbc7a40

  • Fix concurrency issues with WAL segment recycling on Windows. This commit is mostly a revert of aaa3aed, that switched the routine doing the internal renaming of recycled WAL segments to use on Windows a combination of CreateHardLinkA() plus unlink() instead of rename(). As reported by several users of Postgres 13, this is causing concurrency issues when manipulating WAL segments, mostly in the shape of the following error: LOG: could not rename file "pg_wal/000000XX000000YY000000ZZ": Permission denied This moves back to a logic where a single rename() (well, pgrename() for Windows) is used. This issue has proved to be hard to hit when I tested it, facing it only once with an archive_command that was not able to do its work, so it is environment-sensitive. The reporters of this issue have been able to confirm that the situation improved once we switched back to a single rename(). In order to check things, I have provided to the reporters a patched build based on 13.2 with aaa3aed reverted, to test if the error goes away, and an unpatched build of 13.2 to test if the error still showed up (just to make sure that I did not mess up my build process). Extra thanks to Fujii Masao for pointing out what looked like the culprit commit, and to all the reporters for taking the time to test what I have sent them. Reported-by: Andrus, Guy Burgess, Yaroslav Pashinsky, Thomas Trenz Reviewed-by: Tom Lane, Andres Freund Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/16927-67c570d968c99567%40postgresql.org Discussion: https://postgr.es/m/[email protected] Backpatch-through: 13 https://git.postgresql.org/pg/commitdiff/909b449e00fc2f71e1a38569bbddbb6457d28485

  • Add per-index stats information in verbose logs of autovacuum. Once a relation's autovacuum is completed, the logs include more information about this relation state if the threshold of log_autovacuum_min_duration (or its relation option) is reached, with for example contents about the statistics of the VACUUM operation for the relation, WAL and system usage. This commit adds more information about the statistics of the relation's indexes, with one line of logs generated for each index. The index stats were already calculated, but not printed in the context of autovacuum yet. While on it, some refactoring is done to keep track of the index statistics directly within LVRelStats, simplifying some routines related to parallel VACUUMs. Author: Masahiko Sawada Reviewed-by: Michael Paquier, Euler Taveira Discussion: https://postgr.es/m/CAD21AoAy6SxHiTivh5yAPJSUE4S=QRPpSZUdafOSz0R+fRcM6Q@mail.gmail.com https://git.postgresql.org/pg/commitdiff/5aed6a1fc214913de9ac69c1717dc64a2483e16d

  • Reword slightly logs generated for index stats in autovacuum. Using "remain" is confusing, as it implies that the index file can shrink. Instead, use "in total". Per discussion with Peter Geoghegan. Discussion: https://postgr.es/m/CAH2-WzkYgHZzpGOwR14CScJsjaQpvJrEkEfkh_=wGhzLb=yVdQ@mail.gmail.com https://git.postgresql.org/pg/commitdiff/99dd75fb99baa9188971cf47779ed8d7a5e6eb29

  • Sanitize the term "combo CID" in code comments. Combo CIDs were referred in the code comments using different terms across various places of the code, so unify a bit the term used with what is currently in use in some of the READMEs. Author: "Hou, Zhijie" Discussion: https://postgr.es/m/1d42865c91404f46af4562532fdbea31@G08CNEXMBPEKD05.g08.fujitsu.local https://git.postgresql.org/pg/commitdiff/a1999a01bb56c5f5451116abe61b892b2eec5e49

Noah Misch pushed:

Fujii Masao pushed:

  • pgbench: Improve error-handling in \sleep command. This commit improves pgbench \sleep command so that it handles the following three cases more properly. (1) When only one argument was specified in \sleep command and it's not a number, previously pgbench reported a confusing error message like "unrecognized time unit, must be us, ms or s". This commit fixes this so that more proper error message like "invalid sleep time, must be an integer" is reported. (2) When two arguments were specified in \sleep command and the first argument was not a number, previously pgbench treated that argument as the sleep time 0. No error was reported in this case. This commit fixes this so that an error is thrown in this case. (3) When a variable was specified as the first argument in \sleep command and the variable stored non-digit value, previously pgbench treated that argument as the sleep time 0. No error was reported in this case. This commit fixes this so that an error is thrown in this case. Author: Kota Miyake Reviewed-by: Hayato Kuroda, Alvaro Herrera, Fujii Masao Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/8c6eda2d1c926be76baa79c28521275323bd26fd

  • pg_waldump: Fix bug in per-record statistics. pg_waldump --stats=record identifies a record by a combination of the RmgrId and the four bits of the xl_info field of the record. But XACT records use the first bit of those four bits for an optional flag variable, and the following three bits for the opcode to identify a record. So previously the same type of XACT record could have different four bits (three bits are the same but the first one bit is different), and which could cause pg_waldump --stats=record to show two lines of per-record statistics for the same XACT record. This is a bug. This commit changes pg_waldump --stats=record so that it processes only XACT record differently, i.e., filters the opcode out of xl_info and uses a combination of the RmgrId and those three bits as the identifier of a record, only for XACT record. For other records, the four bits of the xl_info field are still used. Back-patch to all supported branches. Author: Kyotaro Horiguchi Reviewed-by: Shinya Kato, Fujii Masao Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/51893c8463501fc9a38e39cc097773dbdfb9db82

  • Change the type of WalReceiverWaitStart wait event from Client to IPC. Previously the type of this wait event was Client. But while this wait event is being reported, walreceiver process is waiting for the startup process to set initial data for streaming replication. It's not waiting for any activity on a socket connected to a user application or walsender. So this commit changes the type for WalReceiverWaitStart wait event to IPC. Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/1e3e8b51bda8ddd59984230f876f199c9ce3166a

  • Log when GetNewOidWithIndex() fails to find unused OID many times. GetNewOidWithIndex() generates a new OID one by one until it finds one not in the relation. If there are very long runs of consecutive existing OIDs, GetNewOidWithIndex() needs to iterate many times in the loop to find unused OID. Since TOAST table can have a large number of entries and there can be such long runs of OIDs, there is the case where it takes so many iterations to find new OID not in TOAST table. Furthermore if all (i.e., 2^32) OIDs are already used, GetNewOidWithIndex() enters something like busy loop and repeats the iterations until at least one OID is marked as unused. There are some reported troubles caused by a large number of iterations in GetNewOidWithIndex(). For example, when inserting a billion of records into the table, all the backends doing that insertion operation got hang with 100% CPU usage at some point. Previously there was no easy way to detect that GetNewOidWithIndex() failed to find unused OID many times. So, for example, gdb full backtrace of hanged backends needed to be taken, in order to investigate that trouble. This is inconvenient and may not be available in some production environments. To provide easy way for that, this commit makes GetNewOidWithIndex() log that it iterates more than GETNEWOID_LOG_THRESHOLD but have not yet found OID unused in the relation. Also this commit makes it repeat logging with exponentially increasing intervals until it iterates more than GETNEWOID_LOG_MAX_INTERVAL, and makes it finally repeat logging every GETNEWOID_LOG_MAX_INTERVAL unless an unused OID is found. Those macro variables are used not to fill up the server log with the similar messages. In the discusion at pgsql-hackers, there was another idea to report the lots of iterations in GetNewOidWithIndex() via wait event. But since GetNewOidWithIndex() traverses indexes to find unused OID and which will do I/O, acquire locks, etc, which will overwrite the wait event and reset it to nothing once done. So that idea doesn't work well, and we didn't adopt it. Author: Tomohiro Hiramitsu Reviewed-by: Tatsuhito Kasahara, Kyotaro Horiguchi, Tom Lane, Fujii Masao Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/7fbcee1b2d5f1012c67942126881bd492e95077e

  • Rename wait event WalrcvExit to WalReceiverExit. Commit de829ddf23 added wait event WalrcvExit. But its name is not consistent with other wait events like WalReceiverMain or WalReceiverWaitStart, etc. So this commit renames WalrcvExit to WalReceiverExit. Author: Fujii Masao Reviewed-by: Thomas Munro Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/84007043fc1b1be68dad5d0a78269347c12094b6

  • Improve connection denied error message during recovery. Previously when an archive recovery or a standby was starting and reached the consistent recovery state but hot_standby was configured to off, the error message when a client connectted was "the database system is starting up", which was needless confusing and not really all that accurate either. This commit improves the connection denied error message during recovery, as follows, so that the users immediately know that their servers are configured to deny those connections.

  • If hot_standby is disabled, the error message "the database system is not accepting connections" and the detail message "Hot standby mode is disabled." are output when clients connect while an archive recovery or a standby is running. - If hot_standby is enabled, the error message "the database system is not yet accepting connections" and the detail message "Consistent recovery state has not been yet reached." are output when clients connect until the consistent recovery state is reached and postmaster starts accepting read only connections. This commit doesn't change the connection denied error message of "the database system is starting up" during normal server startup and crash recovery. Because it's still suitable for those situations. Author: James Coleman Reviewed-by: Alvaro Herrera, Andres Freund, David Zhang, Tom Lane, Fujii Masao Discussion: https://postgr.es/m/CAAaqYe8h5ES_B=F_zDT+Nj9XU7YEwNhKhHA2RE4CFhAQ93hfig@mail.gmail.com https://git.postgresql.org/pg/commitdiff/df9384492b89aac370ab9d12eb89375aeb38a1d4

  • Fix bug in WAL replay of COMMIT_TS_SETTS record. Previously the WAL replay of COMMIT_TS_SETTS record called TransactionTreeSetCommitTsData() with the argument write_xlog=true, which generated and wrote new COMMIT_TS_SETTS record. This should not be acceptable because it's during recovery. This commit fixes the WAL replay of COMMIT_TS_SETTS record so that it calls TransactionTreeSetCommitTsData() with write_xlog=false and doesn't generate new WAL during recovery. Back-patch to all supported branches. Reported-by: lx zou zoulx1982@163.com Author: Fujii Masao Reviewed-by: Alvaro Herrera Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/438fc4a39c3905b7af88bb848bc5aeb1308a017d

Robert Haas pushed:

Tomáš Vondra pushed:

Bruce Momjian pushed:

Amit Kapila pushed:

Peter Eisentraut pushed:

Stephen Frost pushed:

Michael Meskes pushed:

Álvaro Herrera pushed:

  • Remove StoreSingleInheritance reimplementation. I introduced this duplicate code in commit 8b08f7d4820f for no good reason. Remove it, and backpatch to 11 where it was introduced. Author: Álvaro Herrera alvherre@alvh.no-ip.org https://git.postgresql.org/pg/commitdiff/a24ae3d7b9efb3b113c0d53030aa99de0d41b40a

  • Rework HeapTupleHeader macros to reuse itemptr.h. The original definitions pointlessly disregarded existing ItemPointer macros that do the same thing. Reported-by: Michael Paquier michael@paquier.xyz Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/4669cacbd4b4b1baa1b7f2ea53d461433a1b6276

  • Let ALTER TABLE Phase 2 routines manage the relation pointer. Struct AlteredRelationInfo gains a new Relation member, to be used only by Phase 2 (ATRewriteCatalogs); this allows ATExecCmd() subroutines open and close the relation internally. A future commit will use this facility to implement an ALTER TABLE subcommand that closes and reopens the relation across transaction boundaries. (It is possible to keep the relation open past phase 2 to be used by phase 3 instead of having to reopen it that point, but there are some minor complications with that; it's not clear that there is much to be won from doing that, though.) Author: Álvaro Herrera alvherre@alvh.no-ip.org Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/cd03c6e94b09ff402cbc3ce8da5587f09f0b5e58

  • Add comments for AlteredTableInfo->rel. The prior commit which introduced it was pretty squalid in terms of code documentation, so add some comments. https://git.postgresql.org/pg/commitdiff/cc121d5596964f8aac93607e6f14607184558b16

  • Document lock obtained during partition detach. On partition detach, we acquire a SHARE lock on all tables that reference the partitioned table that we're detaching a partition from, but failed to document this fact. My oversight in commit f56f8f8da6af. Repair. Backpatch to 12. Author: Álvaro Herrera alvherre@alvh.no-ip.org Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/650d623530c884c087c565f1d3b8cd76f8fe2b95

  • ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY. Allow a partition be detached from its partitioned table without blocking concurrent queries, by running in two transactions and only requiring ShareUpdateExclusive in the partitioned table. Because it runs in two transactions, it cannot be used in a transaction block. This is the main reason to use dedicated syntax: so that users can choose to use the original mode if they need it. But also, it doesn't work when a default partition exists (because an exclusive lock would still need to be obtained on it, in order to change its partition constraint.) In case the second transaction is cancelled or a crash occurs, there's ALTER TABLE .. DETACH PARTITION .. FINALIZE, which executes the final steps. The main trick to make this work is the addition of column pg_inherits.inhdetachpending, initially false; can only be set true in the first part of this command. Once that is committed, concurrent transactions that use a PartitionDirectory will include or ignore partitions so marked: in optimizer they are ignored if the row is marked committed for the snapshot; in executor they are always included. As a result, and because of the way PartitionDirectory caches partition descriptors, queries that were planned before the detach will see the rows in the detached partition and queries that are planned after the detach, won't. A CHECK constraint is created that duplicates the partition constraint. This is probably not strictly necessary, and some users will prefer to remove it afterwards, but if the partition is re-attached to a partitioned table, the constraint needn't be rechecked. Author: Álvaro Herrera alvherre@alvh.no-ip.org Reviewed-by: Amit Langote amitlangote09@gmail.com Reviewed-by: Justin Pryzby pryzby@telsasoft.com Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/71f4c8c6f74ba021e55d35b1128d22fb8c6e1629

Pending Patches

Kyotaro HORIGUCHI sent in another revision of a patch to make async replica wait for the lsn to be replayed.

Bharath Rupireddy sent in a patch to remove extra memset calls in BloomInitPage, GinInitPage, and SpGistInitPage.

Hou Zhijie sent in another revision of a patch to avoid CommandCounterIncrement in RI trigger when INSERT INTO referencing table.

Amul Sul sent in another revision of a patch to build the infrastructure that will support ALTER SYSTEM READ {ONLY | WRITE}.

Amit Langote sent in another revision of a patch to fix an infelicity between UPDATE ... RETURNING and moving tuples between partitions.

Greg Nancarrow sent in another revision of a patch to enable parallel INSERT and or SELECT for INSERT INTO.

Tang sent in another revision of a patch to support tab completion with a query result for upper case character inputs in psql.

Tom Lane sent in another revision of a patch to allow an alias to be attached directly to a JOIN ... USING per the SQL standard.

David Oksman sent in a patch to implement ALTER TABLE ... RENAME COLUMN IF EXISTS.

Andrei Zubkov sent in two revisions of a patch to add statement entry timestamps to pg_stat_statements.

Thomas Munro sent in another revision of a patch to add PSQL_WATCH_PAGER for psql's \watch command.

Thomas Munro sent in four more revisions of a patch to detect dropped connections while running queries.

Fujii Masao sent in a patch intended to fix a bug that manifested as Failed assertion on standby while shutdown by making the startup process call ShutdownRecoveryTransactionEnvironment() when it exits.

Peter Eisentraut sent in another revision of a patch to add a result_format_auto_binary_types setting.

Jan Wieck sent in three more revisions of a patch to fix pg_upgrade to preserve datdba.

Bertrand Drouvot sent in four more revisions of a patch to implement minimal logical decoding on standbys.

Pavel Stěhule sent in two more revisions of a patch to implement schema variables.

Marcus Wanner sent in two more revisions of a patch to add an xid argument to the filter_prepare callback for output plugins.

Amul Sul sent in two more revisions of a patch to add an RelationGetSmgr() inline function.

Peter Smith and Amit Kapila traded patches to add logical decoding of two-phase transactions.

Euler Taveira de Oliveira and Peter Eisentraut traded patches to add row filtering for logical replication.

Kyotaro HORIGUCHI sent in another revision of a patch to change the stats collector's temporary storage from files to shared memory.

Masahiro Ikeda and Fujii Masao traded patches to make the WAL receiver report WAL statistics.

Bruce Momjian and Julien Rouhaud traded patches to expose queryid in pg_stat_activity, log_line_prefix, and verbose explain.

Atsushi Torikoshi sent in four more revisions of a patch to add a function, pg_get_backend_memory_contexts(), which does what it says on the label.

Daniel Gustafsson sent in two more revisions of a patch to support NSS as a libpq TLS backend.

Michaël Paquier and Jeevan Chalke traded patches to log authenticated identity from all auth backends.

Stephen Frost sent in another revision of a patch to use a WaitLatch for vacuum/autovacuum sleeping.

Stephen Frost sent in three more revisions of a patch to add a documentation stub for the now obsolete recovery.conf.

Justin Pryzby sent in another revision of a patch to add an optional ACCESS METHOD to CREATE TABLE ... LIKE.

Takayuki Tsunakawa sent in two more revisions of a patch to speed up COPY FROM for the case of remote partitions.

Amit Langote sent in another revision of a patch to create foreign key triggers in partitioned tables, and use this to enforce foreign key correctly during cross-partition updates.

David Rowley sent sent in two more revisions of a patch to add a Result Cache executor node.

Li Japin sent in another revision of a patch to implement ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION.

Tomáš Vondra sent in a patch to fix up an opclass storage type.

Fujii Masao sent in another revision of a patch to rename WalrcvExit wait_event to WalReceiverExit.

Andrey V. Lepikhov sent in another revision of a patch to implement global snapshots.

Atsushi Torikoshi sent in two more revisions of a patch to add plan type to pg_stat_statements.

Denis Hirn sent in a patch to allow multiple recursive self-references in WITH RECURSIVE.

Masahiro Ikeda and Fujii Masao traded patches to get pgstat to avoid writing on SIGQUIT.

Kyotaro HORIGUCHI sent in another revision of a patch to protect syscache from bloating with negative cache entries.

Hou Zhijie sent in another revision of a patch to add a nullif case for eval_const_expressions.

Mark Dilger and Robert Haas traded patches to add a pg_amcheck contrib application.

Daniel Gustafsson sent in another revision of a patch to refactor the SSL test harness to allow for multiple libraries.

Pavel Stěhule sent in two more revisions of a patch to add routine labels.

Thomas Munro sent in four more revisions of a patch to make all SLRU buffer sizes configurable.

Peter Geoghegan and Masahiko Sawada traded patches to centralize state for each VACUUM, break lazy_scan_heap() up into functions, remove the tupgone special case from vacuumlazy.c, and skip index vacuuming in some cases.

Kyotaro HORIGUCHI sent in another revision of a patch to implement in-place table persistence change and add a new command, ALTER TABLE ALL IN TABLESPACE SET LOGGED/UNLOGGED, to go with it.

Ashutosh Bapat sent in another revision of a patch to fix a memory leak in decoding speculative inserts with TOAST.

Ekaterina Sokolova sent in another revision of a patch to add extra statistics to explain for Nested Loop.

Pavel Borisov sent in two more revisions of a patch to implement covering SP-GiST indexes, i.e. support for INCLUDE columns.

Marcus Wanner add a concurrent_abort callback for the output plugin.

Joel Jacobson sent in another revision of a patch to add views pg_permissions and pg_ownerships.

Bharath Rupireddy sent in another revision of a patch to make the error messages while adding tables to publications a bit more informative and consistent.

Kyotaro HORIGUCHI sent in another revision of a intended to fix a bug that manifested as Walsender may fail to send wal to the end.

Jim Finnerty sent in another revision of a patch to add a capability to have 64-bit GUCs, use XID_FMT to format xids, and use ClogPageNumber in place of int for type safety.

Sven Klemm sent in a patch to allow CustomScan nodes to signal whether they support projection.

Andrew Dunstan and Nikita Glukhov traded patches to implement the JSON_TABLE part of SQL/JSON.

Andrew Dunstan and Nikita Glukhov traded patches to implement the functions part of SQL/JSON.

Amit Langote and Tom Lane traded patches to make updates in inheritance trees scale better by overhauling how updates compute new tuples, and revise how inherited update/delete are handled.

David Steele sent in two revisions of a patch to document the fact that backup labels may need to be opened in binary mode on Windows.

Cai Mengjuan sent in a patch to update walrcv->flushedUpto each time when requesting xlog streaming.

Andrew Dunstan sent in another revision of a patch to allow matching the whole DN from a client certificate.

Masahiro Ikeda sent in a patch to improve the performance of reporting wal stats.

Tomáš Vondra sent in a patch to show applied extended statistics in explain.

Noah Misch sent in another revision of a patch to add a public schema default ACL.

Lætitia Avrot sent in two revisions of a patch to make it possible to dump only functions using pg_dump.

Noah Misch sent in another revision of a patch to accept slightly-filled pages for tuples larger than fillfactor.

Álvaro Herrera sent in two more revisions of a patch to add tracing capability to libpq.

Kazutaka Onishi sent in another revision of a patch to make TRUNCATE on foreign tables work.

Andrew Dunstan sent in another revision of a patch to implement global temporary tables.

Yoan SULTAN sent in a patch to make it possible for pg_stat_statements to track the most recent statement.

David Rowley sent in another revision of a patch to get better results from valgrind leak tracking.