PostgreSQL Weekly News - March 7, 2021

Posted on 2021-03-08 by PWN
PWN

PostgreSQL Weekly News - March 7, 2021

PostgreSQL Product News

parquet_s3_fdw 0.1, a foreign data wrapper for parquet files on S3, released. https://github.com/pgspider/parquet_s3_fdw/releases/tag/v0.1

PostgreSQL Jobs for March

https://archives.postgresql.org/pgsql-jobs/2021-03/

PostgreSQL in the News

Planet PostgreSQL: https://planet.postgresql.org/

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm PST8PDT to [email protected].

Applied Patches

Thomas Munro pushed:

Amit Kapila pushed:

  • Update the docs and comments for decoding of prepared xacts. Commit a271a1b50e introduced decoding at prepare time in ReorderBuffer. This can lead to deadlock for out-of-core logical replication solutions that uses this feature to build distributed 2PC in case such transactions lock [user] catalog tables exclusively. They need to inform users to not have locks on catalog tables (via explicit LOCK command) in such transactions. Reported-by: Andres Freund Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/b4e3dc7fd420cdc2287f30a638c4affd071f01b2

  • Update docs of logical replication for commit ce0fdbfe97. Forgot to update the logical replication configuration settings page. After commit ce0fdbfe97, table synchronization workers also started using replication origins to track the progress and the same should be reflected in docs. Author: Amit Kapila Discussion: https://postgr.es/m/CAA4eK1KkbppndxxRKbaT2sXrLkdPwy44F4pjEZ0EDrVjD9MPjQ@mail.gmail.com https://git.postgresql.org/pg/commitdiff/cf54e04b9e29a13c5e7c2386c5e2ff785e830414

  • Avoid repeated decoding of prepared transactions after a restart. In commit a271a1b50e, we allowed decoding at prepare time and the prepare was decoded again if there is a restart after decoding it. It was done that way because we can't distinguish between the cases where we have not decoded the prepare because it was prior to consistent snapshot or we have decoded it earlier but restarted. To distinguish between these two cases, we have introduced an initial_consistent_point at the slot level which is an LSN at which we found a consistent point at the time of slot creation. This is also the point where we have exported a snapshot for the initial copy. So, prepare transaction prior to this point are sent along with commit prepared. This commit bumps SNAPBUILD_VERSION because of change in SnapBuild. It will break existing slots which is fine in a major release. Author: Ajin Cherian, based on idea by Andres Freund Reviewed-by: Amit Kapila and Vignesh C Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/8bdb1332eb51837c15a10a972c179b84f654279e

  • Add option to enable two_phase commits via pg_create_logical_replication_slot. Commit 0aa8a01d04 extends the output plugin API to allow decoding of prepared xacts and allowed the user to enable/disable the two-phase option via pg_logical_slot_get_changes(). This can lead to a problem such that the first time when it gets changes via pg_logical_slot_get_changes() without two_phase option enabled it will not get the prepared even though prepare is after consistent snapshot. Now next time during getting changes, if the two_phase option is enabled it can skip prepare because by that time start decoding point has been moved. So the user will only get commit prepared. Allow to enable/disable this option at the create slot time and default will be false. It will break the existing slots which is fine in a major release. Author: Ajin Cherian Reviewed-by: Amit Kapila and Vignesh C Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/19890a064ebf53dedcefed0d8339ed3d449b06e6

  • Clarify the usage of max_replication_slots on the subscriber side. It was not clear in the docs that the max_replication_slots is also used to track replication origins on the subscriber side. Author: Paul Martinez Reviewed-by: Amit Kapila Backpatch-through: 10 where logical replication was introduced Discussion: https://postgr.es/m/CACqFVBZgwCN_pHnW6dMNCrOS7tiHCw6Retf_=U2Vvj3aUSeATw@mail.gmail.com https://git.postgresql.org/pg/commitdiff/8af3c233e423e106a5121000ecd92abf57c11704

Andres Freund pushed:

  • Fix recovery test hang in 021_row_visibility.pl on windows. The psql processes were not explicitly killed (but would eventually exit due postgres shutting down). For some reason windows perl doesn't like that, resulting in errors like Warning: unable to close filehandle GEN20 properly: Bad file descriptor during global destruction. The test was introduced in d6734a897e3, so no backpatching necessary. https://git.postgresql.org/pg/commitdiff/1e6e40447115ca7b4749d7d117b81b016ee5e2c2

Tom Lane pushed:

  • Improve reporting for syntax errors in multi-line JSON data. Point to the specific line where the error was detected; the previous code tended to include several preceding lines as well. Avoid re-scanning the entire input to recompute which line that was. Simplify the logic a bit. Add test cases. Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com https://git.postgresql.org/pg/commitdiff/ffd3944ab9d481906137bc7d20f5325a2bd68acc

  • Fix semantics of regular expression back-references. POSIX defines the behavior of back-references thus: The back-reference expression '\n' shall match the same (possibly empty) string of characters as was matched by a subexpression enclosed between "(" and ")" preceding the '\n'. As far as I can see, the back-reference is supposed to consider only the data characters matched by the referenced subexpression. However, because our engine copies the NFA constructed from the referenced subexpression, it effectively enforces any constraints therein, too. As an example, '(^.)\1' ought to match 'xx', or any other string starting with two occurrences of the same character; but in our code it does not, and indeed can't match anything, because the '^' anchor constraint is included in the backref's copied NFA. If POSIX intended that, you'd think they'd mention it. Perl for one doesn't act that way, so it's hard to conclude that this isn't a bug. Fix by modifying the backref's NFA immediately after it's copied from the reference, replacing all constraint arcs by EMPTY arcs so that the constraints are treated as automatically satisfied. This still allows us to enforce matching rules that depend only on the data characters; for example, in '(^\d+).*\1' the NFA matching step will still know that the backref can only match strings of digits. Perhaps surprisingly, this change does not affect the results of any of a rather large corpus of real-world regexes. Nonetheless, I would not consider back-patching it, since it's a clear compatibility break. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/4aea704a5bfd4b5894a268499369ccab89940c9c

  • Improve performance of regular expression back-references. In some cases, at the time that we're doing an NFA-based precheck of whether a backref subexpression can match at a particular place in the string, we already know which substring the referenced subexpression matched. If so, we might as well forget about the NFA and just compare the substring; this is faster and it gives an exact rather than approximate answer. In general, this optimization can help while we are prechecking within the second child expression of a concat node, while the capture was within the first child expression; then the substring was saved during cdissect() of the first child and will be available to NFA checks done while cdissect() recurses into the second child. It can help quite a lot if the tree looks like concat / \ capture concat / \ expensive stuff backref as we will be able to avoid recursively dissecting the "expensive stuff" before discovering that the backref isn't satisfied with a particular midpoint that the lower concat node is testing. This doesn't help if the concat tree is left-deep, as the capture node won't get set soon enough (and it's hard to fix that without changing the engine's match behavior). Fortunately, right-deep concat trees are the common case. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/0c3405cf11a12da1a4278c6833f4d979fe06c866

  • Suppress unnecessary regex subre nodes in a couple more cases. This extends the changes made in commit cebc1d34e, teaching parseqatom() to generate fewer or cheaper subre nodes in some edge cases. The case of interest here is a quantified atom that is "messy" only because it has greediness opposite to what preceded it (whereas captures and backrefs are intrinsically messy). In this case we don't need an iteration node, since we don't care where the sub-matches of the quantifier are; and we might also not need a second concatenation node. This seems of only marginal real-world use according to my testing, but I wanted to get it in before wrapping up this series of regex performance fixes. Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/4604f83fdfe030a2f1984159ede5270c1d825310

  • Mark default_transaction_read_only as GUC_REPORT. This allows clients to find out the setting at connection time without having to expend a query round trip to do so; which is helpful when trying to identify read/write servers. (One must also look at in_hot_standby, but that's already GUC_REPORT, cf bf8a662c9.) Modifying libpq to make use of this will come soon, but I felt it cleaner to push the server change separately. Haribabu Kommi, Greg Nancarrow, Vignesh C; reviewed at various times by Laurenz Albe, Takayuki Tsunakawa, Peter Smith. Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com https://git.postgresql.org/pg/commitdiff/d16f8c8e416d288bd4734ed5f14076b62ec8d153

  • Extend the abilities of libpq's target_session_attrs parameter. In addition to the existing options of "any" and "read-write", we now support "read-only", "primary", "standby", and "prefer-standby". "read-write" retains its previous meaning of "transactions are read-write by default", and "read-only" inverts that. The other three modes test specifically for hot-standby status, which is not quite the same thing. (Setting default_transaction_read_only on a primary server renders it read-only to this logic, but not a standby.) Furthermore, if talking to a v14 or later server, no extra network round trip is needed to detect the session's status; the GUC_REPORT variables delivered by the server are enough. When talking to an older server, a SHOW or SELECT query is issued to detect session read-only-ness or server hot-standby state, as needed. Haribabu Kommi, Greg Nancarrow, Vignesh C, Tom Lane; reviewed at various times by Laurenz Albe, Takayuki Tsunakawa, Peter Smith. Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com https://git.postgresql.org/pg/commitdiff/ee28cacf619f4d9c23af5a80e1171a5adae97381

  • Silence perlcritic warning in commit ee28cacf6. Per buildfarm; this fix is from Michael Paquier (vignesh C proposed nearly the same). Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/d422a2a94b1e7a7bb25da9d5511fffff750c3f21

  • Make test_target_session_attrs more robust against connection failure. Feed the desired command to psql via "-c" not stdin, else Perl may complain that it can't push stdin to an already-failed psql process, as seen in intermittent buildfarm failures. Make some minor cosmetic improvements while at it. Before commit ee28cacf6 we had no tests here that expected failure to connect, so there seems no need for a back-patch. Discussion: https://postgr.es/m/CALDaNm2mo8YED=M2ZJKGf1U3F3mw6SaQuLXWCK8rZP6sECYcrA@mail.gmail.com https://git.postgresql.org/pg/commitdiff/3769e11a31831fc2f3bd4c4a24b4f45c352fb8fb

  • Add trim_array() function. This has been in the SQL spec since 2008. It's a pretty thin wrapper around the array slice functionality, but the spec says we should have it, so here it is. Vik Fearing, reviewed by Dian Fay Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/0a687c8f103d217ff1ca8c34a644b380d89bb0ad

  • Remove deprecated containment operators for contrib types. Since PG 8.2, @ and ~ have been deprecated aliases for the containment operators @> and <@. It seems like enough time has passed to actually remove them, so do so. This completes the project begun in commit 2f70fdb06. Note that in the core types, the relation to the preferred operator names was reversed from what it is in these contrib modules. The confusion that induced was a large part of the reason for deprecation. Justin Pryzby Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/112d411fbeb56afd18c117e20b524a86afc9aba5

  • Doc: remove obsolete entries in table of BRIN strategy numbers. See commit 591d282e8. Noted by Justin Pryzby Discussion: https://postgr.es/m/[email protected] https://git.postgresql.org/pg/commitdiff/0ce4cd04da558178b0186057b721c50a00b7a945

  • Add binary I/O capability for cube datatype. We can adjust the not-yet-released cube--1.4--1.5.sql upgrade rather than making a whole new version. KaiGai Kohei Discussion: https://postgr.es/m/CAOP8fzZO4y60QPTK=RGDXeVeVHV9tLHKOsh7voUOoUouVCPV8A@mail.gmail.com https://git.postgresql.org/pg/commitdiff/1265a9c8f871cfa5aea4d607b1ab72b45b96dbb7

Michaël Paquier pushed:

Álvaro Herrera pushed:

Peter Geoghegan pushed:

  • Fix nbtree page deletion error messages. Adjust some "can't happen" error messages that assumed that the page deletion target page must be a half-dead page. This assumption was wrong in the case of an internal target page. Simply refer to these pages as the target page instead. Internal pages are never marked half-dead. There is exactly one half-dead page for each subtree undergoing deletion. The half-dead page is also the target subtree's leaf-level page. This has been the case since commit efada2b8, which totally overhauled nbtree page deletion. https://git.postgresql.org/pg/commitdiff/3d8d5787a358156edaa7782f0c88e231af974a01

  • nbtree page deletion: Add leaftopparent assertion. Add documenting assertion. This makes it easier to follow how we maintain the top parent link in target subtree's half-dead/leaf level page. https://git.postgresql.org/pg/commitdiff/5b2f2af3d9d57626b9052a05337e32ad1399093d

Peter Eisentraut pushed:

Heikki Linnakangas pushed:

Fujii Masao pushed:

Andrew Dunstan pushed:

Magnus Hagander pushed:

Pending Patches

Pavel Stěhule sent in another revision of a patch to implement schema variables.

Maxim Orlov sent in a patch to fix an infelicity between TLS and multiple hosts in libpq connect strings.

Thomas Munro sent in two more revisions of a patch to add a condition variable for walreceiver state, another for for recovery pause/resume, and poll the postmaster less frequently in recovery.

Ajin Cherian and Amit Kapila traded patches to add an option to enable two_phase commits via pg_create_logical_replication_slot.

Michael Banck sent in a patch to add a --create-only option to pg_dump/pg_dumpall.

Michaël Paquier sent in another revision of a patch to clarify the documentation of signal handling in the archiver.

Etsuro Fujita sent in another revision of a patch to implement asynchronous append on postgres_fdw nodes.

Peter Eisentraut sent in another revision of a patch to deal with macOS SIP in builds.

Laurenz Albe and Amit Langote traded patches to allow setting parallel_workers on partitioned tables.

Joel Jacobson sent in three revisions of a patch to implement regexp_positions(), which returns the starting and ending positions of each match.

Thomas Munro sent in two more revisions of a patch to provide a new optional GUC that can be used to check whether the client connection has gone away periodically while running very long queries.

Paul Förster sent in another revision of a patch to mention URIs and services in psql --help's output.

Peter Smith and Ajin Cherian traded patches to implement logical decoding of two-phase transactions.

Greg Nancarrow and Amit Kapila traded patches to enable parallel SELECT for "INSERT INTO ... SELECT ...", add a new boolean GUC, enable_parallel_dml, and a similar table option, parallel_dml_enabled.

Dian M Fay sent in two revisions of a patch to suppress explicit casts of text constants in postgres_fdw when the other side of a binary OpExpr is also text.

Dilip Kumar sent in three more revisions of a patch to add a compression method option for tables.

Julien Rouhaud sent in another revision of a patch to move query jumbling from pg_stat_statements into core, expose queryid in pg_stat_activity and log_line_prefix, and expose query identifier in EXPLAIN ... VERBOSE.

Bruce Momjian sent in a patch to fix up some GiST code comments.

Vik Fearing sent in another revision of a patch to implement GROUP BY DISTINCT, which elides duplicated GROUPING SETs.

Mark Dilger sent in another revision of a patch to add a contrib extension, pg_amcheck.

Pavel Borisov sent in another revision of a patch to make amcheck checking a UNIQUE constraint for btree indexes.

Jacob Champion sent in another revision of a patch to add an API to the table AM to accept a column projection list.

Joel Jacobson sent in two revisions of a patch to support empty ranges with bounds information.

Tomáš Vondra sent in three more revisions of a patch to implement BRIN multi-range indexes.

Peter Eisentraut sent in another revision of a patch to implement SQL-standard function body.

Thomas Munro sent in two more revisions of a patch to implement parallel full hash join.

Magnus Hagander sent in five revisions of a patch to add PROXY protocol support.

Vigneshwaran C sent in a patch to fix a Buildfarm failure in crake by separating a declaration from an assignment in Perl code.

Michaël Paquier sent in another revision of a patch to enable the libcrypto callbacks before checking whether SSL needs to be done.

Daniel Gustafsson sent in another revision of a patch to support NSS as a libpq TLS backend.

Gilles Darold sent in a patch to implement regexp_count, regexp_instr, regexp_substr and regexp_replace.

David Rowley sent in another revision of a patch to reduce the number of special cases to build contrib modules on Windows.

Amit Langote sent in another revision of a patch to make UPDATE and DELETE scale better on inheritance trees.

John Naylor sent in a patch to review some alternatives to the current popcount implementations.

Mark Rofail sent in another revision of a patch to implement foreign key arrays.

Pavel Stěhule sent in three more revisions of a patch to make psql able to use a pager for \watch.

Tom Lane sent in another revision of a patch to remove support for COPY FROM STDIN in protocol version 2.

Mark Dilger sent in two revisions of a patch to add a pg_amcheck contrib application.

Soumyadeep Chakraborty and Kyotaro HORIGUCHI traded patches to prevent the checkpointer from writing to older timeline.

Amit Kapila sent in another revision of a patch to track replication origin progress for rollbacks.

Masahiro Ikeda and Fujii Masao traded patches to add WAL write/fsync statistics to pg_stat_wal.

Justin Pryzby and Takayuki Tsunakawa traded patches to speed up COPY FROM on tables with foreign partitions.

Álvaro Herrera and Justin Pryzby traded patches to add batch/pipelining support for libpq.

Daniel Gustafsson sent in three more revisions of a patch to disallow SSL compression.

Andrey V. Lepikhov and Tom Lane traded patches to remove the 64K rangetable limit.

Robins Tharakan sent in a patch to fix pg_upgrade for large numbers of large objects.

Jacob Champion sent in a patch to rework the sslfiles Makefile target.

Kyotaro HORIGUCHI sent in another revision of a patch to move the callback call from ReadPageInternal to XLogReadRecord, move the page reader out of XLogReadRecord, remove globals readOff, readLen and readSegNo, and allow xlogreader to use different xlog blocksizes.

Andy Fan sent in a patch to adjust the cost model for the partition pruning case.

Kyotaro HORIGUCHI sent in another revision of a patch to make the End-Of-Recovery error less scary.

Heikki Linnakangas sent in another revision of a patch to force lookahead in COPY FROM parsing.

Daniel Gustafsson sent in another revision of a patch to remove deprecated parameters authtype and pqtty from libpq.

Amul Sul sent in another revision of a patch to implement wal prohibit state using a global barrier, and error or Assert before START_CRIT_SECTION for WAL writes.

Thomas Munro sent in a patch to make relfile tombstone files conditional on WAL level.

Amit Langote sent in another revision of a patch to make some cosmetic improvements to the partition pruning step generation code.

Thomas Munro sent in another revision of a patch to replace buffer I/O locks with condition variables.

Thomas Munro sent in two more revisions of a patch to add missing pthread_barrier_t, refactor the way thread portability is done in pgbench, improve pgbench's time measurement code, and synchronize its client threads.

Julien Rouhaud sent in another revision of a patch to fix various shared memory estimates.

Laurenz Albe sent in two more revisions of a patch to improve \e, \ef and \ev if the editor is quit without saving by retaining the current query buffer rather than executing the previous query.

Masahiko Sawada sent in two revisions of a patch to ensure that pg_stat_xxx_tables.n_mod_since_analyze is reset on TRUNCATE.

Tomáš Vondra sent in two more revisions of a patch to make it possible to collect extended statistics on expressions.

Hayato Kuroda sent in another revision of a patch to refactor ECPGconnect and allow IPv6 connections in same.

Kyotaro HORIGUCHI sent in another revision of a patch to delay checkpoint completion after truncate success.

Kota Miyake sent in a patch to remove some warts from pgbench's \sleep metacommand.

Hou Zhijie sent in a patch to avoid CCI in RI trigger when INSERTing an FK relation.

Ibrar Ahmed sent in another revision of a patch to fix tests broken by an earlier patch to make GROUP BY work more efficiently.

Kirk Jamison sent in another revision of a patch to implement tracing in libpq.

Kyotaro HORIGUCHI sent in another revision of a patch to change the stats collector from using files for temporary storage to shared memory.

Andres Freund sent in another revision of a patch to ensure that on Windows, PostgreSQL is only considered to be running as a service if stderr is invalid.

Thomas Munro sent in another revision of a patch to inject fault timing, and fix a race condition in parallel hash join batch cleanup.

Andrew Dunstan sent in a patch to allow matching the whole DN from a client certificate.

KaiGai Kohei and Tom Lane traded patches to add binary input/output handlers to contrib/cube.

Bharath Rupireddy sent in two more revisions of a patch to refactor the code for refreshing materialized views into a group of functions, and implement EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW.

Justin Pryzby sent in a patch to bootstrap that makes it possible to convert a Typ to a List*, and use that to allow composite types in bootstrap.

Kyotaro HORIGUCHI sent in a patch to fix an intermittent failure in 011_crash_recovery.pl by adding a CHECKPOINT.

Joel Jacobson sent in a patch to create a pg_permissions view.

Joel Jacobson sent in a patch to create a pg_ownerships view.

Justin Pryzby sent in another revision of a patch to make it possible to run CLUSTER on partitioned indexes.

Bharath Rupireddy sent in another revision of a patch to add another flavor of pg_terminate_backend that allows people to specify wait and timeout.

Bharath Rupireddy sent in another revision of a patch to improve the "PID XXXX is not a PostgreSQL server process" message by splitting out the case where sending a signal is not allowed.

Li Japin sent in another revision of a patch to implement ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION.

Joe Conway sent in a patch to fix a bug in has_column_privilege()" that got attnums and non-existent columns wrong.

Justin Pryzby sent in another revision of a patch to implement ALTER TABLE SET ACCESS METHOD, allow specifying the table acccess method of partitioned tables to be inherited by partitions, and implement lsyscache get_rel_relam() because a table's AM is now in play.