PostgreSQL

(September 2019)

https://www.postgresql.org/docs/

In addition to the excellent web docs and unix man pages, the psql client has two forms on help: \h for SQL command and \? for psql commands.

mydb=> help
You are using psql, the command-line interface to PostgreSQL.
Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

🐚 ~ $ sudo apt install postgresql
🐚 ~ $ sudo -u postgres psql
🐚 ~ $ sudo -u postgres createuser -e --interactive --pwprompt paul
🐚 ~ $ psql -h 127.0.0.1 -U paul -W -d template1
template1=> CREATE DATABASE mydb;
CREATE DATABASE
template1=> \connect mydb
Password for user paul: 
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
You are now connected to database "mydb" as user "paul".
mydb=> \list
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 mydb      | paul     | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(4 rows)

mydb=> \quit

A cluser can hold many databases.
Databases can hold many schemas.
Every database starts with a default schema named public.
Coming from MySQL, it’s mostly safe to ignore schemas (just use the default, implicit public schema) except for the purpose of granting permissions.

=> REVOKE ALL ON DATABASE mydb FROM public;
=> GRANT CONNECT ON DATABASE mydb TO mygrp;
=> GRANT USAGE ON SCHEMA public TO mygrp;
=> GRANT ALL ON ALL TABLES IN SCHEMA public TO mygrp;
=> GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO mygrp;
=> ALTER DEFAULT PRIVILEGES FOR ROLE myusr IN SCHEMA public
=> GRANT ALL ON TABLES TO mygrp;
=> ALTER DEFAULT PRIVILEGES FOR ROLE myusr IN SCHEMA public
=> GRANT ALL ON SEQUENCES TO mygrp;
=> GRANT mygrp TO myusr;
=> \du

Use of groups isn’t required.

🐚 ~ $ sudo -u postgres psql
psql (11.5 (Debian 11.5-1+deb10u1))
Type "help" for help.

postgres=# CREATE DATABASE mydb;
CREATE DATABASE
postgres=# CREATE USER myuser WITH ENCRYPTED PASSWORD 'mysecret';
CREATE ROLE
postgres=# GRANT ALL PRIVILEGES ON DATABASE mydb TO myuser;
GRANT
postgres=# \q
🐚 ~ $ psql -h 127.0.0.1 -W -U myuser -d mydb
Password: 
psql (11.5 (Debian 11.5-1+deb10u1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

mydb=>

Import a file? Either of these:

mydb=> \i create.sql
🐚 ~ $ sudo -u postgres psql mydb < dumpfile.sql

Template Databases

https://www.postgresql.org/docs/current/manage-ag-templatedbs.html

CREATE DATABASE actually works by copying an existing database. By default, it copies the standard system database named template1. Thus that database is the “template” from which new databases are made. If you add objects to template1, these objects will be copied into subsequently created user databases. This behavior allows site-local modifications to the standard set of objects in databases. For example, if you install the procedural language PL/Perl in template1, it will automatically be available in user databases without any extra action being taken when those databases are created.

There is a second standard system database named template0. This database contains the same data as the initial contents of template1, that is, only the standard objects predefined by your version of PostgreSQL. template0 should never be changed after the database cluster has been initialized. By instructing CREATE DATABASE to copy template0 instead of template1, you can create a “virgin” user database that contains none of the site-local additions in template1. This is particularly handy when restoring a pg_dump dump: the dump script should be restored in a virgin database to ensure that one recreates the correct contents of the dumped database, without conflicting with objects that might have been added to template1 later on.

Another common reason for copying template0 instead of template1 is that new encoding and locale settings can be specified when copying template0, whereas a copy of template1 must use the same settings it does. This is because template1 might contain encoding-specific or locale-specific data, while template0 is known not to.

To create a database by copying template0, use:

=> CREATE DATABASE dbname TEMPLATE template0;

from the SQL environment, or from the shell:

🐚 ~ $ createdb -T template0 dbname

Backup and Restore

https://www.postgresql.org/docs/current/backup.html

🐚 ~ $ sudo -u postgres pg_dump mydb > dumpfile.sql
🐚 ~ $ sudo -u postgres psql mydb < dumpfile.sql

The file created by pg_dump does not recreate the database. When restoring from scratch, manually create the database before restoring the dump file.

The dumps produced by pg_dump are relative to template0. This means that any languages, procedures, etc. added via template1 will also be dumped by pg_dump. As a result, when restoring, if you are using a customized template1, you must create the empty database from template0

After restoring a backup, it is wise to run ANALYZE on each database so the query optimizer has useful statistics.

pg_dump output also does not include roles or tablespaces, since they don’t exist per-database.

Use pg_dumpall to back up the entire contents of a database cluster:

🐚 ~ $ sudo -u postgres pg_dumpall > dumpfile
🐚 ~ $ sudo -u postgres psql -f dumpfile postgres

(Actually, you can specify any existing database name to start from, but if you are loading into an empty cluster then postgres should usually be used.)

Since pg_dump writes to STDOUT, things like this work:

🐚 ~ $ pg_dump dbname | gzip > filename.gz
🐚 ~ $ gunzip -c filename.gz | psql dbname

Routine Maintenance

Vacuuming:

recovers space from updated or deleted rows
updates stats used by the PostgreSQL query planner
updates the visibility map to speed up index-only scans
guard against data loss from transaction and multixact ID wraparound

Two type of vacuuming:

Normal VACUUM can run during regular production database operations (apart from ALTER TABLE not being available).
FULL VACUUM reclaims more space, but runs more slowly and locks tables.

PostgreSQL offers an optional but recommended autovacuum feature that automates VACUUM and ANALYZE. Because it uses the stats to optimize its operations, track_counts must be set true for autovacuum to work.

A persistent daemon, the autovacuum launcher, starts autovacuum workder processes.

On Debian, the postgresql package appears to enable autovacuum to default.

Reindexing:

Index pages remain allocated even only a few index keys have not been deleted. Reindexing reallocates mostly empty pages.
Indexes become less efficient after being heavily modified. Reindexing improves speed by moving logically adjacent pages to also be physically adjacent.

REINDEX can be used safely and easily, althought the command requires an exclusive table lock.

Make sure log rotation works. PostgreSQL has a built-in log rotation feature, or use the system log rotation. If the system has syslog already set up to handle log rotation, it may be desirable to set log_destination to syslog in postgresql.conf.

Schemas

https://www.postgresql.org/docs/current/ddl-schemas.html

A PostgreSQL database cluster contains one or more named databases. Users and groups of users are shared across the entire cluster, but no other data is shared across databases. Any given client connection to the server can access only the data in a single database, the one specified in the connection request.

Note: Users of a cluster do not necessarily have the privilege to access every database in the cluster. Sharing of user names means that there cannot be different users named, say, joe in two databases in the same cluster; but the system can be configured to allow joe access to only some of the databases.

A database contains one or more named schemas, which in turn contain tables. Schemas also contain other kinds of named objects, including data types, functions, and operators. The same object name can be used in different schemas without conflict; for example, both schema1 and myschema can contain tables named mytable. Unlike databases, schemas are not rigidly separated: a user can access objects in any of the schemas in the database they are connected to, if they have privileges to do so.

There are several reasons why one might want to use schemas:

To allow many users to use one database without interfering with each other.

To organize database objects into logical groups to make them more manageable.

Third-party applications can be put into separate schemas so they do not collide with the names of other objects.

Schemas are analogous to directories at the operating system level, except that schemas cannot be nested.

CREATE SCHEMA myschema;
CREATE TABLE myschema.mytable (
	…
);
CREATE SCHEMA schema_name AUTHORIZATION user_name;
DROP SCHEMA myschema CASCADE;

By default tables (and other objects) are automatically put into a schema named “public”. Every new database contains such a schema. Thus, the following are equivalent:

CREATE TABLE products ( … );
CREATE TABLE public.products ( … );

Qualified names are tedious to write, and it’s often best not to wire a particular schema name into applications anyway. Therefore tables are often referred to by unqualified names, which consist of just the table name. The system determines which table is meant by following a search path, which is a list of schemas to look in. The first matching table in the search path is taken to be the one wanted. If there is no match in the search path, an error is reported, even if matching table names exist in other schemas in the database.

The ability to create like-named objects in different schemas complicates writing a query that references precisely the same objects every time. It also opens up the potential for users to change the behavior of other users’ queries, maliciously or accidentally. Due to the prevalence of unqualified names in queries and their use in PostgreSQL internals, adding a schema to search_path effectively trusts all users having CREATE privilege on that schema. When you run an ordinary query, a malicious user able to create objects in a schema of your search path can take control and execute arbitrary SQL functions as though you executed them.

The first schema named in the search path is called the current schema. Aside from being the first schema searched, it is also the schema in which new tables will be created if the CREATE TABLE command does not specify a schema name.

To show the current search path, use the following command:

SHOW search_path;

In the default setup this returns:

 search_path
--------------
 "$user", public

The first element specifies that a schema with the same name as the current user is to be searched. If no such schema exists, the entry is ignored. The second element refers to the public schema that we have seen already.

The first schema in the search path that exists is the default location for creating new objects. That is the reason that by default objects are created in the public schema. When objects are referenced in any other context without schema qualification (table modification, data modification, or query commands) the search path is traversed until a matching object is found. Therefore, in the default configuration, any unqualified access again can only refer to the public schema.

To put our new schema in the path, we use:

SET search_path TO myschema,public;

(We omit the $user here because we have no immediate need for it.) And then we can access the table without schema qualification:

DROP TABLE mytable;

Also, since myschema is the first element in the path, new objects would by default be created in it.

We could also have written:

SET search_path TO myschema;

In the SQL standard, the notion of objects in the same schema being owned by different users does not exist. Moreover, some implementations do not allow you to create schemas that have a different name than their owner. In fact, the concepts of schema and user are nearly equivalent in a database system that implements only the basic schema support specified in the standard. Therefore, many users consider qualified names to really consist of user_name.table_name. This is how PostgreSQL will effectively behave if you create a per-user schema for every user.

Sequences

A sequence is a database object that generates numbers. PostgreSQL can use sequences for primary keys, similarly to MySQL AUTO_INCREMENT.

CREATE TABLE widgets (
    id    SERIAL PRIMARY KEY,
    name  TEXT,
    age   INT4
);

When used as above, PostgreSQL creates an index on the id column, and assumes the associated sequence is used only to generate values for that column. If we drop the column, PostgreSQL automatically removes the sequence.

However sequences can be created as independent database objects.

CREATE SEQUENCE threes INCREMENT BY 3 MINVALUE 3 MAXVALUE 9 CYCLE;

CREATE TABLE foo {
	name VARCHAR(256),
	three SMALLINT DEFAULT nextval('threes') NOT NULL

The threes sequence would persist if we drop table foo.

Sequences are not transactional. If we roll back a transaction, any sequence advanced remains at the advanced value.

https://www.postgresql.org/docs/current/sql-createsequence.html

Search

https://www.postgresql.org/docs/current/textsearch.html

Full text search looks for documents that satisfy a query. The document may be a logical entity comprised of data from multiple tables.