PostgreSQL (postgres or pg) is an amazing open source relational database that provides the SQL DSL for interacting with data. Installation is a breeze with any package manager, packages to grab:

  • postgresql and postgresql-common: core server
  • postgresql-client-common and postgresql-client: client libs and binaries
  • postgresql-contrib: useful bolt on modules

Once installed, is managed as a daemon by systemd.

$ sudo systemctl start postgresql
$ sudo systemctl stop postgresql
$ sudo systemctl restart postgresql
$ sudo systemctl reload postgresql
$ sudo systemctl status postgresql

Core Concepts

Configuration

Depends on distro, generally somewhere like /etc/postgresql/11/main.

  • postgresql.conf the main config file, by default psql runs on port 5432, and the data directory lives in var somewhere like /var/lib/postgresql/11/main
  • pg_hba.conf host based authentication
  • pg_ident.conf ident authentication config

Roles and Security

Postgres simplifies users and groups, into the role (i.e. they are users, they are roles).

A number of authentication methods are available, with the default being ident.

  • ident and peer: if a local nix account that has the same name as a psql role, can sign in as that role. Optional user to role mappings can go in pg_ident.conf. ident applies to TCP/IP, and peer for local connections.
  • md5: takes an MD5 hashed password
  • password: clear text
  • trust: no password needed, based on an IP/user/DB match
  • reject: deny immediately
  • GSSAPI: standard secure auth protocol defined in RFC2743, and often combined with Kerberos
  • SSPI: for Microsoft environments, uses Kerberos when possible, will fall back to NTLM
  • and lots more kerberos, LDAP, RADIUS, Certificate and PAM

By default, only the postgres user has read/write access to the configuration files, to edit make sure you are running as this identity.

To allow a non-default login, open up the the pg_hba.conf (host based auth) config file.

$ sudo -u postgres vim pg_hba.conf

Basics

Connectivity

Starting off with ident authentication, by default on installation a new OS user and PG role called postgres is created. Running the psql client as this user is one way to gain initial access, i.e. yb sudo -i -u postgres or by running psql under the user sudo -u postgres psql.

postgres=# SELECT version();
                                                         version
----------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.1 (Ubuntu 11.1-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, 64-bit
(1 row)

Enabling remote access, is generally a first desirable goal.

  1. Open postgresql.conf (from above), and uncomment listen_addresses = 'localhost', change localhost to * (bind to all addresses)

  2. Open pg_hba.conf, and add host all all 192.168.1.0/24 trust (substituting the network and CIDR for whats appropriate)

  3. Bounce the PostgreSQL server process sudo systemctl restart postgresql

  4. From a remote (client) host verify the port is open with a port scan nmap -sT 192.168.1.238 (thats my psql server).

    $ nmap -sT 192.168.1.238

    Nmap scan report for 192.168.1.238 PORT STATE SERVICE 22/tcp open ssh 5432/tcp open postgresql

Client Utilities

PostgreSQL ships with several handy client applications. You can see the symlinks to them in /usr/bin by running ls -l /usr/bin | grep postgres.

  • clusterdb cluster a database
  • createdb create a new database
  • createlang install a procedural language
  • createuser define a new user account
  • dropdb remove a database
  • droplang remove a procedural language
  • dropuser remove a user account
  • pg_basebackup take a base backup of a cluster
  • pgbench run a benchmark test
  • pg_dump extract a database into a script file or other archive file
  • pg_dumpall extract a database cluster into a script file
  • pg_isready check the connection status of a server
  • pg_receivewal stream write-ahead logs from a server
  • pg_receivexlog streams transaction logs from a cluster
  • pg_recvlogical control logical decoding streams
  • pg_restore restore a database from an archive file created by pg_dump
  • psql interactive terminal
  • reindexdb reindex a database
  • vacuumdb garbage-collect and analyze a database

For example to create a user and database, could run:

$ sudo -u postgres createuser --login ben

And:

$ sudo -u postgres createdb --owner=ben foodb

To make the ben role a superuser could have passed the -s switch to createuser, to promote the existing low powered role, can leverage the ALTER USER SQL command. First can review existing role permissions with a quick \du:

$ sudo -u postgres psql
postgres=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of
-----------+------------------------------------------------------------+-----------
 ben       |                                                            | {}
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

Then the SQL:

postgres=# ALTER USER ben WITH SUPERUSER CREATEDB CREATEROLE REPLICATION LOGIN;
ALTER ROLE

Verification:

postgres=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of
-----------+------------------------------------------------------------+-----------
 ben       | Superuser, Create role, Create DB, Replication             | {}
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

Several privileges are available and well documented:

  • SUPERUSER can override any access controls within the database. Dangerous.
  • CREATEDB can spin up new dbs
  • CREATEROLE can create/drop roles (users)
  • INHERIT automatically applies powers of roles that the role is a member of (without explicit SET ROLE)
  • LOGIN can be used for initial session authorisation (i.e. a user)
  • REPLICATION needed for connectivity to servers in replication mode.
  • BYPASSRLS bypass row-level security (RLS) policy.

Each of the below has an explicit deny version of the privilege, prefixed with NO, e.g. NOSUPERUSER.

psql Primer

  • \?: show all psql commands
  • \h show SQL syntax help
  • \c dbname [username]: Connect to database, with an optional username (or \connect)

Display commands (append + for extra details):

  • \l: List all database (or \list)
  • \d: Display all tables, indexes, views, and sequences
  • \dt: Display all tables
  • \di: Display all indexes
  • \dv: Display all views
  • \ds: Display all sequences
  • \dT: Display all types
  • \dS: Display all system tables
  • \du: Display all users
  • \x auto|on|off: Toggle|On|Off expanded output mode

To change database use \c:

\c enceladus
enceladus=#

To get specific SQL statement help, try handy psql command \h:

enceladus=# \h CREATE TABLE

Command:     CREATE TABLE
Description: define a new table
Syntax:
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name ( [
  { column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
    | table_constraint
....

Go for it:

enceladus=# drop table if exists master_plan;
enceladus=# create table master_plan(
enceladus(#   id serial primary key,
enceladus(#   the_date date,
enceladus(#   title varchar(100),
enceladus(#   description text
enceladus(# );
CREATE TABLE

SQL script files

Great for reusable SQL, and putting into Git.

psql foodb -h 10.1.2.3 < build.sql

Or:

psql foodb -h 10.1.2.3 -f build.sql

Importing Data (ETL)

Generally ETL software is bloat. KISS with:

  • Idempotent shell scripts and make files. Idempotency is important, so that scripts can be reliably run numerous times and result in the same outcome. All data should be typed as TEXT, typing and cleanup can be dealt with later on.
  • If needed, elevate to Python using Pandas or PETL
  • If needed, elevate by considering a dedicating piece of middleware such as Kafka or Logstash

Digging through the help \h COPY found COPY FROM will be useful for extracting the CSV on disk:

copy data between a file and a table

For example:

COPY master_plan
FROM '<put_path_here>/master_plan.csv'
WITH DELIMITER ',' HEADER CSV;

Note, the SQL based approach is server based, i.e. the csv path will need to be accessible by the server itself.

For a client side approach can leverage the \copy commands, like so:

\copy master_plan FROM '/home/ben/code/psql/cassini_data/curious_data/data/master_plan.csv' WITH DELIMITER ',' HEADER CSV;