How to prevent table lock when add/drop index in Postgres?

How to prevent table lock when add/drop index in Postgres?

The CONCURRENTLY option in PostgreSQL is used when creating or dropping indexes to avoid locking the entire table during the process. It’s especially useful in production environments where you want to ensure minimal downtime or performance impact while working with large tables.

Here’s a more detailed explanation of CONCURRENTLY:

How Indexes Work Without CONCURRENTLY

By default, when you create or drop an index on a PostgreSQL table, the operation locks the entire table. This means:

  • Writes (Inserts/Updates/Deletes) to the table are blocked.
  • Reads (Select queries): can read (postgres 12+)

This is problematic in production systems where database activity is continuous, and blocking writes or reads can result in downtime or degraded performance.

How CONCURRENTLY Works

The CONCURRENTLY option allows PostgreSQL to create or drop an index without locking the table for writing. This means your application can continue operating normally while the index operation proceeds.

Key Points about CONCURRENTLY:

  1. Index Creation without Table Locking:
    When using CONCURRENTLY to create an index, PostgreSQL performs the index creation in multiple steps:
  • It takes a snapshot of the table’s current state.
  • It begins building the index in the background without interfering with ongoing read or write operations.
  • Once the index is built, it synchronizes the index with any new rows that were added during the process.
  • Of course, the extra CPU and I/O load imposed by the index creation might slow other operations.
  1. Non-Blocking Index Drops:
    Similarly, when dropping an index with CONCURRENTLY, PostgreSQL will not block the table. Instead, it allows concurrent operations to continue while the index is being removed.
  2. Longer Duration:
    Creating an index CONCURRENTLY is slower than the normal method because PostgreSQL needs to keep track of ongoing changes (inserts/updates) and synchronize them with the index as it’s being built. However, the benefit is that the table remains available for both reads and writes.
  3. Requires No Transaction:
    Operations with CONCURRENTLY cannot be run inside a transaction block because it requires multiple stages, each committed separately. Therefore, in Rails migrations, you need to use disable_ddl_transaction! to disable the default transaction behavior.
  4. Increased System Load:
    Since CONCURRENTLY operations take longer and occur in the background, they may increase overall system load. The database will be maintaining the new index and normal operations at the same time.
  5. Unique Indexes:
    You can also create unique indexes CONCURRENTLY, but with a caveat. The uniqueness is enforced only after the index has been fully built. Therefore, any violations of the uniqueness constraint during the index creation process will cause the operation to fail.

Example of Creating an Index Concurrently:

CREATE INDEX CONCURRENTLY index_name ON table_name (column_name);

In Rails, this looks like:

class AddIndexToUsersEmail < ActiveRecord::Migration[6.0]
  disable_ddl_transaction!

  def change
    add_index :users, :email, algorithm: :concurrently
  end
end

Example of Dropping an Index Concurrently:

DROP INDEX CONCURRENTLY index_name;

In Rails:

class RemoveIndexFromUsers < ActiveRecord::Migration[6.0]
  disable_ddl_transaction!

  def change
    remove_index :users, :email, algorithm: :concurrently
  end
end

Important Considerations:

  1. Disable DDL Transaction: As mentioned earlier, you must disable transactions (disable_ddl_transaction!) because CONCURRENTLY cannot be used inside a transaction.
  2. Slower Performance: The process is slower than the usual index creation. Depending on the size of the table, it can take a considerable amount of time to complete.
  3. Error Handling: If the migration fails, it can leave an incomplete index behind, which needs to be handled manually. For instance, you may need to manually clean up an index after a failed concurrent operation.
  4. Database Version: The CONCURRENTLY option is supported in PostgreSQL 8.2 and later versions.
  5. Unique Indexes: Creating unique indexes CONCURRENTLY is possible, but it only checks for uniqueness after the index is fully built. If there are existing non-unique records in the table, the migration will fail.

Benefits of Using CONCURRENTLY:

  1. No Table Locking: The biggest advantage is that you can perform index creation or removal without locking the table for writes or reads, ensuring your application remains functional during the migration.
  2. Minimized Downtime: For high-traffic applications where downtime must be avoided at all costs, CONCURRENTLY helps you deploy large-scale database changes with minimal impact on your users.

Limitations of Using CONCURRENTLY:

  1. Only PostgreSQL: The CONCURRENTLY option is specific to PostgreSQL and is not available in other databases like MySQL or SQLite.
  2. Not in Transactions: You can’t run CONCURRENTLY inside a transaction, which means other parts of your migration cannot be automatically rolled back if something goes wrong.
  3. Slower: The trade-off for avoiding locks is that the index creation or removal takes longer, so you need to account for the extra time during deployment.

When to Use CONCURRENTLY:

  • Large tables: When working with large datasets that cannot afford to be locked.
  • Production environments: For applications that need to stay live and responsive during schema changes.
  • Avoiding downtime: For index creation and deletion in mission-critical systems where uptime is crucial.

When Not to Use CONCURRENTLY:

  • Small tables: For small tables, the overhead of using CONCURRENTLY may not be necessary, and a normal index creation will be faster.
  • When immediate uniqueness enforcement is critical: If you need an index to enforce uniqueness immediately, it’s better to avoid CONCURRENTLY since uniqueness is only guaranteed after the operation is complete.

In conclusion, the CONCURRENTLY option is an essential tool in PostgreSQL for performing index-related operations in production without locking the table. It helps keep applications running smoothly while performing migrations that involve large datasets, especially when downtime must be minimized.