Skip to content

Making SQLite much faster in a local dev environment

Abstract

TIL: it is possible to make SQLite data insertions 3 times faster ๐Ÿš€ in a "local development" environment - where data integrity is not a crucial criteria.

UPDATE:
It turns out that we can even reach a "30 times faster" gain with additional tweaks ๐Ÿคฏ - see the update at the end of the article.

Inserting data in a SQLite database can be quite slow

A Quick "TIL" this time... ๐Ÿ™‚

Contrary to my "Postgres" habits, for one of my side projects I'm using SQLite quite heavily.

Everything works fine, except that... Data insertions are very slow ๐ŸŒ

I suspected it could be caused by the extra work SQLite has to do to maintain data integrity, and started to check if it was possible to tune the level of strictness we want for this.

SQLite has a "journal mode", and it can be customised

That's how I found about the journal mode of SQLite:

We can see on this documentation that it can be set to DELETE, TRUNCATE, PERSIST, MEMORY, WAL or OFF.

๐ŸŒŸ Right, let's try to change that!
At the beginning of the Python function that processes my big batch of data to insert, I've added a SQL query that - given that I understand that doc correctly - should make my data insertions faster:

pragma journal_mode = memory;

Let's launch my data insertion script with this update! ๐Ÿคž

๐Ÿ‘‰ Result: nothing has changed, the insertion of my โ‰ˆ2.000 rows still takes 50 seconds ๐Ÿ˜”

Ah, but maybe I need to send this query when the database connection is initiated, rather than on the fly when I need it?

Note also that the journal_mode cannot be changed while a transaction is active.

The SQLite documentation

Setting the "journal mode" when the database connection is initialised

As I'm using Django for this side project, I had to find a way to send a SQL query as soon as possible to the database, right after the connection is initialised.

According to this (old) ticket on the Django bug tracker, which refers to that Stack Overflow thread, a way to do this is to plug a custom function to the signal that Django sends when the connection is initialised, and send the query there.

๐ŸŒŸ Alright, let's try this!
I want to lower the data integrity work only on my "local dev" environment, so I'll have to add this code to my project.settings.development module.
(explanations about this module in this other post)

# file: src/project/settings/development.py

# My existing settings...
...
# ...to which I'm adding the following:

# Setting SQLite journal mode to 'memory' - much faster writes, 
# at the expense of database safety and integrity.
# @link https://www.sqlite.org/pragma.html#pragma_journal_mode
# @link https://code.djangoproject.com/ticket/24018#comment:4

from django.db.backends.signals import connection_created


def _disable_sqlite_journal(sender, connection, **kwargs):
    import logging

    if connection.vendor == "sqlite":
        logging.getLogger("apps").warning("Setting SQLite journal mode to 'memory'")
        cursor = connection.cursor()
        cursor.execute("PRAGMA journal_mode = memory;")


connection_created.connect(_disable_sqlite_journal) 

Let's launch my data insertion script again! ๐Ÿคž(x2)

๐Ÿ‘‰ Result: the insertion of my โ‰ˆ2.000 rows now takes 15 seconds, instead of 50! โœŒ

Closing notes

The trade-off is clear:

The MEMORY journaling mode stores the rollback journal in volatile RAM.
This saves disk I/O but at the expense of database safety and integrity.

If the application using SQLite crashes in the middle of a transaction when the MEMORY journaling mode is set, then the database file will very likely go corrupt.

The SQLite documentation

That's why in my case I want to customise this journal mode only on my local environment, and won't do it in production.
But being able to opt in for 3 times faster data insertions is still a pretty good discovery, as being able to iterate quickly is crucial when working on such a local environment! ๐Ÿ™‚

UPDATE

Following the publication of this article, the amazing Albin gave me a pointer to this URL:

If I activate all these settings, the original 50 seconds now become... 1.7 seconds! โšก
That's pretty much a "30 times faster" gain! ๐Ÿคฏ

from django.db.backends.signals import connection_created

_UNLEASH_SQLITE_QUERIES = [
    # @link https://avi.im/blag/2021/fast-sqlite-inserts/#sqlite-optimisations
    "PRAGMA journal_mode = memory",
    "PRAGMA synchronous = 0",
    "PRAGMA cache_size = 1000000",
    "PRAGMA locking_mode = EXCLUSIVE",
    "PRAGMA temp_store = MEMORY",
]


def _disable_sqlite_journal(sender, connection, **kwargs):
    import logging

    if connection.vendor == "sqlite":
        logging.getLogger("apps").warning("Setting SQLite journal mode to 'memory', and various other settings")
        cursor = connection.cursor()
        for sql in _UNLEASH_SQLITE_QUERIES:
            cursor.execute(sql)


connection_created.connect(_disable_sqlite_journal)