Abstract
TIL: it is possible to make SQLite data insertions 3 times faster π in a "local development" environment - where data integrity is not a crucial criteria.
UPDATE: It turns out that we can even reach a "30 times faster" gain with additional tweaks π€― - see the update at the end of the article.
Inserting data in a SQLite database can be quite slow
A Quick "TIL" this time... π
Contrary to my "Postgres" habits, for one of my side projects I'm using SQLite quite heavily.
Everything works fine, except that... Data insertions are very slow π
I suspected it could be caused by the extra work SQLite has to do to maintain data integrity, and started to check if it was possible to tune the level of strictness we want for this.
SQLite has a "journal mode", and it can be customised
That's how I found about the journal mode of SQLite:
We can see on this documentation that it can be set to DELETE
, TRUNCATE
, PERSIST
, MEMORY
, WAL
or OFF
.
π Right, let's try to change that! At the beginning of the Python function that processes my big batch of data to insert, I've added a SQL query that - given that I understand that doc correctly - should make my data insertions faster:
Let's launch my data insertion script with this update! :fingers_crossed:
π Result: nothing has changed, the insertion of my β2.000 rows still takes 50 seconds π
Ah, but maybe I need to send this query when the database connection is initiated, rather than on the fly when I need it?
Note also that the journal_mode cannot be changed while a transaction is active.
The SQLite documentation
Setting the "journal mode" when the database connection is initialised
As I'm using Django for this side project, I had to find a way to send a SQL query as soon as possible to the database, right after the connection is initialised.
According to this (old) ticket on the Django bug tracker, which refers to that Stack Overflow thread, a way to do this is to plug a custom function to the signal that Django sends when the connection is initialised, and send the query there.
π Alright, let's try this!
I want to lower the data integrity work only on my "local dev" environment, so I'll have to add this code
to my project.settings.development
module.
(explanations about this module in this other post)
# file: src/project/settings/development.py
# My existing settings...
...
# ...to which I'm adding the following:
# Setting SQLite journal mode to 'memory' - much faster writes,
# at the expense of database safety and integrity.
# @link https://www.sqlite.org/pragma.html#pragma_journal_mode
# @link https://code.djangoproject.com/ticket/24018#comment:4
from django.db.backends.signals import connection_created
def _disable_sqlite_journal(sender, connection, **kwargs):
import logging
if connection.vendor == "sqlite":
logging.getLogger("apps").warning("Setting SQLite journal mode to 'memory'")
cursor = connection.cursor()
cursor.execute("PRAGMA journal_mode = memory;")
connection_created.connect(_disable_sqlite_journal)
Let's launch my data insertion script again! :fingers_crossed:(x2)
π Result: the insertion of my β2.000 rows now takes 15 seconds, instead of 50! βοΈ
Closing notes
The trade-off is clear:
The MEMORY journaling mode stores the rollback journal in volatile RAM. This saves disk I/O but at the expense of database safety and integrity.
If the application using SQLite crashes in the middle of a transaction when the MEMORY journaling mode is set, then the database file will very likely go corrupt.
The SQLite documentation
That's why in my case I want to customise this journal mode only on my local environment, and won't do it in production. But being able to opt in for 3 times faster data insertions is still a pretty good discovery, as being able to iterate quickly is crucial when working on such a local environment! π
UPDATE
Following the publication of this article, the amazing Albin gave me a pointer to this URL:
If I activate all these settings, the original 50 seconds now become... 1.7 seconds! β‘ That's pretty much a "30 times faster" gain! π€―
from django.db.backends.signals import connection_created
_UNLEASH_SQLITE_QUERIES = [
# @link https://avi.im/blag/2021/fast-sqlite-inserts/#sqlite-optimisations
"PRAGMA journal_mode = memory",
"PRAGMA synchronous = 0",
"PRAGMA cache_size = 1000000",
"PRAGMA locking_mode = EXCLUSIVE",
"PRAGMA temp_store = MEMORY",
]
def _disable_sqlite_journal(sender, connection, **kwargs):
import logging
if connection.vendor == "sqlite":
logging.getLogger("apps").warning("Setting SQLite journal mode to 'memory', and various other settings")
cursor = connection.cursor()
for sql in _UNLEASH_SQLITE_QUERIES:
cursor.execute(sql)
connection_created.connect(_disable_sqlite_journal)