Learnings from migrating a fulltext index into a Meilisearch instance with Laravel Scout —

Meilisearch index updates are single threaded

Updates on a Meilisearch index can only happen in one thread at a time. This seems obvious in hindsight, but going into this I had misunderstood two parts of Meilisearch’s functionality —

Meilisearch has a [MEILI_MAX_INDEXING_THREADS environment variable](https://docs.meilisearch.com/learn/configuration/instance_options.html#max-indexing-memory)
- MEILI_MAX_INDEXING_THREADS does indicate a maximum limit on how many threads can be running an indexing task at a time, but each task must be on a different index. The number of threads that can update the same index at a time is one.
Meilisearch will sometimes have multiple tasks for the same index in “processing” state at the same time
- This is due to Meilisearch’s autobatching — if it receives multiple indexing updates in a small window of time, it will automatically add them to the same task, which will reflect as multiple tasks processing simultaneously, but update operations on the same index will only run on a single thread.

Task cancellation is not available until Meilisearch 0.30

Before I implemented the batching job, I originally spun this up on 0.29 and ended up with a ~20 hour backlog of single document tasks which I just nuked and remade lol 😭

Filtering with Laravel Scout

Laravel Scout doesn’t support more than simple where statements, but you are able to reach into Scout’s search implementation and override it. In my case I wanted to be able to filter by GT/LT which Scout doesn’t support, and I had some complex OR requirements which weren’t satisfied by whereIn.

Step 1: Specify which indices must be searchable/filterable

This can be run in tinker, or you can also do this by hitting the Meilisearch API directly.

$dummy = new Post;
$engine = $dummy->searchableUsing();
$index = $engine->index($dummy->searchableAs());
$index->updateSearchableAttributes(['title', 'text', 'blurb']); // things to keyword search by
$index->updateFilterableAttributes(['author_id', 'status', 'published_at']); // things to filter by
$index->updateSortableAttributes(['published_at']); // things to sort by

Step 2: Override the search functionality

Meilisearch documentation has a good section on filtering; below is a generic example of how to do this with Scout —


$posts = Post::search(
    'foo', // keyword search
    function (Indexes $meilisearch, $query, array $options) {
        // filterable items
        $options['filter'] = '(status = 1 OR (status = 0 AND author_id = 1)) AND published_at > 1669820400 AND published_at < 1669906800';

        // sortable items
        $options['sort'] = ['published_at:desc'];

        return $meilisearch->rawSearch($query, $options);
    }
);

Numbers

I wasn’t able to find much information before getting into this regarding expectations for Meilisearch performance and requirements, so I’m just going to dump my experiences with it.

Server:
- E3-1270v2 / 32GB RAM / 1TB SSD
Dataset:
- 9,498,766 documents
- Final index size of 234GB
- 3 searchable attributes and 17 filterable attributes

The larger the index got, the slower indexing got, but searching never slowed down. With the full index a search takes 200ms~600ms, regardless of how many filters I add.

It took around 32 hours of continuous processing to index, in batches of 500k documents.

I started out indexing in batches of 5k which sounded reasonable to me at the time; each batch took about 30~60 seconds to run. Indexing speed degraded the more documents were populated in; around the 1.5 million documents mark these 5k batches were taking ~15 minutes each which meant that it was unlikely my index was going to ever realistically finish.

I raised my batch volume to the max round number that I was comfortable fitting into RAM (500k documents per batch), which was much more efficient. The time taken on these batches for reference:

| 2.0-2.5 million | 52 minutes | Database size at this point is ~70GB | | --- | --- | --- | | 2.5-3.0 million | 76 minutes | | | 3.0-3.5 million | 66 minutes | | | 3.5-4.0 million | 92 minutes | | | 4.0-4.5 million | 108 minutes | | | 4.5-5.0 million | 101 minutes | | | 5.0-5.5 million | 146 minutes | | | 5.5-6.0 million | 156 minutes | Database size at this point is ~160GB I stopped after this batch to remove all filterableAttributes and see if that’d help indexing speed. Removing filterableAttributes took ~3 hours. | | 6.0-6.5 million | 117 minutes | | | 6.5-7.0 million | 127 minutes | | | 7.0-7.5 million | 141 minutes | | | 7.5-8.0 million | 119 minutes | | | 8.0-8.5 million | 213 minutes | | | 8.5-9.0 million | 157 minutes | | | 9.0-9.5 million | 180 minutes | Database size at this point is 234GB. Re-adding filterableAttributes took ~5 hours. |

If I were to do this again:

I’d remove all searchableAttributes and filterableAttributes before starting the index, then add them back on after the index is complete.
If I were on a cloud service I’d be tempted to spin up a better specced server just to run the batching job and make the batches as large as I can. Maybe with a 256GB RAM server I could do everything in one batch… lol

With the current index, even a 1k batch of documents takes ~30 minutes to process, so making batches as large as possible seems necessary. For day-to-day updates, I have a re-indexing job that batches every 3 hours. This usually needs to update around 10~15k documents and takes 60~70 minutes to run. I could increase server resources for better indexing performance, but I’m stingy and a 3 hour delay on updates works fine for my use case.

The objective of this was to clear out space from a MySQL database. The initial database size was 650GB, but a lot of it was taken up by a single fulltext index which did not perform very well. After I got everything swapped over and dropped the index, the new database size is now 350GB. This means a ~300GB MySQL fulltext index translated into a 234GB Meilisearch index — I assume the main reason was that the Meilisearch index only contains searchable documents whereas the MySQL index had to include every row in the table (including hidden/unlisted and soft-deleted entries). Keyword search is now much faster and doesn’t drag the database down with it.

Overall satisfied with the results. I was planning on just using this as a test since I assumed a 32GB RAM server wouldn’t be able to handle an index this size (everything that I read recommended having enough RAM to hold the entire index in memory), but it looks like everything’s working fine barring the rather slow indexing.