Implementing materialized views in Rails
When you have a complex query which takes a lot of time to execute and you want to render the data in a page which will be accessed frequently materialized view is your best option. You will get an excellent performance benefit with it as the data is already fetched in the cache table and ready to use like an actual db table. But it comes with its own challenges.
Let’s begin with creating one and I will walk you through the challenges I faced in my project.
Creating a materialized view:
I will use Scenic gem and create a view for users table.
$ rails generate scenic:view active_users
# output:
# create db/views/active_users_v01.sql
# create db/migrate/[TIMESTAMP]_create_active_users.rb
Assume I added a relevant query(the details of the query doesn’t matter much here) into active_users_v01.sql.
The migration file looks as below.
def change
create_view :active_users, materialized: true
end
Run rake db:migrate
and you have created your first materialized view 🎊. Now you can interact with it like any other active record model.
class ActiveUser < ApplicationRecord
self.primary_key = 'id'
belongs_to :user
# This will be used by rails to avoid creating new records as this is a view.
def readonly?
true
end
end
# ActiveUser.count etc..
Refreshing the view:
There are 2 ways to refresh the view.
- Through Active record callbacks.
pros: You can control in the code when to call the refresh.
Scenic gem has a class method to do this. We just need to call this from all the events the affect this view.
eg:
class User < ApplicationRecord
after_commit :refresh_active_users
def refresh_active_users
ActiveUser.refresh
end
end
class ActiveUser < ApplicationRecord
self.primary_key = 'id'
belongs_to :user
# This will be used by rails to avoid creating new records as this is a view.
def readonly?
true
end
def self.refresh
Scenic.database.
# note concurrently: true
refresh_materialized_view(table_name, concurrently: true, cascade: false)
end
end
*Important*: While the view is updating concurrent selects on the materialized view will be locked. In my case we create 12k+ user records from a spreadsheet in a background job and during this time all the front end requests timed out as the view is locked while updating.
That’s where the concurrently
option shines. It will refresh the data in the background without locking the view. You can read more here.
Note: If your data is changing very rapidly it might be a overkill to move to materialized views as it will have too many refreshes queued up.
concurrently true option only works if it satisfy the below two conditions.
- There is at least on unique
index
on one of the columns. - View should not have a
where
condition.
which will make the migration
look as follows
def change
create_view :active_users, materialized: true
add_index :active_users, :id, unique: true
end
2. Through database function and trigger
pros: Quick as it is executed in db level and less code. We can also make it per STATEMENT so that it triggers once for multiple row updates in same transaction.
I used fx gem to create the function
and trigger
.
rails generate fx:function refresh_active_users_view
# output
# create db/functions/refresh_active_users_view_v01.sql
# create db/migrate/20230801021608_create_function_refresh_active_users_view.rb
Creation of a function
looks like this
CREATE OR REPLACE FUNCTION refresh_active_users_view() RETURNS trigger AS $function$
BEGIN
REFRESH MATERIALIZED VIEW CONCURRENTLY active_users;
RETURN NULL;
END;
$function$ LANGUAGE plpgsql;
Creation of a trigger
looks like this
rails generate fx:trigger refresh_active_users_view table_name:users
# output
# create db/triggers/refresh_active_users_view_v01.sql
# create db/migrate/20230801021952_create_trigger_refresh_active_users_view.rb
The trigger file looks as below.
CREATE TRIGGER refresh_active_users_view
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH STATEMENT
EXECUTE PROCEDURE refresh_active_users_view();
That’s those are the two ways to refresh
the view.
Things to consider:
- If your data is going to change very often you might need to find alternate ways to refresh like doing it with a cron job or user action when needed.
- DB size if you are storing large set of data in the view.