Automated Firestore Replication to BigQuery (2023)

Firestore is a very popular serverless document database on Google Cloud Platform (GCP). While Firestore is great at storing, updating, and querying documents of data, it is not as efficient as tools like BigQuery for analytic reporting use cases. Fortunately, there is native integration between Firestore and BigQuery where all inserts/updates/deletes in Firestore can be automatically replicated to BigQuery — and that is what will be demonstrated below.

This blog will show:

  1. Creating a Firestore database in GCP (not through Firebase)
  2. Inserting a simple document in Firestore via the console
  3. Adding Firebase to the GCP project
  4. Enable the Firebase extension for Firestore to BigQuery replication
  5. Using an extension script to replicate the existing Firestore database in BigQuery
  6. Creating a more relational data view for reporting tools on top of the default replicated Firestore view in BigQuery
  7. Inserting a new document into Firestore to see it automatically replicated to BigQuery

First, we will create a Firestore database in GCP by going to the Firestore console:

Automated Firestore Replication to BigQuery (1)

Then, create a Native Mode Firestore instance:

Automated Firestore Replication to BigQuery (2)
Automated Firestore Replication to BigQuery (3)

Once the database is created, we will click “Start Collection” to add an initial dataset to Firestore:

Automated Firestore Replication to BigQuery (4)

For the new collection, we need to specify a Collection ID, Document ID (for our first document), and some key value pairs for the first document, then click Save.

Automated Firestore Replication to BigQuery (5)

Congratulations! You have successfully created a Firestore instance in your GCP project, added a collection to it, and populated the collection with your first document:

Automated Firestore Replication to BigQuery (6)
(Video) Export Firestore collections to BigQuery and use Data Studio to analyze and aggregate data

Now that we have a collection in Firestore (and yours might be much larger and much older), we probably want to do some analytic reporting against the Firestore dataset that BigQuery would be GREAT for, so we want to replicate the Firestore database to BigQuery.

The replication from Firestore to BigQuery is done via a Firebase extension. Firebase is a suite of developer tools to make web and mobile application development more simple and efficient. In order to install that extension, we will have to configure Firebase for our GCP project.

If you are not familiar with Firebase, you may be worried about the added cost of installing Firebase. As with most GCP technologies, it is only pay for use, so if you only use the Firestore to BigQuery extension, then that is all you will pay for. The estimated cost for that extension is typically around $0.01/month: https://firebase.google.com/products/extensions/firestore-bigquery-export

The extension itself uses Firebase triggers and Cloud Functions to copy the changes from Firestore to BigQuery, so there may be some Cloud Function and BigQuery costs as well, but no other Firebase costs.

To enable Firebase in our project, go to https://console.firebase.google.com:

Automated Firestore Replication to BigQuery (7)

Click “Add project”, then find your existing GCP project and click “Continue:

Automated Firestore Replication to BigQuery (8)

Confirm the Pay as You Go Firebase plan:

Automated Firestore Replication to BigQuery (9)

Click Continue on the second step:

Automated Firestore Replication to BigQuery (10)

Disable Google Analytics in the third step if the Firestore to BQ extension is the only part of Firebase you will be using, and click “Add Firebase”.

(Video) Query SQL no Firebase Firestore com BigQuery

Automated Firestore Replication to BigQuery (11)

Once you see the message that Firebase was added, click Continue to get to the Firebase console.

In the Firebase console that just opened from the last step, you will select “Extensions” on the left and then click “Install” by the “Export Collections to BigQuery” extension as shown below.

Automated Firestore Replication to BigQuery (12)

Click “Next” in the first install page that comes up.

Automated Firestore Replication to BigQuery (13)

Click “Next” through the first three install screens. But on the fourth screen (shown below), you’ll need to specify where you want your replication Cloud Functions and BigQuery dataset location, what Firestore collection you want replicated, and what BigQuery dataset/table you want the replication to end up in.

Note: This replication extension only replicates one Firestore collection. However, you can install the extension multiple times for multiple Firestore collections (or even sub-collections) if you have multiple Firestore collections that you want to have replicated.

Automated Firestore Replication to BigQuery (14)

Once installed, it should look something like this. At this point, everything is installed — your collection has triggers for inserts/updates/deletes, a Cloud Function gets notified by the trigger, and the Cloud Function will update the BigQuery table with the Firestore changes.

Automated Firestore Replication to BigQuery (15)

Note: The BigQuery dataset and table that you specified above will not be created until the first document update happens in Firestore, so don’t be surprised it you don’t see the BigQuery dataset/table yet.

If your Firestore database is empty, you can skip this step. If you have existing data in your Firestore collection that you want moved over to the BQ table, then that is what we will do next.

(Video) Firestore in a minute

There is script in the extension we just installed that will automatically replicate the existing Firestore data in our collection to BQ. The script is documented here (look at this linked page for more details than we will cover in this example):

https://github.com/firebase/extensions/blob/master/firestore-bigquery-export/guides/IMPORT_EXISTING_DOCUMENTS.md

Running the script is as simple as running these two commands in a Cloud Shell terminal:

gcloud auth application-default loginnpx @firebaseextensions/fs-bq-import-collection 

Below are the results of executing these two commands in Cloud Shell. The commands will prompt you for additional information, including logging in to get a code. But, it is fairly simple to follow the command instructions:

Automated Firestore Replication to BigQuery (16)
Automated Firestore Replication to BigQuery (17)

Now if we look in BQ, we should see the dataset and table populated with our initial Firestore collection:

Automated Firestore Replication to BigQuery (18)

Note that for each collection that we installed the extension for, there will be a “raw_changelog” table that tracks all the changes to the Firestore collection, and there will be a “raw_latest” view that does the hard work of filtering out old changes, so you only see the latest data that is in Firestore.

If you want to get the data in a relational format instead of the JSON format, you can add another view on top of the raw_latest view that extracts the data out of the data documents into columns. For example:

Here is an example for creating a view that uses BQ JSON functions to extract information from the JSON documents in a more relational format that would work well with reporting tools:

CREATE VIEW my_firestore_dataset.reporting_view AS
SELECT
JSON_VALUE(data, '$.name') AS name,
UPPER(JSON_VALUE(data, '$.name')) AS name_upper,
FROM `apestel7.my_firestore_dataset.firestore_documents_table_raw_latest`
Automated Firestore Replication to BigQuery (19)

This JSON document is very simple and therefore the view is simple, but you can use this same strategy to make these replicated views of more complex JSON document data look like normal everyday relational views — that are near real time updated from Firestore via this replication extension.

Next, we will test inserting a new document into our Firestore database to make sure that it is replicated to BigQuery.

(Video) How to store data on Google Cloud

Note: The insert into Firebase does not need to be through a Firebase API in order to be replicated. The insert can come from an application anywhere (AppEngine, GKE, GCE, or even the Firestore console that we will use below).

Back in our Firestore console, will will add a new document:

Automated Firestore Replication to BigQuery (20)
Automated Firestore Replication to BigQuery (21)

Now we have two documents in Firestore:

Automated Firestore Replication to BigQuery (22)

Let’s go back to BigQuery and see if we can see this “Second Person” in our custom view:

Automated Firestore Replication to BigQuery (23)

If we look at the raw_changlog table in BQ, we can see one record came across in the import and the second record came across from a “CREATE” operation:

Automated Firestore Replication to BigQuery (24)

Feel free to test other Firestore collection changes as well — updates, deletes, etc. to see how they come across to the raw_changlog table but only the latest rows are displayed in the raw_latest view (and any views you create on top of that view).

The Firestore to BigQuery extension is a powerful tool to enable applications to use the right tool for the right job. Applications can use Firestore for their web applications, with all the efficiency and flexibility of a NoSQL document store while also using BigQuery with all the efficiency and flexibility of a SQL based reporting database.

For more details about the extension demoed in this blog, please refer to:

(Video) Automating your BigQuery Data Pipeline with Cloud Dataprep GSP826

https://firebase.google.com/products/extensions/firestore-bigquery-export

There is more to Firebase than just “extensions”. Though you don’t have to use anything else, it may be worth looking through some of the other Firebase capabilities to see if any of them would add value to your application.

Videos

1. Spanner, Firestore, and Bigtable: Connecting the Dots (Cloud Next '19)
(Google Cloud Tech)
2. Get started with Firestore
(Google Cloud Tech)
3. Lunch & Learn BigQuery & Firebase from other Google Cloud customers
(Daniel Zivkovic)
4. 5. Streaming into Google BigQuery works!
(Michael Houston)
5. Connecting Looker Studio to Views in BigQuery
(Better Than Data)
6. Migrating from Datastore to Firestore
(Google Cloud Tech)
Top Articles
Latest Posts
Article information

Author: Carlyn Walter

Last Updated: 22/05/2023

Views: 5987

Rating: 5 / 5 (70 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Carlyn Walter

Birthday: 1996-01-03

Address: Suite 452 40815 Denyse Extensions, Sengermouth, OR 42374

Phone: +8501809515404

Job: Manufacturing Technician

Hobby: Table tennis, Archery, Vacation, Metal detecting, Yo-yoing, Crocheting, Creative writing

Introduction: My name is Carlyn Walter, I am a lively, glamorous, healthy, clean, powerful, calm, combative person who loves writing and wants to share my knowledge and understanding with you.