Firestore is a very popular serverless document database on Google Cloud Platform (GCP). While Firestore is great at storing, updating, and querying documents of data, it is not as efficient as tools like BigQuery for analytic reporting use cases. Fortunately, there is native integration between Firestore and BigQuery where all inserts/updates/deletes in Firestore can be automatically replicated to BigQuery — and that is what will be demonstrated below.
This blog will show:
- Creating a Firestore database in GCP (not through Firebase)
- Inserting a simple document in Firestore via the console
- Adding Firebase to the GCP project
- Enable the Firebase extension for Firestore to BigQuery replication
- Using an extension script to replicate the existing Firestore database in BigQuery
- Creating a more relational data view for reporting tools on top of the default replicated Firestore view in BigQuery
- Inserting a new document into Firestore to see it automatically replicated to BigQuery
First, we will create a Firestore database in GCP by going to the Firestore console:
Then, create a Native Mode Firestore instance:
Once the database is created, we will click “Start Collection” to add an initial dataset to Firestore:
For the new collection, we need to specify a Collection ID, Document ID (for our first document), and some key value pairs for the first document, then click Save.
Congratulations! You have successfully created a Firestore instance in your GCP project, added a collection to it, and populated the collection with your first document:
Now that we have a collection in Firestore (and yours might be much larger and much older), we probably want to do some analytic reporting against the Firestore dataset that BigQuery would be GREAT for, so we want to replicate the Firestore database to BigQuery.
The replication from Firestore to BigQuery is done via a Firebase extension. Firebase is a suite of developer tools to make web and mobile application development more simple and efficient. In order to install that extension, we will have to configure Firebase for our GCP project.
If you are not familiar with Firebase, you may be worried about the added cost of installing Firebase. As with most GCP technologies, it is only pay for use, so if you only use the Firestore to BigQuery extension, then that is all you will pay for. The estimated cost for that extension is typically around $0.01/month: https://firebase.google.com/products/extensions/firestore-bigquery-export
The extension itself uses Firebase triggers and Cloud Functions to copy the changes from Firestore to BigQuery, so there may be some Cloud Function and BigQuery costs as well, but no other Firebase costs.
To enable Firebase in our project, go to https://console.firebase.google.com:
Click “Add project”, then find your existing GCP project and click “Continue:
Confirm the Pay as You Go Firebase plan:
Click Continue on the second step:
Disable Google Analytics in the third step if the Firestore to BQ extension is the only part of Firebase you will be using, and click “Add Firebase”.
Once you see the message that Firebase was added, click Continue to get to the Firebase console.
In the Firebase console that just opened from the last step, you will select “Extensions” on the left and then click “Install” by the “Export Collections to BigQuery” extension as shown below.
Click “Next” in the first install page that comes up.
Click “Next” through the first three install screens. But on the fourth screen (shown below), you’ll need to specify where you want your replication Cloud Functions and BigQuery dataset location, what Firestore collection you want replicated, and what BigQuery dataset/table you want the replication to end up in.
Note: This replication extension only replicates one Firestore collection. However, you can install the extension multiple times for multiple Firestore collections (or even sub-collections) if you have multiple Firestore collections that you want to have replicated.
Once installed, it should look something like this. At this point, everything is installed — your collection has triggers for inserts/updates/deletes, a Cloud Function gets notified by the trigger, and the Cloud Function will update the BigQuery table with the Firestore changes.
Note: The BigQuery dataset and table that you specified above will not be created until the first document update happens in Firestore, so don’t be surprised it you don’t see the BigQuery dataset/table yet.
If your Firestore database is empty, you can skip this step. If you have existing data in your Firestore collection that you want moved over to the BQ table, then that is what we will do next.
There is script in the extension we just installed that will automatically replicate the existing Firestore data in our collection to BQ. The script is documented here (look at this linked page for more details than we will cover in this example):
Running the script is as simple as running these two commands in a Cloud Shell terminal:
gcloud auth application-default loginnpx @firebaseextensions/fs-bq-import-collection
Below are the results of executing these two commands in Cloud Shell. The commands will prompt you for additional information, including logging in to get a code. But, it is fairly simple to follow the command instructions:
Now if we look in BQ, we should see the dataset and table populated with our initial Firestore collection:
Note that for each collection that we installed the extension for, there will be a “raw_changelog” table that tracks all the changes to the Firestore collection, and there will be a “raw_latest” view that does the hard work of filtering out old changes, so you only see the latest data that is in Firestore.
If you want to get the data in a relational format instead of the JSON format, you can add another view on top of the raw_latest view that extracts the data out of the data documents into columns. For example:
Here is an example for creating a view that uses BQ JSON functions to extract information from the JSON documents in a more relational format that would work well with reporting tools:
CREATE VIEW my_firestore_dataset.reporting_view AS
JSON_VALUE(data, '$.name') AS name,
UPPER(JSON_VALUE(data, '$.name')) AS name_upper,
This JSON document is very simple and therefore the view is simple, but you can use this same strategy to make these replicated views of more complex JSON document data look like normal everyday relational views — that are near real time updated from Firestore via this replication extension.
Next, we will test inserting a new document into our Firestore database to make sure that it is replicated to BigQuery.
Note: The insert into Firebase does not need to be through a Firebase API in order to be replicated. The insert can come from an application anywhere (AppEngine, GKE, GCE, or even the Firestore console that we will use below).
Back in our Firestore console, will will add a new document:
Now we have two documents in Firestore:
Let’s go back to BigQuery and see if we can see this “Second Person” in our custom view:
If we look at the raw_changlog table in BQ, we can see one record came across in the import and the second record came across from a “CREATE” operation:
Feel free to test other Firestore collection changes as well — updates, deletes, etc. to see how they come across to the raw_changlog table but only the latest rows are displayed in the raw_latest view (and any views you create on top of that view).
The Firestore to BigQuery extension is a powerful tool to enable applications to use the right tool for the right job. Applications can use Firestore for their web applications, with all the efficiency and flexibility of a NoSQL document store while also using BigQuery with all the efficiency and flexibility of a SQL based reporting database.
For more details about the extension demoed in this blog, please refer to:
There is more to Firebase than just “extensions”. Though you don’t have to use anything else, it may be worth looking through some of the other Firebase capabilities to see if any of them would add value to your application.