Choosing between Filestore and Persistent disks

I'm looking for insights/tips around factors to consider when designing storage for a workload being migrated to GCP, specifically the choice between file storage (i.e., Filestore) and block storage (i.e., GCE persistent disks).

One factor could be that if the app reads/writes data from shared file systems, using NFS for example, then Filestore might be a good choice. (The app can be migrated with zero to minimal refactoring.)

What other technical factors would you consider -- performance, scaling? I'd appreciate any thoughts from the community.

Thank you!

Solved Solved
3 6 6,553
2 ACCEPTED SOLUTIONS

Hi @kumards some other factors you can consider are management overhead (self managed vs fully managed), capacity, performance throughput, scaling,  cost, support for NFSv3, v4, backups, HA, DR, latency, file locks, encryption (at rest, in transit, in-memory), compliance, IAM, network security. There are few limitations when using PDs with multiple instances such as ability to write from only 2 instances simultaneously. You can also consider some 3P solutions such as NetApp Cloud Volumes or Dell EMC along with Elastifile architecture and multitude of Filestore service tiers available within GCP.

View solution in original post

We now have a "storage advisor" doc that provides an overview of the available storage options in GCP and helps you choose an option that meets your requirements: https://cloud.google.com/architecture/storage-advisor. You might find the following sections of the doc particularly relevant to the discussion in this thread.

Please take a look, and share your feedback in this thread. Thank you!

View solution in original post

6 REPLIES 6

Hi @kumards some other factors you can consider are management overhead (self managed vs fully managed), capacity, performance throughput, scaling,  cost, support for NFSv3, v4, backups, HA, DR, latency, file locks, encryption (at rest, in transit, in-memory), compliance, IAM, network security. There are few limitations when using PDs with multiple instances such as ability to write from only 2 instances simultaneously. You can also consider some 3P solutions such as NetApp Cloud Volumes or Dell EMC along with Elastifile architecture and multitude of Filestore service tiers available within GCP.

Any similar simultaneous write constraints on Filestore just like PD's?

Thank you, @dshah , for taking the time to respond to my question!

Taking a closer look at the parameters you mentioned, for just the Google-provided options (not 3P for now):

  • PDs score over Filestore on a few factors (e.g., encryption using customer-supplied keys, multi-zone data redundancy, lower minimum size and more granular increments).
  • Performance wise,  PDs appear to be better, though the comparison is not straightforward. Also, PD performance scales linearly with size, whereas Filestore performance increases in steps, not linearly.
  • The one area where Filestore clearly scores more is shared storage, in comparison with the 2-VMs limit for shared write access to PDs.

PS: I looked up Elastifile, which you mentioned, and noticed that it's marked as deprecated in marketplace.

Hi kumards/dshah,

 

Just to mention - Elastifle features are now (Or will be) rolled into Filestore, due to Google's purchase of Elastifile.

One of the things that prevented FIlestore use for me was lack of CMEK support, in addition to lack of support for VPC SC.  Another thing to consider with the partner solutions e.g. NetApp CVS is your security requirements?.  Is it OK to use resources in a co-lo?

We now have a "storage advisor" doc that provides an overview of the available storage options in GCP and helps you choose an option that meets your requirements: https://cloud.google.com/architecture/storage-advisor. You might find the following sections of the doc particularly relevant to the discussion in this thread.

Please take a look, and share your feedback in this thread. Thank you!

We use filestore for common shared storage of application caches, cubing for analytics - so if an clustered app server goes down, the other server can pick from filestore for faster response with analytics.. 

whereas we use persistent disk for Compute engine disk storage for the VMs..