Fivetran vs Stitch for data ingestion

  • 24 June 2019
  • 3 replies
  • 708 views

Userlevel 2

Hi all,


I’m looking for opinions on Fivetran vs Stitch, specifically focused on the ingestion piece. Has anyone had experience of both and can explain their preference? I’m interested in anything from customer service standards, roadmap transparency, robustness or anything else that has left anyone with data ingestion battle scars.


No doubt people from both companies keep an eye on discourse but it would be great to keep this to a customer-only view point please. Both tools have lots of advantages as far as I can see so I’d love to get a feel for what this community thinks are the big differentiators.


Cheers,


Jon


3 replies

I have used both platforms and each have it’s own strengths and weaknesses. Our company uses both for different use cases. I’ll make a small, opinionated list of what I like/dislike of each here, but I’m happy to expand any on any thoughts here if you’d like.


Stitch



  • Great for smaller data sources, but gets expensive when you want to ingest a lot of data. Since Stitch’s pricing is based on the number of records replicated per month, we tend to use Stitch for our many of our ‘smaller’ integrations, where we’re ingesting a lot fewer data.

  • Good debugging and monitoring tools. In general, I find Stitch’s platform does this a lot better than Fivetran.


  • Singer integration is awesome. Technically you can run any open source Singer tap on their platform or even build your own if it doesn’t exist.

  • Very good customer support. Stitch’s customer support has always been excellent.

  • Can only use one data warehouse. With Stitch, you can only set up a single ‘Destination’ per account. Our company uses Redshift and BigQuery, so this has been a bit of a challenge.

  • Better modeling in BigQuery. With the integrations I’ve used with Stitch it makes better use of BigQuery nested/repeated fields.

  • Finer grained control over what to replicate. Stitch lets you control what tables/fields to replicate on most integrations pretty well. This is great if you don’t want to pull in potentially sensitive data like PII or passwords.


Integrations



  • Inferior MongoDB integration. We struggled using Stitch’s Mongo integration because it only allows specific versions of MongoDB and you have to set up specific indexes on your source tables to use it and can’t handle ‘hard deletes’ (marking deleted records in Mongo as deleted instead of deleting them in your warehouse).

  • Better Stripe integration. Stitch’s Stripe integration beats Fivetran, mostly because it has less data quality issues and pulls in raw Stripe events.


Fivetran



  • Great for replicating a lot of data. With Fivetran you pay a fixed price per connector and you can replicate as much data as you like. For example, our company replicates 1TB+ worth of MongoDB data for a fixed price of $250 per month, which would be really pricey on Stitch.

  • Customer support can be a bit lacking. I’ve had mixed experiences here, but overall I’d say Fivetran’s support is nowhere as good as Stitch.

  • Can have many different data warehouses. Unlike Stitch, Fivetran lets you replicate data to multiple destinations on one account (each integration can only send data to one warehouse though).

  • Worse modeling in BigQuery. Unlike Stitch, Fivetran doesn’t model data in BigQuery using nested/repeated fields and in many cases will map data as a JSON string instead. This means you often have to add another modeling layer to extract data from json.

  • Less control over what data to replicate. With Fivetran you can’t always control what specific fields to replicate, which could lead you to pull in potentially sensitive data into your warehouse.


Integrations



  • Superior MongoDB integration. Fivetran’s Mongo integration is pretty amazing, it can replicate all the collections huge of a database without setting up indexes, supports all versions of Mongo and supports hard deletes. The only edge that Stitch has is that it has finer-grained control over what fields to replicate.

  • Inferior Stripe integration. Fivetran’s Stripe integration has some data quality issues that need to be fixed and it pretty unreliable (it’s broken a few times for us, taking up to a week for them to fix it). It also doesn’t pull in raw Stripe events.

Userlevel 3

Hello,


I’ll share my experience as we compared Fivetran and Stich in December 2018 for our new data platform.


Please note it’s only reflecting our use cases that were centered around advertising sources, and it was 6 months ago, so things may have changed on Stitch side especially.


What we noticed is that Fivetran performs much much better sync especially in BigQuery. To understand why, Stitch is working on BigQuery in append-only way meaning it will duplicate data everytime it tries to write in the WH at each sync.

For example, on Google Ads, both solutions update the last 30 days of data at each sync but on Stitch side, once the historical sync is over, it will start duplicating data for any subsequent sync.


As an example, on a Fivetran table, if you want to calculate a sum of clicks per day, you just take the click metrics and include it a viz, it’s working right away :



Whereas, on Stitch, if you do that you’ll get:



So you have to take extra steps to dedup this using the “last stitch synced date” to only account for the last version of the data. But even like this, you end up with a lot of useless storage in the WH so at some point, you should clean it out etc…


So given the fact we had hundreds of ad accounts to sync and we wanted to do it every 5 minutes, it was definitely a no go for us.


Beyond that, I totally agree with Michael about the fact Fivetran lets you replicate A LOT of data for every sources so if your use case require scale, It’s likely to be better suited than Stitch. To give you an idea, we are currently replicating more than 500M rows per day and it’s working beautifully well


Regarding debugging and monitoring, I didn’t really dig into what Stitch was offering on this but I found Fivetran was allowing some good options:



  • You can connect Fivetran to your logging service on AWS (Cloudwatch), GCP (Stackdriver) or Azure

  • You can use the Fivetran API to pull status from every connectors

  • You get an “audit” table in every schema in your WH with a summary of what has been done after each sync


Regarding customer support, we’ve been using FIvetran for the last 6 months and i’m really happy with the speed and efficiency of the team: the fact the customer service is working 24/7 from 3 different continents is definitely a great thing.


At the moment, the only caveat we are facing is more related to the UI and the initial configuration : as we had a lot of connectors to create, it was a lot of manual work to select the accounts, the dimensions, the metrics and so on… You can’t do things in bulk or copy connector configuration for example. But I know Fivetran is currently working on this by adding more and more sources in support from their API.


To sum up, we found that Fivetran was delivering much better data quality and services for our use cases and needs! But as always, there is so many different sources and needs that yuo should always test and compare or even eventually take both even if it’s just for 1 application!


Anthony

Userlevel 2

Thanks for the great feedback @Michael_Erasmus and @antho. That’s really helpful!

Reply