Best way to batch load / historic backfill bigQuery table?

We've got a mysql server running at another cloud provider, and the plan is to use datastream to pull the near real time data into bigQuery for our analytical team to use. Datastream has a backfilling option, but our largest table comes in at 500GB and a few are over 100GB, so I'd rather not put too much strain on the MySQL server using the built in option.

Are there any recommendations for how I'd get the historic table data into bigQuery in a more controlled manner? We don't have a large engineering team so a simple solution would suit us better.

Solved Solved
0 1 804
1 ACCEPTED SOLUTION

Here are some recommendations for how to get the historic table data into BigQuery in a more controlled manner without putting too much strain on your MySQL server:

Option 1: Use a third-party tool

There are a number of third-party tools that can be used to migrate data from MySQL to BigQuery. These tools typically offer a variety of features, such as:

  • Incremental data migration: This allows you to migrate data in batches, which can help to reduce the load on your MySQL server.
  • Schema conversion: These tools can automatically convert your MySQL schema to a BigQuery schema.
  • Data transformation: These tools can be used to transform your data before it is migrated to BigQuery.

Option 2: Use a custom script

If you have a small engineering team, you may prefer to use a custom script to migrate your data from MySQL to BigQuery. This can be a more complex option, but it gives you more control over the migration process.

Here is a basic example of a Python script that can be used to migrate data from MySQL to BigQuery:

 

import mysql.connector
from google.cloud import bigquery

# Connect to the MySQL server
mysql_db = mysql.connector.connect(host='localhost', database='my_database', user='my_user', password='my_password')

# Create a BigQuery client
bigquery_client = bigquery.Client()

# Create a BigQuery table to store the migrated data
bigquery_table = bigquery.Table('my_project.my_dataset.my_table')
bigquery_table.schema = [
    bigquery.SchemaField('id', 'INT64'),
    bigquery.SchemaField('name', 'STRING'),
    bigquery.SchemaField('age', 'INT64'),
]
bigquery_client.create_table(bigquery_table)

# Query the MySQL table
mysql_cursor = mysql_db.cursor()
mysql_cursor.execute('SELECT * FROM my_table')

# Insert the MySQL data into the BigQuery table
for row in mysql_cursor:
    bigquery_client.insert_rows(bigquery_table, [row])

# Close the MySQL connection
mysql_db.close()

You can also use a hybrid approach to migrate your data from MySQL to BigQuery. For example, you could use a third-party tool to migrate the initial batch of data, and then use a custom script to migrate the incremental data.

This approach can be helpful if you have a large amount of data to migrate and you need to minimize the load on your MySQL server.

Recommendation

If you have a small engineering team, I recommend using a third-party tool to migrate your data from MySQL to BigQuery. This is the simplest and most straightforward option.

However, if you need more control over the migration process, or if you have a very large amount of data to migrate, you may want to consider using a custom script or a hybrid approach.

Here are some additional tips for migrating your data from MySQL to BigQuery:

  • Test the migration process in a staging environment before migrating your production data.
  • Use a data schema conversion tool to convert your MySQL schema to a BigQuery schema.
  • Use a data transformation tool to transform your data before it is migrated to BigQuery.
  • Monitor the migration process closely to ensure that it is running smoothly.

View solution in original post

1 REPLY 1

Here are some recommendations for how to get the historic table data into BigQuery in a more controlled manner without putting too much strain on your MySQL server:

Option 1: Use a third-party tool

There are a number of third-party tools that can be used to migrate data from MySQL to BigQuery. These tools typically offer a variety of features, such as:

  • Incremental data migration: This allows you to migrate data in batches, which can help to reduce the load on your MySQL server.
  • Schema conversion: These tools can automatically convert your MySQL schema to a BigQuery schema.
  • Data transformation: These tools can be used to transform your data before it is migrated to BigQuery.

Option 2: Use a custom script

If you have a small engineering team, you may prefer to use a custom script to migrate your data from MySQL to BigQuery. This can be a more complex option, but it gives you more control over the migration process.

Here is a basic example of a Python script that can be used to migrate data from MySQL to BigQuery:

 

import mysql.connector
from google.cloud import bigquery

# Connect to the MySQL server
mysql_db = mysql.connector.connect(host='localhost', database='my_database', user='my_user', password='my_password')

# Create a BigQuery client
bigquery_client = bigquery.Client()

# Create a BigQuery table to store the migrated data
bigquery_table = bigquery.Table('my_project.my_dataset.my_table')
bigquery_table.schema = [
    bigquery.SchemaField('id', 'INT64'),
    bigquery.SchemaField('name', 'STRING'),
    bigquery.SchemaField('age', 'INT64'),
]
bigquery_client.create_table(bigquery_table)

# Query the MySQL table
mysql_cursor = mysql_db.cursor()
mysql_cursor.execute('SELECT * FROM my_table')

# Insert the MySQL data into the BigQuery table
for row in mysql_cursor:
    bigquery_client.insert_rows(bigquery_table, [row])

# Close the MySQL connection
mysql_db.close()

You can also use a hybrid approach to migrate your data from MySQL to BigQuery. For example, you could use a third-party tool to migrate the initial batch of data, and then use a custom script to migrate the incremental data.

This approach can be helpful if you have a large amount of data to migrate and you need to minimize the load on your MySQL server.

Recommendation

If you have a small engineering team, I recommend using a third-party tool to migrate your data from MySQL to BigQuery. This is the simplest and most straightforward option.

However, if you need more control over the migration process, or if you have a very large amount of data to migrate, you may want to consider using a custom script or a hybrid approach.

Here are some additional tips for migrating your data from MySQL to BigQuery:

  • Test the migration process in a staging environment before migrating your production data.
  • Use a data schema conversion tool to convert your MySQL schema to a BigQuery schema.
  • Use a data transformation tool to transform your data before it is migrated to BigQuery.
  • Monitor the migration process closely to ensure that it is running smoothly.