Re: Shopify Connector: Missing Data

ianala · 07-27-2023 01:12 AM

Hi,

I created a main integration and sub-integration for 2 different tables coming from Shopify. In other words, in total I have 4 integrations in Application Integrations.

For the first table, the data has added, however, the table of gift cards has 25 rows. Some rows seem to be missing as in Shopify I can see more results for gift cards. However, it shows that the execution was successful.

For the second table, the flow starts and it stoppes after some with with the error of "UNHANDLED_TASK_ERROR" (screenshot is attached). In this case, it shows that the execution failed.

Could you guide how to fetch the missing rows for the first table and get the data fetched from REST API Shopify and added to BQ for the second table. Thank you.

Marramirez

Hi @ianala,

You should look into Connectors Task. Under Entity Operations, listEntitiesPageSize specifies the number of results that should be returned in a page. If the result set is too large, the Connectors task might fail as there is a limit on the data size that the connectors can process at a time. By breaking down the result into smaller chunks, you can avoid this issue.

By default, the page size is 25 and the maximum number of pages supported by the task is 200. If you want to change this, you can specify listEntitiesPageSize to a value that you want. Read more under Input Parameters.

Checkout Application Integration with Shopify. Make sure that you have followed all steps, as you might have missed one.

If the above options don't work, you can contact Google Cloud Support to further look into your case. Let me know if it helped, thanks!

ianala

Hi @Marramirez ,

Thanks for these comments. I've encountered a situation with a specific table that should ideally consist of 377 rows. In an attempt to optimize the integration process, I adjusted the listEntitiesPageSize to 200 and conducted a test with this particular table. Interestingly, during this trial, only 200 rows were retrieved, leaving a deficit of 177 rows. To rectify this, I initiated the integration once more, and this time the same set of rows was successfully fetched and appended to BigQuery.

I'm in search of a strategy that enables a seamless retrieval of data. Ideally, I would like to gather the initial 200 rows, followed by the subsequent 200 rows, all within a single execution run. It's essential to ensure that redundant data isn't fetched, and that all the collected results are stored into a single table. There is some info in the official documentation which is as follows: "If your result set has a large number of pages, you can consider using the task to repeatedly call the Connectors task and use the task to automatically assign token values to the listEntitiesPageToken input parameter after each run." However, I am not sure about its implementation, e.g. what is needed to be changed with my current setup and whether it is indeed the solution for this problem. Do you have a suggestion on that? General docs are not helpful in this case.

Moreover, there are additional tables, each containing around 50,000 rows. While attempting to run the integration for the table with the highest expected row count, I encountered a failure in execution. Interestingly, even with a listEntitiesPageSize of 200 set for this scenario, the process fell short of retrieving the expected number of rows. I'm open to any suggestions or insights you might have to address this issue.

ianala

Hi @Marramirez,

I have faced with the similar issue while fetching the data using Shopify's connector with the GRAPHQL schema. I set the listEntitiesPageSize to 200. Executation stated but failed after some time. Can you guide that is a good best to fetch the full historical data from a particular table so that an integration will not fail?

Meenchou

Hi @ianala

Pls check out this post https://www.googlecloudcommunity.com/gc/Integration-Services/API-Pagination-in-Application-Integrati... and let me know if it answers your questions