Working with large volumes
The challenges when working with large volumes and how to overcome them
Working with large volumes of data can present many challenges that needs to be overcome. While in the near future our preferred way to exchange large volumes of data will be using the Unit4 Data Hub, currently we still have customers that uses API’s to extract large volumes of data for example to build up their own DataWarehouse. It is not a trivial task, there are many considerations like reaching rate limits and quotas (see relevant sections), pagination and handling edge cases during paginations, ongoing data changes during extraction, timeouts and request limits.
Strategies for Efficient Data Retrieval
Filtering and segmenting extracted data
In case of requests done to ObjectAPI consider horizontal partitioning of data by - only read what you really need - by filtering the request and segmenting to only receive slices of the Enterprise Documents.
Separate states
Several endpoints from ObjectAPI exposes transactional Enterprise Documents, where data can be in specific states. While it is easy to just request all states, this can hurt performance. On large volumes is recommended to request states in separate calls.
Data Compression
Enable gzip compression for the responses by providing the header Accept-Encoding: gzip. Doing this is also beneficial for consumed quotas.
Pagination and Incremental Data Loading
Requests can be chunked up using strategies as described in Pagination or Pagination (optimized). Additionally, in many scenarios it is highly recommended to use Incremental Loading, where you request only changes that occurred from previous data loading cycle.
Data Consistency
When using pagination / incremental loading we need to be prepared to deal with data changes during the data loading process as described at Data Consistency
Retry logic
Handling transient failures, retry policy, exponential backoff