Storage Tiering and Archival Job

Info

Available from: Resilio Active Everywhere 4.2.0
Supported for: Local storages, network storages, AWS Glacier storage tiers, other cloud storages.
Requires: This job type must be enabled in the license.

Storage tiering job is a dataflow driven job. To place files in AWS storage tier, or retrieve, or to do data migration between servers.

Configuring the Storage tiering job

Before creating a job, edit Default Job Profile or create a new one with the following custom parameters:

Lazy Indexing: Yes. - Otherwise file (objects) will be retrieved from AWS Glacier archive twice.
fs_enable_meta: false - Advised for AWS Glacier storage. Disabling metadata synchronization is not mandatory. If enabled and the metadata on a retrieved object changes, the object will be moved to AWS Glacier archive again.

Click Jobs -> Create a new job -> Storage Tiering and Archival job.

Select the created Job profile.

Only one source Agent and one destination Agent are supported. Data can be transferred between cloud and non-cloud storages.
Resilio AE 4.2.0 implements native support for AWS Glacier storage tiers.

High availability groups are supported as source and destination.

The Storage Tiering and Archival job can be used in conjunction with any other job's "Path" target, including:

Synchronization
Distribution
Consolidation
File Caching (Primary Storage)
Hybrid Work (Primary Storage)

Do not use this job with Agents that have the following role:

TSS (Transparent Selective Sync)
Caching Gateway
End-User Agents

Note

When files are transferred to the Glacier Tier by a Storage Tiering Job, they become inaccessible until explicitly restored. If another job attempts to access these files while they are still in the Glacier Tier, it will fail and report an error, as the files must be retrieved before they can be read or processed.

Parameters for data transfer

On tab STORAGE TIERING configure the parameters for files to be transferred. These parameters apply to the files/objects in the source directory.
Modification / access time. Select the files depending on their modification or access time.
Profile parameters "Max/min file modification time (sec)" from Job profile are not applied to this job.
Access time configuration is disabled in case source Agent is pointed to a cloud storage, cloud storages don't support access time on objects.
Access time must be enabled on a Windows Server as a source in the job (it's disabled by default. For more information, see Recording of files access time is not enabled on windows). Otherwise, no files will be listed for transfer by the source, job run will show 0 (zero) files discovered by source Agent.

Exact files list. Supports regular expressions - one expression per line, locations relatively the job path configured for the source. Case sensitive. Field cannot be empty. To transfer all files in the source folder, add .* as the list.
Please click outside of the editbox after editing the rule for the rule to be initialized by MC and Next button to activate to continue editing the job..

Indexing optimization can be enabled by using ^ - at the beginning of the line, in all patterns. Agent performs partial matching to exclude unsuitable folders while indexing. For instance, with pattern ^folder1/.*\.txt and data structure

folder1/
    file1.txt
    file2.txt
folder2/
    file3.txt
    file4.txt

directory ‘folder2’ wont be even visited by the Agent, since there are no possible paths inside it that could match any given pattern.

Storage class. Select storage class, if the AWS storage is selected as destination. Files from source will be placed in the chosen storage tier.

Pre-seeded AWS Glacier storage

Pre-seeded objects in the AWS storage already have a storage class assigned. On the next job run, such objects will be moved to the new storage class per job's settings only if the file on source is different from that in the bucket (by size, timestamp). Otherwise, the storage class will be preserved as is.

Restoring. Select restoring tier to retrieve data from an AWS Glacier archive.
Expedited retrieval is not available in Glacier Deep Archive class. Such objects will not be retrieved and the Agent will give an error.

Retrieval classes are also supported for Oracle cloud storage.

By default objects are restored for 3 days. Configurable with custom parameter cloudfs.s3.retrieval_days in Job profile.

File retention and deletion settings. Option to clear files from source Agent once they're transferred to the destination. For more information, see Deleting files from source after transfer in Distribution Jobs.

Dry run for the Archival job

Before starting the job itself, it's possible to launch a dry run. It's launched only on the source Agent, it scans the files that match the configured files list and will be transferred in the job.

Note, the dry run only checks the list of files that match the rule! It does not calculate file hashes and does not verify files' availability for the transfer, does not perform any action with the files.
Besides, the files may change after dry run completes but before the job run is started. Files may also change during the job run.

Dry run can be started only manually. Dry run cannot be started if the job itself does not have a destination agent.

The job run itself and the dry run for this job cannot work simultaneously. The dry run must be stopped in order to launch the job run itself.
Dry run cannot be paused, but it can be stopped.

Resulting list of files is available from FILES tab.

The following MC functionality is not supported for the dry run:

Mail and webhook notifications
Ignoring some errors or aborting the dry run on an error
MC API
Starting by the job scheduler
Job triggers
Launching the dry run on specific Agents and restarting the dry run on Agents
File query
Extended statistics information
Some profile parameters regarding hashing, aborting the job on timeout, executing scripts.

Monitoring the storage tiering job

During retrieval from the Glacier archive, source Agent reports status 'waiting for files to be restored'. Until an object is retrieved, there's no active data transfer and the job run's ETA shows "unknown" state.
Destination Agent reports status “downloading” with the corresponding files count.

The file will be downloaded to the destination once the object is restored. Depending on the selected retrieval tier it may take 2 - 48 hours. Check the restoring details in the AWS bucket in the object’s details.

Objects are downloaded immediately (without retrieval from archive) if they’re the same on destination Agent (i.e. no real need to download the files, Agents compare hashes and that’s it).

Retrieving of an object cannot be stopped by Resilio means, even if the job run is stopped and the job is deleted.

Peculiarities and limitations

Agent does not store objects’ versions in Resilio's Archive. Make sure to enable file versioning in the bucket.
Before finalizing download trigger not supported in this job.
S3 counts retrieval days by its internal algorithm, so if cloudfs.s3.retrieval_days parameter is set as 1, it won’t be 24h exactly.

Example

Restoring started on: October 29, 2024, 11:06:42 (UTC+01:00), cloudfs.s3.retrieval_days=1, then restoring expires on: October 31, 2024, 01:00:00 (UTC+01:00).
While file is being restored from AWS archive, job is in progress, file query shows “can’t be synced now, uploading agent has problems”.