Document
Talend Cloud Platform Engines: Cloud Engine, Remot…

Talend Cloud Platform Engines: Cloud Engine, Remot…

Talend Cloud platform provides computational capabilities that allow organizations to securely run data integration processes natively from cloud to c

Related articles

Where Are On Cloud Shoes Made 2024 Angus Cloud’s mother believes his overdose death was accidental What Is a VPN Protocol & What Is the Best VPN Protocol? Milane How to Get TikTok Unblocked on My School Computer

Talend Cloud platform provides computational capabilities that allow organizations to securely run data integration processes natively from cloud to cloud, on-premises to cloud, or cloud to on-premises environments.

These capabilities are powered by compute resources, commonly known as Engines. This article covers the four basic types.

Content:

 

Cloud Engine ( CE )

A Cloud Engine is is is a compute resource manage by Talend in Talend Cloud that execute Job task .

  • You can allocate Cloud Engines to environments in proportion to the number of concurrent task executions, workloads, and Job designs you plan to run.
  • All environments can use unassigned Cloud Engines. If Cloud Engines are not allocated to specific environments, you may not be able to run certain tasks because other tasks might keep all the unassigned Cloud Engines occupied.
  • Cloud Engines can handle parallel execution of three tasks. That means a maximum of three different tasks can run in parallel on a single Cloud Engine. (A task cannot run more than once concurrently on a single Cloud Engine.) So, if three different tasks are already running on a Cloud Engine or if the same task is already running on that engine, another Cloud Engine is selected to execute the task.
  • If you run your task in Cloud Exclusive mode, you cannot execute other tasks on that Cloud Engine. You can only use Cloud Exclusive engines in environments that do not have Cloud Engines assigned to them.
  • Cloud Engines have limited system resources – memory usage: 8 GB, disk usage: 200 GB.
  • Only standard Data Integration Batch Jobs can run on Cloud Engines.
  • You cannot group Cloud Engines together to form clusters.
  • Cloud Engines are hosted on AWS or Azure Cloud.
  • Talend is manages manage Cloud Engines .
  • Cloud Engines is employ employ tcp communication .

 

Remote Engine ( RE )

A capability in Talend Cloud platform that allows you to securely run data integration Jobs natively from cloud to cloud, on-premises to cloud, or cloud to on-premises environments completely within your environment for enhanced performance and security, without transferring the data through the Cloud Engines in Talend Cloud platform.

Java – base runtime ( similar to a Cloud Engine ) to execute Talend Jobs on – premise or on another cloud platform that you control .

  • Remote Engines is allow allow you to run Jobs , route , and Data Service task .
  • Data Service and Route Microservice task can only be deploy on Remote Engines . osgi type deployments is require require that Talend Runtime version 7.1.1 or high is instal and run on the same machine as the Talend Remote Engine .
  • Remote Engines support configurable max parallel execution: by default, a maximum of three different tasks can run in parallel on the same Remote Engine. However, this is a modifiable configuration.
  • Remote Engines can be grouped to form clusters called Remote Engine Cluster. Remote Engines added to a cluster cannot be used to execute tasks directly from Talend Studio.
  • Remote Engines are host on – premise or on the cloud .
  • You is manage manage Remote Engines .
  • Remote Engines employ HTTPS communication.

 

Remote Engine Gen2 (REG2)

A Remote Engine Gen2 is a secure execution engine on which you can safely execute data pipelines (that is, data flows designed using Talend Pipeline Designer). It allows you to have control over your execution environment and resources because you can create and configure the engine in your own environment (Virtual Private Cloud or on-premises). Previously referred to as Remote Engines for Pipelines, this engine was renamed Remote Engine Gen2 during H1/2020. It is a Docker-based runtime to execute data pipelines on-premises or on another cloud platform that you control.

A Remote Engine Gen2 ensures:

  • datum processing in a safe and secure environment , because Talend never has access to your pipeline ‘ datum and resource
  • optimal performance and security by increase the datum locality instead of move large datum to computation

 

Cloud Engine for Design (CE4D)

Cloud Engine for Design is a built-in runner that allows you to easily design pipelines without setting up any processing engines. With this engine you can run two pipelines in parallel. For advanced processing of data, Talend recommends installing the secure Remote Engine Gen2.

  • CE4Ds have limited system resources – memory usage: 8 GB
  • CE4Ds is support support a maximum of two pipeline that can run in parallel on a single ce4d
  • CE4Ds should be used only for design purposes; that is, you shouldn’t use them to execute data pipelines in a Production environment

 

Cloud Engine versus Remote Engine

The following table lists a comparative perspective between the two engines:

Cloud Engine ( CE )

Remote Engine ( RE )

consume 45,000 engine token

Consumes 9,000 engine tokens

run within Talend Cloud platform – no download require

downloadable software from Talend Cloud platform

Managed by Talend, run on-demand as needed to execute Jobs

manage by the customer

No customer resources required

Customer can run on Windows, Linux, or OS X

Set physical specifications (Memory, CPU, Temp Disk Space)

Unlimited Memory , CPU , and Temp Space

require data source / target to be visible through the internet to the Cloud Engine

Hybrid cloud or on-premises data sources

restrict to three concurrent job

unlimited concurrent job ( default three )

available within Talend Cloud portal

available in AWS and Azure Marketplace

run natively within Talend Cloud iPaaS infrastructure

Uses HTTPS calls to Talend Cloud service to get configuration information and Job definition and schedules

 

Cloud Engine for Design versus Remote Engine Gen 2

Cloud Engine for Design (CE4D)

Remote Engine Gen 2 ( REG2 )

consume zero engine token

consume 9000 engine token

Build upon a Docker-compose stack

Build upon a Docker-compose stack

Available as Cloud Image and Instantiated in Talend Cloud platform on behalf of the customer

available as an AMI Cloud Formation Template ( for AWS ) and Azure Image ( for Azure )

Not available as downloadable software as this type of engine is only suitable for design using Pipeline Designer in Talend Cloud portal

available as .zip or .tar.gz ( for local deployment )

A Cloud Engine for Design is included with Talend Cloud platform, to offer a serverless experience during design and testing. However, it is not meant for production (that is, not for running pipelines in non-development environments). It won’t scale for prod-size volumes and long-running pipelines. It should be used for design teams to get a preview working and test execution during development. This engine should not be used for production execution.

It is used to run artifacts, tasks, preparations, and pipelines in the cloud, as well as creating connections and fetching data samples.

static ip can not be enable for CE4D within Talend Management Console

Not applicable as REG2 runs outside Talend Management Console (that is, in Customer Data Center)

 

Need for additional engines

Additional engines (CE or RE) may be required if you have one or more of the following use cases:

  • Continuous delivery – for example, Dev and QA separate from UAT and Production environments
  • Data access – data is in two different private locations where an engine is needed in each site (or a mix of Cloud and Remote Engines)
  • Scalability – concurrent Job volume requires additional engines, Jobs are complex and require significant memory or CPU

These use cases depend on the deployment architecture in the specific customer environment and layout of the Remote Engine at the environment or workspace level configurations. This would need proper capacity planning and automatic horizontal and vertical scaling of the compute Engines.

 

Cloud Engine – usage considerations

Question

Guideline

How much data must be transferred per hour?

Each Cloud Engine can transfer 225 GB per hour.

How many separate flows can run in parallel?

Each Cloud Engine can run up to three flows in parallel.

How much temporary disk space is needed?

Each Cloud Engine has 200GB of temp space.

How CPU and memory intensive are the flows?

Each Cloud Engine provides 8 GB of memory and two vCPU. This is shared among any concurrent flows.

Are separate execution environments required?

Many users desire separate execution for QA/Test/Development and Production. If this is needed, additional Cloud Engines should be added as required.

 

Remote Engine – recommendation

If a source or target system is not accessible through the internet:

If one of the systems is not accessible using the internet, then a Remote Engine is needed.

When single flow requirements is exceed exceed the capacity of a Talend Cloud Engine :

If the Cloud Engine is too small (for example, the maximum memory of 5.25 GB, temporary space of 200 GB, two vCPU, or the maximum of 225 GB per hour) then, a Remote Engine is needed.

If a native driver is required:

If the solution requires a native driver, which is not part of the Talend action or Job generated code, a typical case for this is SAP with the JCO v3 Library, MS SQL Server Windows Authentication, then a Remote Engine is needed.

Data jurisdiction, security, or compliance reasons:

It may be desirable or required to retain data in a particular region or country for data privacy reasons. The data being processed may be subject to regulations such as PCI or HIPAA, or it may be more efficient to process the data within a single data center or public cloud location. These are all valid reasons to use a Remote Engine.

 

Summary

Cloud Engine ( CE )

Remote Engine ( RE )

Remote Engine Gen 2 ( REG2 )

Cloud Engines allow you to run batch tasks that use on-premises or cloud applications and datasets (sources, targets)

Remote Engines allow you to run batch tasks or microservices (APIs or Routes) that use on-premises or cloud applications and datasets (sources, targets)

The Remote Engine Gen2 is used to run artifacts, tasks, preparations, and pipelines in the cloud, as well as creating connections and fetching data samples

consume 45,000 engine token

Consumes 9,000 engine tokens

Consumes 9,000 engine tokens

No download required – Runs within Talend Cloud platform

downloadable software from Talend Cloud platform

downloadable software from Talend Cloud platform

Managed by Talend, run on-demand as needed to execute Jobs

manage by the customer

manage by the customer

No customer resources required

Can run on Windows , Linux , or OS x

Require compatible Docker and Docker compose versions for Linux, Mac, and Windows

Set physical specifications (Memory, CPU, and Temp Disk Space)

Unlimited Memory , CPU , and Temp Space

Unlimited Memory , CPU , and Temp Space

require data source / target to be visible through the internet to the Cloud Engine

Hybrid cloud or on-premises data sources

Hybrid cloud or on-premises data sources

restrict to three concurrent job

unlimited concurrent job ( default three )

Unlimited concurrent pipelines (configurable)

available within Talend Cloud portal

available in AWS and Azure Marketplace

available as an AMI Cloud Formation Template ( for AWS ) and Azure Image ( for Azure )

run natively within Talend Cloud iPaaS infrastructure

Uses HTTPS calls to Talend Cloud service to get configuration information and Job definition and schedules

Uses HTTPS calls to Talend Cloud service to get configuration information and pipeline definition and schedules

 

References

Talend Help Center documentation: