There may be two network platform options you can choose for your cluster:
EC2-Classic or EC2-VPC. In EC2-Classic,
your instances run in a single, flat network that you share with other customers.
EC2-Classic is available only with certain accounts in certain regions. For more
information, see Amazon EC2 and Amazon VPC in the Amazon EC2 User Guide for Linux Instances. In
EC2-VPC, your cluster uses Amazon Virtual Private Cloud (Amazon VPC), and EC2 instances
run in a VPC that’s logically isolated within your AWS account. Amazon VPC enables
you to provision a virtual private cloud (VPC), an isolated
area within AWS where you can configure a virtual network, controlling aspects
such as
private IP address ranges, subnets, routing tables, and network gateways.
VPC offers the following capabilities:
-
Processing sensitive data
Launching a cluster into a VPC is similar to launching the cluster into a private
network
with additional tools, such as routing tables and network ACLs, to define who
has access to the network. If you are processing sensitive data in your cluster,
you may want the additional access control that launching your cluster into a
VPC provides. Furthermore, you can choose to launch your resources into a
private subnet where none of those resources has direct internet
connectivity. -
Accessing resources on an internal
networkIf your data source is located in a private network, it may be impractical or
undesirable to upload that data to AWS for import into Amazon EMR, either because
of the
amount of data to transfer or because of the sensitive nature of the data. Instead,
you can launch the cluster into a VPC and connect your data center to your VPC
through a VPN connection, enabling the cluster to access resources on your internal
network. For example, if you have an Oracle database in your data center, launching
your cluster into a VPC connected to that network by VPN makes it possible for
the
cluster to access the Oracle database.
Public and Private Subnets
You can launch EMR clusters in both public and private VPC subnets. This means you
do not
need internet connectivity to run an EMR cluster; however, you may need to configure
network address translation (NAT) and VPN gateways to access services or resources
located outside of the VPC, for example in a corporate intranet or public AWS
service endpoints like AWS Key Management Service.
Important
Amazon EMR only supports launching clusters in private subnets in releases 4.2 or
greater.
For more information about Amazon VPC, see the Amazon VPC User Guide.
More Resources for Learning About VPCs
Use the following topics to learn more about VPCs and subnets.
-
Private Subnets in a VPC
-
Public Subnets in a VPC
-
General VPC Information
Leave a Reply