Configure Networking – Amazon EMR

There may be two network platform options you can choose for your cluster:
EC2-Classic or EC2-VPC. In EC2-Classic,
your instances run in a single, flat network that you share with other customers.
EC2-Classic is available only with certain accounts in certain regions. For more
information, see Amazon EC2 and Amazon VPC in the Amazon EC2 User Guide for Linux Instances. In
EC2-VPC, your cluster uses Amazon Virtual Private Cloud (Amazon VPC), and EC2 instances
run in a VPC that’s logically isolated within your AWS account. Amazon VPC enables
you to provision a virtual private cloud (VPC), an isolated
area within AWS where you can configure a virtual network, controlling aspects
such as
private IP address ranges, subnets, routing tables, and network gateways.

VPC offers the following capabilities:

  • Processing sensitive data

    Launching a cluster into a VPC is similar to launching the cluster into a private
    network
    with additional tools, such as routing tables and network ACLs, to define who
    has access to the network. If you are processing sensitive data in your cluster,
    you may want the additional access control that launching your cluster into a
    VPC provides. Furthermore, you can choose to launch your resources into a
    private subnet where none of those resources has direct internet
    connectivity.

  • Accessing resources on an internal
    network

    If your data source is located in a private network, it may be impractical or
    undesirable to upload that data to AWS for import into Amazon EMR, either because
    of the
    amount of data to transfer or because of the sensitive nature of the data. Instead,
    you can launch the cluster into a VPC and connect your data center to your VPC
    through a VPN connection, enabling the cluster to access resources on your internal
    network. For example, if you have an Oracle database in your data center, launching
    your cluster into a VPC connected to that network by VPN makes it possible for
    the
    cluster to access the Oracle database.

Public and Private Subnets

You can launch EMR clusters in both public and private VPC subnets. This means you
do not
need internet connectivity to run an EMR cluster; however, you may need to configure
network address translation (NAT) and VPN gateways to access services or resources
located outside of the VPC, for example in a corporate intranet or public AWS
service endpoints like AWS Key Management Service.

Important

Amazon EMR only supports launching clusters in private subnets in releases 4.2 or
greater.

For more information about Amazon VPC, see the Amazon VPC User Guide.

More Resources for Learning About VPCs

Use the following topics to learn more about VPCs and subnets.

  • Private Subnets in a VPC

  • Public Subnets in a VPC

  • General VPC Information

Source

Be the first to comment

Leave a Reply

Your email address will not be published.


*