Parameters: InstanceType: Type: String Default: t3.small AllowedValues: - t3.micro - t3.small - t3.medium - t3.large - t3.xlarge - t3.2xlarge - m5.large - m5.xlarge - m5.2xlarge - m5.4xlarge - m5.8xlarge - m5.12xlarge - m5.16xlarge - m5.24xlarge - r5.large - r5.xlarge - r5.2xlarge - r5.4xlarge - r5.8xlarge - r5.12xlarge - r5.16xlarge - r5.24xlarge Description: Instance Type for EC2 instance which hosts Spark history server. Enter one of [t3.micro/small/medium/large/xlarge/2xlarge, m5.large/xlarge/2xlarge/4xlarge/8xlarge/12xlarge/16xlarge/24xlarge, r5.large/xlarge/2xlarge/4xlarge/8xlarge/12xlarge/16xlarge/24xlarge]]. Default is t3.small. LatestAmiId: Type: AWS::SSM::Parameter::Value Description: Latest AMI ID of Amazon Linux 2 for Spark history server instance. You can use the default value. Default: /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2 VpcId: Type: AWS::EC2::VPC::Id Description: "VPC ID for Spark history server instance. You can use a VPC in your account. Warning: Using default VPC with a default NACL is not recommended." Default: '' SubnetId: Type: AWS::EC2::Subnet::Id Description: Subnet ID for Spark history server instance. You can use any of subnet in your VPC. You need to have network reachability from your client to the subnet. If you want to access via Internet, you would need to use a public subnet which has Internet gateway in the route table. Default: '' IpAddressRange: Type: String Description: "IP address range that can be used to view the Spark UI. You should use a custom value if you want to restrict access from a specific IP address range. Warning: Using the IP address range of 0.0.0.0/0 would make Spark UI publicly accessible." MinLength: 9 MaxLength: 18 HistoryServerPort: Type: Number Description: History Server Port for the Spark UI. You can use the default value. Default: 18080 MinValue: 1150 MaxValue: 65535 EventLogDir: Type: String Description: "*Event Log Directory* where Spark event logs are stored from the Glue job or dev endpoints. You must use s3a:// for the event logs path scheme (example: s3a://path_to_eventlog)." Default: s3a://path_to_eventlog SparkPackageLocation: Type: String Description: You can use the default value. Default: 'https://archive.apache.org/dist/spark/spark-2.4.3/spark-2.4.3-bin-without-hadoop.tgz' KeystorePath: Type: String Description: SSL/TLS keystore path for HTTPS. If you want to use custom keystore file, you can specify the S3 path s3://path_to_your_keystore_file here. If you leave this parameter empty, self-signed certificate based keystore is used. KeystorePassword: Type: String NoEcho: true Description: SSL/TLS keystore password for HTTPS. A valid password can contain 6 to 30 characters. MinLength: 6 MaxLength: 30 Metadata: AWS::CloudFormation::Interface: ParameterGroups: - Label: default: Spark UI Configuration Parameters: - IpAddressRange - HistoryServerPort - EventLogDir - SparkPackageLocation - KeystorePath - KeystorePassword - Label: default: EC2 Instance Configuration Parameters: - InstanceType - LatestAmiId - VpcId - SubnetId Mappings: MemoryBasedOnInstanceType: t3.micro: SparkDaemonMemory: '512m' t3.small: SparkDaemonMemory: '1g' t3.medium: SparkDaemonMemory: '3g' t3.large: SparkDaemonMemory: '6g' t3.xlarge: SparkDaemonMemory: '12g' t3.2xlarge: SparkDaemonMemory: '28g' m5.large: SparkDaemonMemory: '6g' m5.xlarge: SparkDaemonMemory: '12g' m5.2xlarge: SparkDaemonMemory: '28g' m5.4xlarge: SparkDaemonMemory: '28g' m5.8xlarge: SparkDaemonMemory: '28g' m5.12xlarge: SparkDaemonMemory: '28g' m5.16xlarge: SparkDaemonMemory: '28g' m5.24xlarge: SparkDaemonMemory: '28g' r5.large: SparkDaemonMemory: '12g' r5.xlarge: SparkDaemonMemory: '28g' r5.2xlarge: SparkDaemonMemory: '28g' r5.4xlarge: SparkDaemonMemory: '28g' r5.8xlarge: SparkDaemonMemory: '28g' r5.12xlarge: SparkDaemonMemory: '28g' r5.16xlarge: SparkDaemonMemory: '28g' r5.24xlarge: SparkDaemonMemory: '28g' Resources: HistoryServerInstance: Type: AWS::EC2::Instance Properties: ImageId: !Ref LatestAmiId InstanceType: !Ref InstanceType SubnetId: !Ref SubnetId SecurityGroupIds: - !Ref InstanceSecurityGroup IamInstanceProfile: !Ref HistoryServerInstanceProfile UserData: 'Fn::Base64': !Sub | #!/bin/bash -xe yum update -y aws-cfn-bootstrap echo "CA_OVERRIDE=/etc/pki/tls/certs/ca-bundle.crt" >> /etc/environment export CA_OVERRIDE=/etc/pki/tls/certs/ca-bundle.crt rpm -Uvh https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm /opt/aws/bin/cfn-init -v -s ${AWS::StackName} -r HistoryServerInstance --region ${AWS::Region} /opt/aws/bin/cfn-signal -e -s ${AWS::StackName} -r HistoryServerInstance --region ${AWS::Region} Metadata: AWS::CloudFormation::Init: configSets: default: - cloudwatch_agent_configure - cloudwatch_agent_restart - spark_download - spark_init - spark_configure - spark_hs_start - spark_hs_test cloudwatch_agent_configure: files: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json: content: !Sub | { "logs": { "logs_collected": { "files": { "collect_list": [ { "file_path": "/var/log/cfn-init.log", "log_group_name": "/aws-glue/sparkui_cfn/cfn-init.log" }, { "file_path": "/opt/spark/logs/spark-*", "log_group_name": "/aws-glue/sparkui_cfn/spark_history_server.log" } ] } } } } cloudwatch_agent_restart: commands: 01_stop_service: command: /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a stop 02_start_service: command: /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json -s spark_download: packages: yum: java-1.8.0-openjdk: [] maven: [] python3: [] python3-pip: [] sources: /opt: !Ref SparkPackageLocation commands: create-symlink: command: ln -s /opt/spark-* /opt/spark export: command: !Sub | echo "export JAVA_HOME=/usr/lib/jvm/jre" | sudo tee -a /etc/profile.d/jdk.sh echo "export SPARK_HOME=/opt/spark" | sudo tee -a /etc/profile.d/spark.sh export JAVA_HOME=/usr/lib/jvm/jre export SPARK_HOME=/opt/spark download-pom-xml: command: curl -o /tmp/pom.xml https://aws-glue-sparkui-prod-ap-southeast-2.s3-ap-southeast-2.amazonaws.com/public/mvn/pom.xml download-setup-py: command: curl -o /tmp/setup.py https://aws-glue-sparkui-prod-ap-southeast-2.s3-ap-southeast-2.amazonaws.com/public/misc/setup.py download-systemd-file: command: curl -o /usr/lib/systemd/system/spark-history-server.service https://aws-glue-sparkui-prod-ap-southeast-2.s3-ap-southeast-2.amazonaws.com/public/misc/spark-history-server.service spark_init: commands: download-mvn-dependencies: command: cd /tmp; mvn dependency:copy-dependencies -DoutputDirectory=/opt/spark/jars/ install-boto: command: pip3 install boto --user; pip3 install boto3 --user files: /opt/spark/conf/spark-defaults.conf: content: !Sub | spark.eventLog.enabled true spark.history.fs.logDirectory ${EventLogDir} spark.history.ui.port 0 spark.ssl.historyServer.enabled true spark.ssl.historyServer.port ${HistoryServerPort} spark.ssl.historyServer.keyStorePassword ${KeystorePassword} group: ec2-user mode: '000644' owner: ec2-user /opt/spark/conf/spark-env.sh: content: !Sub - | export SPARK_DAEMON_MEMORY=${SparkDaemonMemoryConfig} export SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem" - SparkDaemonMemoryConfig: !FindInMap [ MemoryBasedOnInstanceType, !Ref InstanceType, SparkDaemonMemory ] group: ec2-user mode: '000644' owner: ec2-user spark_configure: commands: create-symlink: command: ln -s /usr/lib/systemd/system/spark-history-server.service /etc/systemd/system/multi-user.target.wants/ enable-spark-hs: command: systemctl enable spark-history-server configure-keystore: command: !Sub | python3 /tmp/setup.py --keystore "${KeystorePath}" --keystorepw "${KeystorePassword}" > /tmp/setup_py.log 2>&1 spark_hs_start: commands: start_spark_hs_server: command: systemctl start spark-history-server spark_hs_test: commands: check-spark-hs-server: command: !Sub | curl --retry 60 --retry-delay 10 --retry-max-time 600 --retry-connrefused https://localhost:${HistoryServerPort} --insecure; /opt/aws/bin/cfn-signal -e $? "${WaitHandle}" WaitHandle: Type: AWS::CloudFormation::WaitConditionHandle WaitCondition: Type: AWS::CloudFormation::WaitCondition DependsOn: HistoryServerInstance Properties: Handle: !Ref WaitHandle Timeout: 1200 InstanceSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Enable HTTPS access VpcId: !Ref VpcId SecurityGroupIngress: - IpProtocol: tcp FromPort: !Ref HistoryServerPort ToPort: !Ref HistoryServerPort CidrIp: !Ref IpAddressRange HistoryServerRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: - ec2.amazonaws.com Action: - sts:AssumeRole Path: / Policies: - PolicyName: "root" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - kms:Decrypt Resource: "*" ManagedPolicyArns: - arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess - arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy HistoryServerInstanceProfile: Type: "AWS::IAM::InstanceProfile" Properties: Path: "/" Roles: - !Ref HistoryServerRole Outputs: SparkUiPublicUrl: Description: The Public URL of Spark UI Value: !Join - '' - - 'https://' - !GetAtt 'HistoryServerInstance.PublicDnsName' - ':' - !Ref HistoryServerPort SparkUiPrivateUrl: Description: The Private URL of Spark UI Value: !Join - '' - - 'https://' - !GetAtt 'HistoryServerInstance.PrivateDnsName' - ':' - !Ref HistoryServerPort CloudWatchLogsCfnInit: Description: CloudWatch Logs Console URL for cfn-init.log in History Server Instance Value: !Join - '' - - 'https://console.aws.amazon.com/cloudwatch/home?region=' - !Ref AWS::Region - '#logEventViewer:group=/aws-glue/sparkui_cfn/cfn-init.log;stream=' - !Ref HistoryServerInstance CloudWatchLogsSparkHistoryServer: Description: CloudWatch Logs Console URL for spark history server logs in History Server Instance Value: !Join - '' - - 'https://console.aws.amazon.com/cloudwatch/home?region=' - !Ref AWS::Region - '#logEventViewer:group=/aws-glue/sparkui_cfn/spark_history_server.log;stream=' - !Ref HistoryServerInstance