by Stanislas Nanchen

Restricting access to CloudFront distributions

Recently, we needed to restrict access to a static website stored on an S3 bucket and served by CloudFront. We decided to implement a simple authentication application using a Cognito user pool and a pair of AWS Lambda functions. We have called this application the CogCF Barrier (or simply barrier). The barrier is available as free software and can be found on GitHub at the repository forge-cogcf-barrier.

The Barrier assumes that two AWS resources are already in place:

The following figure shows the organisation of the barrier.

Architecture of the barrier

A CloudFront distribution ① receives requests from the internet and forwards them to some origins ② that can be either S3 buckets or publicly available servers. The first component of the barrier is the session checher ③, a AWS Lambda@edge function. It responds to viewer request CloudFront events and verifies that users accessing the web site are authenticated. An authenticated user sends a session cookie containing a session id corresponding to a session stored in a DynamoDB session table ④. The session checher ③ verifies that the user has valid session. If the user has a valid session, the request may proceeed to the origin; if the user has not a valid session, he is redirected to the second component of the barrier: the session manager ⑤. The session manager is a AWS Lambda Function behind a API Gateway and attached to the CloudFront distribution ① on the path prefix /_identity/*. The session manager uses a Cognito User Pool ⑥ to handle the login process. After a user has logged in, the session manager creates a session that is stored in the DynamoDB Session table ④ and sets a session cookie with the session id.

Note: An S3 Bucket can be made private and served over a CloudFront distribution². However, a custom origin must be publically available. In such cases, the custom origin must also check that the users are authenticated³; there is however no need to redirect to the login page, as the users are supposed to access the origin only over the CloudFront distribution.

We first have a closer look at the 2 Lambda functions of the barrier and then we will demonstrate its usage in a full example.

Lambda Functions

Session Manager

The session manager is a AWS Lambda function programmed in Python 3.7. It sits behind an API Gateway so that CloudFront can forward HTTP requests to it. As explained above, it forwards login requests to a cognito user pool for the login process; when a user successfully logs in, it creates a session for that user and stores it in a DynamoDB table. The session is returned to the user in form of a cookie containing a session id. Since we want to set cookies for the CloudFront domain, the session manager must be an origin of the CloudFront distribution. The session manager uses the path prefix /_identity/* and implements the following endpoints:

  1. _identity/login to create a short-lived login session and redirect the user to the login page of the Cognito user pool;
  2. _identity/auth to serve as redirect URI for the Cognito user pool and to create sessions for authenticated users.

The endpoint _identity/login is responsible for initiating a login with the Cognito User Pool. To use the login application provided by AWS, we simply have to redirect to the login endpoint of the Cognito User Pool. The URL for this redirect has the following form.

  https://<cognito domain>/login?response_type=code
     &client_id=<user pool client id>
     &redirect_uri=<auth endpoint>
     &state=<state>

The <cognito domain> is the domain configured for the Cognito User Pool, the <user pool client id> is a Cognito user pool client id and the <auth endpoint> corresponds to the endpoint _identity/auth. The session manager is created with CloudFormation and its CloudFormation template will also create a proper Cognito user pool client. The next code snippet is the CloudFormation resource for the user pool client. On the lines 4-5, we enable the so-called “authorisation code” flow for the login process and on the line 12, we configures the redirect URI. It must be the same as the one used in the login url above.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
  UserPoolClient:
    Type: AWS::Cognito::UserPoolClient
    Properties:
      AllowedOAuthFlows:
        - code
      AllowedOAuthFlowsUserPoolClient: true
      AllowedOAuthScopes:
        - openid
        - profile
      CallbackURLs:
        - !Sub "https://${CloudfrontDomainName}/_identity/auth"
      DefaultRedirectURI: !Sub "https://${CloudfrontDomainName}/_identity/auth"
      GenerateSecret: false
      LogoutURLs:
        - !Sub "https://${CloudfrontDomainName}/_identity/logout"
      SupportedIdentityProviders: !Ref SupportedIdentityProviders
      UserPoolId: !Ref UserPoolId
      ReadAttributes: !Ref ReadAttributes

Finally, in the login redirect, there is the <state> value that is used to prevent CSRF attacks. We use the state to encode a random secret and to encode the original path that the user has requested on the CloudFront distribution, so we can redirect the user on his requested URL after he has logged in. The secret is also stored in a short-lived login session: when the Cognito user pool calls back with the secret, the session manager checks the secret from the callback with the secret stored in the short-lived login session. The login process can finish only if both secrets are the same.

The relevant Python code for the login redirect looks like the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
    def redirect_to_login(self, user_requested_path):
        login_session_id = secrets.token_urlsafe(64)
        secret = secrets.token_urlsafe(64)
        self.ddb.put_item(
            TableName=self.session_table,
            Item={
                'session_id': {'S': login_session_id},
                'valid_until': {'N': str(time.time() + FIVE_MINUTES_IN_SECONDS)},
                'secret': {'S': secret}
            },
            ReturnValues='NONE'
        )

        state = urllib.parse.quote(json.dumps({
            'secret': secret,
            'path': user_requested_path
        }))
        
        return {
            'statusCode': 307,
            'body': None,
            'headers': {
                'Set-Cookie': f'{self.login_cookie_name}={login_session_id}; HttpOnly; Max-Age={FIVE_MINUTES_IN_SECONDS}; Path=/; Secure; SameSite=Lax',
                'Location': f'https://{self.user_pool_domain}/login?response_type=code&client_id={self.user_pool_client_id}&redirect_uri={self.redirect_uri}&state={state}'
            }
        }

On lines 2 to 12, we create a random secret and stores it in a short-lived login session on the session table. On lines 14 to 17, we encode the secret and the original requested path as a url-encoded JSON document to form the state query parameter to the login url. Finally, on lines 19 to 25, we redirect the user to the Cognito user pool login page while setting the login session cookie.

When a user successfully logs in via the Cognito user pool, the latter will redirect the user on the auth redirect URI with an authentification code and the original state that was sent with the login URL. The _identity/auth endpoint loads the login session and compare the callback secret with the session secret. If both secrets are identical, the endpoint then fetches user information using the authorisation code that is transfered by the cognito user pool as part of the callback. Finally, the new session can be created and the user is redirected to his original requests together with a new session cookie. The response is similar as above for the login session.

Session Checker

The session checker is a Lambda@edge function that intercepts CloudFront viewer request events for the origins with restricted access. Because it is a Lambda@edge function, the session checker must be installed in the us-east-1 region. It is also programmed in Python 3.7 as support for this runtime has been added recently.

As explained above, the session checker only let authenticated users access the restricted origins of the CloudFront distribution. To do that, the session checker retrieves the session id from the session cookie and fetch the session expiring date.

1
2
3
4
5
6
7
8
9
    @cached(cache=TTLCache(maxsize=1024, ttl=300))
    def fetch_session_valid_until(self, session_id):
        item = self.ddb.get_item(
            TableName=self.session_table,
            Key={'S': session_id},
            ConsistentRead=True,
            ProjectionExpression='valid_until'
        )
        return float(item['Item']['valid_until']['N'])

Note that we use a TTL cache from cachetools to store the expiring time of the sessions in memory. So we avoid query the DynamoDB on every requests. The cache has a short duration (5 minutes) to that manual invalidation of sessions propagates in at most 5 minutes.

If the session exists and is still valid, the request can proceed. If the session does not exists or is not valid, the session checker redirects the user to the _identity/login endpoint of session manager. The relevant code is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
    def redirect_to_login(self, request):
        querystring = request.get('querystring')
        query = f'?{querystring}' if querystring else ''
        path = urllib.parse.quote(f"{request.get('uri', '/')}{query}")
        return {'body': '',
                'bodyEncoding': 'text',
                'headers': {
                    'location': [{
                        'key': 'Location',
                        'value': f'{self.login_url}?path={path}'
                    }],
                    'set-cookie': [{
                        'key': 'Set-Cookie',
                        'value': f'{self.cookie_name}=invalid; HttpOnly; Max-Age=-1; Path=/; Secure; SameSite=Lax'
                    }],
                },
                'status': '307',
                'statusDescription': 'Temporary Redirect'
                }

You can see on lines 2-4 that we encode the current destination of the request; we want to pass it as parameter in the login redirect (lines 8-11) to present the user with the correct page after authentication has taken place. On lines 12-15, we make sure to delete the session cookie: the session is no more valid and the cookie might still be stored on the browser.

Full Example

We turn now to a full example: a simple static website served from a private S3 bucket. There are some preconditions if you want to follow the tutorial.

  1. You need to install the preview version (≥ 1.0.0.b3) of the poetry dependency manager for Python;
  2. You need Python 3.7;
  3. You need to set up CloudWatch logging for the AWS API Gateway in the us-east-1 region;
  4. You need to install the aws cli tool;
  5. You need to configure credentials locally on your machine in a named profile.

To simplify the example, we will create all resources in the us-east-1 region and we will use the S3 bucket for both website and artifacts for the Lambda functions.

First log in in the AWS Console, switch to the us-east-1 region and create an S3 Bucket with support for versioning. Then create a simple index.html file and upload it on the S3 Bucket under the key public/index.html.

1
2
3
4
5
6
7
8
9
<!DOCTYPE html>
<html>
<head>
	<title>Hello From CloudFront!</title>
</head>
<body>
Hello From CloudFront!
</body>
</html>

Your bucket should look like the following.

We can then create the cloudfront distribution to serve the file, without restriction at first. In the AWS Console, go to CloudFront and choose “Create Distribution”. We want to create a Web distribution. The creation wizard will ask us to configure a origin for the distribution.

As Origin Domain Name, you choose the S3 bucket that you created earlier. The path of the origin is /public; it is the root folder of the site on the S3 bucket. Since the S3 bucket itself is not public, you must create (or reuse) a Origin Access Identity. This will grant access on the S3 Bucket to the CloudFront distribution. Choose to update the bucket policy to let the wizard modify the bucket policy. Also, we want to always redirect http requests to https.

When you have filled the top part of the distribution creation wizard, scroll down the form and configure the default root object to be index.html

You can now create the distribution. It takes between 15 to 30 minutes to be created. On the list of distributions, you will find the domain name for the distribution. After a while, your new website will be ready at the URL https://<distribution domain name>.

Go look at the bucket policy of your S3 bucket and you will see that the CloudFront creation wizard has updated it to grant read permission to the Origin Access Identity that you created earlier.

We have a working CloudFront distribution, but it is publicly available and we want to restrict access with the barrier. To this end, we create a Cognito User Pool. In the AWS console, go to Cognito and choose “Manage User Pools”. Then choose “Create a user pool”, give your new user pool a name and create it without modifying any settings; the default settings are sufficient for our example. When the user pool has been created, note its pool id. As explained above, AWS provides a simple web application to actually handle user logins. To activate it, we must create a domain name for the user pool. To do that, choose “Domain name” under “App integration” on the left side menu.

We do not need our own domain name for this example. Choose a suitable prefix in the upper part of the wizard and save the change.

Now, we have all required resources to create an instance of the barrier application. To do that, clone the forge-cogcf-barrier and create a file test-assembly.yaml with the following content. Replace the placeholders with the name of your S3 bucket, your CLI named profile, the cognito user pool id and domain name prefix of your user pool.

StacksPrefix: TestBarrier
CookieName: TestBarrier
Environment:
  AWSProfile: <<>>
  Region: us-east-1
Packaging:
  S3Buckets:
    us-east-1: <<>>
  S3Prefix: TestBarrier
Cloudfront:
  DomainName: <<>>
CognitoUserPool:
  UserPoolId: <<>>
  DomainName: https://<<>>.auth.us-east-1.amazoncognito.com
  UserPoolClientProperties:
    ReadAttributes:
      - email
    WriteAttributes:
      - email
    SupportedIdentityProviders:
      - COGNITO

On the terminal, inside the root folder of the barrier, run the command poetry run -- deploy -f test-assembly.yaml to deploy the barrier. The Python script found in the folder forge_cognitobarrier will then package the code of the two Lambda functions, upload it on your S3 bucket and create two CloudFormation stacks. When this installation is finished, your terminal will print a table with the following information.

Session Manager Origin Host XXXXXXX.execute-api.us-east-1.amazonaws.com
Session Manager Origin Path /prod
Session Checker Function Arn arn:aws:lambda:us-east-1:999999999999:function:XXXXX:1

The Session Manager Origin Host is the domain name of the Api Gateway that have been created to forward http requests to the session manager; the Viewer Request Function Arn is the full ARN of the session checker Lamdba@edge function.

On the AWS console, switch to Cloudformation and you will see the two CloudFormation stacks.

With the information from the barrier installation, we can now modify our CloudFront distribution to first authenticate users before serving the content. On the AWS console, go back to CloudFront and choose your new distribution to edit it. We need to do two changes: first redirect requests with the prefix /_identity/* to the session manager and then have Cloudfront use the session checker for requests that would be forwarded to the S3 bucket.

For the session manager, go to the “Origins and Origin Groups” and choose “Create Origin” and fill the creation wizard with the Session Manager Origin Host and Path obtained after the deployment of the barrier. Importanlty, choose “HTTPS Only” as origin protocol policy.

Creating an origin is not enough to have CloudFront forward requests; is is necessary to create a so-called behaviour that will link a path pattern to an origin. Under the tab “Behaviors”, choose “Create Behavior”. Fill the upper part of the form with the path pattern /_identity/* and with the origin you just created. Also, we want to redirect all http requests to https.

By default, A CloudFront behavior is configured to cache the response to requests. However, for the session manager, we do not want to cache anything. We can control the caching and request forwarding in the lower part of the behaviour creation form. We want to put 0 in all the TTL fields and forward all cookies and all query string and can create the behavior for the session manager origin.

Now, the session manager has been integrated into the CloudFront distribution. There is one last step to do: the modification of the default behaviour to use the session checker on all the user requests. To do that edit the Default behaviour and add a Lambda function association of type wiewer request with the full ARN for the session checker Lambda function.

After saving the modification of the default behavior, you must wait from 15 to 30min for the changes to propagate on CloudFront. When this is done, trying to access the CloudFront distribution on a browser should redirect you to a login page similar to the following.

By default, the Cognito User Pool allows new users to sign in; you can now create a user and log in. In a real usage of the barrier, you will want to configure your Cognito user pool to prevent users to signin or to restrict which users can sign in.

This concludes the tutorial. To delete the resources created during the tutorial, do the following:

  1. Delete the CloudFront distribution (in CloudFront on the AWS console)
  2. Delete the Origin Access Identity (in CloudFront on the AWS console)
  3. Delete the two CloudFormation stacks (in CloudFormation on the AWS console)
  4. Delete the S3 Bucket (in S3 on the AWS console)
  5. Delete the Cognito User Pool (in Cognito on the AWS console)

¹It is possible to create the CloudFront distribution after installing the barrier. However, the barrier needs to know the domain name of the CloudFront distribution. In this case, the CloudFront must be created with a custom domain name and a corresponding ACM Certificate.

²It is done via an Origin Access Identity.

³The session checker propagates the cookie and adds a custom header X-Barrier-Session-Id with the session id.

One can only set cookies for the current domain or its subdomains.

You can invalidate sessions by deleting the corresponding item from the DynamoDB session table.

You might have to install a Python Version Manager like pyenv.

The domain name space is shared among all User Pools in a certain region, you can use the “Check availibility” function to see if the domain prefix that you choose is still available.