On June, 16th Amazon Web Services released the long-awaited feature of AWS Elastic File System (EFS) support for their AWS Lambda service. It is a huge leap forward in serverless computing, enabling a whole set of new use cases, and could have a massive impact on our AI infrastructure, thus reducing machine learning inference costs down to pennies for a wide number of applications.
The most exciting feature of Elastic File System is its capability to be mounted both on EC2 virtual machines, Fargate containers, and AWS Lambda. This feature is not anything new on the EFS domain. It has been primarily used by many application sharing stored data, helping customers evolve their applications towards stateless services: a couple of EC2 or containers could save vast amounts of data on an EFS volume and share them with producers and consumers. It avoids the complexity (and latencies) of storing objects on S3, then downloading them every time they are needed. Moreover, EFS throughput can be configured up to 1024MB/s speed, thus making it work seamlessly as a local file system.
Since AWS SageMaker involves EC2 to run Jupyter Notebooks, mounting shared storage between different instances is quite easy to set up and can avoid a lot of pain when managing huge deep learning models with millions of parameters and size of hundreds of megabytes.
The case for Image Memorability
Image analysis, classification, object detection, and picture rating through machine learning were applications unimaginable only a few years ago. Now they are commoditized services offered by any cloud provider, such as Amazon Rekognition, through an API interface that can easily be invoked with no need to understand the complexities of how a neural network works
Nevertheless, some specific image analysis tasks are far from being solved by general standardized services and require a context-specific approach, which means building a customized neural network to process images and extract meaningful insights. One of the most exciting services within Neosperience Cloud is related to image scoring on metrics that can be useful to marketers to decide whether or not it could be used in a campaign. The wrong image, or merely a picture not making the viewer focus on the message, can make your company waste a lot of money, lowering the return on investment to a poor outcome. Focus groups have been used since forever with the constraint of testing a campaign on a tiny number of potential users. Standing on the shoulders of giants such as Netflix, techniques such as A/B testing tried to do their best to find the most suitable image for the vast majority of users.
Following our experience in the last decade of digital marketing, we understood that all these methods are flawed because they do not consider the emotional impact of an image, which is deeply connected to its success. To fill this gap for our customers and provide a set of tools that support empathic analysis of content, we started building our Neosperience Image Services leveraging serverless technologies offered by AWS. One of the most interesting is our Image Memorability scoring system. Starting from a preliminary work published by MIT in 2015, we moved forward building a memorability dataset, but with a focus on images usually involved in marketing campaigns and product showcases to increase the model accuracy. After that, we used Amazon SageMaker to train an ensemble of neural networks in detecting a memorability score for a given image and extract the features of the picture contributing to that score.
Service architecture before EFS support
The overall service architecture is outlined in the image below. A simplified version of this architecture is discussed in an episode of This is My Architecture (in Italian)
In this article, we focus on the part of the architecture related to when a trained model is uploaded from SageMaker to Amazon S3, then served either to on-premise customer servers or SageMaker endpoints, to support a variety of use cases:
Upgrading inference to serverless
Thanks to EFS support for AWS Lambda, it is now possible to remove the need for Amazon Sagemaker endpoints and shift the load to a Lambda function, triggered by an image saved to Amazon S3. It is a huge leap forward for many reasons:
- removing the Amazon EC2 instances from the architecture resulted in lower costs by one order of magnitude
- AWS Lambda scalability is more granular than Amazon EC2 Autoscaling groups and requires less time to scale up/down, with an impact on service response time
- using S3 triggers to Lambda function, we can remove the need to poll an SQS queue containing processing jobs
- since there are no more EC2 instances to manage in cloud, there is no need for an instance pool DynamoDB Table anymore (used to coordinate different workers on the same data)
The final architecture is much more simplified, using fewer AWS resources to accomplish the same task
About performances and costs
The usage pattern for Image Memorability often occurs when a marketer uploads a bunch of pictures from her laptop, preparing the next campaign. With an average of 15–20 images for each upload, our service incurs in peaks related to different users uploading sets of different photos during the same timeframe (i.e., working hours). For each batch, a marketer identifies some of the most impactful images, then computes memorability heatmaps upon them to ensure the remembered details are what she wants.
To serve our customers with a responsive platform, we dedicated an inference worker to everyone. Such choice is necessary to avoid spinning up an EC2 instance every time a batch is loaded, resulting in poor user experience. The drawback is in an EC2 autoscaling group dedicated to a given customer, with a minimum number of one p2.xlarge instance per customer within their cluster.
Adopting AWS Lambda with EFS support, we were able to shift all the load related to image scoring (which is the most common operation done by customers) to serverless while maintaining a shared cluster of p2.xlarge workers to compute heatmaps when required.
Such improvement made our score computing time grow from 5 seconds to 10/12 seconds per image. Still, it allowed us to lower the overall batch execution time to less than 20 seconds, thanks to lambda parallelization: a noticeable improvement compared to a batch processing time of more than 90 seconds achieved by the previous service version.
On the cost side, we were able to remove the dedicated p2.xlarge autoscaling group, which is used only for heatmaps, in favor of a shared group: this results in a considerable saving on the overall customer base and better resource optimization.
Where to go from here?
AWS Lambda support for Elastic File System is a game-changer improvement in serverless computing, enabling to overcome Lambda storage size limit of 512MB and S3 transfers bottlenecks to the function runtime. ML models are often quite heavy (especially if they work on images), good advice is always to load the model into VM memory. It can be done initializing the Neural Network object in your deep learning framework, and make this alive within Lambda cache for the life of the container. Such a simple trick could speed up further inferences, even more, avoiding the wait for 100+MB transfer from the file system to the memory.
At this time, we migrated only Image Memorability services related to image scoring. However, Memorability Heatmaps still need GPUs to backpropagate score through the network and extract features, which unfortunately is not yet available on AWS Lambda instances.
Image Memorability was migrated to this architecture in just a few days because configuring a new mount point for EFS is an easy task if your functions are already within a Virtual Private Cloud (VPC). Computing cost reduction by one order of magnitude means we’re able now to consider some improvements to our service. Different prediction models could be used for the same image, sending back to users different scores at the same time together with their performances to let them choose which suits best to their use case.
Neosperience Image Memorability service is free to try and more information is available on our product page to help marketing teams maximize their campaign ROI.
To get started with AWS Lambda and EFS support please refer to this great blog post by Danilo Poccia, while details can be found in the documentation. Starting with AWS EFS and Lambda is quite easy while shifting to modern architectures to leverage serverless and microservices requires support by an experienced team. Feel free to reach out to our consultancy team at Mikamai to get help building your next cloud-native solution on AWS.
My name is Luca Bianchi. I am Chief Technology Officer at Neosperience and, the author of Serverless Design Patterns and Best Practices. I have built software architectures for production workload at scale on AWS for nearly a decade.
Neosperience Cloud is the one-stop SaaS solution for brands aiming to bring Empathy in Technology, leveraging innovation in machine learning to provide support for 1:1 customer experiences.