Generative AI with AWS

Generative AI with AWS

Intro

GenAI at AWS is powered by foundation models, pre-trained on vast amounts of unstructured data which can be applied to a wide range of contexts make its easy to build a custom layer.
 
Use cases:
  • Enhanced customer experience like chatbots, and analytics
  • Boosted productivity and creativity like better search, summarisation and code generation
  • Optimised business processes, such as document processing and cyber security
 
There are 3 primary ways ways to setup generative AI
  1. Use a normal existing model and provide it with context, easiest to setup but less specific to your domain
  1. Fine tune a foundational model with your own curated labeled data
  1. Train your own LLM with specialised data (Most expensive)
The following services are available for each category mentioned
 
AWS Q
  • Chatbot that can connect to whatever data source and easily integrated with
  • Works same as chat-GPT/Gemini
  • Provide some context and hope for the best
 
AWS Bedrock
  • Choose a Foundation Model from various providers
  • Customise per organization
  • Customer data isn’t used to train the model, it remains with your VPC
  • Use Guardrails for filtering, PII removal and censoring
 
AWS Sagemaker
  • Infrastructure layer
  • Allows developers to build and train their own language models using AWS hardware
  • Different classes for different needs, e.g tiered powers of GPUs
 
One example of a targeted trained model is:
AWS CodeWhisperer
  • Code suggestion and implementation
  • Works same as copilot
  • Code isn’t used to train any future models
 
Various FMs we can use, current best (as of April 2024) is Claude 3 which is multi-modal meaning it can of course read and respond, describe an image and create visuals.
Do note, you don’t always need to use the most state of the art models, ensure you balance cost with accuracy.
 

Agents

Completes tasks like a person, can write, click and consume APIs in a step by step format.
 

Pricing

On-Demand - Pay per token with requests-per-minute and tokens-per-minute limits enforced
Provisioned - Fixed cost with high throughput over a period you define at an hourly rate
 

Demonstation Repos

 
 

Retrieval Augmented Generation Setup Workflow

Step 1. Vector Embed in Database
Vector embeddings are the foundation for an LLM's grasp of language. They act as a numerical representation of meaning, enabling LLMs to process information efficiently and understand the relationships between words. This understanding is what allows LLMs to perform various tasks like text generation, translation, and question answering.
notion image
A vector database is also required for the following workflow, depending on workload. AWS of course offers services that integrate with their other offerings, such as Vector Search as apart of MemoryDB inside Redis. Below is an architecture diagram where such a tool is utilised:
notion image
 
Step 2. Define Chunking Strategy
There are various techniques for chunking, often relying on the LLM's existing knowledge of parts-of-speech (POS) tags. These tags are assigned during the training process and indicate the grammatical role of each word (noun, verb, adjective, etc.). By analyzing sequences of POS tags, the LLM can identify likely chunk boundaries.
Common strategies involve fixed size, sentence splitting, recursive and specialised.
Fixed size is most common is breaking data into x amount of tokens, for example if you break down a large document. Note, normally you ensure overlap between chunks to help minimise context being lost, this is called ‘Content aware chunking’.
You don’t do this once, in-fact you iterate with different chunk sizes by feeding differently processed data and evaluating it’s performance.
Example workflow using AWS:
notion image
 
Step 3. Algorithm Neural Network
2 primary methods of vector search algorithms that quantify the similarity between the data.
FLAT (Search without index) ideal for small data, its precise and simple. HNSW, ideal for larger sets that builds layers of ‘graph neighbourhoods’ and is relatively quick
Artificial Neural Networks (ANNs) encompass a broad range of algorithms inspired by the structure and function of the human brain. Here are some popular ANN algorithms categorized by their learning approach:
Supervised Learning Algorithms:
  • Gradient Descent: This is a foundational optimization algorithm used to train many ANNs. It iteratively adjusts the weights within the network to minimize the error between the network's predictions and the desired outputs based on training data.
  • Backpropagation: This is a specific type of gradient descent commonly used in multi-layer ANNs. It propagates the error signal backward through the network, allowing adjustments to be made to the weights of all layers.
Unsupervised Learning Algorithms:
  • K-Means Clustering: This algorithm groups similar data points together without labeled data. It iteratively assigns data points to clusters based on their similarity and refines the cluster centroids (central points) until convergence.
Deep Learning Algorithms (a sub-field of ANNs):
  • Convolutional Neural Networks (CNNs): These excel at image recognition and analysis. They use convolutional layers to extract features from images and pooling layers to reduce dimensionality.
  • Recurrent Neural Networks (RNNs): These are adept at handling sequential data like text or time series. They have loops that allow them to process information from previous steps and incorporate it into their current output.
    • Long Short-Term Memory (LSTM): A specific type of RNN that addresses the vanishing gradient problem, allowing RNNs to learn long-term dependencies in data.
 
 

AWS Bedrock Tutorial

Assuming you already have an AWS account, your first step to getting started with Amazon Bedrock is by entering the dedicated AWS console page where you will need to navigate to the Model Access page to select any models you might want to use. Model enablement will take a few minutes after you confirm your selections.
notion image
When enabled, head over to the Chat Playground to have your first interactions with the models. Select a model category and model to get started.
notion image
Enter a sample prompt and hit Run and await the results of your first Bedrock query. If you have streaming enabled your results will appear in real time as they are generated.
Experiment with the various controls whilst running the same prompt to see how different settings can affect your output. Temperature, Top P, and Top K give you control over the length and tokens used of the response, as well as the randomness and diversity. Each base model has different controls but all work in roughly the same way.
notion image

Interacting with the Bedrock API

Once you’ve got to grips with the Playground you will soon want to move onto interacting with the Bedrock API. This allows you to integrate Bedrock services into your own applications. For the purpose of this guide, we will:
  • Use boto3 to connect to AWS services
  • Invoke the Bedrock API
  • List all Bedrock models available
  • Interact with Claude using the API
Before progressing, ensure your python environment is set up with the AWS Python SDK and the latest version of boto3.
Making your first API call to Amazon Bedrock is incredibly straightforward. We will import boto3 and then call the list_foundation_models() function to get the latest list of models available to us.
import boto3 bedrock = boto3.client( service_name=’bedrock’, region_name=’us-east-1' ) bedrock.list_foundation_models()
Run your python code and if successful you should see an output listing all the available models similar to the image below.
notion image
Moving onto running an inference to get a result back from Claude. We need to expand the code to utilise the service name ‘bedrock-runtime’ along with the ‘invoke_model’ commmand.
import boto3 import json bedrock = boto3.client(service_name='bedrock-runtime', region_name='us-east-1') modelId = 'anthropic.claude-v2' accept = 'application/json' contentType = 'application/json' body = json.dumps({ "prompt": "Human:This is a test prompt. Assistant:", "max_tokens_to_sample": 300, "temperature": 0.1, "top_p": 0.9, }) response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType) response_body = json.loads(response.get('body').read()) print(response_body.get('completion'))
Here we have expanded the code and told Bedrock that we want to use Clause v2 and want a response in JSON format. Claude needs to take its prompt in the format ‘Human: <prompt> ‘Assistant:’ and will not accept anything else. You can play around with the max tokens, temperature and top_p to vary the results.
My initial results returned as:
Hello! I’m Claude, an AI assistant created by Anthropic. I don’t actually have experiences or preferences, since I’m an AI without subjective experiences. I’m happy to chat, but I don’t have personal stories or opinions to share.

Guides and practical notes, training references, and code snippets shared freely for learning and career growth.