GenAI at AWS is powered by foundation models, pre-trained on vast amounts of unstructured data which can be applied to a wide range of contexts make its easy to build a custom layer.
Use cases:
Enhanced customer experience like chatbots, and analytics
Boosted productivity and creativity like better search, summarisation and code generation
Optimised business processes, such as document processing and cyber security
There are 3 primary ways ways to setup generative AI
Use a normal existing model and provide it with context, easiest to setup but less specific to your domain
Fine tune a foundational model with your own curated labeled data
Train your own LLM with specialised data (Most expensive)
The following services are available for each category mentioned
AWS Q
Chatbot that can connect to whatever data source and easily integrated with
Works same as chat-GPT/Gemini
Provide some context and hope for the best
AWS Bedrock
Choose a Foundation Model from various providers
Customise per organization
Customer data isn’t used to train the model, it remains with your VPC
Use Guardrails for filtering, PII removal and censoring
AWS Sagemaker
Infrastructure layer
Allows developers to build and train their own language models using AWS hardware
Different classes for different needs, e.g tiered powers of GPUs
One example of a targeted trained model is:
AWS CodeWhisperer
Code suggestion and implementation
Works same as copilot
Code isn’t used to train any future models
Various FMs we can use, current best (as of April 2024) is Claude 3 which is multi-modal meaning it can of course read and respond, describe an image and create visuals.
Do note, you don’t always need to use the most state of the art models, ensure you balance cost with accuracy.
Agents
Completes tasks like a person, can write, click and consume APIs in a step by step format.
Pricing
On-Demand - Pay per token with requests-per-minute and tokens-per-minute limits enforced
Provisioned - Fixed cost with high throughput over a period you define at an hourly rate
Vector embeddings are the foundation for an LLM's grasp of language. They act as a numerical representation of meaning, enabling LLMs to process information efficiently and understand the relationships between words. This understanding is what allows LLMs to perform various tasks like text generation, translation, and question answering.
A vector database is also required for the following workflow, depending on workload. AWS of course offers services that integrate with their other offerings, such as Vector Search as apart of MemoryDB inside Redis. Below is an architecture diagram where such a tool is utilised:
Step 2. Define Chunking Strategy
There are various techniques for chunking, often relying on the LLM's existing knowledge of parts-of-speech (POS) tags. These tags are assigned during the training process and indicate the grammatical role of each word (noun, verb, adjective, etc.). By analyzing sequences of POS tags, the LLM can identify likely chunk boundaries.
Common strategies involve fixed size, sentence splitting, recursive and specialised.
Fixed size is most common is breaking data into x amount of tokens, for example if you break down a large document. Note, normally you ensure overlap between chunks to help minimise context being lost, this is called ‘Content aware chunking’.
You don’t do this once, in-fact you iterate with different chunk sizes by feeding differently processed data and evaluating it’s performance.
Example workflow using AWS:
Step 3. Algorithm Neural Network
2 primary methods of vector search algorithms that quantify the similarity between the data.
FLAT (Search without index) ideal for small data, its precise and simple. HNSW, ideal for larger sets that builds layers of ‘graph neighbourhoods’ and is relatively quick
Artificial Neural Networks (ANNs) encompass a broad range of algorithms inspired by the structure and function of the human brain. Here are some popular ANN algorithms categorized by their learning approach:
Supervised Learning Algorithms:
Gradient Descent: This is a foundational optimization algorithm used to train many ANNs. It iteratively adjusts the weights within the network to minimize the error between the network's predictions and the desired outputs based on training data.
Backpropagation: This is a specific type of gradient descent commonly used in multi-layer ANNs. It propagates the error signal backward through the network, allowing adjustments to be made to the weights of all layers.
Unsupervised Learning Algorithms:
K-Means Clustering: This algorithm groups similar data points together without labeled data. It iteratively assigns data points to clusters based on their similarity and refines the cluster centroids (central points) until convergence.
Deep Learning Algorithms (a sub-field of ANNs):
Convolutional Neural Networks (CNNs): These excel at image recognition and analysis. They use convolutional layers to extract features from images and pooling layers to reduce dimensionality.
Recurrent Neural Networks (RNNs): These are adept at handling sequential data like text or time series. They have loops that allow them to process information from previous steps and incorporate it into their current output.
Long Short-Term Memory (LSTM): A specific type of RNN that addresses the vanishing gradient problem, allowing RNNs to learn long-term dependencies in data.
AWS Bedrock Tutorial
Assuming you already have an AWS account, your first step to getting started with Amazon Bedrock is by entering the dedicated AWS console page where you will need to navigate to the Model Access page to select any models you might want to use. Model enablement will take a few minutes after you confirm your selections.
When enabled, head over to the Chat Playground to have your first interactions with the models. Select a model category and model to get started.
Enter a sample prompt and hit Run and await the results of your first Bedrock query. If you have streaming enabled your results will appear in real time as they are generated.
Experiment with the various controls whilst running the same prompt to see how different settings can affect your output. Temperature, Top P, and Top K give you control over the length and tokens used of the response, as well as the randomness and diversity. Each base model has different controls but all work in roughly the same way.
Interacting with the Bedrock API
Once you’ve got to grips with the Playground you will soon want to move onto interacting with the Bedrock API. This allows you to integrate Bedrock services into your own applications. For the purpose of this guide, we will:
Use boto3 to connect to AWS services
Invoke the Bedrock API
List all Bedrock models available
Interact with Claude using the API
Before progressing, ensure your python environment is set up with the AWS Python SDK and the latest version of boto3.
Making your first API call to Amazon Bedrock is incredibly straightforward. We will import boto3 and then call the list_foundation_models() function to get the latest list of models available to us.
Run your python code and if successful you should see an output listing all the available models similar to the image below.
Moving onto running an inference to get a result back from Claude. We need to expand the code to utilise the service name ‘bedrock-runtime’ along with the ‘invoke_model’ commmand.
Here we have expanded the code and told Bedrock that we want to use Clause v2 and want a response in JSON format. Claude needs to take its prompt in the format ‘Human: <prompt> ‘Assistant:’ and will not accept anything else. You can play around with the max tokens, temperature and top_p to vary the results.
My initial results returned as:
Hello! I’m Claude, an AI assistant created by Anthropic. I don’t actually have experiences or preferences, since I’m an AI without subjective experiences. I’m happy to chat, but I don’t have personal stories or opinions to share.
Guides and practical notes, training references, and code snippets shared freely for learning and career growth.