🚀AWS Certified ML Specialty

List of concepts required without any exaggeration :)

Guys, I appeared for the AWS Certified Machine Learning ~ Specialty exam yesterday and just got the results.

Here is a list of concepts that are required and that actually have been asked into the exam without any exaggeration :)

--DATA SCIENCE --

Descriptive statistics: correlation, summary statistics, p-value

Probability Distributions: Binomial, Poisson, Bernoulli, Gaussian, Direchlet, Gamma etc.)

Scaling and Normalization: Min-Max Scaling, Standardization (Z-score normalization), Logarithmic scaling

Dimensionality reduction: PCA, LDA (Linear Discriminant Analysis), t-SNE (t-distributed Stochastic Neighbor Embedding)

Feature Selection: Univariate feature selection (e.g., chi-squared, ANOVA), Recursive feature elimination, Feature importance from tree-based models)

Time Series Feature Engineering: Lag features (time-shifted values), Rolling statistics (e.g., moving averages), Seasonal decomposition

Text Feature Engineering: Bag of Words, TF-IDF (Term Frequency-Inverse Document Frequency), Word embeddings (e.g., Word2Vec, Obj2Vec, GloVe)

Feature Cross-Validation: k-fold, stratified k-fold

Text Preprocessing: Tokenization, Stop-word removal, Stemming, Lemmatization

Imputation: MICE (Multiple Imputation by Chained Equations)

Encoding Categorical Data: One-Hot Encoding, Label Encoding, Target Encoding

Visualization: graphs (scatter plots, time series, histograms, box

plot, bar plot)

--MACHINE LEARNING--

Modelling techniques: classification, regression, forecasting, clustering, recommendation

Models (Supervised): XGBoost, Regression, KNN, SVM, Decision Trees, Random Forest, Ensemble Learning, Neural Networks (RNN,CNN,LSTM), Multi-Layered Perceptron, BERT

Models (Unsupervised): Clustering techniques (K-Means)

Transfer Learning

Model Evaluation: RMSE, MAE, MAPE, Confusion Matrix, AUC-ROC, detecting and handling bias & variance, A/B testing

Hyperparameter optimization: Regularization (L1,L2), Drop out, Early stopping, Understanding neural network architecture (number of layers and nodes), learning rate, activation functions (Sigmoid, Softmax, ReLU, TanH etc.)

Deployment and operationalization of ML solutions: Exposing endpoints and interacting with them, Retraining pipelines, Debugging and troubleshooting ML models, Detecting and mitigating drops in model performance. Monitoring model performance

--AWS Services--

Create data repositories for ML:

 Identify data sources (for example, content and location, primary sources such as user data).

 Determine storage mediums (for example, databases, Amazon S3, Amazon Elastic File System [Amazon EFS], Amazon Elastic Block Store [Amazon EBS])

Identify and implement a data ingestion and transformation solution:

 Identify data job styles and job types (for example, batch load, streaming)

 Transform data in transit (ETL, AWS Glue, Amazon EMR, AWS Batch).

 Handle ML-specific data by using MapReduce (for example, Apache Hadoop, Apache Spark, Apache Hive).

 Orchestrate data ingestion and transformation pipelines (batch-based ML workloads and streaming-based ML workloads) with the help of:

o Amazon Kinesis

o Amazon Kinesis Data Firehose

o Amazon EMR

o AWS Glue

o Amazon Managed Service for Apache Flink

o Data Wrangler

 Schedule jobs

Recommending and implementing the appropriate ML services and features for a given problem:

 Amazon Sagemaker for custom development

 Amazon Comprehend

 AWS Deep Learning AMIs (DLAMI)

 AWS DeepLens

 Amazon Forecast

 Amazon Fraud Detector

 Amazon Lex

 Amazon Mechanical Turk

 Amazon Polly

 Amazon Rekognition

 Amazon SageMaker

 Amazon Textract

 Amazon Transcribe

 Amazon Translate

Compute:

 AWS Batch

 Amazon EC2

 AWS Lambda

Containers:

 Amazon Elastic Container Registry (Amazon ECR)

 Amazon Elastic Container Service (Amazon ECS)

 Amazon Elastic Kubernetes Service (Amazon EKS)

 AWS Fargate

Database:

 Amazon Redshift

Internet of Things:

 AWS IoT Greengrass

Management and Governance:

 AWS CloudTrail

 Amazon CloudWatch

Networking and Content Delivery:

 Amazon VPC

Security, Identity, and Compliance:

 AWS Identity and Access Management (IAM)

Storage:

 Amazon Elastic Block Store (Amazon EBS)

 Amazon Elastic File System (Amazon EFS)

 Amazon FSx

 Amazon S3

The questions are usually designed in a business architecture manner where many of these services are used together to demonstrate the problem statement and the required solution.

It is best to have working knowledge of MLOps for production - how Machine Learning solutions work in production (in AWS context as well of course) before appearing for the exam.

Here is my honest experience & thoughts-

To be very honest, I was planning to reschedule my exam since I have been caught up with work and couldn't put much time in the prep. But it totally slipped my mind to reschedule! 😂

So, I had less than 24 hours to prepare.

But, on the other hand, I had faith in my understanding of Machine Learning and my experience with AWS ML so far.

So I pulled an all nighter to do a revision of the pre-requisites and it was enough!

I started thinking if my result is a result of just one all-nighter. And the answer is NO, of course not. I have been rigorously studying ML since around second semester of my bachelor's. Not considering the time I invested in college exams, assignments, interview preps, college fun (obvi) etc., I have been following and studying ML for around at least 2.5 to 3 years.

All that time invested and rigorous reading and practicing has now enabled me to be able to attempt such specialization exams without investing too much time prepping for it.

I am thankful it is this way since I would rather prefer applying my knowledge (like I did for this one) than just mugging up answers for a specific exam and moving on without learning anything.

Conclusion: Certifications are cool but knowledge retention is the coolest!

Last updated