Like other notes on this site, this note contains only a few noteworthy points of the topic.
π My Github Repo for this note: dinhanhthi/google-vertex-ai
π All services needed for Data Science on Google Cloud.
Good to know
- You should choose the same location/region for all services (google project, notebook instances,...). π Check this section.
- Troubleshooting.
- Access Cloud Storage buckets.
- Google Cloud Pricing Calculator
gcloud ai
references (for Vertex AI)- ALways use Logging service to track the problems.
- When making models, especially for serving on prod, don't forget to use
logging
services. - When creating a new notebook instance, consider to choose a larger size for "boot disk" (100GB is not enough as it is).
- If you run the gcp command lines in workbench, you don't have to give the credential for the connecting to gcp. It's automatically passed.
Tutorials & references
- What is Vertex AI? -- Official video.
- Google Cloud Vertex AI Samples -- Official github repository.
- Vertex AI Documentation AIO: Samples - References -- Guides.
Notebooks (Workbench)
If you are going to create images with docker
inside the virtual machine, you should choose more boot disk space (default = 100GB but you should choose more than that). In case you wanna change the size of disk, you can go to Compute Engine / Disks[ref].
Remember to shutdown the notebook if you don't use it!!
Workbench notebook vs Colab
π Note: Google Colab.
Workbench notebook | Colab | |
---|---|---|
Free | No | Yes (but limit) |
Persistent storage | Yes | No |
Easy to share | No | Yes |
Idle time | No (user-managed) & Yes (Managed) | Yes (1h in free version) |
"Managed notebook" vs "User-managed notebook"
π Official doc. Below are some notable points.
Managed notebook | User-managed notebook | |
---|---|---|
SSH | No (but we can π) | Yes |
sudo access | No | Yes |
Idle shutdown | Yes | No (remember to shutdown when not using) |
Flexibility | No | Yes |
Schedule notebook run | Yes | No |
Connect with Compute Engine | No | Yes |
Health status monitoring | No | Yes |
Use third-party jupyter lab extension | No | Yes |
Apply a custom script after creating new instance | No | Yes |
Use a custom docker image (for a custom kernel in jupyter notebook, alongside with prebuilt things) | Yes | No |
Can edit type of machine + GPU | Yes | Yes |
Can edit storage | No | Yes |
gcloud
CLI
π Official references.
# Start instance
gcloud compute instances start thi-managed-notebook --zone=europe-west1-d
# Stop instance
gcloud compute instances stop thi-managed-notebook --zone=europe-west1-d
Sync with Github using gh
CLI
Inside the notebook, open Terminal tab. Then install the Github CLI (ref),
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" \
| sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null
sudo apt update
sudo apt install gh
Login to gh,
gh auth login
Then following the guides.
Open Juputer notebook on your local machine
The JupyterLab is running on Vertex Notebook at port 8080
. You have 2 options to open it on your local machine:
gcloud compute ssh \
--project <project-id> \
--zone <zone> <instance-name> \
-- \
-L 8081:localhost:8080
Then open http://localhost:8081
ngrok
First, you have to sign up an account on ngrok, without this step, you cannot open HTML pages.
Open Terminal in the Vertex machine and then install ngrok
. Here, I use snap
,
sudo apt update
sudo apt install snapd
sudo snap install ngrok
# Check
ngrok --version
If it's not found, you can find it in /snap/ngrok/current
. Add this line to .bashrc
or .zshrc
,
export PATH="/snap/ngrok/current:$PATH"
Then source ~/.bashrc
or source ~/.zshrc
to make changes.
Log in to your ngrok account, go to your AuthToken page and copy the token here. Back to the terminal on Vertex machine,
ngrok authtoken <token>
Then,
ngrok http 8080
It returns something like,
Account Anh-Thi Dinh (Plan: Free)
Version 2.3.40
Region United States (us)
Web Interface http://127.0.0.1:4040
Forwarding http://4c1f-34-79-165-21.ngrok.io -> http://localhost:8080
Forwarding https://4c1f-34-79-165-21.ngrok.io -> http://localhost:8080
Go to http://4c1f-34-79-165-21.ngrok.io
and see the result!
SSH to User-managed notebook
You have to use User-managed notebook! Managed notebook doesn't allow you to use SSH (officially). If you wanna connect via SSH for managed notebook, read next section.
First, connect using gcloud
π Note: Google Cloud CLI.
π Note: SSH
gcloud
command + SSH port forwardingπ Official doc.
gcloud compute ssh --project <project-id> --zone <zone> <instance-name> -- -L 8081:localhost:8080
- You can find all information of
<thing>
by clicking the notebook name in Workbench.8081
is the port on your machine and8080
is the port on vertex. - For
<instance-name>
, you can also use instance id (which can be found in Compute Engine > VM instances) - For the popup "Build Recommended" in JupyterLab, you can run
jupyter lab build
.
ssh
(and also on VScode)You can follow the official instructions. For me, they're complicated. I use another way.
Make sur you've created a ssh keys on your local machine, eg. /Users/thi/.ssh/id_rsa.ideta.pub
is mine.
# Show the public keys
cat /Users/thi/.ssh/id_rsa.ideta.pub
# Then copy it
On the vertex notabook instance (you can use gcloud
method to access or just open the notebook on browser and then open Terminal).
# Make sure you are "jupyter" user
whoami # returns "jupyter"
# If not
su - jupyter
# If it asks your password, check next section in this note.
# Create and open /home/jupyter/.ssh/authorized_keys
nano /home/jupyter/.ssh/authorized_keys
# Then paste the public key you copied in previous step here
# Ctrl + X > Y > Enter to save!
On your local machine,
ssh -i /Users/thi/.ssh/id_rsa.ideta jupyter@<ip_of_notebook>
The ip of the instance will change each time you reset the instance. Go to the Compute Engine section to check the up-to-date ip address.
You are good. On VScode, you make the same things with the extension Remote - SSH.
- After running above command, you enter the instance's container (with your username, eg. when you run
whoami
, it will bethi
) and you can also openhttp://localhost:8081
for jupyer notebook on your local machine. To stop, typeexit
and also cmd + C. - The user (and folder) on which the notebook is running is
jupyter
(you can check/home/jupyter/
). You can usesudo - jupyter
to change to this user.
For example, default user after I connect via ssh is thi
and the user for jupyter notebook is jupyter
. However, you don't know their passwords. Just change them!
sudo passwd thi
# then entering the new password for this
sudo passwd jupyter
Why? The default bash
has a problem of "backspace" button when you connect via ssh.
π Note: Zsh.
sudo i
sudo apt install zsh # install zsh
Make zsh default for each user, below is for jupyter
,
su - jupyter
chsh -s $(which zsh) # make zsh be default
exit # to log out
su - jupyter # log in again
# Then follow the instructions
Then, install oh-my-zsh
,
sh -c "$(curl -fsSL https://raw.githubusercontent.com/robbyrussell/oh-my-zsh/master/tools/install.sh)"
Then install spacehip
theme (optional),
git clone https://github.com/denysdovhan/spaceship-prompt.git "$ZSH_CUSTOM/themes/spaceship-prompt"
ln -s "$ZSH_CUSTOM/themes/spaceship-prompt/spaceship.zsh-theme" "$ZSH_CUSTOM/themes/spaceship.zsh-theme"
If you change the GPU type?
You have to re-install the GPU driver on the virtual machine. Check this official instruction.
SSH to managed notebook
When creating a new notebook, make sure to enable terminal for this notebook. Open the notebook and then open the terminal.
# On your local machine => check the public keys
cat ~/.ssh/id_rsa.pub
# On managed notebook, make sure you're at /home/jupyter
pwd
mkdir .ssh
touch .ssh/authorized_keys
vim .ssh/authorized_keys
# Paste the public key here
# Then save & exit (Press ESC then type :wq!)
# Check
cat .ssh/authorized_keys
# Check the external ip address of this notebook instance
curl -s http://whatismyip.akamai.com
Connect from local,
ssh -i ~/.ssh/id_rsa jupyter@<ip-returned-in-previous-step>
AIO steps
Remark: This section is almost for me only (all the steps here are already described in previous steps).
Remember to shutdown the notebook if you don't use it!!
# Update system
sudo apt update
# You have to use below line to install zsh
# The terminal on workbench doesn't allow to do that!
gcloud compute ssh --project <project-id> --zone <zone> <name-of-instance> -- -L 8081:localhost:8080
# Change user's password
sudo passwd thi
sudo passwd jupyter
Go back to Terminal on Vertex (to make sure you're jupyter
)
# Install zsh
sudo apt install zsh
chsh -s $(which zsh)
# Install oh-my-zsh
sh -c "$(curl -fsSL https://raw.githubusercontent.com/robbyrussell/oh-my-zsh/master/tools/install.sh)"
# Install theme "spaceship"
git clone https://github.com/denysdovhan/spaceship-prompt.git "$ZSH_CUSTOM/themes/spaceship-prompt"
ln -s "$ZSH_CUSTOM/themes/spaceship-prompt/spaceship.zsh-theme" "$ZSH_CUSTOM/themes/spaceship.zsh-theme"
# Change theme to "spaceship"
nano ~/.zshrc # then change to "spaceship"
# Change the line of plugins too
plugins=(git docker docker-compose python emoji)
# Add alias
gs='git status'
ud_zsh='source ~/.zshrc'
# Update changes
source ~/.zshrc
# Add local ssh keys to this instance (for accessing via ssh)
# (Below is for local machine)
cat ~/.ssh/id_rsa.ideta.pub
# Then copy the public keys
# On vertex
mkdir ~/.ssh
nano /home/jupyter/.ssh/authorized_keys
# Then paste the key copied above to this and save
# If needed, make the same thing for user "thi"
# Install Github CLI
curl -fsSL \
https://cli.github.com/packages/githubcli-archive-keyring.gpg \
| sudo dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null
sudo apt update
sudo apt install gh
gh auth login
# Add conda path
nano ~/.zshrc
# Then add the following to the end of the file
export PATH="/opt/conda/bin:$PATH"
# After that Ctrl + X > Y > Enter to save
source ~/.zshrc
# Add more space to swap
# (to prevent the error: "[Errno 12] Cannot allocate memory")
sudo dd if=/dev/zero of=/swapfile bs=1024 count=1024k
sudo mkswap /swapfile
sudo swapon /swapfile
# Check
sudo swapon -s
Troubleshooting
[Errno 12] Cannot allocate memory
π reference to this solution
sudo swapon -s
If it is empty it means you don't have any swap enabled. To add a 1GB swap:
sudo dd if=/dev/zero of=/swapfile bs=1024 count=1024k
sudo mkswap /swapfile
sudo swapon /swapfile
Add the following line to the fstab
to make the swap permanent.
sudo nano /etc/fstab
/swapfile none swap sw 0 0
Error processing tar file(exit status 1): write /home/model-server/pytorch_model.bin: no space left on device
It's because disk space are full. You can check by running df -h
,
# Filesystem Size Used Avail Use% Mounted on
# /dev/sda1 99G 82G 13G 87% /
If you use the notebook to create docker images, be careful, the spaces will be used implicitly (Use docker info
to check where the images will be stored, normally they are in /var/lib/docker
which belongs to boot disk spaces). You can check the unsed images by docker images
and remove them by docker image rm img_id
.
Wanna increase the disk space?: Go to Compute Engine / Disks > Choose the right machine and edit the spaces[ref].
With Hugging Face models (A-Z)
π Hugging Face models.
A-Z text classification with PyTorch
π Original repo & codes.
π The official blog about this task (the notebook is good but you need to read this blog too, there are useful points and links)
π My already-executed notebook (There are my comments here).
- Take a model from Hugging Face + dataset IMDB + train again with this dataset to get only 2 sentiments -- "Positive" and "Negative". They cut the head of the model from HF and put another on top (Fine tuning).
- They do all the steps (preprocessing, training, predicting, post processing) inside the notebook first and then use these functions in the container they create before pushing to Vertex AI's registry.
- They run a custom job on Vertex AI with a pre-built container π Before doing that "online", they perform the same steps locally to make things work!
- They also show the way of using custom container for the training task.
- Create a new docker image locally with
Dockerfile
and all neccessary settings. - Push the image to Google's Container Registry.
- Using
aiplatform
to init + run the job on vertex ai.
- Create a new docker image locally with
- Hyperparameter Tuning: Using Vertex AI to train the model with different values of the Hyperparameters and then choose the best one for the final model.
- Deploying: they use TorchServe + create a custom container for prediction step.
- All the codes are in folder
/predictor/
- The most important codes are in file
custom_handler.py
which makes the same things in the step of predicting locally (the beginning steps of the notebook) - A custom image is created via file
Dockerfile
- Test the image with a container locally before pusing to Vertex AI.
- All the codes are in folder
- Create a model with this image and then deploy this model to an endpoint (this step can be done on the Vertex platform).
# Vertex AI SDK for Python
# https://googleapis.dev/python/aiplatform/latest/index.html
pip -q install --upgrade google-cloud-aiplatform
# Create a new bucket
gsutil mb -l <name>
# Check
gsutil ls -al <name>
# To disable the warning:
# huggingface/tokenizers: The current process just got forked, after parallelism has
# already been used. Disabling parallelism to avoid deadlocks...
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
Load & train the model from Hugging Face's SDK,
tokenizer = AutoTokenizer.from_pretrained(
model_name_or_path,
use_fast=True,
)
# 'use_fast' ensure that we use fast tokenizers (backed by Rust)
# from the π€ Tokenizers library.
model = AutoModelForSequenceClassification.from_pretrained(
model_name_or_path, num_labels=len(label_list)
)
# Upload from local to Cloud Storage bucket
gsutil cp local/to/file file/on/bucket
# validate
gsutil ls -l file/on/bucket
# Init the vertex ai sdk
aiplatform.init(project=PROJECT_ID, staging_bucket=BUCKET_NAME, location=REGION)
# Check the healthy of the container just created
!curl http://localhost:7080/ping
# The port and "/ping" are defined manually in the Dockerfile
# You have to wait ~1 minute after creating successfully the container to run this line
Make sure to add location=REGION
where your project locates. You should make this region/location be the same across the services (buckets, workbench, compute engine, registry,...)
For other tasks: creating docker container + test locally,..., let's read the notebook.
Just deploying?
In case you skip the training phase and just use the model given by Hugging Face community.
π Notebook for testing load/use models from Hugging Face.
π Notebook for creating an image and deploying to vertex AI.
π Export Transformers Models
I use the idea given in this blog.
from transformers import TFAutoModelForSequenceClassification
MAX_SEQ_LEN = 100
callable = tf.function(tf_model.call)
concrete_function = callable.get_concrete_function([tf.TensorSpec([None, MAX_SEQ_LEN], tf.int32, name="input_ids"), tf.TensorSpec([None, MAX_SEQ_LEN], tf.int32, name="attention_mask")])
tf_model = TFAutoModelForSequenceClassification.from_pretrained("joeddav/xlm-roberta-large-xnli")
tf_model.save(data_loc + '/xlm-roberta-large-xnli', signatures=concrete_function)
# Upload to bucket
! gsutil cp -r $modelPath $BUCKET_NAME
To make some tests with curl
, check this note. Below a shortcodes,
instance = b"Who are you voting for in 2020?"
b64_encoded = base64.b64encode(instance)
test_instance = {
"instances": [
{
"data": {
"b64": b64_encoded.decode('utf-8')
},
"labels": ["Europe", "public health", "politics"]
}
]
}
payload = json.dumps(test_instance)
r = requests.post(
f"http://localhost:7080/predictions/{APP_NAME}/",
headers={"Content-Type": "application/json", "charset": "utf-8"},
data=payload
)
r.json()
Using Transformers' pipeline
with Vertex AI?
You can check a full example in this notebook. In this section, I note about the use of Transformers' pipeline
using TorchServe
and Vertex AI.
The principle idea focuses on the file custom_hanler.py
which is used with TorchServe
when creating a new container image for serving the model.
In this custom_handler.py
file, we have to create methods initialize()
, preprocess()
, inference()
which extend the class BaseHandler
. Most of the problems come from the format of the outputs in these methods.
For using pipeline()
, we can define initialze()
, preprocess()
and inference()
like below,
def initialize(self, ctx):
""" Loads the model.pt file and initialized the model object.
Instantiates Tokenizer for preprocessor to use
Loads labels to name mapping file for post-processing inference response
"""
self.manifest = ctx.manifest
properties = ctx.system_properties
model_dir = properties.get("model_dir")
self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")
# Read model serialize/pt file
serialized_file = self.manifest["model"]["serializedFile"]
model_pt_path = os.path.join(model_dir, serialized_file)
if not os.path.isfile(model_pt_path):
raise RuntimeError("Missing the model.pt or pytorch_model.bin file")
# Load model
self.model = AutoModelForSequenceClassification.from_pretrained(model_dir)
self.model.to(self.device)
self.model.eval()
# Ensure to use the same tokenizer used during training
self.tokenizer = AutoTokenizer.from_pretrained(model_dir)
# pipeline()
# We should create this pipe here in order to not creating again and again for each
# request
self.pipe = pipeline(task='zero-shot-classification', model=self.model, tokenizer=self.tokenizer)
self.initialized = True
def preprocess(self, data):
""" Preprocessing input request by tokenizing
Extend with your own preprocessing steps as needed
"""
text = data[0].get("data")
if text is None:
text = data[0].get("body")
sentences = text.decode('utf-8')
# Tokenize the texts
tokenizer_args = ((sentences,))
inputs = self.tokenizer(*tokenizer_args,
padding='max_length',
max_length=128,
truncation=True,
return_tensors = "pt")
return inputs
def inference(self, inputs):
""" Predict the class of a text using a trained transformer model.
"""
decoded_text = self.tokenizer.decode(inputs["input_ids"][0], skip_special_tokens=True)
prediction = self.pipe(decoded_text, candidate_labels=["negative", "neutral", "positive"])
return [prediction] # YES, A LIST HERE!!!!
Another way to define preprocess()
and the corresponding inference()
(thanks to this idea).
def preprocess(self, data):
""" Preprocessing input request by tokenizing
Extend with your own preprocessing steps as needed
"""
text = data[0].get("data")
if text is None:
text = data[0].get("body")
sentences = text.decode('utf-8')
processed_sentences = []
num_separated = [s.strip() for s in re.split("(\d+)", sentences)]
digit_processed = " ".join(num_separated)
processed_sentences.append(digit_processed)
return processed_sentences
def inference(self, inputs):
""" Predict the class of a text using a trained transformer model.
"""
prediction = self.pipe(inputs[0], candidate_labels=["negative", "neutral", "positive"])
if len(inputs) == 1:
prediction = [prediction]
return prediction # YES, IT'S ALREADY A LIST FROM preprocess()
Encode example text in base64
format
For online prediction requestion, fortmat the prediction input instances as JSON with base64
encoding as shown here:
[
{
"data": {
"b64": "<base64 encoded string>"
}
}
]
π Converting a text to base64
online.
import base64
# Without non-ascii characters
instance = b"This film is not so good as it is."
b64_encoded = base64.b64encode(instance)
print(b64_encoded)
# b'VGhpcyBmaWxtIGlzIG5vdCBzbyBnb29kIGFzIGl0IGlzLg=='
# With (and also with) non-ascii characters (like Vietnamese, Russian,...)
instance = "BαΊ‘n sαΊ½ bαΊ§u cho ai trong nΔm 2020?"
b64_encoded = base64.b64encode(bytes(instance, "utf-8"))
print(b64_encoded)
# b'QuG6oW4gc+G6vSBi4bqndSBjaG8gYWkgdHJvbmcgbsSDbSAyMDIwPw=='
b64_encoded.decode('utf-8')
# 'QuG6oW4gc+G6vSBi4bqndSBjaG8gYWkgdHJvbmcgbsSDbSAyMDIwPw=='
# To decode?
base64.b64decode(b64_encoded).decode("utf-8", "ignore")
# 'BαΊ‘n sαΊ½ bαΊ§u cho ai trong nΔm 2020?'
Testing created endpoint
curl
Below codes are in Jupyter notebook.
ENDPOINT_ID="<id-if-endpoint>"
PROJECT_ID="<project-id>"
test_instance = {
"instances": [
{
"data": {
"b64": b64_encoded.decode('utf-8')
},
"labels": ["positive", "negative", "neutral"]
}
]
}
payload = json.dumps(test_instance)
# '{"instances": [{"data": {"b64": "VGhpcyBmaWxtIGlzIG5vdCBzbyBnb29kIGFzIGl0IGlzIQ=="}, "labels": ["positive", "negative", "neutral"]}]}'
%%bash -s $PROJECT_ID $ENDPOINT_ID
PROJECT_ID=$1
ENDPOINT_ID=$2
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://europe-west1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/europe-west1/endpoints/${ENDPOINT_ID}:predict \
-d '{"instances": [{"data": {"b64": "VGhpcyBmaWxtIGlzIG5vdCBzbyBnb29kIGFzIGl0IGlzIQ=="}, "labels": ["positive", "negative", "neutral"]}]}'
{
"predictions": [
{
"scores": [
0.92624515295028687,
0.04236096516251564,
0.031393911689519882
],
"labels": [
"negative",
"positive",
"neutral"
],
"sequence": "This film is not so good as it is!"
}
],
"deployedModelId": "***",
"model": "projects/***/locations/europe-west1/models/***",
"modelDisplayName": "***"
}
π My repo: dinhanhthi/google-api-playground
π Note: Google APIs
First, you have to create a Service Account (You can take the one you use to work with Vertex at the beginning, for me, it's "Compute Engine default service account").
Next, you have to create and download a JSON key w.r.t this Service Account.
// File .env
PRIVATE_KEY = "***"
CLIENT_EMAIL = "***"
// File predict.js
import { PredictionServiceClient, helpers } from "@google-cloud/aiplatform";
const credentials = {
private_key: process.env.PRIVATE_KEY
client_email: process.env.CLIENT_EMAIL
}
const projectId = "***";
const location = "europe-west1";
const endpointId = "***";
async function main(text = "I love you so much!") {
const clientOptions = {
credentials,
apiEndpoint: `${location}-aiplatform.googleapis.com`,
};
const predictionServiceClient = new PredictionServiceClient(clientOptions);
const endpoint = `projects/${projectId}/locations/${location}/endpoints/${endpointId}`;
const parameters = {
structValue: {
fields: {},
},
};
const buff = new Buffer.from(text);
const base64encoded = buff.toString("base64");
const _instances = {
data: { b64: base64encoded },
};
const instance = {
structValue: {
fields: {
data: {
structValue: {
fields: { b64: { stringValue: _instances.data.b64 } },
},
},
},
},
};
const instances = [instance];
const request = { endpoint, instances, parameters };
const [response] = await predictionServiceClient.predict(request);
console.log("Predict custom trained model response");
console.log(`Deployed model id : ${response.deployedModelId}`);
const predictions = response.predictions;
console.log("Predictions :");
for (const prediction of predictions) {
const decodedPrediction = helpers.fromValue(prediction);
console.log(`- Prediction : ${JSON.stringify(decodedPrediction)}`);
}
}
process.on("unhandledRejection", (err) => {
console.error(err.message);
process.exitCode = 1;
});
main(...process.argv.slice(2));
Then run the test,
node -r dotenv/config vertex-ai/predict.js "text to be predicted"
The results,
Predict custom trained model response
Deployed model id : 3551950323297812480
Predictions :
- Prediction : {"scores":[0.9942014217376709,0.0030435377266258,0.002755066612735391],"sequence":"You aren't kind, i hate you.","labels":["negative","neutral","positive"]}
Below are some links which may be useful for you,
Some remarks for Hugging Face's things
Without option return_tensors
when encoding
tokenizer = AutoTokenizer.from_pretrained(pt_model_dir)
inputs = saved_tokenizer("I am happy")
tokenizer.decode(inputs["input_ids"], skip_special_tokens=True)
With option return_tensors
when encoding
tokenizer = AutoTokenizer.from_pretrained(pt_model_dir)
inputs = saved_tokenizer("I am happy", return_tensors="pt")
# "pt" for "pytorch", "tf" for "tensorflow"
tokenizer.decode(inputs["input_ids"][0], skip_special_tokens=True)
Choose the same locations
π Vertex locations (You can check all supported locations here)
Below are some codes where you have to indicate the location on which your service will be run (Remark: They're not all, just what I've met from these notebooks),
# When working with notebooks
# (You can choose it visually on the Vertex Platform)
gcloud notebooks instances ... --location=us-central1-a ...
# When initialize the vertex ai sdk
aiplatform.init(project=PROJECT_ID, staging_bucket=BUCKET_NAME, location=REGION)
When pushing the image to the Container Registry, check this link for the right locations. For example, gcr.io
or us.gcr.io
is for US, eu.gcr.io
is for EU, asia.gcr.io
is for Asia.
Container Registry to Artifact Registry
Step 1: Artivate Artifact Registry API.
Step 2: Go to Artifacet Registry. If you see any warning like "*You have gcr.io repositories in Container Registry. Create gcr.io repositories in Artifact Registry?", click CREATE GCR. REPOSITORIES.
Step 3: Copy images from Container Registry to Artifact Registry. What you need is the URLs of "from" CR and "to" AR.
- Check in page AR, there is small warning icon β οΈ, hover it to see the "not complete" url. Example: Copy your images from
eu.gcr.io/ideta-ml-thi
toeurope-docker.pkg.dev/ideta-ml-thi/eu.gcr.io
- Check in page CR, click the button copy, a full url of the image will be copied to clipboard, eg. gcr.io/ideta-ml-thi/pt-xlm-roberta-large-xnli_3
- Finally, combine them with the tag (use
:lastest
if you don't have others already). - Example, from
gcr.io/ideta-ml-thi/pt-xlm-roberta-large-xnli_3:latest
tous-docker.pkg.dev/ideta-ml-thi/gcr.io/pt-xlm-roberta-large-xnli_3:latest
.
π Transitioning to repositories with gcr.io domain support (also on this link, copy from container to artifact)
gcrane
(this tool is recommended by Google)Next, install gcrane
tool. (It uses "go"). In case you just want to use directly, you can download it, then put it in the $PATH
in your .bashrc
or .zshrc
. On MacOS, don't forget to go to System Preferences > Securitty & Privacy > Run it anyway.
Finally,Read this official guide.
gcloud
Good practice: Use Cloud Shell instead.
# Change to the current project with gcloud
gcloud config set project <project-id>
π Follow this official guide.
gcloud container images add-tag gcr.io/ideta-ml-thi/name-of-image:latest us-docker.pkg.dev/ideta-ml-thi/gcr.io/name-of-image:latest
Remark: It takes a long time to run in background. Don't close the terminal window!! That's why we should (or shouldn't?) try Cloud Shell instead.
Step 4: Route to AR (After step 3, the iamges in AR has the same route as in CR but the traffic only regconize it from CR. We need this step to make all traffics use AR's instead). You need these permissions to perform the action (click the button ROUTE TO ARTIFACT).
Problems?
I met this problem when my model is around 2.5GB but it's ok for the model around 500MB.
Solution: When creating a new endpoint, set "Maximum number of compute nodes" to a number (don't leave it empty) and also choose a more powerful "Machine type".
π¬ Comments