AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 [Clearml Serving] Hi Everyone! I Am Trying To Automatically Generate An Online Endpoint For Inference When Manually Adding Tag

Hi @<1636175432829112320:profile|PlainSealion45>

I used this initial model to create the endpoint with

model add

command.

I think that the initial model needs to be added with model auto-aupdate Not with model add
basically do not call model add - this is static, always using the model ID specified (you can deploy new models with manually callign model add on the same endpoint and specifying diffrent model ID , but again manual)

To Automatically have the m...

10 months ago

0 [Clearml Serving] Hi Everyone! I Am Trying To Automatically Generate An Online Endpoint For Inference When Manually Adding Tag

Thanks for the logs @<1627478122452488192:profile|AdorableDeer85>
Notice that the log you attached means the preprocessing is executed and the GPU backend is returning an error.
Could you provide the log of the docker compose specifically the intersting part is the Triton container, I want to verify it loads the model properly

10 months ago

0 Hey, Trying To Use Trains-Agent To Run An Experiment On My Computer. When Trying To Execute A Job From The Queue On My Agent Im Getting An Error That Numpy Is Not Installed. How Do I Have The Trains-Agent Install My

Hi CloudyHamster42

how do i have the trains-agent install my

requirements.txt

file from my repo when creating the environment?

BTW if you clear all "the installed packages", then trains-agent will user requirements.txt and update back all the packages in the UI

3 years ago

0 Given I Want To Run A Task In A Pipeline Using A Base Task Id. One Of My Steps Just Finds The Latest Model To Use. I Want The Task To Output The Id, And The Next Step To Use It. How Would I Go About Doing This?

VexedCat68 yes 🙂 you can also pass the parent folder and it will zip the entire subfolders into a single artifact

2 years ago

0 [Clearml Serving] Hi Everyone! I Am Trying To Automatically Generate An Online Endpoint For Inference When Manually Adding Tag

. I am not sure this is related to the fact the model is not correctly converted to TorchScript

Because Triton Only supports TorchScript (Not torch models) 🙂

10 months ago

0 Hi All, I Have Deployed A Clearml Server With Docker To One Of Our Local Machine. I Had Set Up The Filesserver Folder As Mount Point To The Cloud. How Easy Is It To Migrate Our Existing Experiments Later On To A Clearml Server That We Deploy In The Cloud

Correct

one year ago

0 Hi, I'M Trying To Clone And Queue Experiments For Running Them On My Workers. I Am Able To Successfully Clone And Queue The Task, But Seems Like The Task Does Not Pass The Correct Parameters To My Python Script On The Worker. We Use Hydra For Configuring

JumpyPig73 Do you see all the configurations under the Args section in the "Configuration" Tab ?
(Maybe I'm wrong and the latest RC does Not include the python-fire support)

2 years ago

0 Any Info On The Lifecycle Of Datasets Downloaded To $Home/.Clearml/Cache/Storage_Manager/Datasets Via Get_Local_Copy I Have A Task Running And I Was Watching The Above Path And Datasets Were Being Downloaded And Then They Are All Removed And For A Partic

Hmm, Notice that it does store sym links to parent data versions (to save on multiple copies of the same file). If you call get_mutable_local_copy() you will get a standalone copy

3 years ago

Could that be it ?

3 years ago

0 Thank You For Your Help So Far. I Have A Question About Trains Authentication And Privacy When Deploying On K8S. I Want Integrate Building A Trains-Server Into Our Iac. Now That I Got A Server To Work With An Agent Deployment Im Thinking About Authorizati

ColossalAnt7 I would do the following:
Configure trains-server user/pass, mounting the API server configuration file as pointed in the trains-server documentation (intermediate temporary step) Start by providing the ML guys with a VPN access that allows them to access directly the trains-server api/web/file pos (caveat is the IP/sub-domain needs to be solved) Configure a ConfigMap to do the routing/ingest (this solves the IP/Sub-Domain issue) and allow the VPN to access the single entrypoint...

3 years ago

0 Hi, Is There A Way To Pull Clearml Datasets To A Mounted Pv Instead Of The Pod'S Local Directory.

By the way, will downloading still happen if the datasets is available in the cache folder?

If it is cached, then there is no need to re-download 🙂

one year ago

0 [Clearml Serving] Hi Everyone! I Am Trying To Automatically Generate An Online Endpoint For Inference When Manually Adding Tag

I notice that, in my Serving Service situated in the DevOps project, the "endpoints" section doesn't seem to get updated when I tag a new model with "released".

It takes it a few minutes (I think 5 min is the default) to update.
Notice that you need to add the model with

model auto-update --engine triton --endpoint "test_model_pytorch_auto" ...

Not with model add (if for some reason that does not work please let me know)
No need to pass the model version i.e. 1 you can ...

10 months ago

0 [Clearml Serving] Hi Everyone! I Am Trying To Automatically Generate An Online Endpoint For Inference When Manually Adding Tag

Hi @<1636175432829112320:profile|PlainSealion45>

I am trying to automatically generate an online endpoint for inference when manually adding tag

released

to a model.

So the "automatic" here means that the model endpoint will be updated with the latest model, but not that a new endpoint will be created.
Does that make sense ?
To add a new endpoint on Tagging a model, you should combine it with ModelTrigger and have a fucntion that calls the clearml-serving to cr...

10 months ago

0 [Clearml Serving] Hi Everyone! I Am Trying To Automatically Generate An Online Endpoint For Inference When Manually Adding Tag

Yes! That's exactly what I meant, as you can see the Triton backend was not able to load your model. I'm assuming because it was Not converted to torch script, like we do in the original example
https://github.com/allegroai/clearml-serving/blob/6c4bece6638a7341388507a77d6993f447e8c088/examples/pytorch/train_pytorch_mnist.py#L136

10 months ago

0 Different Question. How Can I Pass Pythonpath Env Variable To A Task, Run By Agent (So Python Can Find Classes Inside M Subdirectories)?

Happy to hear 🙂

2 years ago

0 Hi Everyone, Does Anybody Now If The Latest Release 1.15 Is Still Vulnerable To

Hi @<1658281099807166464:profile|SmallCamel52>

Lack of authentication in all versions of the fileserver component

Are you leaving the fileserver open to the world ?

6 months ago

0 Question About The File Server. Currently, We Have A Machine With Minio Installed, And All File Communication Is Made Using The Minio Sdk Client. [Minio Is Just Like An S3 Bucket, Fully Compliant With S3 Protocol]. In The Examples I'Ve Seen The

basically the default_output_uri will cause all models to be uploaded to this server (with specific subfolder per project/task)
You can have the same value there as the files_server.
The files_server is where you have all your artifacts / debug samples

3 years ago

0 If I Am Using The Demo Servers, Do I Need To Do Something Special To Use

Hi HealthyStarfish45

is there an advantage in using tensorboard over your reporting?

Not unless your code already uses TB or has some built in TB loggers.

html reporting looks powerfull, can one inject some javascript inside?

As long as the JS is self contained in the html script, anything goes :)

3 years ago

0 If I Am Using The Demo Servers, Do I Need To Do Something Special To Use

HealthyStarfish45
No, it should work 🙂

3 years ago

0 What Is The Suggested Way Of Running Trains-Agent With Slurm? I Was Able To Do A Very Naive Setup: Trains-Agent Runs A Slurm Job. It Has The Disadvantage That This Slurm Job Is Blocking A Gpu Even If The Worker Is Not Running Any Task. Is There An Easy Wa

Okay, what you can do is the following:
assuming you want to launch task id aabb12
The actual slurm command will be:
trains-agent execute --full-monitoring --id aabb12
You can test it on your local machine as well.
Make sure the trains.conf is available in the slurm job
(use trains-agent --config-file to point to a globally shared one)
What do you think?

3 years ago

do you have docker installed on all slurm agent/worker machines

Docker support?

3 years ago

0 Hi, When Running A Training Script From Pycharm, It Seems That Clearml Logs Only Those Packages That Are Explicitly Imported By My .Py Files; It Seems To Not Take The Pacakges That Are In The Requirements.Txt My Training Uses Keras

Hi RoughTiger69

seems to not take the pacakges that are in the requirements.txt

The reason for not taking the entire python packages, it will most likely break when trying to run inside the agent.
The directly imported packages aill essentially pull their required packages, and thus create a stable env on the remote machine. The agent then will store the Entire env, as it assumes it will be able to fully replicate it the next time it runs.
If the "Installed Packages" section is empty...

3 years ago

Okay this more complicated but possible.
The idea is to write a glue layer (service) that pulls from the (i.e UI) queue
sets the slurm job
and puts it in a pending queue (so you know the job s waiting in the slurm scheduler)
There is a template here:
https://github.com/allegroai/trains-agent/blob/master/trains_agent/glue/k8s.py
I would love to help and setup a slurm glue in a similar manner
what do you think?

3 years ago

HealthyStarfish45 could you take a look at the code, see if it makes sense to you?
What I'm getting to, is maybe we build a template, then you could fill in the gaps ?

3 years ago

but I need to dig digger into the architecture to understand what we need exactly from k8s glue.

Once you do, feel free to share, basically there are two options , use the k8s scheduler with dynamic pods, or spin the trains-agent as a service pod, and let it spin the jobs

3 years ago

I hope you can do this without containers.

I think you should be fine, the only caveat is CUDA drivers, nothing we can do about that ...

3 years ago

HealthyStarfish45 if I understand correctly the trains-agent is running as daemon (i.e. automatically pulling jobs and executes them), the only point might be cancelling a daemon will cause the Task executed by that daemon to be canceled as well.
Other than that, sounds great!

3 years ago

NICE!

3 years ago

HealthyStarfish45

Is there a way to say to a worker that it should not take new tasks? If there is such a feature then one could avoid the race condition

Still undocumented, but yes, you can tag it as disabled.
Let me check exactly how.

3 years ago

0 In Ui Under Execution Tab, I See That The Trains Has

PompousParrot44 What is the "working directory" on the experiment itself? and the "script path"?
Based on what you wrote above, in order for it work you should have:
working directory: "."
script path: "-m test.scripts.script"
notice no "--args" and working directory is "." (i.e. the root of the repository)

4 years ago

Show more results