AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Hello. I'M Interested In Dynamic Gpu Feature. But I Can'T Find Any Information On How It Works. Can You Help Me With It? Is It Possible To Try It Somewhere ?

ItchyJellyfish73
Unfortunately this needs backend support, and only available in the enterprise version, what is your use case for it? (It was designed to allow out of the box bare-metal multi gpu dynamic allocation, think DGX with 8 GPUs that instead of spinning down agents when you want to change the queue->num-gpu mapping you can do it on the fly)

4 years ago

0 Clearml Server Deployment Uses Node Storage. If More Than One Node Is Labeled As App=Clearml, And You Redeploy Or Update Later, Then Clearml Server May Not Locate All Your Data.

Wait, let me double check

4 years ago

0 Hi Everyone, Is It Possible To Show The Upload Progress Of Artificats? E.G. I Use

server-->agent is fast, but agent-->server is slow.

Then multiple connection will not help, this is the bottleneck of the upload speed of your machine, regardless of what the target is (file-server, S3, etc...)

3 years ago

0 Hi All, I Have Deployed A Clearml Server With Docker To One Of Our Local Machine. I Had Set Up The Filesserver Folder As Mount Point To The Cloud. How Easy Is It To Migrate Our Existing Experiments Later On To A Clearml Server That We Deploy In The Cloud

Hi @<1576381444509405184:profile|ManiacalLizard2>
If you make sure all server access is via a host name (i.e. instead of IP:port, use host_address:port), you should be able to replace it with cloud host on the same port

2 years ago

0 What Does

ReassuredTiger98 yes this is exactly it 🙂
agent.package_manager.type will select for the agent weather it should use conda or pip to do the installation. Basically if you develop on conda you should select conda.
The agent will first try to install packages using conda, then it will collect the missing packages and install them into the save environment only using pip.

4 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

the hack doesn't work if conda is not installed

Of course conda needs to be installed, it is using a pre-existing conda env, no?! what am I missing

Ideally it would just pull an experiment from a dedicated HPO queue and run it inplace

And the assumption is the code is also there ?

3 years ago

0 Hello, My Dl Workflow Includes Post-Training Quantization. Is There A Way To Implement These Procedures In Clearml?

However, SNPE performs quantization with precompiled CLI binary instead of python library (which also needs to be installed). What would be the pipeline in this case?

I would imagine a container with preinstalled SNPE compiler / quantizer, and a python script triggering the process ?

one more question: in case of triggering the quantization process, will it be considered as separate task?

I think this makes sense, since you probably want a container with the SNE environment, m...

3 years ago

0 What Happens If The Task.Init Doesn'T Happen In The Same Py File As The "Data Science" Stuff I Have A List Of Classes That Do The Coding And I Initialise The Task Outside Of Them. Something Like

This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also helpnot to pollute clearml spaces with half backed ideas

What's the value of runnign outside of an experiment management context ? don't you want to log it?
There is no real penalty here, no?!

2 years ago

0 Hi, When Using

ResponsiveHedgehong88 so I would suggest using execute_remotely in your code, basically you start locally you make sure everything is passed as intended, then from within the code you call task.execute_remotely(...) which will stop the current process and enqueue the Task on the selected queue for the agent to execute.
https://github.com/allegroai/clearml/blob/0397f2b41e41325db2a191070e01b218251bc8b2/examples/advanced/execute_remotely_example.py#L127
This way you can both easily test...

3 years ago

0 Hi, I Try To Write An Article On Medium About Clearml And Face Some A Problem With Plotly Figures. When Displaying The Figure Locally In A Browser Works Fine, But On The Cleaml Server (I Use The Free Tier Service) The Plot Is Empty And Has The Title 'Unkn

WickedGoat98 Nice!!!
BTW: The fix should solve both (i.e. no need to manually cast), I'll make sure the fix is on GitHub so you'll be able to verify 🙂

4 years ago

0 Very Weird Error, Trying To Run An Experiment Through An Agent In Docker Mode, And I Get This Error

yes

4 years ago

0 Is It Possible To Add A Callback For A Pipeline From A Step?

Is task.parent something that could help?

Exactly 🙂 something like:
# my step is running here the_pipeline_task = Task.get_task(task_id=task.parent)

4 years ago

0 Hi, Is There A Way To Create A Draft Experiment Manually? That Is - Give It A Some File To Run, Or, Better Yet, A Function To Run Which Will Be The Start Of The Experiment? In W&B, For Example It Is Possible To Simply Write (Their

OddAlligator72 I like this idea.
The single thing I'm not sure about is the "function entry point"
Why would one do that? Meaning why wouldn't you have a proper python entry-point.
The reason I'm reluctant is that you might have calls/functions/variables in global scope of the file storing the function, and then users will not know why something broke, ans it will be very cumbersome to debug.
A simple script entry point seems trivial to launch and debug locally.
What do you think ? What woul...

4 years ago

0 Hi All, I Have A Broad Question On How A

Hi OutrageousGrasshopper93
which framework are you using? trains-agent will pull the correct torch based on the cuda version it detects, but no such thing for TF the default venv mode, trains-agent creates a new venv for the experiment (not conda) then everything is installed there. If you need conda you need to change the package_manager to conda: https://github.com/allegroai/trains-agent/blob/de332b9e6b66a2e7c6736d12614de9870eff48bc/docs/trains.conf#L49 The safest way to control CUDA dri...

4 years ago

0 Hi, Clearml Stores Models In The Following Format:

Yes, the same will work with artifacts, use pass the full url to the artifact_object it should just register it as is.

2 years ago

0 Unrelated Problem (Or Is It?) The Clearml'S Built In Cleanup Service Fails

what do you say that I will manually kill the services agent and launch one myself?

Makes sense 🙂

3 years ago

0 Another Conundrum: I Have A Single Script That Launches Training Jobs For Various Models. It Does This By Accepting A Flag Which Is The Model Name, And Dynamically Loading The Module To Train It. This Didn'T Mesh Well With Trains, Because The Project And

Hi SillyPuppy19
I think I lost you half way through.

I have a single script that launches training jobs for various models.

Is this like the automation example on the Github, i.e. cloning/enqueue experiments?

flag which is the model name, and dynamically loading the module to train it.

a Model has a UUID in the system as well, so you can use that instead of name (which is not unique), would that solve the problem?

This didn't mesh well with Trains, because the project a...

5 years ago

0 Folks, Could You Please Clarify/Help? I Correct Understand, If --Docker Is Enable That Will Means Every New Experiments Will Be Executed Into Dedicated Agent Worker Containers? Also I See For

Ohh then you do docker sibling:
Basically you map the docker socket into the agent's docker , that lets the agent launch another docker on the host machine.
You cab see an example here:
https://github.com/allegroai/clearml-server/blob/6434f1028e6e7fd2479b22fe553f7bca3f8a716f/docker/docker-compose.yml#L144

4 years ago

0 I Have A Questions About Queue Priorities With Clearml-Agent. I Have Two Queues,

but it is not optimal if one of the agents is only able to handle tasks of a single queue (e.g. if the second agent can only work on tasks of type B).

How so?

4 years ago

0 Hello All. I'M Generating An Outputmodel In One Task And Using It As An Inputmodel For Another Task. Since There'S Already A Timestamp On The Model Creation Date, Is There A Way To Get The Date From The Inputmodel?

Hi @<1545216070686609408:profile|EnthusiasticCow4>

is there a way to get the date from the InputModel?

You should be able to with model._get_model_data()
But I think we should have it all exposed, wdyt?

2 years ago

0 I'M Training A Tensorflow Model And Saving It In The End. I Looked At The Outputmodel Class. How Do I Connect The Model I'M Saving To The Outputmodel?

When I look at the details, model artifact in the ClearML UI, it's been saved the usual way, and no tags that I added in the OutputModel constructor are there.

Did you disable the autologging ? Are you saying the tags not appearing is a bug (it might be) ?

Also, I don't mind auto logging either if I have control over publishing the model or not directly from that script, and adding tags etc, like OutputModel.

Sure you can publish models / add tags etc, wither from the UI or pr...

3 years ago

0 Hey All, I Want To Purchase The Pro Version Of Clearml But Would Like To Have A Better Understanding Of The Metric Events And Api Calls That Are Performed When Using Clearml-Serving. For Example: I Have No Understanding Which Docker Container Calls The Ap

Hi @<1526371965655322624:profile|NuttyCamel41>
I think that the only way to actually get huge number of api calls is with a lot of machines.
For example, regardless of the amount of console-logs you print, it will only be a single call, as these are packages every 2-10 seconds. The same with metric reporting etc.
On the free tier you cal already test the amount of API calls, I think the mechanism is exactly the same
fyi: I would put this question in the channel

2 years ago

0 I Originally Posted In

ldconfig from

/etc/profile

which is put there by the interactive_session_task

LackadaisicalOtter14 are you sure ? maybe this is done as part of the installation the interactive session runs ?
Could that be the issue ?
apt-get update && apt-get install -y openssh-server

3 years ago

0 I Originally Posted In

Hi LackadaisicalOtter14

However, whenever we spin up a session,

always gets run and overwrites our configs

what do you mean by that?
The what config are being overwritten? (generally speaking, it just add the OS environment it needs to for the setup process)

3 years ago

0 I Originally Posted In

That would be great! Might have to use

2>/dev/null

in some of my bash scripts

Feel free to test and PR :)

One other question regarding connecting. We have setup sshd inside the docker image we are using.

Actually the remote session opens port 10022 on the host machine (so it does not collide with the default ssh port)
It actually runs an additional sshd inside the docker, setting its port.
And the clearml-session will ssh directly into the container sshd...

3 years ago

0 I Originally Posted In

Hi LackadaisicalOtter14

Is it possible to remove this line to stop it from being executed

Everything is possible 🙂 II think the main question is why it is there (which ti the best of my understanding, is to solve for any cuda drivers and installed packages, meaning anything that is installed in runtime)
I think we can suppress the error, wdyt?
'echo "ldconfig" 2>/dev/null >> /etc/profile && '

3 years ago

0 I Originally Posted In