AgitatedDove14

49 Questions, 8122 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8122

0 Hi Guys, I’M Trying To Install It My Lab Server, But When I Try To Create Credentials, It Says Error And Gives More Info: Error 301 : Invalid User Id: Id=F46262Bde88B4928997351A657901D8B, Company=D1Bd92A3B039400Cbafc60A7A5B1E52B

Yes, let's assume we have a task with id aabbcc
On two different machines you can do the following:
trains-agent execute --docker --id aabbccThis means you manually spin two simultaneous copies of the same experiment, once they are up and running, will your code be able to make the connection between them? (i.e. openmpi torch distribute etc?)

4 years ago

0 Another Question: How Can I Make Clearml-Agent Use Pre-Installed Version From The Nvidia/Pytorch (

One last question: Is it possible to set the pip_version task-dependent?

no... but why would it matter on a Task basis ? (meaning what would be a use case to change the pip version per Task)

3 years ago

0 Question About

Hi SmallDeer34
Hmm I'm not sure you can, the code will by default use rglob with the last part of the path as wildcard selection
😞
You can of course manually create a zip file...
How would you change the interface to support it ?

4 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

Hmm I suspect the 'set_initial_iteration' does not change/store the state on the Task, so when it is launched, the value is not overwritten. Could you maybe open a GitHub issue on it?

3 years ago

0 I Am Also Experiencing A Weird Behaviour When Running A Script Using The Module Flag. For Example I Run:

Assuming git repo looks something like:
.git readme.txt module | +---- script.pyThe working directory should be "."
The script path should be: "-m module.scipt"
And under the Configuration/Args, you should have:
args1 = value args2 = another_value
Make sense?

4 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

Oh if this is the case you can probably do
` import os
import subprocess
from clearml import Task
from clearml.backend_api.session.client import APIClient

client = APIClient()

queue_ids = client.queues.get_all(name="queue_name_here")

while True:
result = client.queues.get_next_task(queue=queue_ids[0].id)
if not result or not result.entry:
sleep(5)
continue
task_id = result.entry.task
client.tasks.started(task=task_id)
env = dict(**os.environ)
env['CLEARML_TASK_ID'] = ta...

3 years ago

0 Hi, I Need Your Help Setting Up An Trains Agent Running In Docker. I Have An Python Script Calling Wget As System Command Which Runs Fine On My Dev Engine. When Cloning The Experiment And Scheduling It Into The Services Queue I Get An Error That The Call

Nice!!!

4 years ago

0 Hi, When Trying To Use A Remote Agent To Train A Model, The Initial Environment Setup On The Remote Machine Fails Because The List Of Requirements Located In /Tmp/Cached-Reqsaw90Argk.Txt Contains A Link To An Aarch64 Wheel:

Thanks for the details TroubledJellyfish71 !
So the agent should have resolved automatically this line:
torch == 1.11.0+cu113 into the correct torch version (based on the cuda version installed, or cpu version if no cuda is installed)
Can you send the Task log (console) as executed by the agent (and failed)?
(you can DM it to me, so it's not public)

3 years ago

0 Hi, We'Re Hosting Clearml On Our K8S Cluster, And I'M Running Into Problems With It... I'Ve Set It Up In A Subdomain Way - App/Files/Api.Clearml.Mydomain... But I Have Some Issues With The Ssl Certificate. When I Try Running

or point to the self signed certificate:
export REQUESTS_CA_BUNDLE=/path/to/your/certificate.pem

4 years ago

0 Hi, I Am Creating Pipeline From Function With Dynamically Created Steps, Eg. If I Pass Pipeline Param Tune_Optime='Recall,Precision', My Pipeline Is Creating 2 Tasks/Steps - Each For Trained Model. Everything Is Working Really Nice, When I Start Pipeline

Ad1. yes, think this is kind of bug. Using _task to get pipeline input values is a little bit ugly

Good point, let;s fix it 🙂

new pipeline is built from scratch (all steps etc), but by clicking "NEW RUN" in GUI it just reuse existing pipeline. Is it correct?

Oh I think I understand what happens, the way the pipeline logic is built, is that the "DAG" is created the first time the code runs, then when you re-run the pipeline step it serializes the DAG from the Task/backend.
Th...

3 years ago

0 Regarding The New Version 1.1.2, I Have Noticed Type Hints Are Now Included In The Script Generated By

GiganticTurtle0 this one worked for me 🙂
` from clearml import Task
from clearml.automation.controller import PipelineDecorator

@PipelineDecorator.component(return_values=["msg"], execution_queue="myqueue1")
def step_1(msg: str):
msg += "\nI've survived step 1!"
return msg

@PipelineDecorator.component(return_values=["msg"], execution_queue="myqueue2")
def step_2(msg: str):
msg += "\nI've also survived step 2!"
return msg

@PipelineDecorator.component(return_values=["m...

3 years ago

0 How To Use

This doesn't seem to be running inside a container...
What's the clearml-agent launch command you are using ? (i.e. do you have --docker flag)

3 years ago

0 Hi All, I'M Using Clearml 1.0.3 With Clearml-Server <1 (How Do I Get The Current Running Version?) In Pytorch-Lightning I Use Ddp And I See Multiple Tasks (As The Number Of Gpus) Being Created And Remaining In Draft Mode. Is It A Problem Running Clearml

Maybe we should rename it?! it actually creates a Task but will not auto connect it...

4 years ago

0 Hi There

5 years ago

0 Do I Understand Correctly, That Running

Task.get_task(..).system_tags

4 years ago

0 Hi, Is It Possible To Sync Expiriment Using S3 Or Gs? I Loved To Have A Look At The Some Documentation. We Want To Sync The Training While They Are Running[Not Just When They Are Finished] Thanks,

StaleButterfly40 just making sure I understand, are we trying to solve the "import offline zip file/folder" issue, where we create multiple Tasks (i.e. Task per import)? Or are you suggesting the Actual task (the one running in offline mode) needs support for continue-previous execution ?

3 years ago

0 Hi There, I’Ve Been Trying To Play Around With The Model Inference Pipeline Following

you need to set

CLEARML_DEFAULT_BASE_SERVE_URL:

So it knows how to access itself

2 years ago

0 Hey, What Is The Exact Difference Between

Since pytorch is a special example (the agent will pick the correct pytorch based on the installed CUDA) , the agent will first make sure the file is downloaded, and then pass the resolving for pip to decide if it necessary to install. (bottom line, we downloaded the torch for no reason but it is cached so no real harm done) It might be the second package needs a specific numpy version... this resolving is don't by pip, not the agent specifically. Anyhow --system-site-packages is applicable o...

5 years ago

0 Hi Guys, I Am Having Some Trouble Running Some Training Scripts With The Agent Functionality:

When we enqueue the task using the web-ui we have the above error

ShallowGoldfish8 I think I understand the issue,
basically I think the issue is:
task.connect(model_params, 'model_params')Since this is a nested dict:
model_params = { "loss_function": "Logloss", "eval_metric": "AUC", "class_weights": {0: 1, 1: 60}, "learning_rate": 0.1 }The class_weights is stored as a String key, but catboost expects "int" key, hence it fails.
One op...

3 years ago

0 Roll Call! Who Else Is Here?

What's up @<1523708288686886912:profile|DeliciousBluewhale87>

4 years ago

0 Hi, I'M Trying To Run Task.Init Inside A Jupyter Notebook For The First Time (Used It A Lot Before In Normal Python Scripts), And I Get A Warning-

How can I reproduce it?

4 years ago

0 Are There Any Particular System Dependencies Needed To Enable

I still don't get resource logging when I run in an agent.

@<1533620191232004096:profile|NuttyLobster9> there should be no difference ... are we still talking about <30 sec? or a sleep test? (no resource logging at all?)

have a separate task that is logging metrics with tensorboard. When running locally, I see the metrics appear in the "scalars" tab in ClearML, but when running in an agent, nothing. Any suggestions on where to look?

This is odd and somewhat consistent with actu...

one year ago

0 Hi

LOL

4 years ago

0 Hi, I'M Having Problems With The Installed Packages When Creating An Experiment. The Installed Packages Used To Be A List With The Versions Of All The Installed Packages In The Venv. However, Now I Get The Following:

I'm assuming these are the Only packages that are imported directly (i.e. pandas requires other packages but the code imports pandas so this is what listed).
The way ClearML detect packages, it first tries to understand if this is a "standalone" scrip, if it does, than only imports in the main script are logged. Then if it "thinks" this is not a standalone script, then it will analyze the entire repository.
make sense ?

3 years ago

0 Hi, Is Clearml Support Creating New Tasks While In Offline Mode? I'M Trying To Run The Following:

Yes, offline got broken in 1.3.0 😞 , RC fixed it:
pip install clearml==1.3.1rc0Stable release later this week

3 years ago

0 Hi There, I Have A Package Called

This looks like 'feast' error, could it be a configuration missing?

3 years ago

0 Is This An Expected Behaviour? Trains Version 0.16.4, Not Able To Upgrade Now To Latest Version But I Doubt This Was Changed

Ohhh I see, yes this is regexp matching, if you want the exact match:
'^{}$'.format(name)

4 years ago

0 Hi

well you can always do:
os.system('sed ...')🙂