AgitatedDove14

49 Questions, 8126 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8126

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

okay so the error should have been:

trains_agent: ERROR: Connection Error: it seems api_server is misconfigured. Is this the TRAINS API server http://<IP>:8008 ?

Not https nor 8010 ?!

4 years ago

0 If I Want To Create A Parameter That Is A List, The Text Field Gets Very Small In The Gui. Is There A Way To Increase The Size Of The Text Input For Fields Or A Better Way To Handle Lists?

is there a way to increase the size of the text input for fields or a better way to handle lists?

No 😞

Maybe an easier way to use connect_configuration instead ? it will take an entire dict and store it as text (format is hocon, which is YAML/Json compatible, which means it is hard to break when editing)

4 years ago

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

But the same configuration does not work on the machine with the trains-agent?

4 years ago

0 Hello Guys, Not Sure If This Is The Right Place To Ask About Clearml Serving. May I Know If An Updated Readme Will Be Released Soon? I Did Not Manage To Get Clearml Serving Work With My Own Clearml Server And Triton Setup.

Hi OddShrimp85

right place to ask about clearml serving.

It is 🙂

I did not manage to get clearml serving work with my own clearml server and triton setup.

Yes it should have been updated already, apologies.
Until we manage to sync the docs, what seems to be your issue, maybe we can help here?

4 years ago

0 I Cannot Get The Configuration From A Task: I Run

In the documentation it warns about

.close()

"Only call Task.close if you are certain the Task is not needed."

Maybe this is not clear enough, this means you do not need to automatically Add/Log/Track things into the Task in the current process.
This does Not mean you cannot access the Task or its artifacts

Mark closed means to externally (i..e not from the process that crated the Task, maybe even from a different machine) close and mark the task as completed (this...

2 years ago

0 My Team Uses Metaflow By Outerbounds. Great Dag Tool. Super Robust. We Run Our Production Workloads On It And Use It For Experimentation, Too. I'M Considering Adding Clearml To Our Stack As An Exp Tracker / Model Registry Rather Than Going With The More

Hi @<1541954607595393024:profile|BattyCrocodile47>

Has anyone used ClearML for this use case?

you mean as experiment management / model registry / data? I think this is the bread&butter of clearml 🙂
regrading the other options ion the list, I think most of them are alternatives to metaflow, not covering the parts you mentioned, no?

10 months ago

0 Hi, We Could Only Access The Self-Hosted Clearml-Server Via

https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-create-https-ssl-load-balancer.html
🙂

3 years ago

0 Hello, I'M Confused About The Best Way To Use A Docker Container To Manage The Environment For Experiments. I'Ve Gone Through Some Of The Tutorials Running In Venv Mode And Am Now Trying To Run Some Of My Own Codebase. I Have A Docker Container With All T

Hi @<1653207659978952704:profile|LovelyStork78>

I have a docker container with all the dependencies.

Well I think the main question is are you using the clearml-agent to launch jobs/experiments? If you do it makes sense to specify your docker as "base docker image" (in the UI look for under the Execution tab, Container).
This means the agent will use the pre-installed environment and will add anything that your Task needs on top of it, this of course includes pushing your codebase i...

one year ago

0 Hi, I'M Using Clearml'S Hosted Free Saas Offering. I'M Running Model Training In Pytorch On A Server And Pushing Metrics To Cml. I'Ve Noticed That Anytime My Training Job Fails Due To Gpu Oom Issues, Cml Marks The Job As

Actually you cannot breakpoint at "atexit" calls (or at least doesn't work with my gdb)
But I would add a few prints here:
https://github.com/allegroai/clearml/blob/aa4e5ea7454e8f15b99bb2c77c4599fac2373c9d/clearml/task.py#L3166

3 years ago

0 Clearml Tracks The Executed

do you have your Task.init call inside the "train.py" script ? (and if you do, what are you getting in the Execution tab of the task) ?

2 years ago

0 When Launching A Task To Trains Agent, I'M Having Trouble Getting The Imports From Other Files Working Correctly. For Instance, If My Task Imports A Function From Another File Within The Same Git Repo [

would I have to execute each task in the pipeline locally(but still connected to trains),

Somehow you have to have the pipeline step Task in the system, you can import it from code, or you can run it once, then the pipeline will clone it and reuse it. Am I missing something ?

5 years ago

0 Hi, I Need Your Help Setting Up An Trains Agent Running In Docker. I Have An Python Script Calling Wget As System Command Which Runs Fine On My Dev Engine. When Cloning The Experiment And Scheduling It Into The Services Queue I Get An Error That The Call

WickedGoat98

for such pods instantiating additional workers listening on queues

I would recommend to create a "devops" user and have its credentials spread across all agents. sounds good?

EDIT:
There is no limit on number of users on the system, so login as a new one and create credentials in the "profile" page :)

4 years ago

0 I Have A Training Task That Auto-Magically Saves A Model For Me To Gcs

Hi PanickyMoth78
` torch.save(net.state_dict(), PATH) # auto-uploads to GCS

get all the models from the Task

output_models = Task.current_task().models["output"]

get the last one

last_model = output_models[-1]

set meta-data

last_model.set_metadata(key="my key", value="my value", type="str") `

3 years ago

0 Hi, I Am Having Difficulties When Using The Dataset Functionality. I Am Trying To Create A Dataset With The Following Simple Code:

Hi GiganticTurtle0
Let me check

4 years ago

0 Hi There Trains Riders, Is There A Built-In Way To Send Notifications Upon Completed/Failed Experiment? I Have Seen The Slack_Alerts Code Sample, Where The Monitor Is Implemented By Code. Nice. My Question Is About Existing Monitors In The Trains-Server (

ColossalDeer61 btw, it turns out the docker-compose services docker was ill configured on the GitHub 😞 I suggest you get the latest copy of it:
curl -o docker-compose.yml

5 years ago

0 I Am Using Pipeline From Decorators. In The Pipeline, There Is A Training Step That Returns A Model (I Want This Model To Also Be Uploaded As An Artifact On Clearml). But This Results In The Following Error:

Hi DilapidatedCow43
I'm assuming the returned object cannot be pickled (which is ClearML's way of serializing it)
You can upload it as a model with
` uploaded_model_url = Task.current_task().update_output_model(model_path="/path/to/local/model")

...
return uploaded_model_url `wdyt?

3 years ago

0 Hi, I Am Trying To Setup Multi-Node Training With Pytorch Distributeddataparallel. Ddp Requres A Launch Script With A Set Of Parameters To Be Run On Each Node. One Of These Parameters Is Master Node Address. I Am Currently Using The Following Scheme:

Yes, i basically plan to use ClearML as user-friendly cluster manager

and it is 🙂
I think the main "drawback" is that you cannot "reserve" nodes for the multi-node training. The easiest solution is to have high-priority queue that is never used, and then have the DDP master process push into the high priority queue, which will ensure these are the next Tasks to be executed (now the only thing that is missing is preemption to running Tasks, but this automation policy is unfortunate...

4 years ago

0 When My Remote Task Is Installing The Python Dependencies

BoredHedgehog47
is this ( https://clearml.slack.com/archives/CTK20V944/p1665426268897429?thread_ts=1665422655.799449&cid=CTK20V944 ) the same issue (or solution) ?

3 years ago

0 Is There A Way To Get The Most Updated

yes 🙂
But I think that when you get the internal_task_representation.execution.script you are basically already getting the API object (obviously with the correct version) so you can edit it in place and pass it too

5 years ago

0 Is Clearml-Serving Using Either System Or Cuca Shared Memory? Or Planning To? In Our Experiments Using Perf_Analyzer The Shared Memory Experiments Showed A Huge Improvement And If We Wanted To Look Into This, Do You Have Any Pointers Of Where We Can Do T

Sorry @<1657918706052763648:profile|SillyRobin38> I missed this reply

Is ClearML-Serving using either System or CUCA shared memory? O

This needs to be set on the docker-compose:
and I think this line actually includes ipc: host which means there is no need to set the shm_size, but you can play around with it and let me know if you see a difference
[None](https://github.com/allegroai/clearml-serving/blob/7ba356efc97a6ae2159283d198d981b3c1ab85e6/docker/docker-compose-triton-gpu.yml#L1...

one year ago

0 Hi All, I'M Looking For An Easy Clean Way To Check If The Code Is Executed By A Clearml Agent Programatically, Any Suggestions?

Task.running_localy()
Should do the trick

4 years ago

0 Hi, Trying To Spin Up A Clearml Agent And Gettting This Error:

or at least stick to the requirements.txt file rather than the actual environment

You can also for it to log the requirements.txt with
Task.force_requirements_env_freeze(requirements_file="requirements.txt") task = Task.init(...)

3 years ago

0 Is There Any Reason Why Doing The Following Is Not Possible? Am I Doing It Right? I Want To Run A Pipeline With Different Parameters But I Get The Following Error?

GiganticTurtle0 fix was pushed 🙂
you can test with:
pip install git+🤞

3 years ago

0 Hello All, I'M Trying To Adapt Clearml With My Workflow. I Installed A Server At My Server, With Workers Attached To It. I'M Trying To Execute A Task From My Local Within One Of My Workers. Trying To Use Docker Mode And A Custom Image. I Also Have A Local

ZanyPig66 this should have worked, any chance you can send the full execution log (in the UI "results -> console" download full log) and attach it here? (you can also DM it so it is not public)

3 years ago

0 Hi All! I I Tried To Run The

In that case, yes please open an Issue so we can fix it 🙂

4 years ago

0 Hi, I Am Running Several Python Scripts But All For The Same Project/Task. Is It Possible To Task.Init To Existing Running/Completed Task And Adding On The Results?

SubstantialElk6 if you call Task.init with continue_last_task=<task_id> it will automatically add the last_iteration of the previous run, to any logging/report so you never overwrite the previous reports 🙂

4 years ago

0 Hi Everyone. We Have A Clearml Server Set Up Pretty Similar As With The

I can't find out how to pass my custom clearml.conf

Hi @<1544491301435609088:profile|TeenyElk27>
The easiest is to map it into the container in your docker-compose
(map a host clearml.conf into /root/clearml.conf inside the container)

2 years ago

0 Colors Of Cm Reporting Are Strange... Is It Possible To Adjust The Default Ones

This is strange... Could you send the browser console log, maybe there is an exception there

5 years ago

0 Hi, Is There A Way To List All Agents Running In A Host, I Do Not Find Relevant One In Clearml-Agent -H.

@<1523701304709353472:profile|OddShrimp85> are you trying to shut down the one running on your machine ?

2 years ago

0 Question About The Storage Manager. Assuming I Have An Object That Updates Frequently And Always Saved At The Same Path (E.G.

Do you want to PR it? should be a quick fix

5 years ago

Show more results