AgitatedDove14

49 Questions, 8126 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8126

0 A Question About Ssh Keys Mount To A Clearnl-Agent Running In Docker Mode. I Noticed That Only When The Task Is Created And Enqueued (Using Python Script), The Local .Ssh Folder Will Be Bind With The Container, But If I Later Reset (Or Clone) And Enqueue

CrookedWalrus33
Force SSH git authentication, it will auto mount the .ssh from the host to the docker
https://github.com/allegroai/clearml-agent/blob/6c5087e425bcc9911c78751e2a6ae3e1c0640180/docs/clearml.conf#L25

3 years ago

0 So, Here'S A Question. Does Clearml Automatically Save Everything Necessary To Continue Training A Pytorch Language Model? Specifically, I'Ve Been Looking At The Checkpoint Folders Created When I'M Training A Huggingface Robertaformaskedlm. I Checked What

Hi SmallDeer34
The any generally any pytorch.save(...) is logged/uploaded by clearml automatically. specifically in your case I think the only missing one is the trainer_sate.json, which I assume is general json file, and I imagine is part of huggingface framework. You can easily upload it as additional artifact with Task.upload_artifact wdyt?

4 years ago

0 How Does The Uncommitted Changes Stuff Work? Is It Only For Modified Changes And Not Untracked Files?

Is it only for modified changes and not untracked files?

basically everything that "git diff" will output.
Then the agent will re-apply it on a remote machine

4 years ago

0 The Overview Panel Would Be Extremely Well Suited For The Task Of Selecting A Number Of Projects For Comparing Them. Another Useful Feature Would Be To Allow Adding Information (E.G. Metrics Or Metadata) To The Tooltip. Would You Consider Adding This

The Overview panel would be extremely well suited for the task of selecting a number of projects for comparing them.

Could you elaborate ?

Another useful feature would be to allow adding information (e.g. metrics or metadata) to the tooltip.

You mean are we still talking about the "Overview" Tab?

4 years ago

0 Hi, Does Clearml Send Emails From The

Hi @<1554275779167129600:profile|ProudCrocodile47>
Do you mean @ clearml.io ?
If so, then this is the same domain (.ml is sometimes flagged as spam, I'm assuming this is why they use it)

2 years ago

0 Btw: There Seems To Be No Support For Videos In Tensorboard/Experiment View (E.G.

👍

4 years ago

0 Hi People, I Am Using Pytorch-Lightning Together With Trains, And Came Across A Trainslogger That Was Available In Previous Lightning Versions And Is Currently Deprecated. I Was Wondering, What Is The Recommended Way To Go About It? On The One Hand I Get

RipeGoose2 you are not limited to the automagic
From anywhere in your code you can always do:
from trains import Logger Logger.current_logger().report_plotly(...)So you can add any manual reporting on top of the one generated by lightning .
Sounds good?

4 years ago

0 Hi, I Faced With A Silly Error, When I Run The Python Script With Task = Trains.Init(Project_Name='My Project', Task_Name='My Task'). The Task Goes To The Trains Server, But In The Trains Server, In Installed Packages Part One Of The Line

Thanks@doru! BTW if you are running a code from outside the trains repo, do you still get the double package?

5 years ago

0 I Wanted To Ask, How To Run Pipeline Steps Conditionally? E.G If Step Returns A Specific Value, Exit The Pipeline Or Run Another Step Instead Of The Sequential Step

Wait, so the pipeline step only runs if the pre execute callback returns True? It'll stop if it doesn't run?

Only if you have a Callback function, and that callback function returns False, then it will skip it (otherwise it will process it)

Another question, in the parents sequence in pipe.add_step, we have to pass in the name of the step right?

Correct, the step name is a unique identifier for the pipeline

how would I access the artifact of a previous step within the pre ...

3 years ago

0 Hi All, Is It Possible To Control The Number Of Steps Of The Pipeline During Run Time. Eg. If User Wants #N Parallel Steps In The Pipeline

Hi @<1523701523954012160:profile|ShallowCormorant89>
This is generally based on number of agents, or am I missing something ? Also is it based on Task or decorated functions ?

2 years ago

0 Can Anyone Recommend A Good Workflow For

exactly! it is very cool to see it in action, and it really works very well, kudos for these guys

2 years ago

0 Hey

The goal is to be able to run

docker-compose up

in CI, which starts a clearml-server. And then make several API calls to the started ClearML server to prove that the VS Code extension code is working.

Oh I see, if this is CI workflow, why not run in offline mode ?
None

2 years ago

0 Hi. Help

from the screenshot it looks like a "gs://" access issue?
could that be it?

3 years ago

0 Is There A Way Clearml Can Be Stopped From Updating Dependencies When Cloning?

BroadSeaturtle49 agent RC is out with a fix:
pip3 install clearml-agent==1.5.0rc0Let me know if it solved the issue

3 years ago

0 Please Tell Me, When Migrating A Local Server, We Have Problems That The Saved Images Are Not Displayed, It Says "Unable To Load Image" And Links To The Old Server If You Click "Copy Image Url" Or "Open Image". The Migration Was Done According To Backup'

CheerfulGorilla72

yes, IP-based access,

hmm so this is the main downside of using IP based server, the links (debug images, models, artifacts) store the full URL (e.g. http://IP:8081/ http://IP:8081/... ) This means if you switched IP they will no longer work. Any chance to fix the new server to the old IP?
(the other option is somehow edit the DB with the links, I guess doable but quite risky)

3 years ago

0 Hi, Does Anyone Use Mlflow / Weight & Biases /

This is a horrible setup, it means no authentication will pass, it will literally break every JWT authentication scheme

5 years ago

0 Hi All, A Newbie Question: How Can I Store Single Value Results Per Experiment That Will Appear As Metrics I Can Select In The Experiments Tables Columns. My Reference Is The "Tracking Leaderboards" Tutorial.

Notice that if you are using TB, everything you report to the TB will appear as well 🙂

4 years ago

0 Hi, Is Clearml Support Creating New Tasks While In Offline Mode? I'M Trying To Run The Following:

Yes, offline got broken in 1.3.0 😞 , RC fixed it:
pip install clearml==1.3.1rc0Stable release later this week

3 years ago

0 Hi

Thanks SarcasticSparrow10 !
I'll later reply the Github issue (for better visibility)
But my initial thoughts:
(1) I think this was suggested, and hopefully we will get to implementing it, I can definitely see the value. Meanwhile you can achieve some of the functionality with the experiment table and custom columns 🙂
(2) "Don't display the performance metric" -> isn't that important? what am I missing?
(3) Hmm you mean just extra columns?
(4) sounds like a bug
(5) is this a plotly issue?...

4 years ago

0 I Have A Question About The Clean Up Script. The Cleanup Service Can Remove Model Checkpoints That Are Saved Somewhere On Disk. However The Cleanup Service Is Also Running In A Docker Container. How Is It Possible That The Cleanup Service Has Access And C

Hi GreasyPenguin14

However the cleanup service is also running in a docker container. How is it possible that the cleanup service has access and can remove these model checkpoints?

The easiest solution is to launch the cleanup script with a mount point from the storage directory, to inside the container ( -v <host_folder>:<container_folder> )
The other option, which clearml version 1.0 and above supports, is using the Task.delete, that now supports deleting the artifacts and mod...

4 years ago

0 Brand New User Here. I’M Trying To Run An Optimization Task. The Tasks Resulting From The Optimization All Fail Because A Necessary Package Is Not Installed On Them. I Checked The Template Task And The List Of “Installed Packages” Indeed Does Not Have One

Yey!
Out of curiosity, what's the workflow with snowflake?

3 years ago

0 Hello! I'M Trying To Make A Simple Eval.Py Script That Will Go Pull The Best Model Of A Given Experiment, Load It Locally And Evaluate It On Whatever Data I Give. Question 1: Is There A Standard Way Documented Somewhere To Do This? Question 2: I'M Loadin

That might be me, let me check...

3 years ago

0 Hi, I Am Having Difficulties When Using The Dataset Functionality. I Am Trying To Create A Dataset With The Following Simple Code:

too large to be stored in the .cache path? It will be stored there anyway?

oh that is exactly why the latest release supports chunks, so you can get a partial copy 🙂
nonetheless, the assumption is that you will have to end up with the data locally, otherwise the network becomes a huge bottleneck
make sense ?

4 years ago

0 Hi! I Am Currently Using Clearml (With Remote Execution), To Train An Object Detection Model With

Thanks NonchalantDeer14 !
BTW: how do you submit the multi GPU job? Is it multi-gpu or multi node ?

4 years ago

0 Can Anyone Recommend A Good Workflow For

do you have a video showing the use case for clearml-session

I totally think we should, I'll pass it along 🙂

what is the difference between vscode via clearml-session and vscode via remote ssh extension ?

Nice! remote vscode is usually thought of as SSH, basically you have your vscode running on your machine, and using SSH vscode automatically connects to the remote machine.

Clearml-Session also ads a new capability VSCode inside your browser, where the VSCode itself as well...

2 years ago

0 After I Finish Training A Model, I Want To Call Logger.Report_Scalars To Help Monitor Inferencing Status (We Do A Lot Of Batch) But After The Model Finishes Training, Scalars Are No Longer Accepted By The Task As It Is Considered Completed. Help!

Hi @<1523711619815706624:profile|StrangePelican34>
You can either report on the Model itself:
None
or you can force it on the Task:

task = Task.get_task("task id here")
task.mark_started(force=True)
task.get_logger().report_scalar(...)
task.mark_completed(force=True)

2 years ago

0 Hello. It'D Be Really Helpful If Someone Could Let Me Know Why I Keep Getting "Misconfigurationexception('No Supported Gpu Backend Found!')" Error. I Am Using "Task.Execute_Remotely(Queue_Name="Default", Exit_Process=True)". Once It Gets Queued, I Clone I

Hi @<1715175986749771776:profile|FuzzySeaanemone21>

and then run "clearml-agent daemon --gpus 0 --queue gcp-l4" to start the worker.

I'm assuming the docker service cannot spin a container with GPU access, usually this means you are missing the nvidia docker runtime component

one year ago

0 I Just Deployed Clearml Into K8 Cluster Using Clearml Helm Package. When I Ran A Job, It Gave This Error In The Clearml Web Server (Attached Below). I Sshed Into The Pod Running The Clearml-Agent. Upon Typing Clearml-Agent Init, I Realised The Clearml.Con

DeliciousBluewhale87

Upon ssh-ing into the folders in the both the physical node (/opt/clearml/agent) and the pod (/root/.clearml), it seems there are some files there..

Hmm that means it is working...
Do you see there a *.conf files? What do they contain? (it point to the correct clearml-server config)

4 years ago

Wait, that makes no sense to me. The API from python and the API from the UI are getting the same data from the backend ...
What are you getting with?
from clearml import Task task = Task.get_task(task_id=<put task id here>) print(task.models)

3 years ago

0 Hey, I'M Trying To Set Up A Clearml Server On Docker As Per Documentation. Everything Goes Well Until The Docker-Compose Up Step, That'S When I Get This Error; Error: Error Pulling Image Configuration: Download Failed After Attempts=6: X509: Certificate

hurray 🎊

3 years ago

Show more results