AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Hi Everyone! I Try To Run Pytorch Lightning Code On Slurm With Srun Script Like This (

I think non-master processes trying to log something, but have no Logger instance because have no Task instance.

Hmm is your code calling Logger.current_logger() directly ?

Logs in master process include all training history or I need to concatenate logs from different nodes somehow?

So the main problem is that you need to pass the TASK ID that the master node creates to the second node, so it can report to the same Task.
I know that the enterprise version of ClearML support...

2 years ago

0 Hi!

Hi EagerOtter28
The agent knows how to do the http->ssh conversion on the fly, in your cleaml.conf (on the agent's machine) set force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/42606d9247afbbd510dc93eeee966ddf34bb0312/docs/clearml.conf#L25

4 years ago

0 Hi, I'M Trying To Set Storage Manager To Use Our Internal Miniio Installation But I Ran Into This Issue With This Testing Code:

Yes 🙂
BTW: do you guys do remote machine development (i.e. Jupyter / vscode-server) ?

4 years ago

0 Hi, I'M Trying To Set Storage Manager To Use Our Internal Miniio Installation But I Ran Into This Issue With This Testing Code:

Sounds good to me 🙂

4 years ago

0 Hello! I Get The Following Error In Results->Console After A Task Is Sent For Remote Execution (Using Sdk):

Woot woot! 🤩

3 years ago

0 Hi Guys, Just Wondering If Anyone Encountered This Error When Using The Pipeline Controller Object. I Simply Added A Step With The Step-Name And Base_Task_Id As Flags.

Hi AverageBee39
What's the clearml-server and clearml packge you are using ?
(I looks like some capability that is missing from the server, i.e. needs upgrade ?!)

4 years ago

0 Hey! I Have My Custom Model, That Uses Models From Populars Frameworks Inside, Such As Lgbm, Catboost Etc. Also It Have Multiple Instances Of One Models Of One Framework.

EnviousPanda91 'connect' will log the object properties, the automagic logging is controlled in the Task.init call. Specifically Which framework produces metrics that are not logged? Your sample code manually reports some scalars/values, do you these as well?

3 years ago

0 Hi,Guys, I Have Some Questions: 1. Can I Backup All My Experiments? 2. Can I Add My Old Experiments To A New Server? 3. Can I Add Some Information To One Experiment Which Was Finished(Maybe I Want To Reevaluate Some Model)?

Hi SubstantialBaldeagle49
yes, you can backup the entire trains-server (see the github docs on how) You mean upgrading the server? Yes, you can change the name or add comments (Info tab / description ), and you can add key/value description (under the configuration tab, see user properties)

5 years ago

0 Can Someone Help Me With Deploying This Example Model (From Triton Inference Server) Deployed In Clearml-Serving? Too Many Random Errors For Me To Figure It Out

mode.savemodel ?

4 years ago

0 Hi, When I Run A Pipeline, The Artifacts From My Tasks Are Saved On My Ec2 Server. When I Archive The Pipeline And Go Into The Archive And Delete The Pipeline, The Artifacts Are Not Deleted. It Looks Like They Are Only Deleted If I Delete The Whole Pipel

Yep only in 1.7 🙂

2 years ago

0 We Have A Environment Variables Definitions.Py File Which Every User Configures On Their Local Machine. This File Includes Local Paths As Well As Aws/Api Credentials. This Is An Issue When Spinning Up Clearml Tasks Since It Is Not Included In The Git Repo

CloudySwallow27 okay essentially this defs file is kind of a user "secret vault" for access credentials, is that correct?

3 years ago

0 Hi, I Am Running Clearml Agent Using Sdk. When I Run A Remote Job On This Clearml Agent, The Venv Setup Is Totally Based On My Requirements.Txt Instead Of Adding On To What The Image Has Before. Why?

Are you running inside a kubernetes cluster ?

2 years ago

0 Anyone Doing Sagemaker With Clearml - Something Like The K8S Glue But The Tasks Are Pulled Into Sagemaker Training Jobs

I think my main point is, k8s glue on aks or gke basically takes care of spinning new nodes, as the k8s service does that. Aws autoscaler is kind of a replacement , make sense?

4 years ago

0 Question About The Usage Of Trains Agents. In Our Company We Have 3 Hpc Servers, Two Of Them Have Multiple Gpus, One Is Cpu Only. I Saw In The Docs The Multiple Agents Can Be Run Separately Assigning Gpus In Whatever Manner You Want. My Questions Are 1

Hi WackyRabbit7 ,
Running in Docker mode provides you greater flexibility in terms of environment control, from switching cuda versions, to pre-compiled packages that are needed (think apt-get) etc. Specifically for DL if you are using multiple tensorflow versions, they are notorious for compiling against a specific CUDA version, and the only easy way to be able to switch between them would be different dockers. If your are a PyTorch user, then you are in luck, they have all the pytorch ver...

5 years ago

0 Hello, I'M Using A Virtual Environment Inside My Jupyterhub Server Along With Clearml. Whenever I Create Any Task The "Uncommitted Changes" Are The Contents Of

@<1535793988726951936:profile|YummyElephant76> oh you mean like jupyter server was running, then inside the notebook you would start a new venv, in that venv "notebook" package was missing, hence it failed detecting the notebook ?

2 years ago

0 Hi, Relating To The

Oh 😢 yes this is not good, let me see if we can quickly fix that

2 years ago

0 Hi All, I Am Starting To Use Clearml-Agent. Run It With

Let me check if we can hack something...

4 years ago

0 Hi, I Think I Found A Bug: In The

StaleKangaroo85 check https://demoapp.trains.allegro.ai/projects/0e152d03acf94ae4bb1f3787e293a9f5/experiments/193ac2bced184c49a57658fceb4bd7f9/info-output/metrics/plots?columns=type&columns=name&columns=status&columns=project.name&columns=user.name&columns=started&columns=last_update&columns=last_iteration&order=last_update on the demo server, seems okay to me...

5 years ago

0 Is There Any Api Reference? Somewhere In The Docs I Can See The Signature Of Methods/Classes And See What Arguments They Accept And Description? Before I'M Rushing To Ask Questions Here Myself, I'D Prefer To Do As Much Learning As I Can Through The Docs

Hi WackyRabbit7
First always check the functions on the Task object, they are the most straight forward access to the system.
Then if you need general purpose API calls, currently they are only documented in the doc-string of the API schema (that said it should be quite documented)
You can check all the endpoints https://github.com/allegroai/trains/tree/master/trains/backend_api/services/v2_8
And finally if you want to easily use the RestAPI :
` from trains.backend_api.session.client impo...

5 years ago

0 Hey All! Ive Gone Through The Doco And Not Found Anything At The Moment, But Does Clearml Have Model Versioning And Staging (Similar To Mlflow).

LudicrousParrot69
I "think" I have a better handle on what you wish to do.
Is it kind of generic "serving" solution?
FYI:
Model artifact is, usually, a weights/model file. The idea that later you will be able to access it and serve it. Now the problem is (and I think this is what you are referring to) there is usually a specific piece of code tied to that model that can use it (a.k.a pyfunc)
A few ideas:
These days everyone is trying to build their models with generic interface, so that scik...

4 years ago

0 Is There Some Automated Migration For Existing Tasks From Other Mlops Frameworks To Clearml? (Specifically, Interested In Migrating From Mlflow)

UnevenDolphin73 something like this one?
https://github.com/allegroai/clearml/pull/225

3 years ago

0 When Using Something Like Pdf2Image Which Requires Poppler (Which Can Be Installed With Conda), How Can I Ensure That The Task Can Run On An Agent Correctly? As Of Now It Doesn’T Know About Poppler

Hi JealousParrot68
spinning the clearml-agent with docker support (i.e. each experiment is running inside its own container):
https://clear.ml/docs/latest/docs/clearml_agent#docker-mode
Basically you can specify a default docker to use (per agent) and a specific docker container to use per Task (configured in the UI under execution at the bottom)

4 years ago

0 Hi Guys, Just Wondering If Anyone Encountered This Error When Using The Pipeline Controller Object. I Simply Added A Step With The Step-Name And Base_Task_Id As Flags.

make sure you follow all the steps :
https://clear.ml/docs/latest/docs/deploying_clearml/upgrade_server_linux_mac
(basically make sure you get the latest docker-compose.yml and the pull it
curl -o /opt/clearml/docker-compose.yml docker-compose -f /opt/clearml/docker-compose.yml pull docker-compose -f /opt/clearml/docker-compose.yml up -d

4 years ago

0 Potential Feature Request: Having The Parallel Coordinates Plot Available From The Hp Parent Task. Right Now, If I Want To See The Parallel Coord Plot (Shown Below), I Have To Manually Select All Trials In A Hpo Run > Compare > Hyperparameters > Parallel

LudicrousParrot69 this is implementation issue, this entire page is based on "task comparison" single Task means totally different interface for querying the data 🙂

4 years ago

0 I'M Using Trains Hyperparameter Optimizer, As In This Example:

BeefyCow3 if you are trying to optimizer a specific metric (i.e. a scalar on a graph). The template Task should report it with the same title/series combination, which should be easy enough to verify in the UI 🙂
You can either report with Tensorboard or with the Trains Logger, either way will work.

5 years ago

0 Hi There, I Am Running A Clearml-Agent In Services Mode (With Docker) On A Machine With Two Disks: One With The Os (8Go, 91% Space Used) And One For The Data (100Go, 40% Space Used). When Executing The Auto-Scaler Task In This Agent, I Get The Following E

JitteryCoyote63 it should just "freeze" after a while as it will constantly try to resend logs. Basically you should be fine 🙂
(If for some reason something crashed, please let me know so we can fix it)

4 years ago

🤞

4 years ago

0 I Am Trying To Use

i.e. you can:
curl

4 years ago

JitteryCoyote63 I think that with 0.17.2 we stopped mounting the venv build to the host machine. Which means it is all stored inside the docker.

4 years ago

it will constantly try to resend logs

Notice this happens in the background, in theory you will just get stderr messages when it fails to send but the training should continue

4 years ago

Show more results