AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi All, I'M Using Clearml 1.0.3 With Clearml-Server <1 (How Do I Get The Current Running Version?) In Pytorch-Lightning I Use Ddp And I See Multiple Tasks (As The Number Of Gpus) Being Created And Remaining In Draft Mode. Is It A Problem Running Clearml

pip install clearml==1.0.4rc1

3 years ago

0 Hi All

Hi CooperativeFox72
Sure 🙂
task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)

3 years ago

0 Hi

DilapidatedDucks58

all our workers went down after starting the slack bot, is it expected?)

Oh dear... I can;t see any connection... What is the last log you have there?

4 years ago

0 Hi, I Have A Local Package That I Use To Train My Models. To Start Training, I Have A Script That Calls

JitteryCoyote63 I think that without specifically adding torch to the requirements, the agent will not be able to automatically resolve the correct cuda/torch version. Basically you should add torch to the requirements.txt file, and provide it to Task create, or use Task.force_requirements_env_freeze

2 years ago

0 Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

JitteryCoyote63 did you add the bash script here: https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L99

4 years ago

0 Hi! I Am Trying To Build And Run A Pipeline. I Pass My Dataset As Parameter Of Pipeline:

I pass my dataset as parameter of pipeline:

@<1523704757024198656:profile|MysteriousWalrus11> I think you were expecting the dataset_df dataframe to be automatically serialized and passed, is that correct ?
If you are using add_step, all arguments are simple types (i.e. str, int etc.)
If you want to pass complex types, your code should be able to upload it as an artifact and then you can pass the artifact url (or name) for the next step.

Another option is to use pipeline from dec...

one year ago

0 Hello! I'M Trying To Make A Simple Eval.Py Script That Will Go Pull The Best Model Of A Given Experiment, Load It Locally And Evaluate It On Whatever Data I Give. Question 1: Is There A Standard Way Documented Somewhere To Do This? Question 2: I'M Loadin

Hi MistakenDragonfly51
Notice that Models are their own entity, you can query them based on tags/projects/names etc.
Querying and getting Models is done by Model class:
https://clear.ml/docs/latest/docs/references/sdk/model_model#modelquery_models

task.get_models()

is always empty. (edited)

How come there are no Models on the Task? (in other words how come this is empty?)

one year ago

0 Hi, Can I Somewhere Setup Urls That Are Shown In

Hi MelancholyChicken65
I'm not sure you an control it, the ui deduces the URL based on the address you are browsing to: so if you go yo http://app.clearml.example.com you will get the correct ones, but you have to put them on the right subdomains:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#subdomain-configuration

one year ago

0 Heya, Is There Any Plan For Clearml To Leverage The New

for a TPU with more than 16GB GRAM and less than 40GB, so sometime we need to provision a A100 to get the training speed we want but we don't use all the GRAM

Oh that makes sense...
Just saw this one, this might help?
https://www.globenewswire.com/news-release/2022/10/24/2539924/0/en/ClearML-and-Genesis-Cloud-Announce-New-MLOps-Partnership-Delivering-100-Green-Energy-Compute-Solution-for-Machine-Learning.html

one year ago

0 Why Does My Task Execution Freeze After Pip Installation (Running Agent In Foreground Mode)? The Task Is:

Yes it seems so 😞

2 years ago

0 Warning:Root:Could Not Delete Task Id=6Cd7F02Be36C4361965Adf9F027Bcda5, Task Id "6Cd7F02Be36C4361965Adf9F027Bcda5" Could Not Be Found 2021-07-15 20:58:48,046 - Clearml.Task - Error - Action Failed <400/101: Tasks.Get_By_Id/V1.0 (Invalid Task Id: Id=Ff308E

Seems like a Task contained an invalid artifact link.
I wouldn't sweat over it, it basically a warning that it could not locate the actual file to delete (albeit an ugly warning 🙂 )
I think AnxiousSeal95 would know when will the new version be ready.
regardless, is it actually deleting old Tasks ?

3 years ago

0 Hi, We Are Having Some Issues With Model Snapshots Uploading To The Fileserver. We Configured Sdk.Development.Default_Output_Uri To Point To Our File Server, And When We Run Some Experiment We Can See Under The Models Tab Some Url Pointing To

Hi RipeGoose2
Any logs on the console ?
Could you test with a dummy example on the demoserver ?

3 years ago

0 Hi, I'M Trying To Install A New Server, This Is A Fresh Ubuntu 18.04 Install. When I Try To Run The Docker Composer Up Command I Get Error Messages Like This One:

CourageousLizard33 specifically section (4) is the issue (and it's related to any elastic docker, nothing specific to trains-server)
echo "vm.max_map_count=262144" > /tmp/99-trains.conf sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf sudo sysctl -w vm.max_map_count=262144 sudo service docker restartDid you try the above, and you are still getting the same error ?

4 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

MysteriousBee56 what do you mean "delete a worker"
stop the agent running remotely ?

4 years ago

0 Another One: What Is The Difference Between Task.Connect() And Task.Set_Parameter?

Task.connect is "automagic" i.e. to server when in Manual mode, from server in agent mode,
set_parameter is one way only and should be used to set an external Task's parameters.

4 years ago

0 I Have A Notebook Which Is Uncommited. It Is Being Run On A Remote Machine With Clearml-Agent Through Clearml-Session. Everything With Newest Versions, Server Is Community-Hosted. Under Uncommitted Changes I See

Hi FiercePenguin76
It seems it fails detecting the notebook server and thinks this is a "script running".
What is exactly your setup?
docker image ?
jupyter-lab version ?
clearml version?
Also are you getting any warning when calling Task.init ?

3 years ago

0 In A Nutshell, What Do I Need For The Clearml Agent To Scale Ec2 Nodes In The K8 Cluster, In Terms Of Helm Configuration? I Assume Aws Credentials, Is There Anything Else?

BoredHedgehog47 you need to configure the clearml k8s glue to spin pods (instead of allocating agents per pods statically) does that make sense ?

2 years ago

0 Our Mac Users Are Having Some Issues. They Have Their Respective ~/Clearml.Conf, And Yet They Get: Clearml 1.1.5

A quick fix will be:
` import dotenv
dotenv.load_dotenv('~/.env')
from clearml import Task # Now we can load it.
import argparse

if name == "main":
# do stuff `wdyt?

2 years ago

0 What Is Being Stored Exactly In

my question is how to recover, must i recreate the agents or there is another way?

Yes you have to recreate the Task (I assume they failed, no?!)

2 years ago

0 Hi Everyone, Is It Possible To Show The Upload Progress Of Artificats? E.G. I Use

server-->agent is fast, but agent-->server is slow.

Then multiple connection will not help, this is the bottleneck of the upload speed of your machine, regardless of what the target is (file-server, S3, etc...)

2 years ago

0 Hi, I Have A File On Azure Blob, Which Will Be A Parent For Some Experiments, Which In Every One Of Them I Will Manipulate The Orig File. Now I Want To Create A Dataset, Define The Orig File As The Parent, And Then, While Creating Each Of The New Files, D

Notice the parents argument when creating a new Dataset

3 years ago

0 Is There Any Testing Suite That Ships With Clearml? If We'D Like To Make Some Unit Tests For Our Code?

UnevenDolphin73 are you saying offline does not work?

stream.write(msg + self.terminator) ValueError: I/O operation on closed file.This is internal python error, how come there is no stream?

one year ago

0 Hello There, I Am Trying To Organize The Dl Code Into A Monorepo, The Repo Will Have A Section Of Shared Packages That Will Be Used By Other Packages That Are The Actual Training Projects. Let'S Say That I Install The Shared Libs With Pip In Editable Mod

Then as you suggested, I would just use sys.path it is probably the easiest and actually very safe (because the subfolders are Always next to the "main" source code)

2 years ago

0 I'Ve Tried Setting Up A Clearml Application On Openshift Using The Helm Chart But The Pods Cannot Go Up Because They Are Trying To Write To Files And Directories That Aren'T Open To Non Root Users During Their Setup. This Is A Problem On Openshift Because

i've tried setting up a clearml application on openshift

First, my condolences 🙂 openshift ...
Second, what you need to make sure is that each container (i.e. ELK/Monogo etc) has their own PV for persistent storage , I'm assuming this is the root cause for the error.
Make sense to you ?

2 years ago

0 Hi All, Looking For Some Help When Executing Pipelines With Custom Docker Images. I Have A Component Defined And I Expect Its Python Runtime Environment To Be Managed By A Custom Docker Image (

If this is the case, there is nothing you need to change, just provide the docker image (no need to pass packages )

2 years ago

0 Web Server Ui Bug? When Trying To Extend The Width Of A Column In The Experiments Table, If You Try To Extend It More Then The Width Of The Column To The Right, It Doesn'T Do Anything..

Wait, how do I reproduce it on community server? Maybe it has something to do with number of columns ? Or whether it is already wider than the screen? What's your browser / OS ?

2 years ago

0 Web Server Ui Bug? When Trying To Extend The Width Of A Column In The Experiments Table, If You Try To Extend It More Then The Width Of The Column To The Right, It Doesn'T Do Anything..

Hmm I tested on chromium and it seemed to work, let me see if I can reproduce it...

2 years ago

0 Fatal: Could Not Read From Remote Repository. Please Make Sure You Have The Correct Access Rights And The Repository Exists.

where is it in the docs?

https://clear.ml/docs/latest/docs/clearml_agent (section 6)
https://clear.ml/docs/latest/docs/configs/clearml_conf#agent-section

2 years ago

0 Hi! I Noticed A Bug Related To Reusing The Same Component In A Pipeline. I Have Prepared A Mock Example So That You Can Reproduce It:

Thanks GiganticTurtle0
So the bug is "mock_step" is storing "NUMBER_2" argument value in the second instance?

2 years ago

0 Hey, I'M Looking Into The Aws Autoscaler. I Couldn'T Find The Task In My Ui, So I Ran The

no need for it actually

3 years ago

Show more results