JitteryCoyote63

214 Questions, 1021 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

979 × Eureka!

Questions 214
Answers 1021

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Looks Like Trains-Agent 0.16

Looks like trains-agent 0.16 doesn't support --install-globally documented parameter -> Only available for trains-agent build command. Would it be possible t...

clearml

4 years ago

0 Votes

3 Answers

988 Views

0 Votes 3 Answers 988 Views

Hi, In The Context Of Multi-Gpu Training, Is

Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others

clearml

3 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi, I Face A Strange Behavior From The Clearml-Agent: It’S Running In Services Mode, Not In Docker Mode, Cpu Only. I Want To Execute Two Tasks On This Service Agent. One Works, The Other Always Fails After Being Enqueued And Picked By The Agent With The E

Hi, I face a strange behavior from the clearml-agent: it’s running in services mode, not in docker mode, cpu only. I want to execute two tasks on this servic...

mlops

3 years ago

0 Votes

12 Answers

1K Views

0 Votes 12 Answers 1K Views

Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Hey there, since a bit I often find experiments being stuck while training a model. It seems to happen randomly and I could not find a reproducible scenario ...

mlops

2 years ago

0 Votes

26 Answers

1K Views

0 Votes 26 Answers 1K Views

Hi, I Attached An Iam Role To An Ec2 Instance To Grant Access To An S3 Bucket. The Ec2 Instance Is Running A Clearml-Agent (V1.1.0). I Didn’T Specify Any Key/Secret For Clearml. The Tasks Fail With The Following Error:

Hi, I attached an IAM role to an ec2 instance to grant access to an s3 bucket. The ec2 instance is running a clearml-agent (v1.1.0). I didn’t specify any key...

aws

3 years ago

0 Votes

2 Answers

921 Views

0 Votes 2 Answers 921 Views

Hey There

Hey there 🙂 Still my journey to deploy the aws-autoscaler with spot instances, I have another question: I would like to limit the amount of time spent setti...

mlops

3 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, How Does

Hi, how does agent.enable_git_ask_pass works? I am using the clearml-agent in docker mode and my experiment is stuck at downloading a private dependency: Clo...

mlops

one year ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, I Have A Configuration File That I Read And Connect To My Training Tasks. I Cannot Use

Hi, I have a configuration file that I read and connect to my training tasks. I cannot use config = task.get_parameters_as_dict()["General"]["param"]["nested...

clearml

3 years ago

0 Votes

7 Answers

976 Views

0 Votes 7 Answers 976 Views

Hi, I Am Currently Using

Hi, I am currently using CLEARML_AGENT_GIT_USER and CLEARML_AGENT_GIT_PASS when starting my clearml-agent and I would like to switch to using a single auth t...

clearml

2 years ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

Hi, I Want To Upgrade Clearml Server From 1.1 To 1.2 (Self Hosted). I Have The Following Setup:

Hi, I want to upgrade clearml server from 1.1 to 1.2 (self hosted). I have the following setup: /dev/nvme0n1p1 30G 21G 8.9G 70% / <- This is where /opt/clear...

clearml

2 years ago

0 Votes

1 Answers

900 Views

0 Votes 1 Answers 900 Views

Is It Possible To Shutdown The Clearml Server, Upgrade To V1, Restart It While Experiments Are Running? Or Is It Dancing With The Devil?

Is it possible to shutdown the clearml server, upgrade to v1, restart it while experiments are running? Or is it dancing with the devil? 😄

clearml

3 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, Is It Possible To Start A Clearml-Agent (Not In Docker Mode) On A Machine With A Gpu, But Enforce The Clearml-Agent To Not “See” The Gpu? So That The Experiments Run By This Agent Fail If They Try To Access A Gpu? Like The

Hi, is it possible to start a clearml-agent (not in docker mode) on a machine with a gpu, but enforce the clearml-agent to not “see” the gpu? So that the exp...

mlops

3 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

How Can I Filter Out Archived Tasks With Task.Get_Tasks?

How can I filter out archived tasks with Task.get_tasks?

clearml

3 years ago

0 Votes

2 Answers

996 Views

0 Votes 2 Answers 996 Views

Is There An Option To Make Trains-Agent Create Experiment Virtualenvs With

Is there an option to make trains-agent create experiment virtualenvs with --system-site-packages parameter?

clearml

4 years ago

0 Votes

1 Answers

606 Views

0 Votes 1 Answers 606 Views

Quick Question: Why Does Clearml-Server 1.15.0 Api-Server Python Package Require Es 8.12.0 But The Docker-Compose References Es 7.17.18?

Quick question: Why does clearml-server 1.15.0 api-server python package require ES 8.12.0 but the docker-compose references ES 7.17.18?

clearml

8 months ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hi, I Encountered A Bug On Clearml-Server 1.0.1: I Tried To Add In A Project Page A Custom Column In +Hyper Parameters > Args > Queue And Got An Error Pop Up With The Following Message:

Hi, I encountered a bug on clearml-server 1.0.1: I tried to add in a project page a custom column in +HYPER PARAMETERS > Args > queue and got an error pop up...

clearml

3 years ago

0 Votes

12 Answers

926 Views

0 Votes 12 Answers 926 Views

Hey, Would It Possible To Add An Option To Make

Hey, would it possible to add an option to make task.upload_artifact() blocking? (Not running in background)

clearml

4 years ago

0 Votes

2 Answers

923 Views

0 Votes 2 Answers 923 Views

Hi, Is It Possible To Get An Artifact From A Task And Force Not Using Local Cache? The Task Itself Updated The Artifact In The Meantime And I Cannot Get The Latest Version Of The Artifact. I Saw That

Hi, is it possible to get an artifact from a Task and force not using local cache? The task itself updated the artifact in the meantime and I cannot get the ...

clearml

3 years ago

0 Votes

2 Answers

926 Views

0 Votes 2 Answers 926 Views

Hi, Another Idea For Clearml Web Ui: In The Projects View, If I Have Several Experiments Being Enqueued And I Sort By “Started” Ascending (Newest On Top), I Expect To See Enqueued Experiments At The Very Top, While They Are Shown At The Very Bottom - Woul

Hi, another idea for ClearML web UI: in the projects view, if I have several experiments being enqueued and I sort by “Started” ascending (newest on top), I ...

clearml

3 years ago

0 Votes

3 Answers

961 Views

0 Votes 3 Answers 961 Views

Hi Clearml Team Members! Is There Any Progress Made On The Clearml-Serving Repo? I’D Love To Start Using It But I Lack A Straightforward Get Started Example. My Use Case Is The Following:

Hi ClearML team members! Is there any progress made on the clearml-serving repo? I’d love to start using it but I lack a straightforward get started example....

clearml

3 years ago

0 Votes

5 Answers

979 Views

0 Votes 5 Answers 979 Views

Hi There, I Would Like To Report A Bug With The Resizing Of The Columns In The Projects View: It Doesn’T Work As Expected. Please Look At The Behavior Of The Resizing On The Following Screen Recording

Hi there, I would like to report a bug with the resizing of the columns in the projects view: it doesn’t work as expected. Please look at the behavior of the...

clearml

3 years ago

0 Votes

15 Answers

1K Views

0 Votes 15 Answers 1K Views

Hi, I Restarted My Clearml-Server (1.1.0) And The Login Page Always Redirects Me To The Login Page. I Am Using Fixed Users In Config Files. In The Logs Of The Api Server I Can See:

Hi, I restarted my clearml-server (1.1.0) and the login page always redirects me to the login page. I am using fixed users in config files. In the logs of th...

clearml

3 years ago

0 Votes

4 Answers

950 Views

0 Votes 4 Answers 950 Views

Hey, I Would Like My Experiment To Call At Some Point A Cli Program Installed As A Dependency Of The Experiment. Here Is What I Do:

Hey, I would like my experiment to call at some point a CLI program installed as a dependency of the experiment. Here is what I do: myTask = Task.init(...) i...

clearml

4 years ago

0 Votes

5 Answers

941 Views

0 Votes 5 Answers 941 Views

Hi Guys, I Would Like To Start Using The Aws Autoscaler Shipped In Trains. I Need To Create A Iam User To Get And I Would Like To Know What Are The Minimal Permissions Required For The Autoscaler To Work?

Hi guys, I would like to start using the AWS autoscaler shipped in trains. I need to create a IAM user to get and I would like to know what are the minimal p...

mlops

4 years ago

0 Votes

7 Answers

1K Views

0 Votes 7 Answers 1K Views

Hi, I Have A Question Regarding The Aws_Autoscaler: It Usually Takes ~Hours To Get A Gpu Instance Nowadays. I Was Thinking, It Would Be Much More Interesting To Stop The Instances (Clearml-Agents) Instead Of Terminating Them Once They Are Inactive, So Tha

Hi, I have a question regarding the aws_autoscaler: It usually takes ~hours to get a GPU instance nowadays. I was thinking, it would be much more interesting...

mlops

2 years ago

0 Votes

1 Answers

976 Views

0 Votes 1 Answers 976 Views

Hi There, I Moved My Clearml Server From Us To Eu And Now I Am Trying To Setup The Aws Autoscaler With The Different Architecture That I Have Now. So Far I Used An Old Version

Hi there, I moved my ClearML server from US to EU and now I am trying to setup the AWS autoscaler with the different architecture that I have now. So far I u...

mlops

3 years ago

0 Votes

1 Answers

966 Views

0 Votes 1 Answers 966 Views

Hi, I Have A Clearml-Agent (1.1.2) In A G4Dn.4Xlarge Aws Instance (With One T4 Gpu), That Reports

Hi, I have a clearml-agent (1.1.2) in a g4dn.4xlarge AWS instance (with one T4 GPU), that reports agent.cuda_version = 0 agent.cudnn_version = 0and does not ...

clearml

2 years ago

0 Votes

14 Answers

1K Views

0 Votes 14 Answers 1K Views

Hi There, I Have A Bit Of A Problem With Aws Secrets: I Pass Keys As Env Var To Clearml-Agents To Retrieve Data From A Bucket In Us-East-1 But I Use A Bucket To Store Task Artifacts In A Bucket In Eu-Central-1. So When I Pass Aws Keys As Env Vars, The Tas

Hi there, I have a bit of a problem with AWS secrets: I pass keys as env var to clearml-agents to retrieve data from a bucket in us-east-1 but I use a bucket...

mlops

3 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi Guys, Last Night One Of Our Agents (0.16.1) Was Disconnected From Our Trains-Server While Executing An Experiment. I Saw That Because The Experiment It Was Running Had The Status Aborted And I Could Not See The Agent In The List Of Available Workers. H

Hi guys, Last night one of our agents (0.16.1) was disconnected from our trains-server while executing an experiment. I saw that because the experiment it wa...

mlops

4 years ago

0 Votes

1 Answers

900 Views

0 Votes 1 Answers 900 Views

Hey There

Hey there 🙂 Would in the WebUI, on an experiment CONFIGURATION tab, for a specific parameter, would it be possible not show its value as a single string whe...

clearml

2 years ago

Show more results

0 Hey, I Have One Question Regarding The Cleanup_Service Task In The Devops Project: Does It Assume That The Agent In Services Mode Is In The Trains-Server Machine?

Thanks TimelyPenguin76 and AgitatedDove14 ! I would like to delete artifacts/models related to the old archived experiments, but they are stored on s3. Would that be possible?

4 years ago

0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Hi AgitatedDove14 , sorry somehow this message got lost 😄
clearml version is the latest at the time, 1.7.1 Yes, I always see the "model uploaded completed" for such stuck tasks I am using python 3.8.10

2 years ago

0 Hi There, Is It Possible To Configure The Clearml-Agent To Run Some Commands Before Running Each Experiment It Launches? Eg.

yes but they are in plain text and I would like to avoid that

3 years ago

0 Hi, From Within An Experiment, How Can I Intercept The Signal That The Experiment Was Aborted And Execute A Cleanup Function? I Tried To Intercept Sigint And Sigterm, Unsuccessfully:

How exactly is the clearml-agent killing the task?

2 years ago

0 Hi, From Within An Experiment, How Can I Intercept The Signal That The Experiment Was Aborted And Execute A Cleanup Function? I Tried To Intercept Sigint And Sigterm, Unsuccessfully:

yes

2 years ago

0 Hi, I Encounter A Weird Behavior: I Have A Task A That Schedules A Task B. Task B Is Executed On An Agent, But With An Old Commit

The task I cloned from is not the one I though

4 years ago

mmmh good point actually, I didn’t think about it

2 years ago

0 Hi, Is It Possible To Disable Some Of The System Metrics Monitored? And Also Downsample The Rate Of Logging?

AgitatedDove14 I see that the default sample_frequency_per_sec=2. , but in the UI, I see that there isn’t such resolution (ie. it logs every ~120 iterations, corresponding to ~30 secs.) What is the difference with report_frequency_sec=30. ?

3 years ago

0 Hi, Is It Possible To Disable Some Of The System Metrics Monitored? And Also Downsample The Rate Of Logging?

Nice, thanks!

3 years ago

and just run the same code I run production

2 years ago

0 How Can I Do The Following? (Basically, Filtering By Task Type)

I found, the filter actually has to be an iterable:
Task.get_tasks(project_name="my-project", task_name="my-task", task_filter=dict(type=["training"])))

4 years ago

0 How Can I Do The Following? (Basically, Filtering By Task Type)

Thanks!

4 years ago

0 Hi There, I Have A Problem With Pyjwt: I Am Using

Note: I can verify that post_packages is well picked up by the trains-agent, since in the experiment log I see:
agent.package_manager.type = pip agent.package_manager.pip_version = \=\=20.2.3 agent.package_manager.system_site_packages = true agent.package_manager.force_upgrade = false agent.package_manager.post_packages.0 = PyJWT\=\=1.7.1

3 years ago

0 Hi, I Am Trying To Use The Clearml-Agent In Docker Mode To Run An Experiment, But It Seems To Fail Passing The Clearml.Conf File To The Docker Container:

The rest of the configuration is set with env variables

one year ago

0 Hi, I Face A Strange Behavior From The Clearml-Agent: It’S Running In Services Mode, Not In Docker Mode, Cpu Only. I Want To Execute Two Tasks On This Service Agent. One Works, The Other Always Fails After Being Enqueued And Picked By The Agent With The E

Oof now I cannot start the second controller in the services queue on the same second machine, it fails with
` Processing /tmp/build/80754af9/cffi_1605538068321/work
ERROR: Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: '/tmp/build/80754af9/cffi_1605538068321/work'
clearml_agent: ERROR: Could not install task requirements!
Command '['/home/machine/.clearml/venvs-builds.1.3/3.6/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r'...

3 years ago

Alright SuccessfulKoala55 I was able to make it work by downgrading clearml-agent to 0.17.2

3 years ago

Actually I think I am approaching the problem from the wrong angle

2 years ago

Can I simply set agent.python_binary = path/to/conda/python3.6 ?

3 years ago

Ok, deleting installed packages list worked for the first task

3 years ago

in clearml.conf:
agent.package_manager.system_site_packages = true agent.package_manager.pip_version = "==20.2.3"

3 years ago

SInce it fails on the first machine (clearml-server), I try to run it on another, on-prem machine (also used as an agent)

3 years ago

0 Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

After I started clearml-session

2 years ago

I execute the clearml-agent this way:
/home/machine/miniconda3/envs/py36/bin/python3 /home/machine/miniconda3/envs/py36/bin/clearml-agent daemon --services-mode --cpu-only --queue services --create-queue --log-level DEBUG --detached

3 years ago

yes exactly

3 years ago

0 Hi, I Am Trying To Use The Clearml-Agent In Docker Mode To Run An Experiment, But It Seems To Fail Passing The Clearml.Conf File To The Docker Container:

might be worth documenting 😄

one year ago

0 Hey There, Happy New Year To All Of You

Hi AgitatedDove14 , thanks for the answer! I will try adding 'multiprocessing_context='forkserver' to the DataLoader. In the issue you linked, nirraviv mentionned that forkserver was slower and shared a link to another issue https://github.com/pytorch/pytorch/issues/15849#issuecomment-573921048 where someone implemented a fast variant of the DataLoader to overcome the speed problem.
Did you experiment any drop of performances using forkserver? If yes, did you test the variant suggested i...

3 years ago

0 Hi There,

Ok interestingly using matplotlib.use('agg') it doesn't leak (idea from here )

one year ago

0 Hi, I Would Like To Report Something Else Weird In The Clearml-Agent 1.5.1 Running In Docker Mode: In The Logs, When It Dumps Its Config, It Writes:

Hi SuccessfulKoala55 , not really wrong, rather I don't understand it, the docker image with the args after it

one year ago

0 Hi, I Encounter A Weird Behavior: I Have A Task A That Schedules A Task B. Task B Is Executed On An Agent, But With An Old Commit

here is the function used to create the task:

` def schedule_task(parent_task: Task,
task_type: str = None,
entry_point: str = None,
force_requirements: List[str] = None,
queue_name="default",
working_dir: str = ".",
extra_params=None,
wait_for_status: bool = False,
raise_on_status: Iterable[Task.TaskStatusEnum] = (Task.TaskStatusEnum.failed, Task.Ta...

4 years ago

0 Hi, I Encounter A Weird Behavior: I Have A Task A That Schedules A Task B. Task B Is Executed On An Agent, But With An Old Commit

In execution tab, I see old commit, in logs, I see an empty branch and the old commit

4 years ago

Show more results