JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

13 Answers

2K Views

0 Votes 13 Answers 2K Views

Hello, In The Following Context:

Hello, in the following context: controller_task = Task.init(...) # This will clone the parent task, enqueue and wait for finished status data_processing_tas...

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, In A Subproject, Would It Be Possible To Hide The Parent Project If It Is Empty?

Hi, in a subproject, would it be possible to hide the parent project if it is empty?

clearml

3 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hi, I Have A Question Regarding The Aws-Autoscaler: Am I Understanding Correctly That:

Hi, I have a question regarding the aws-autoscaler: am I understanding correctly that: max_idle_time_min=5 max_spin_up_time_min=10 polling_interval_time_min=...

mlops

4 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hi Guys, I Had Several Times Now The Following Errors Poping In Agents While Executing A Task:

Hi Guys, I had several times now the following errors poping in agents while executing a task: trains_agent: ERROR: Failed applying git diff: I attached the ...

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi Folks, Is It Possible To Use An Aws P3 Instance (Which As Several Gpus) With One Agent Per Gpu, All Controlled Through Clearml Aws Autoscheduler? So Clearml Aws Autoscheduler Would Know In Advance How Much Agents To Start In The Instances (Can Be An Op

Hi folks, Is it possible to use an aws p3 instance (which as several GPUs) with one agent per GPU, all controlled through ClearML AWS AutoScheduler? So Clear...

aws mlops

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi, What Happens Exactly When I Execute The Following Command:

Hi, what happens exactly when I execute the following command: trains-agent daemon --gpus 0 --queue default &In my code, how to know which GPU to choose insi...

clearml

5 years ago

0 Votes

3 Answers

432 Views

0 Votes 3 Answers 432 Views

Hi There, I Am Trying To Setup Clearml To Use Uv As I Am Switching From Pip To Uv. I Am Now Blocked By The Following Issue: Clearml-Agent Won'T Pass The Args Registered When Creating The Experiment To The Task When Running It Remotely. I Do Something Like

Hi there, I am trying to setup clearml to use uv as I am switching from pip to uv. I am now blocked by the following issue: clearml-agent won't pass the args...

clearml

3 months ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hi There, Would It Be Possible To Add Some Neural Architecture Search Example, As For The Hyperparameter Optimizer Examples?

Hi there, would it be possible to add some Neural Architecture Search example, as for the HyperParameter Optimizer examples?

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, From Within An Experiment, How Can I Intercept The Signal That The Experiment Was Aborted And Execute A Cleanup Function? I Tried To Intercept Sigint And Sigterm, Unsuccessfully:

Hi, from within an experiment, how can I intercept the signal that the experiment was aborted and execute a cleanup function? I tried to intercept SIGINT and...

clearml

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, It Seems That The

Hi, It seems that the package_manager.pip_version has been removed from the https://allegro.ai/docs/references/trains_ref/#agent , although still being shown...

clearml

5 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Is It Possible To Shutdown The Clearml Server, Upgrade To V1, Restart It While Experiments Are Running? Or Is It Dancing With The Devil?

Is it possible to shutdown the clearml server, upgrade to v1, restart it while experiments are running? Or is it dancing with the devil? 😄

clearml

4 years ago

0 Votes

14 Answers

2K Views

0 Votes 14 Answers 2K Views

Hi There, I Have A Bit Of A Problem With Aws Secrets: I Pass Keys As Env Var To Clearml-Agents To Retrieve Data From A Bucket In Us-East-1 But I Use A Bucket To Store Task Artifacts In A Bucket In Eu-Central-1. So When I Pass Aws Keys As Env Vars, The Tas

Hi there, I have a bit of a problem with AWS secrets: I pass keys as env var to clearml-agents to retrieve data from a bucket in us-east-1 but I use a bucket...

mlops

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hey There, Happy New Year To All Of You

Hey there, happy new year to all of you 🍾 I have several tasks that are stuck while training a model with pytorch/ignite, more precisely right after uploadi...

mlops

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, One More Question: When Creating A Task With Task.Init(), We Can Specify The

Hi, one more question: When creating a task with Task.init(), we can specify the task_type . Now when using Task.clone(), we cannot specify the task_type (is...

clearml

5 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hey, What Is The Exact Difference Between

Hey, what is the exact difference between agent.package_manager.system_site_packages and trains-agent --install-globally ?

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Congrats On The Clearml-Serving 0.9.0 Release! I’Ll Try It For Sure!

Congrats on the clearml-serving 0.9.0 release! I’ll try it for sure!

clearml

3 years ago

0 Votes

19 Answers

2K Views

0 Votes 19 Answers 2K Views

Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

Hi again, I am trying to make the aws autoscaler work with ec2 instances, but it fails to setup the agent in the machine: the logs of the user-data script sh...

aws mlops

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi Guys; Another Idea: Would Be Very Cool To Have A Mattermost Alert (Monitor Task), Just Like The One For Slack. Have A Nice Week-End All

Hi guys; another idea: would be very cool to have a mattermost alert (monitor task), just like the one for Slack. Have a nice week-end all 👋

clearml

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

The “Manage Queue” Option In The Right Tab On A Queued Experiment Is Broken In V1.0 (It Does Nothing)

The “Manage queue” option in the right tab on a queued experiment is broken in v1.0 (it does nothing)

clearml

4 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi, How Does

Hi, how does agent.enable_git_ask_pass works? I am using the clearml-agent in docker mode and my experiment is stuck at downloading a private dependency: Clo...

mlops

2 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi, I Cannot Manage To Start Trains-Server 0.16 With The Docker-Compose File, The Trains-Elastic Container Fails With The Following Error:

Hi, I cannot manage to start trains-server 0.16 with the docker-compose file, the trains-elastic container fails with the following error:

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hello, Pytorch 1.8 Was Released, Bringing Amd Wheels With It > Pip Install Torch -F

Hello, Pytorch 1.8 was released, bringing AMD wheels with it > pip install torch -f https://download.pytorch.org/whl/rocm4.0.1/torch_stable.html Is ClearML s...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hello, What Is The Default Limit For Global Context ?

Hello, what is the default limit for global context ? https://allegro.ai/docs/storage_manager_storagemanager.html#trains.storage.manager.StorageManager.get_l...

clearml

5 years ago

0 Votes

13 Answers

2K Views

0 Votes 13 Answers 2K Views

Hi, I Update Recently To Clearml-Server 1.2 (Self Hosted), Great Job! I Am Seeing The Popup Asking For S3 Creds Often When Navigating In Debug Samples. I Set Them Multiple Times Under Settings > Configuration > Web App Cloud Access, But For Some Reason It

Hi, I update recently to clearml-server 1.2 (self hosted), great job! I am seeing the popup asking for s3 creds often when navigating in debug samples. I set...

clearml

3 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi Guys, Is It Possible To Spin Up Two Agents On One Gpu? Something Like

hi guys, is it possible to spin up two agents on one GPU? Something like trains-agent daemon --gpus 0 --queue default & trains-agent daemon --gpus 0 --queue ...

clearml

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

Hi guys, with the new venv caching available in clearml, I have the following problem: I force my pip requirements to be: torch==1.7.1 pytorch-ignite clearml...

clearml

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, In The Metric Snapshot Graph, Is It Possible To Scale The Y Axis To

Hi, in the Metric Snapshot graph, is it possible to scale the Y axis to [y_min *0.9, y_max * 1,1] ? currently all my values are flat at the same ~y and it is...

clearml

3 years ago

0 Votes

12 Answers

2K Views

0 Votes 12 Answers 2K Views

Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Hey there, since a bit I often find experiments being stuck while training a model. It seems to happen randomly and I could not find a reproducible scenario ...

mlops

3 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi There, Would It Be Possible For The Autoscaler To Support Stopping Instances Instead Of Terminating Them? My Use Case Is The Following: I Am Continuing My Journey With The Clearml-Session Tool, And In Case The Clearml-Session Is Running In A Ec2 Inst

Hi there, would it be possible for the autoscaler to support stopping instances instead of terminating them? My use case is the following: I am continuing my...

mlops remote-ssh

3 years ago

0 Votes

17 Answers

2K Views

0 Votes 17 Answers 2K Views

Hi There, I Have A Problem With Pyjwt: I Am Using

Hi there, I have a problem with PyJWT: I am using trains==0.16.4 and trains-agent==0.16.3 in my agents. I installed PyJWT==1.7.1 in the agent (through extra_...

mlops

4 years ago

Show more results

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

now I can do nvcc --version and I get
Cuda compilation tools, release 10.1, V10.1.243

4 years ago

0 Hey, Would It Possible To Add An Option To Make

awesome 🎉
Maybe then we can extend task.upload_artifact ?
def upload_artifact(..., wait_for_upload: bool = False): ... if wait_for_upload: self.flush(wait_for_uploads=True)

5 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

I killed both trains-agent and restarted one to have a clean start. This way it correctly spin up docker containers for services tasks. So probably the bug comes when a bug occurs while setting up a task, it cannot go back to the main task. I would need to do some tests to validate that hypothesis though

5 years ago

0 I Guess One Experiment Is Running Backwards In Time

ok, what is the 3.8 release? a server release? how does this number relates to the numbers above?

3 years ago

0 Hey, I Moved My Trains-Server To Another Machine, Zipping The /Opt/Trains/Data Folder As Described In The Docs

AgitatedDove14 Yes I have the xpack security disabled, as in the link you shared (note that its xpack.security.enabled: "false" with brackets around false), but this command throws:

{"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}

5 years ago

0 Hey! Would It Be Possible To Tag The Rc Releases In The Different Repos? So That One Knows What Is Inside?

That would be awesome 🎉

5 years ago

0 Hi! I Have A Question Regarding Performances Of The Clearml-Server: Are The Calls From The Agents Made Asynchronously/In A Non Blocking Separate Thread? Is The Connection To The Clearml-Server Expected To Be A Bottleneck If The Clearml-Server Is Far From

Is there one?

No, I rather wanted to understand how it worked behind the scene 🙂

The latest RC (0.17.5rc6) moved all logs into separate subprocess to improve speed with pytorch dataloaders

That’s awesome!

4 years ago

0 Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

there is no error from this side, I think the aws autoscaler just waits for the agent to connect, which will never happen since the agent won’t start because the userdata script fails

4 years ago

0 Hey, I Moved My Trains-Server To Another Machine, Zipping The /Opt/Trains/Data Folder As Described In The Docs

I think this is because this API is not available in elastic 5.6

5 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

If I remove security_group_ids and just let subnet_id in the configuration, it is not taken into account (the instances are created in a the default subnet)

4 years ago

0 Hi, How Can I Get The Logs From The Pytorch Ignite Early Stopping Handler To Be Logged In Clearml?

AgitatedDove14 I was able to redirect the logger by doing so:
clearml_logger = Task.current_task().get_logger().report_text early_stopping = EarlyStopping(...) early_stopping.logger.debug = clearml_logger early_stopping.logger.info = clearml_logger early_stopping.logger.setLevel(logging.DEBUG)

4 years ago

0 Hi, Together With

Yes, it is supposed to run for 200 epochs

5 years ago

0 Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

Would be very cool if you could include this use case!

5 years ago

0 I Guess One Experiment Is Running Backwards In Time

Just caught another star 😄

3 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

of which task/process?

5 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

Would adding a ILM (index lifecycle management) be an appropriate solution?

4 years ago

0 Hey There, I Would Like To Increase The

mmmh it fails, but if I connect to the instance and execute ulimit -n , I do see
65535while the tasks I send to this agent fail with:
OSError: [Errno 24] Too many open files: '/root/.commons/images/aserfgh.png'and from the task itself, I run:
import subprocess print(subprocess.check_output("ulimit -n", shell=True))Which gives me in the logs of the task:
b'1024'So nnofiles is still 1024, the default value, but not when I ssh, damn. Maybe rebooting would work

4 years ago

0 Hi There

Could be also related to https://allegroai-trains.slack.com/archives/CTK20V944/p1597928652031300

5 years ago

0 Hi, I Would Like To Create Backups Of My Trains-Server Periodically. I Was Thinking About Creating A Service Task Under The Devops Project. The Backup Task Would:

ok, thanks SuccessfulKoala55 !

4 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

from 10 to 11

4 years ago

0 Hi, I Encounter A Weird Behavior: I Have A Task A That Schedules A Task B. Task B Is Executed On An Agent, But With An Old Commit

here is the function used to create the task:

` def schedule_task(parent_task: Task,
task_type: str = None,
entry_point: str = None,
force_requirements: List[str] = None,
queue_name="default",
working_dir: str = ".",
extra_params=None,
wait_for_status: bool = False,
raise_on_status: Iterable[Task.TaskStatusEnum] = (Task.TaskStatusEnum.failed, Task.Ta...

5 years ago

0 Hi, If I Am Starting My Training With The Following Command:

And I am wondering if only the main process (rank=0) should attach the ClearMLLogger or if all the processes within the node should do that

3 years ago

0 Hi

Very good job! One note: in this version of the web-server, the experiments logo types are all blank, what was the reason to change them? Having a color code in the logos helps a lot to quickly check the nature of the different experiments tasks, isnt it?

5 years ago

0 Hi, I Have Another Problem

OK but nowhere I specified that, I just checked my trains.conf file

5 years ago

0 Hi, I Have A Local Package That I Use To Train My Models. To Start Training, I Have A Script That Calls

Hi NonchalantHedgehong19 , thanks for the hint! what should be the content of the requirement file then? Can I specify my local package inside? how?

3 years ago

0 Hi, I Just Updated Clearml Server 1.0 Using

It worked with clearml server 0.17

4 years ago

0 Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

I did change the replica setting on the same index yes, I reverted it back from 1 to 0 afterwards

4 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

yes what happens in the case of the installation with pip wheels files?

4 years ago

0 Hello There! I Have A Question Regarding The Web Ui, On The Project Page: I Have The Following Use Case: I Need To Add Two Custom Columns, Each Reporting One Metric. Currently, This Shows Me The Best (Min/Max) Values Reached By The Model, But Not Necessar

Sure 🙂 Opened https://github.com/allegroai/clearml/issues/568

3 years ago

0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

Alright I have a followup question then: I used the param --user-folder “~/projects/my-project”, but any change I do is not reflected in this folder. I guess I am in the docker space, but this folder is not linked to my the folder on the machine. Is it possible to do so?

4 years ago

Show more results