JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi There, I Just Updated Clearml-Server To 1.8.0 And I See The Following But In The Comparison Of Scalars: All The Graphs Are Compressed To The Left When The Experiment Name Is Too Long In The Legend. I Will Now Try In 1.7.0 (It Was Not The Case In 1.6.0)

Hi there, I just updated clearml-server to 1.8.0 and I see the following but in the comparison of Scalars: All the graphs are compressed to the left when the...

clearml

2 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, I Have A Long Running Experiment That Was Running On Aws Instance That Got Killed After ~4 Days With The Following Reason:

Hi, I have a long running experiment that was running on AWS instance that got killed after ~4 days with the following reason: STATUS REASON: Forced stop (no...

clearml

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Does Trains 0.16 Supports Pip >=20.2?

Does trains 0.16 supports pip >=20.2?

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, I See That There Is A New Parameter In Aws Autoscaler:

Hi, I see that there is a new parameter in aws autoscaler: max_spin_up_time_min - What is the difference with max_idle_time_min ?

aws

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi Guys, I Would Like To Start Using The Aws Autoscaler Shipped In Trains. I Need To Create A Iam User To Get And I Would Like To Know What Are The Minimal Permissions Required For The Autoscaler To Work?

Hi guys, I would like to start using the AWS autoscaler shipped in trains. I need to create a IAM user to get and I would like to know what are the minimal p...

mlops

4 years ago

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

Hi, Is It Possible To Pass Temporary Iam Role To The Web App Could Access?

Hi, is it possible to pass temporary IAM role to the web app could access?

clearml

3 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi Guys, There Is A Bug Introduced With Clearml-Agent 1.5.0: The Resolution Of The Torch Version Is Broken: It Will Try To Find The Torch Version Matching The Cuda Version Of The System, As Opposed To Version 1.4.1, Where It Tries To Find The Cuda Version

Hi guys, there is a bug introduced with clearml-agent 1.5.0: the resolution of the torch version is broken: it will try to find the torch version matching th...

clearml

2 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi, Are The Experiments Logs Stored In S3 Or In The Trains-Server? (When Using S3 As Artifact Storage)

Hi, are the experiments logs stored in s3 or in the trains-server? (When using s3 as artifact storage)

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Is It Possible To Start A Clearml-Agent (Not In Docker Mode) On A Machine With A Gpu, But Enforce The Clearml-Agent To Not “See” The Gpu? So That The Experiments Run By This Agent Fail If They Try To Access A Gpu? Like The

Hi, is it possible to start a clearml-agent (not in docker mode) on a machine with a gpu, but enforce the clearml-agent to not “see” the gpu? So that the exp...

mlops

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, A Small Bug (Not Really A Bug) In The Autoscaler: I Have P3.2Xlarge Instances That Take A Long Time To Shutdown. With

Hi, a small bug (not really a bug) in the autoscaler: I have p3.2xlarge instances that take a long time to shutdown. With polling_interval_time_min=1 , the a...

mlops

4 years ago

0 Votes

23 Answers

2K Views

0 Votes 23 Answers 2K Views

Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

Hi, I started a trains-agent (0.15) in services mode (full command: trains-agent daemon --services-mode --detached --queue services --create-queue --docker u...

mlops

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hey There

Hey there 🙂 Still my journey to deploy the aws-autoscaler with spot instances, I have another question: I would like to limit the amount of time spent setti...

mlops

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi Guys, Is It Possible To Spin Up Two Agents On One Gpu? Something Like

hi guys, is it possible to spin up two agents on one GPU? Something like trains-agent daemon --gpus 0 --queue default & trains-agent daemon --gpus 0 --queue ...

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi Again, It Seems Like The Aws Autoscaler Is Not Spinning Instances With The Ebs Configuration I Configured. Here Is The Configuration:

Hi again, it seems like the aws autoscaler is not spinning instances with the EBS configuration I configured. Here is the configuration: resource_configurati...

aws mlops

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Where Can I Find The Logs Of Trains-Agent By Default?

Hi, where can I find the logs of trains-agent by default?

clearml

5 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, I Deleted All Archived Experiments In A Project And I Just Realized All Experiments Of All Projects Were Deleted (Clearml Server V1.0.0)

Hi, I deleted all archived experiments in a project and I just realized all experiments of all projects were deleted (clearml server v1.0.0) 🤔

clearml

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

Hi, is it possible to pass environment variables to agents created by the AWS AutoScaler service?

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, I Am Getting An Error While Running

Hi, I am getting an error while running task.mark_stopped() , any idea why? (clearml 1.0.2, clearml-agent 1.0.0, python 3.6) File "/home/machine/.clearml/ven...

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hey There, Would It Be Possible To Make Clearml-Agents Support Both Docker Mode And Venv Mode At The Same Time? Ie. Not Requiring To Be Restarted To Switch The Mode. The Mode Should Be Define On The Task Level: I Start An Experiment And Define Whether It

Hey there, Would it be possible to make clearml-agents support both docker mode and venv mode at the same time? Ie. not requiring to be restarted to switch t...

clearml

3 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi There, I Recently Updated Clearml Server To 1.7.0, And Found The Following

⚠️ Hi there, I recently updated clearml server to 1.7.0, and found the following critical regression: When I reset an experiment, it is actually deleted 😵 ,...

clearml

2 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, In The Context Of Multi-Gpu Training, Is

Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others

clearml

3 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi There, Congrats For Releasing V1

Hi there, congrats for releasing v1 😄 I observed that with pytorch ignite (4.2.0), the metrics of the validation engines are delayed by one epoch. I am not ...

pytorch

4 years ago

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

Hi! I Have A Question Regarding Performances Of The Clearml-Server: Are The Calls From The Agents Made Asynchronously/In A Non Blocking Separate Thread? Is The Connection To The Clearml-Server Expected To Be A Bottleneck If The Clearml-Server Is Far From

Hi! I have a question regarding performances of the clearml-server: are the calls from the agents made asynchronously/in a non blocking separate thread? is t...

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi There, It Seems Like There Is A Bug With The Visualization Of Debug Samples On The Ui (Server V1.2.0, Self-Hosted): When Clicking On A Debug Sample Then On The Download Button, If The Sample Is Stored In S3, The Download Button Opens A Blank Page With

Hi there, it seems like there is a bug with the visualization of debug samples on the UI (server v1.2.0, self-hosted): when clicking on a debug sample then o...

clearml

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hey There, Since Which Version, Clearml Stops Connecting To The Demo Server By Default?

Hey there, since which version, clearml stops connecting to the demo server by default?

clearml

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi Guys, I Got A Very Unexpected Error Today On In One Of My Agents:

Hi guys, I got a very unexpected error today on in one of my agents: ... Collecting tqdm Using cached tqdm-4.48.2-py2.py3-none-any.whl (68 kB) Processing /ro...

clearml

5 years ago

0 Votes

23 Answers

2K Views

0 Votes 23 Answers 2K Views

Hi, I Would Like To Bring Awareness

Hi, I would like to bring awareness on this issue , this impacts my work as I cannot install the older version of torch (1.11.0)

clearml

2 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hi, I Encountered A Bug On Clearml-Server 1.0.1: I Tried To Add In A Project Page A Custom Column In +Hyper Parameters > Args > Queue And Got An Error Pop Up With The Following Message:

Hi, I encountered a bug on clearml-server 1.0.1: I tried to add in a project page a custom column in +HYPER PARAMETERS > Args > queue and got an error pop up...

clearml

4 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hey There, I Would Like To Increase The

Hey there, I would like to increase the ulimit for the number of files opened at the same time in a ec2 instance. According to this https://stackoverflow.com...

clearml

4 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hi All, Would It Be Possible To Make The Aws Autoscaler Log Each Scale In/Out Operation In The Console To Help Debugging/Understanding The Course Of Events?

Hi all, Would it be possible to make the aws autoscaler log each scale in/out operation in the console to help debugging/understanding the course of events?

aws mlops

4 years ago

Show more results

0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

alright I am starting to get a better picture of this puzzle

5 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

But we can easily extend, right?

4 years ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

AgitatedDove14 yes 🙂

4 years ago

0 I Am Wondering Is It Possible To Schedule A Task To Run At Certain Time In Periodic Fashion Aka. Cron Style... Thinking Of Having A Monitoring Task To Be Run Routinely ... I Could Use A Cron On One Of The Server But Prefer To Run It On Trains As Then I Am

I don't think there is an example for this use case in the repo currently, but the code should be fairly simple (below is a rough draft of what it could look like)
` controller_task = Task.init(...)
controller_task.execute_remotely(queue_name="services", clone=False, exit_process=True)

while True:
periodic_task = Task.clone(template_task_id)
# Change parameters of {periodic_task} if necessary
Task.enqueue(periodic_task, queue="default")
time.sleep(TRIGGER_TASK_INTERVAL_SECS) `

5 years ago

0 Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

Also maybe we are not on the same page - by clean up, I mean kill a detached subprocess on the machine executing the agent

4 years ago

0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

Sorry, I didn't get that 😄

5 years ago

AgitatedDove14 Unfortunately no, I already had the problem before using the function, I added it hoping it would fix the issue but it didn’t

4 years ago

btw I monkey patched ignite’s function global_step_from_engine to print the iteration and passed the modified function to the ClearMLLogger.attach_output_handler(…, global_step_transform=patched_global_step_from_engine(engine)) . It prints the correct iteration number when calling ClearMLLogger.OutputHandler.__ call__ .
` def call(self, engine: Engine, logger: ClearMLLogger, event_name: Union[str, Events]) -> None:

    if not isinstance(logger, ClearMLLogger):
  ...

4 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

I ended up dropping omegaconf altogether

3 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Otherwise I can try loading the file with custom loader, save as temp file, pass the temp file to connect_configuration, it will return me another temp file with overwritten config, and then pass this new file to OmegaConf

3 years ago

0 Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

yes, exactly 🙂

4 years ago

0 Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

venv mode 🙂

4 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

Interesting idea! (I assume for reporting only, not configuration)

Yes for reporting only - Also to understand which version is used by the agent to define the torch wheel downloaded

regrading the cuda check with

nvcc

, I'm not saying this is a perfect solution, I just mentioned that this is how this is currently done.
I'm actually not sure if there is an easy way to get it from nvidia-smi interface, worth checking though ...

Ok, but when nvcc is not ava...

4 years ago

0 Hi There,

If I manually call report_matplotlib_figure yes. If I don't (just create the figure), no mem leak

2 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

Here is (left) the data disk (/opt/clearml) and right the OS disk

4 years ago

0 Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

I’ve reindexed the data for the logs, now the mappings are correct but I am missing one month of data, I have literally no idea where this data is/how it disappeared

4 years ago

0 I Guess One Experiment Is Running Backwards In Time

when can we expect the next self hosted release btw?

3 years ago

0 Hi, How Can I Change The Project.Default_Output_Destination? I Tried Setting It To None But It Is Not Updated

Yes, perfect!!

3 years ago

0 Hi, I Would Like To Report Another Bug Introduced With Clearml-Server 1.2.0: In The Comparison Page Of Two Experiments, On The Scalar Tab, With The Graph Layout, When Clicking On The Eye On One Scalar Group To Hide The Related Graphs, The Later Do Disappe

And after the update, the loss graph appears

3 years ago

0 Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

So it can be that when restarting the docker-compose, it used another volume, hence the loss of data

4 years ago

0 Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

AgitatedDove14 https://clear.ml/docs/latest/docs/apps/clearml_session/#running-in-docker in the docs there is a --docker option, that’s what confuses me, since the agent should always run in docker mode

3 years ago

0 Hi There, Congrats For Releasing V1

I am using 0.17.5, it could be either a bug on ignite or indeed a delay on the send. I will try to build a simple reproducible example to understand to cause

4 years ago

0 Hi, Is There A Way To Stop A Clearml-Agent From Within An Experiment? Or Block It To Prevent It Running Any Other Task?

My use case it: in a spot instance marked for termination after 2 mins by aws, I want to close a task and prevent the clearml-agent to pick up a new task after.

4 years ago

0 Hi, I Would Like To Use Pytorch3D==0.5.0 With Torch==1.9.1 On Cuda Version 110, Locally It Works, But The Clearml Agent Fails Setting Up The Environment With The Following Error:

Hi AgitatedDove14 , Here is the full log.
Both python versions (local and remote) are python 3.6 Locally (macos), I get pytorch3d== (from versions: 0.0.1, 0.1.1, 0.2.0, 0.2.5, 0.3.0, 0.4.0, 0.5.0) Remotely (Ubuntu), I get (from versions: 0.0.1, 0.1.1, 0.2.0, 0.2.5, 0.3.0)So I guess it’s not related to clearml-agent really, rather pip that cannot find the proper wheel for ubuntu for latest versions of pytorch3d, right? If yes, is there a way to build the wheel on the remote machine...

4 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

Probably 6. I think because of some reason, it did not go back to main trains-agent. Nevertheless I am not sure, because a second task could start. It could also be that the second was aborted for some reason while installing task requirements (not system requirements, so executing the trains-agent setup within the docker container) and therefore again it couldn't go back to main trains-agent. But ps -aux shows that the trains-agent is stuck running the first experiment, not the second...

5 years ago

0 For Any Early Adopters, Who Also Want To Give Us Feedback - Both Good And Bad, Please Feel Free To Try The Clearml-Serving Beta

I’ll definitely check that out! 🤩

4 years ago

0 Hi Guys, I Would Like To Start Using The Aws Autoscaler Shipped In Trains. I Need To Create A Iam User To Get And I Would Like To Know What Are The Minimal Permissions Required For The Autoscaler To Work?

Hey FriendlySquid61 ,
I ended up asking for full control of EC2 not to be blocked, so unfortunately I cannot give you a more precise list 😕

4 years ago

0 Hello, ~3 Months Ago I Created A Trains-Server In A Machine With 30Gb Of Disk Space. Today I Wasn'T Able To Connect To Trains-Server, So I Checked The Server And Found That The Disk Full. I Ran:

Thanks for your help 🙏

4 years ago

0 Hi, I Would Like To Switch From The Elastic-Search Service In The Docker-Compose Of The Clearml-Server To An Externally Managed, Scalable Elastic-Search Cluster. I Have Two Questions:

Thanks! I would like to use this opportunity to split the indices into multiple shards, as explained here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-split-index.html#indices-split-index