JitteryCoyote63

214 Questions, 1021 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

979 × Eureka!

Questions 214
Answers 1021

0 Votes

7 Answers

1K Views

0 Votes 7 Answers 1K Views

Hi, I Deleted All Archived Experiments In A Project And I Just Realized All Experiments Of All Projects Were Deleted (Clearml Server V1.0.0)

Hi, I deleted all archived experiments in a project and I just realized all experiments of all projects were deleted (clearml server v1.0.0) 🤔

clearml

3 years ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

Another Strange Behavior Of The Python Sdk Cli: After Executing Python My_Task.Py, Where My_Task.Py Creates And Send To The Queue An Experiment, The Command Returns But After Some Time Some Messages Are Printed In The Console, Such As

Another strange behavior of the python SDK CLI: after executing python my_task.py, where my_task.py creates and send to the queue an experiment, the command ...

clearml

3 years ago

0 Votes

0 Answers

959 Views

0 Votes 0 Answers 959 Views

(Sorry I Pinned The Message Accidentally

(sorry I pinned the message accidentally 😅 )

clearml

4 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hey, Just Wanted To Mention: In Docs, Task.Get_Parameter Does Not Say:

Hey, just wanted to mention: in docs, Task.get_parameter does not say: Different sections with key prefix "section/" , as Task.get_parameters do. Also there ...

clearml

4 years ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Hi Guys, Is A Task Updating Its Status To 'Complete' Before Finishing To Upload Its Artifacts/Metrics In The Background?

Hi guys, is a Task updating its status to 'Complete' before finishing to upload its artifacts/metrics in the background?

clearml

4 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, I Cannot Manage To Start Trains-Server 0.16 With The Docker-Compose File, The Trains-Elastic Container Fails With The Following Error:

Hi, I cannot manage to start trains-server 0.16 with the docker-compose file, the trains-elastic container fails with the following error:

clearml

4 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hi There, Any Plan/Benefit To Support Virtualenv= 20 ?

Hi there, any plan/benefit to support virtualenv= 20 ?

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hi, I Encountered A Bug On Clearml-Server 1.0.1: I Tried To Add In A Project Page A Custom Column In +Hyper Parameters > Args > Queue And Got An Error Pop Up With The Following Message:

Hi, I encountered a bug on clearml-server 1.0.1: I tried to add in a project page a custom column in +HYPER PARAMETERS > Args > queue and got an error pop up...

clearml

3 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Hi Guys, I Got A Very Unexpected Error Today On In One Of My Agents:

Hi guys, I got a very unexpected error today on in one of my agents: ... Collecting tqdm Using cached tqdm-4.48.2-py2.py3-none-any.whl (68 kB) Processing /ro...

clearml

4 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi There

Hi there 🙂 Task.get_parameters() returns an empty dict from within a trains-agent task being executed. When I execute it outside, it works properly. Is it i...

clearml

4 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, I Am Using The Aws Autoscaler And Getting The Following Error While Trying To Spin Up Spot Instances:

Hi, I am using the aws autoscaler and getting the following error while trying to spin up spot instances: 2021-08-16 17:18:48 Spinning new instance type=v100...

aws mlops

3 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

Hi, I am giving another try to clearml-session and I am blocked at the current error shown when the CLI try to establish the tunneling: Starting SSH tunnel W...

remote-ssh

2 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Hi, In A Subproject, Would It Be Possible To Hide The Parent Project If It Is Empty?

Hi, in a subproject, would it be possible to hide the parent project if it is empty?

clearml

3 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Hi Guys, Since I Am Done With Implementing The Aws Autoscaler, I Would Like To Share Some Pain Points That I Encountered In The Process With The Hope That They Can Be Documented To Help Other Users:

Hi guys, since I am done with implementing the AWS autoscaler, I would like to share some pain points that I encountered in the process with the hope that th...

aws

3 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Btw I Saw A Bug In The Web Ui That Is Rather Frustrating: When I Add Some Metric Columns To A Project Page, If I Refresh The Page Manually With F5, All The Changes I Made On The Columns Are Rolled-Back, As If They Were Not Saved. Same Happens With The Res

Btw I saw a bug in the web UI that is rather frustrating: When I add some metric columns to a project page, if I refresh the page manually with F5, all the c...

clearml

2 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hi, In The "Choose Compared Experiments" View Of The Webui, Would It Be Possible To Add A Toggle To Include Archived Experiments In The Results Of The Search? Also Add The Task Type Field?

Hi, in the "Choose compared experiments" view of the WebUI, would it be possible to add a toggle to include archived experiments in the results of the search...

clearml

2 years ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Hi There, I Just Updated Clearml-Server To 1.8.0 And I See The Following But In The Comparison Of Scalars: All The Graphs Are Compressed To The Left When The Experiment Name Is Too Long In The Legend. I Will Now Try In 1.7.0 (It Was Not The Case In 1.6.0)

Hi there, I just updated clearml-server to 1.8.0 and I see the following but in the comparison of Scalars: All the graphs are compressed to the left when the...

clearml

2 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hey Guys, I Am Setting Up A New Machine With Two Rtx 3070 Gpus Where I Created Two Agents (One For Each Gpu). On Both Agents, My Experiments Fail With Error:

Hey guys, I am setting up a new machine with two rtx 3070 GPUs where I created two agents (one for each GPU). On both agents, my experiments fail with error:...

pytorch

4 years ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

Hi, I Want To Upgrade Clearml Server From 1.1 To 1.2 (Self Hosted). I Have The Following Setup:

Hi, I want to upgrade clearml server from 1.1 to 1.2 (self hosted). I have the following setup: /dev/nvme0n1p1 30G 21G 8.9G 70% / <- This is where /opt/clear...

clearml

2 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hi Guys, Any Plan To Integrate The

Hi guys, any plan to integrate the https://github.com/allegroai/trains-agent/blob/master/examples/dynamic_cloud_cluster.ipynb in trains-server? The code ther...

clearml

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, I Recently Updated My Clearml To 1.1.2 And A Code That Was Working Before Now Behaves Completely Differently: I Am Using The Following To Log Debug Samples:

Hi, I recently updated my clearml to 1.1.2 and a code that was working before now behaves completely differently: I am using the following to log debug sampl...

clearml

3 years ago

0 Votes

26 Answers

1K Views

0 Votes 26 Answers 1K Views

Hi, I Attached An Iam Role To An Ec2 Instance To Grant Access To An S3 Bucket. The Ec2 Instance Is Running A Clearml-Agent (V1.1.0). I Didn’T Specify Any Key/Secret For Clearml. The Tasks Fail With The Following Error:

Hi, I attached an IAM role to an ec2 instance to grant access to an s3 bucket. The ec2 instance is running a clearml-agent (v1.1.0). I didn’t specify any key...

aws

3 years ago

0 Votes

8 Answers

967 Views

0 Votes 8 Answers 967 Views

Hi! I Have A Question Regarding Performances Of The Clearml-Server: Are The Calls From The Agents Made Asynchronously/In A Non Blocking Separate Thread? Is The Connection To The Clearml-Server Expected To Be A Bottleneck If The Clearml-Server Is Far From

Hi! I have a question regarding performances of the clearml-server: are the calls from the agents made asynchronously/in a non blocking separate thread? is t...

clearml

3 years ago

0 Votes

1 Answers

997 Views

0 Votes 1 Answers 997 Views

Hi, Would It Be Possible To Parse Torch Requirement When It’S Part Of The Extras_Require Dict? In My Code, I Have The Following:

Hi, would it be possible to parse torch requirement when it’s part of the extras_require dict? In my code, I have the following: train_task._update_requireme...

mlops

3 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Is There An Option To Make Trains-Agent Create Experiment Virtualenvs With

Is there an option to make trains-agent create experiment virtualenvs with --system-site-packages parameter?

clearml

4 years ago

0 Votes

1 Answers

998 Views

0 Votes 1 Answers 998 Views

Hi, I Have A Clearml-Agent (1.1.2) In A G4Dn.4Xlarge Aws Instance (With One T4 Gpu), That Reports

Hi, I have a clearml-agent (1.1.2) in a g4dn.4xlarge AWS instance (with one T4 GPU), that reports agent.cuda_version = 0 agent.cudnn_version = 0and does not ...

clearml

2 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

Hi again, my clearml api-server is having a memory leak. Each time I restart it, its ram consumption grows until getting OOM, is not killed and make the ec2 ...

clearml

3 years ago

0 Votes

19 Answers

1K Views

0 Votes 19 Answers 1K Views

Hi, With Clearml-Agent 1.5.1, I Tried To Run An Experiment Within A Docker With Image Python3:8 And It Failed Executing The Task While Trying To Call Python3.9. I Am Not Sure Why It'S Using Python3.9, Since The Agent.Default_Python Is 3.8 And The Image Is

Hi, with clearml-agent 1.5.1, I tried to run an experiment within a docker with image python3:8 and it failed executing the task while trying to call python3...

clearml

one year ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Hi, In The Aws Autoscaler, Is It Possible To Specify Multiple Regions (Availability_Zone)? I Currently Use Eu-West-1A, And Would Like To Start Using Eu-West-1B And Eu-West-1C. I Tried Specifying A List In Availability_Zone Parameter, But Without Success:

Hi, in the aws autoscaler, is it possible to specify multiple regions (availability_zone)? I currently use eu-west-1a, and would like to start using eu-west-...

aws

3 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hi, I Have A Local Package That I Use To Train My Models. To Start Training, I Have A Script That Calls

Hi, I have a local package that I use to train my models. To start training, I have a script that calls task._update_requirements([".", "torch==1.11.0"]) . I...

mlops

2 years ago

Show more results

0 Hi There, I Just Updated Clearml-Server To 1.8.0 And I See The Following But In The Comparison Of Scalars: All The Graphs Are Compressed To The Left When The Experiment Name Is Too Long In The Legend. I Will Now Try In 1.7.0 (It Was Not The Case In 1.6.0)

But you might want to double check

2 years ago

0 Hi, I Recently Updated Clearml-Server To 1.7 And I Am Getting A Lot Of The Following Errors Since Today On Any Experiment (I Didn'T Had This Error Before):

This is the mapping of the faulty index:
` {
"events-plot-d1bd92a3b039400cbafc60a7a5b1e52b_new" : {
"mappings" : {
"dynamic" : "strict",
"properties" : {
"@timestamp" : {
"type" : "date"
},
"iter" : {
"type" : "long"
},
"metric" : {
"type" : "keyword"
},
"plot_data" : {
"type" : "binary"
},
"plot_len" : {
"type" : "long"
},
"plot_str" : {
...

2 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

I see what I described in https://allegroai-trains.slack.com/archives/CTK20V944/p1598522409118300?thread_ts=1598521225.117200&cid=CTK20V944 :
randomly, one of the two experiments is shown for that agent

4 years ago

0 Hi

Awesome! (Broken link in migration guide, step 3: https://allegro.ai/docs/deploying_trains/trains_server_es7_migration/ )

4 years ago

0 Hi

MagnificentSeaurchin79 You could also just fork the tensorflow repo, make changes in a specific branch and specify your forked repo with your custom branch in the install_requires of your setup.py

3 years ago

0 Hi Guys, Any Plan To Integrate The

Both ^^, I already adapted the code for GCP and I was planning to adapt to Azure now

4 years ago

0 Hi, Is It Possible To Specify The Required Version Of Python For A Task That Is Different From The Python Running The Clearml-Agent? Example: My Clearml-Agent Is Running On Python 3.8 And I Need A Task To Run On Python 3.10. How Can I Do That?

Ok thanks! And for this?

Would it be possible to support such use case? (have the clearml-agent setting-up a different python version when a task needs it?)

2 years ago

0 Hi, I Want To Upgrade Clearml Server From 1.1 To 1.2 (Self Hosted). I Have The Following Setup:

on /data or /opt/clearml? these are two different disks

2 years ago

0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

And I can verify that ~/trains.conf exists in the su home folder

4 years ago

0 Hi, I Have Another Problem

thanks, I will do that

4 years ago

0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

Not really: I just need to find the one that is compatible with torch==1.3.1

4 years ago

0 Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

Also maybe we are not on the same page - by clean up, I mean kill a detached subprocess on the machine executing the agent

4 years ago

0 Hi There, Is It Possible To Configure The Clearml-Agent To Run Some Commands Before Running Each Experiment It Launches? Eg.

Hi CostlyOstrich36 ! no I am running on venv mode

3 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

trains==0.16.4

3 years ago

0 Hi Guys, Is A Task Updating Its Status To 'Complete' Before Finishing To Upload Its Artifacts/Metrics In The Background?

I want to make sure that an agent did finish uploading its artifacts before marking itself as complete, so that the controller does not try to access these artifacts while they are not available

4 years ago

no, one worker (trains-agent-1) "forget from time to time" the current experiment he is running and picks another experiment on top of the one he is currently running

4 years ago

0 Hey, I Would Like My Experiment To Call At Some Point A Cli Program Installed As A Dependency Of The Experiment. Here Is What I Do:

It worked like a charm 😱 Awesome thanks AgitatedDove14 !

4 years ago

0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

I will probably just use everywhere an absolute path to be robust against different machine user accounts: /home/user/trains.conf

4 years ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

AgitatedDove14 Unfortunately no, I already had the problem before using the function, I added it hoping it would fix the issue but it didn’t

3 years ago

0 Hi, I Cannot Manage To Start Trains-Server 0.16 With The Docker-Compose File, The Trains-Elastic Container Fails With The Following Error:

trains-elastic container fails with the following error:

4 years ago

0 Hi, I Have A Long Running Experiment That Was Running On Aws Instance That Got Killed After ~4 Days With The Following Reason:

I assume you’re using a self-hosted server?

Yes

2 years ago

0 Hi, I Would Like To Follow-Up In This

Hi AppetizingMouse58 , I sent you the files in PM 🙂

2 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

Would adding a ILM (index lifecycle management) be an appropriate solution?

3 years ago

0 Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

Ok, but that means this cleanup code should live somewhere else than inside the task itself right? Otherwise it won't be executed since the task will be killed

4 years ago

0 Hi, I Have A Clearml-Agent (1.1.2) In A G4Dn.4Xlarge Aws Instance (With One T4 Gpu), That Reports

Nevermind, nvidia-smi command fails in that instance, the problem lies somewhere else

2 years ago

0 Hi, I Am Trying To Use The Clearml-Agent In Docker Mode To Run An Experiment, But It Seems To Fail Passing The Clearml.Conf File To The Docker Container:

This works well when I run the agent in virtualenv mode (remove --docker )

2 years ago

0 Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

I think waiting for the apt locks to be released with something like this would work
startup_bash_script = [ "#!/bin/bash", "while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done", "sudo apt-get update", ...Weirdly this throws an error in the autoscaler:
` Spinning new instance type=v100_spot
Error: Failed to start new instance, unexpected '{' in field...

3 years ago

0 Hey There, Is It Possible For A Clearml Pipeline Step To Log A Folder Instead Of Numpy/Pickle Objects? Looking At The Docs,

I guess I can have a workaround by passing the pipeline controller task id to the last step, so that the last step can download all the artifacts from the controller task.

2 years ago

0 Hi There, I Used

AgitatedDove14 , my “uncommitted changes” ends with
... if __name__ == "__main__": task = clearml.Task.get_task(clearml.config.get_remote_task_id()) task.connect(config) run() from clearml import Task Task.init()

2 years ago

0 Hi There

So in my minimal reproducable example, it does work 🤣 very frustrating, I will continue searching for that nasty bug

4 years ago

Show more results