JitteryCoyote63

214 Questions, 1021 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

979 × Eureka!

Questions 214
Answers 1021

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

Hi guys, with the new venv caching available in clearml, I have the following problem: I force my pip requirements to be: torch==1.7.1 pytorch-ignite clearml...

clearml

3 years ago

0 Votes

7 Answers

958 Views

0 Votes 7 Answers 958 Views

Hi, Is There A Way To Get Some Stats About The Use Of Workers? I Would Like To Know, Over The Past 3 Months:

Hi, is there a way to get some stats about the use of workers? I would like to know, over the past 3 months: Number of training hours per user Number of trai...

clearml

3 years ago

0 Votes

19 Answers

1K Views

0 Votes 19 Answers 1K Views

I Guess One Experiment Is Running Backwards In Time

I guess one experiment is running backwards in time 😄

clearml

2 years ago

0 Votes

5 Answers

989 Views

0 Votes 5 Answers 989 Views

Hey There, Since Which Version, Clearml Stops Connecting To The Demo Server By Default?

Hey there, since which version, clearml stops connecting to the demo server by default?

clearml

3 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

Hi, in one of my agents with CUDA Version: 11.1 (from nvidia-smi), clearml agent 0.17.1 detects version 100 (I can see from experiments logs: agent.cuda_vers...

mlops

3 years ago

0 Votes

18 Answers

979 Views

0 Votes 18 Answers 979 Views

Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

Hello there, I would like to do run cleanup code in case the user aborts one task from the dashboard (the agent is not using the task in docker). What signal...

mlops

4 years ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Hi Again, It Seems Like The Aws Autoscaler Is Not Spinning Instances With The Ebs Configuration I Configured. Here Is The Configuration:

Hi again, it seems like the aws autoscaler is not spinning instances with the EBS configuration I configured. Here is the configuration: resource_configurati...

aws mlops

3 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

Hello, I would like to use spot instances together with the AWS autoscaler to train models with pytorch/ignite and I am wondering how to support interruption...

mlops

3 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

Hi again, my clearml api-server is having a memory leak. Each time I restart it, its ram consumption grows until getting OOM, is not killed and make the ec2 ...

clearml

3 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Hi There, I Think There Is A Bug With Clearml Sdk V0.17.5Rc2: When Running A Task Locally, The Dashboard Doesnt Not Shows The Task As Finished Once The Task Is Finished

Hi there, I think there is a bug with clearml sdk v0.17.5rc2: when running a task locally, the dashboard doesnt not shows the task as finished once the task ...

clearml

3 years ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Hi, I Would Like To Create Backups Of My Trains-Server Periodically. I Was Thinking About Creating A Service Task Under The Devops Project. The Backup Task Would:

Hi, I would like to create backups of my trains-server periodically. I was thinking about creating a service task under the devops project. The backup task w...

clearml

3 years ago

0 Votes

12 Answers

1K Views

0 Votes 12 Answers 1K Views

Hi There! Is There An Easy Way To Retrieve The Site-Package Directory That Was Created By An Agent From Inside A Task? Eg.

Hi there! Is there an easy way to retrieve the site-package directory that was created by an agent from inside a task? Eg. task = Task.init(...) task.add_req...

mlops

2 years ago

0 Votes

18 Answers

1K Views

0 Votes 18 Answers 1K Views

Hi, I Just Updated Clearml Server 1.0 Using

Hi, I just updated clearml server 1.0 using docker-compose down & docker-compose pull & docker-compose up -d , it worked ant it looks amazing! I found two pr...

clearml

3 years ago

0 Votes

1 Answers

975 Views

0 Votes 1 Answers 975 Views

Hi, Would It Be Possible To Parse Torch Requirement When It’S Part Of The Extras_Require Dict? In My Code, I Have The Following:

Hi, would it be possible to parse torch requirement when it’s part of the extras_require dict? In my code, I have the following: train_task._update_requireme...

mlops

3 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Hi, Are The Experiments Logs Stored In S3 Or In The Trains-Server? (When Using S3 As Artifact Storage)

Hi, are the experiments logs stored in s3 or in the trains-server? (When using s3 as artifact storage)

clearml

3 years ago

0 Votes

11 Answers

1K Views

0 Votes 11 Answers 1K Views

Hey, I Moved My Trains-Server To Another Machine, Zipping The /Opt/Trains/Data Folder As Described In The Docs

Hey, I moved my trains-server to another machine, zipping the /opt/trains/data folder as described in the docs https://allegro.ai/docs/deploying_trains/train...

mlops

4 years ago

0 Votes

22 Answers

1K Views

0 Votes 22 Answers 1K Views

Hi There, I Used

Hi there, I used clearml-task to send a script to be executed remotely. When being executed remotely Task.current_task() returns None, how should I get the c...

clearml

2 years ago

0 Votes

15 Answers

1K Views

0 Votes 15 Answers 1K Views

Hi, I Restarted My Clearml-Server (1.1.0) And The Login Page Always Redirects Me To The Login Page. I Am Using Fixed Users In Config Files. In The Logs Of The Api Server I Can See:

Hi, I restarted my clearml-server (1.1.0) and the login page always redirects me to the login page. I am using fixed users in config files. In the logs of th...

clearml

3 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, I Am Using The Aws Autoscaler And Getting The Following Error While Trying To Spin Up Spot Instances:

Hi, I am using the aws autoscaler and getting the following error while trying to spin up spot instances: 2021-08-16 17:18:48 Spinning new instance type=v100...

aws mlops

3 years ago

0 Votes

3 Answers

943 Views

0 Votes 3 Answers 943 Views

Hey There, I See That In The Autoscaler Configuration, The

Hey there, I see that in the autoscaler configuration, the queues param accept dictionaries with values of type list of lists (see eg below.) What does it me...

mlops

3 years ago

0 Votes

20 Answers

1K Views

0 Votes 20 Answers 1K Views

Is It Possible To Run An Agent, Listen To The Services Queue Without Using Docker?

Is it possible to run an agent, listen to the services queue without using docker?

clearml

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi There, Congrats For Releasing V1

Hi there, congrats for releasing v1 😄 I observed that with pytorch ignite (4.2.0), the metrics of the validation engines are delayed by one epoch. I am not ...

pytorch

3 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

Hello, I am getting ValueError: Could not get access credentials for ' s3://my-bucket ' , check configuration file ~/trains.conf but I did specify them in my...

clearml

4 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

Hi, I am getting the following errors in the experiments I am currently running: 2021-06-25 17:11:47,911 - clearml.Metrics - ERROR - Action failed <504/0: ev...

clearml

3 years ago

0 Votes

16 Answers

1K Views

0 Votes 16 Answers 1K Views

Hi Guys, Coming This Time To Share An Idea Of A Killer Feature For Clearml

Hi guys, coming this time to share an idea of a killer feature for ClearML 🚀 I am pretty sure you guys already heard of https://www.streamlit.io/ , which is...

clearml

3 years ago

0 Votes

29 Answers

1K Views

0 Votes 29 Answers 1K Views

Hi, Although

Hi, although https://github.com/allegroai/clearml/issues/181 is resolved, clearml-agent (0.17.2) still logs tqdm iterations as different lines, is there some...

clearml

3 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hi There, Would It Be Possible For The Autoscaler To Support Stopping Instances Instead Of Terminating Them? My Use Case Is The Following: I Am Continuing My Journey With The Clearml-Session Tool, And In Case The Clearml-Session Is Running In A Ec2 Inst

Hi there, would it be possible for the autoscaler to support stopping instances instead of terminating them? My use case is the following: I am continuing my...

mlops remote-ssh

2 years ago

0 Votes

30 Answers

943 Views

0 Votes 30 Answers 943 Views

Hi, If I Am Starting My Training With The Following Command:

Hi, if I am starting my training with the following command: python -u -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --config configs/tra...

clearml

3 years ago

0 Votes

5 Answers

980 Views

0 Votes 5 Answers 980 Views

Hi There! I Have A Question Regarding S3 Access: I Created A S3 User With Read/Write Access But Not Delete, And Trains Seems To Requires Delete Permissions (See Errors Below). Why Does It Need Delete Permissions?

Hi there! I have a question regarding s3 access: I created a s3 user with read/write access but not delete, and trains seems to requires delete permissions (...

clearml

4 years ago

0 Votes

14 Answers

1K Views

0 Votes 14 Answers 1K Views

Hi, When I Use Task.Get_Logger().Report_Table, I Go The Ui After The Experiment Finishes And I Download The Table (Under Results > Plots), It Gives Me A Json File. How Can I Use It? It Seems To Follow A Structure Specific To Clearml, How Can I For Example

Hi, when I use task.get_logger().report_table, I go the UI after the experiment finishes and I download the table (under RESULTS > PLOTS), it gives me a json...

clearml

3 years ago

Show more results

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

ExcitedFish86 I have several machines with different cuda driver/runtime versions, that I why you might be confused as I am referring to one or another 🙂

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

I also did run sudo apt install nvidia-cuda-toolkit

3 years ago

0 Hi, I Have Several Long Running Experiments Failing With

AgitatedDove14 After investigation, another program on the machine consumed all the memory available, most likely making the OS killing the agent/task

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

ok yea now I see it

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

yes

3 years ago

0 Hi, I Have Another Problem

I just started one and it wrote:
...

4 years ago

0 Hi, I Have Another Problem

thanks, I will do that

4 years ago

0 Hello, In The Following Context:

Downloading the artifacts is done only when actually calling get()/get_local_copy()

Yes, I rather meant: reproduce this behavior even for getting metadata on the artifacts 🙂

4 years ago

0 Hi, I Have Another Problem

I specified a torch @ https://download.pytorch.org/whl/cu100/torch-1.3.1%2Bcu100-cp36-cp36m-linux_x86_64.whl and it didn't detect the link, it tried to install latest version: 1.6.0

4 years ago

0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

you mean “docker” was not installed and it did not throw an error ?

Yes docker was not installed in the machine

Yes you must make sure the docker can mount a persistent folder for you to work on.

Ok, it would be nice to have a --user-folder-mounted that do the linking automatically

3 years ago

0 Hi, I Have Another Problem

OK but nowhere I specified that, I just checked my trains.conf file

4 years ago

0 Hi, I Have Another Problem

ho, that might be it then, thanks!

4 years ago

0 Hi There! I Have A Question Regarding S3 Access: I Created A S3 User With Read/Write Access But Not Delete, And Trains Seems To Requires Delete Permissions (See Errors Below). Why Does It Need Delete Permissions?

I actually need to be able to overwrite files, so in my case it makes sense to give the Deleteobject permission in s3. But for other cases, why not simply catch this error, display a warning to the user and store internally that delete is not possible?

4 years ago

0 Hi, I Have Another Problem

yes, I use --gpus parameter

4 years ago

0 Hi, I Have Another Problem

python3 -m trains_agent --config-file "~/trains.conf" daemon --queue default --log-level DEBUG --detached --gpus 1 > ~/trains-agent.startup.log 2>&1

4 years ago

0 Hi There, I Used

UnevenDolphin73 , task = clearml.Task.get_task(clearml.config.get_remote_task_id()) worked, thanks

2 years ago

0 Hi There, I Used

and this works. However, without the trick from UnevenDolphin73 , the following won’t work (return None):
if __name__ == "__main__": task = Task.current_task() task.connect(config) run() from clearml import Task Task.init()

2 years ago

Actually I think I am approaching the problem from the wrong angle

2 years ago

0 How Can I Filter Out Archived Tasks With Task.Get_Tasks?

Thanks!

3 years ago

you mean to run it on the CI machine ?

yes

That should not happen, no? Maybe there is a bug that needs fixing on clearml-agent ?

It just to test that the logic being executed in if not Task.running_locally() is correct

2 years ago

In my github action, I should just have a dummy clearml server and run the task there, connecting to this dummy clearml server

2 years ago

0 Hi, Coming Back With The Venv Caching: With The Following Setting:

Yes, I guess that's fine then - Thanks!

3 years ago

0 Hi, Is There A Way To Get Some Stats About The Use Of Workers? I Would Like To Know, Over The Past 3 Months:

Thank you!

3 years ago

0 For Any Early Adopters, Who Also Want To Give Us Feedback - Both Good And Bad, Please Feel Free To Try The Clearml-Serving Beta

I’ll definitely check that out! 🤩

3 years ago

Ho wow! is it possible to not specify a remote task? (If i am working with Task.set_offline(True))

2 years ago

0 Hi, I Just Updated Clearml Server 1.0 Using

yes exactly

3 years ago

0 Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

the instances takes so much time to start, like 5 mins

3 years ago

0 Hi, If I Am Starting My Training With The Following Command:

btw I see in the pytorch_distributed_example I see that you average_gradients , but from pytorch https://pytorch.org/tutorials/beginner/dist_overview.html it says:
DDP takes care of gradient communication to keep model replicas synchronized and overlaps it with the gradient computations to speed up training.

3 years ago

0 Hi, If I Am Starting My Training With The Following Command:

Hi AgitatedDove14 , How should we proceed to fix this bug? Should I open an issue in github? Should I try to make a minimal reproducible example? It’s blocking me atm

3 years ago

0 Hi, Is It Possible To Get An Artifact From A Task And Force Not Using Local Cache? The Task Itself Updated The Artifact In The Meantime And I Cannot Get The Latest Version Of The Artifact. I Saw That

I sneaked a PR > https://github.com/allegroai/clearml/pull/319

3 years ago

Show more results