Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
214 Questions, 1021 Answers
  Active since 10 January 2023
  Last activity 7 months ago

Reputation

0

Badges 1

979 × Eureka!
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
Hi guys, with the new venv caching available in clearml, I have the following problem: I force my pip requirements to be: torch==1.7.1 pytorch-ignite clearml...
3 years ago
0 Votes
7 Answers
958 Views
0 Votes 7 Answers 958 Views
Hi, is there a way to get some stats about the use of workers? I would like to know, over the past 3 months: Number of training hours per user Number of trai...
3 years ago
0 Votes
19 Answers
1K Views
0 Votes 19 Answers 1K Views
I guess one experiment is running backwards in time 😄
2 years ago
0 Votes
5 Answers
989 Views
0 Votes 5 Answers 989 Views
Hey there, since which version, clearml stops connecting to the demo server by default?
3 years ago
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
Hi, in one of my agents with CUDA Version: 11.1 (from nvidia-smi), clearml agent 0.17.1 detects version 100 (I can see from experiments logs: agent.cuda_vers...
3 years ago
0 Votes
18 Answers
979 Views
0 Votes 18 Answers 979 Views
Hello there, I would like to do run cleanup code in case the user aborts one task from the dashboard (the agent is not using the task in docker). What signal...
4 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
Hi again, it seems like the aws autoscaler is not spinning instances with the EBS configuration I configured. Here is the configuration: resource_configurati...
3 years ago
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
3 years ago
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
Hi again, my clearml api-server is having a memory leak. Each time I restart it, its ram consumption grows until getting OOM, is not killed and make the ec2 ...
3 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
Hi there, I think there is a bug with clearml sdk v0.17.5rc2: when running a task locally, the dashboard doesnt not shows the task as finished once the task ...
3 years ago
0 Votes
8 Answers
1K Views
0 Votes 8 Answers 1K Views
Hi, I would like to create backups of my trains-server periodically. I was thinking about creating a service task under the devops project. The backup task w...
3 years ago
0 Votes
12 Answers
1K Views
0 Votes 12 Answers 1K Views
Hi there! Is there an easy way to retrieve the site-package directory that was created by an agent from inside a task? Eg. task = Task.init(...) task.add_req...
2 years ago
0 Votes
18 Answers
1K Views
0 Votes 18 Answers 1K Views
Hi, I just updated clearml server 1.0 using docker-compose down & docker-compose pull & docker-compose up -d , it worked ant it looks amazing! I found two pr...
3 years ago
0 Votes
1 Answers
975 Views
0 Votes 1 Answers 975 Views
Hi, would it be possible to parse torch requirement when it’s part of the extras_require dict? In my code, I have the following: train_task._update_requireme...
3 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
Hi, are the experiments logs stored in s3 or in the trains-server? (When using s3 as artifact storage)
3 years ago
0 Votes
11 Answers
1K Views
0 Votes 11 Answers 1K Views
Hey, I moved my trains-server to another machine, zipping the /opt/trains/data folder as described in the docs https://allegro.ai/docs/deploying_trains/train...
4 years ago
0 Votes
22 Answers
1K Views
0 Votes 22 Answers 1K Views
Hi there, I used clearml-task to send a script to be executed remotely. When being executed remotely Task.current_task() returns None, how should I get the c...
2 years ago
0 Votes
15 Answers
1K Views
0 Votes 15 Answers 1K Views
Hi, I restarted my clearml-server (1.1.0) and the login page always redirects me to the login page. I am using fixed users in config files. In the logs of th...
3 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
Hi, I am using the aws autoscaler and getting the following error while trying to spin up spot instances: 2021-08-16 17:18:48 Spinning new instance type=v100...
3 years ago
0 Votes
3 Answers
943 Views
0 Votes 3 Answers 943 Views
Hey there, I see that in the autoscaler configuration, the queues param accept dictionaries with values of type list of lists (see eg below.) What does it me...
3 years ago
0 Votes
20 Answers
1K Views
0 Votes 20 Answers 1K Views
Is it possible to run an agent, listen to the services queue without using docker?
4 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Hi there, congrats for releasing v1 😄 I observed that with pytorch ignite (4.2.0), the metrics of the validation engines are delayed by one epoch. I am not ...
3 years ago
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
Hello, I am getting ValueError: Could not get access credentials for ' s3://my-bucket ' , check configuration file ~/trains.conf but I did specify them in my...
4 years ago
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
Hi, I am getting the following errors in the experiments I am currently running: 2021-06-25 17:11:47,911 - clearml.Metrics - ERROR - Action failed <504/0: ev...
3 years ago
0 Votes
16 Answers
1K Views
0 Votes 16 Answers 1K Views
Hi guys, coming this time to share an idea of a killer feature for ClearML 🚀 I am pretty sure you guys already heard of https://www.streamlit.io/ , which is...
3 years ago
0 Votes
29 Answers
1K Views
0 Votes 29 Answers 1K Views
Hi, although https://github.com/allegroai/clearml/issues/181 is resolved, clearml-agent (0.17.2) still logs tqdm iterations as different lines, is there some...
3 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
2 years ago
0 Votes
30 Answers
943 Views
0 Votes 30 Answers 943 Views
Hi, if I am starting my training with the following command: python -u -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --config configs/tra...
3 years ago
0 Votes
5 Answers
980 Views
0 Votes 5 Answers 980 Views
Hi there! I have a question regarding s3 access: I created a s3 user with read/write access but not delete, and trains seems to requires delete permissions (...
4 years ago
0 Votes
14 Answers
1K Views
0 Votes 14 Answers 1K Views
3 years ago
Show more results questions
0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

ExcitedFish86 I have several machines with different cuda driver/runtime versions, that I why you might be confused as I am referring to one or another 🙂

3 years ago
0 Hi, I Have Several Long Running Experiments Failing With

AgitatedDove14 After investigation, another program on the machine consumed all the memory available, most likely making the OS killing the agent/task

3 years ago
0 Hi, I Have Another Problem

I just started one and it wrote:
...

4 years ago
0 Hi, I Have Another Problem

thanks, I will do that

4 years ago
0 Hello, In The Following Context:

Downloading the artifacts is done only when actually calling get()/get_local_copy()

Yes, I rather meant: reproduce this behavior even for getting metadata on the artifacts 🙂

4 years ago
0 Hi, I Have Another Problem

I specified a torch @ https://download.pytorch.org/whl/cu100/torch-1.3.1%2Bcu100-cp36-cp36m-linux_x86_64.whl and it didn't detect the link, it tried to install latest version: 1.6.0

4 years ago
0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

 you mean “docker” was not installed and it did not throw an error ?

Yes docker was not installed in the machine

Yes you must make sure the docker can mount a persistent folder for you to work on.

Ok, it would be nice to have a --user-folder-mounted that do the linking automatically

3 years ago
0 Hi, I Have Another Problem

OK but nowhere I specified that, I just checked my trains.conf file

4 years ago
0 Hi, I Have Another Problem

ho, that might be it then, thanks!

4 years ago
0 Hi There! I Have A Question Regarding S3 Access: I Created A S3 User With Read/Write Access But Not Delete, And Trains Seems To Requires Delete Permissions (See Errors Below). Why Does It Need Delete Permissions?

I actually need to be able to overwrite files, so in my case it makes sense to give the Deleteobject permission in s3. But for other cases, why not simply catch this error, display a warning to the user and store internally that delete is not possible?

4 years ago
0 Hi, I Have Another Problem

yes, I use --gpus parameter

4 years ago
0 Hi, I Have Another Problem

python3 -m trains_agent --config-file "~/trains.conf" daemon --queue default --log-level DEBUG --detached --gpus 1 > ~/trains-agent.startup.log 2>&1

4 years ago
0 Hi There, I Used

UnevenDolphin73 , task = clearml.Task.get_task(clearml.config.get_remote_task_id()) worked, thanks

2 years ago
0 Hi There, I Used

and this works. However, without the trick from UnevenDolphin73 , the following won’t work (return None):
if __name__ == "__main__": task = Task.current_task() task.connect(config) run() from clearml import Task Task.init()

2 years ago
2 years ago
0 Hi, Similar To Task.Set_Offline(True), Is There A Way To Simulate An Execution In An Agent? (For Testing Purposes)

you mean to run it on the CI machine ?

yes

That should not happen, no? Maybe there is a bug that needs fixing on clearml-agent ?

It just to test that the logic being executed in if not Task.running_locally() is correct

2 years ago
0 Hi, Similar To Task.Set_Offline(True), Is There A Way To Simulate An Execution In An Agent? (For Testing Purposes)

In my github action, I should just have a dummy clearml server and run the task there, connecting to this dummy clearml server

2 years ago
0 Hi, Coming Back With The Venv Caching: With The Following Setting:

Yes, I guess that's fine then - Thanks!

3 years ago
0 Hi, Similar To Task.Set_Offline(True), Is There A Way To Simulate An Execution In An Agent? (For Testing Purposes)

Ho wow! is it possible to not specify a remote task? (If i am working with Task.set_offline(True))

2 years ago
0 Hi, If I Am Starting My Training With The Following Command:

btw I see in the pytorch_distributed_example I see that you average_gradients , but from pytorch https://pytorch.org/tutorials/beginner/dist_overview.html it says:
DDP takes care of gradient communication to keep model replicas synchronized and overlaps it with the gradient computations to speed up training.

3 years ago
0 Hi, If I Am Starting My Training With The Following Command:

Hi AgitatedDove14 , How should we proceed to fix this bug? Should I open an issue in github? Should I try to make a minimal reproducible example? It’s blocking me atm

3 years ago
Show more results compactanswers