Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
215 Questions, 1023 Answers
  Active since 10 January 2023
  Last activity 3 months ago

Reputation

0

Badges 1

981 × Eureka!
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
Quick question: How can I clone a task and change the cloned task type? I see no Task.set_type() function
5 years ago
0 Votes
10 Answers
2K Views
0 Votes 10 Answers 2K Views
Hi guys, any plan to integrate the https://github.com/allegroai/trains-agent/blob/master/examples/dynamic_cloud_cluster.ipynb in trains-server? The code ther...
5 years ago
0 Votes
13 Answers
2K Views
0 Votes 13 Answers 2K Views
Hey there, Is it possible for a clearml pipeline step to log a folder instead of numpy/pickle objects? Looking at the docs, monitor_artifacts could be what I...
3 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hi, in a subproject, would it be possible to hide the parent project if it is empty?
3 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
Hi, I have a long running experiment that was running on AWS instance that got killed after ~4 days with the following reason: STATUS REASON: Forced stop (no...
3 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
Does trains 0.16 supports pip >=20.2?
5 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hi, I see that there is a new parameter in aws autoscaler: max_spin_up_time_min - What is the difference with max_idle_time_min ?
aws
4 years ago
0 Votes
6 Answers
2K Views
0 Votes 6 Answers 2K Views
2 years ago
0 Votes
4 Answers
2K Views
0 Votes 4 Answers 2K Views
Hi, are the experiments logs stored in s3 or in the trains-server? (When using s3 as artifact storage)
4 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Hi, is it possible to start a clearml-agent (not in docker mode) on a machine with a gpu, but enforce the clearml-agent to not “see” the gpu? So that the exp...
4 years ago
0 Votes
23 Answers
2K Views
0 Votes 23 Answers 2K Views
Hi, I started a trains-agent (0.15) in services mode (full command: trains-agent daemon --services-mode --detached --queue services --create-queue --docker u...
5 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Hey there šŸ™‚ Still my journey to deploy the aws-autoscaler with spot instances, I have another question: I would like to limit the amount of time spent setti...
4 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
hi guys, is it possible to spin up two agents on one GPU? Something like trains-agent daemon --gpus 0 --queue default & trains-agent daemon --gpus 0 --queue ...
4 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
Hi again, it seems like the aws autoscaler is not spinning instances with the EBS configuration I configured. Here is the configuration: resource_configurati...
4 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Hi, where can I find the logs of trains-agent by default?
5 years ago
0 Votes
7 Answers
2K Views
0 Votes 7 Answers 2K Views
Hi, I deleted all archived experiments in a project and I just realized all experiments of all projects were deleted (clearml server v1.0.0) šŸ¤”
4 years ago
0 Votes
30 Answers
2K Views
0 Votes 30 Answers 2K Views
Hi, is it possible to pass environment variables to agents created by the AWS AutoScaler service?
4 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
āš ļø Hi there, I recently updated clearml server to 1.7.0, and found the following critical regression: When I reset an experiment, it is actually deleted 😵 ,...
2 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others
3 years ago
0 Votes
8 Answers
2K Views
0 Votes 8 Answers 2K Views
4 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
Hey there, since which version, clearml stops connecting to the demo server by default?
4 years ago
0 Votes
4 Answers
2K Views
0 Votes 4 Answers 2K Views
Hi guys, I got a very unexpected error today on in one of my agents: ... Collecting tqdm Using cached tqdm-4.48.2-py2.py3-none-any.whl (68 kB) Processing /ro...
5 years ago
0 Votes
23 Answers
2K Views
0 Votes 23 Answers 2K Views
Hi, I would like to bring awareness on this issue , this impacts my work as I cannot install the older version of torch (1.11.0)
2 years ago
0 Votes
0 Answers
2K Views
0 Votes 0 Answers 2K Views
Hi, I encountered a bug on clearml-server 1.0.1: I tried to add in a project page a custom column in +HYPER PARAMETERS > Args > queue and got an error pop up...
4 years ago
0 Votes
0 Answers
2K Views
0 Votes 0 Answers 2K Views
Hi all, Would it be possible to make the aws autoscaler log each scale in/out operation in the console to help debugging/understanding the course of events?
4 years ago
0 Votes
25 Answers
2K Views
0 Votes 25 Answers 2K Views
Hi, I have another problem šŸ˜… in one of my agent, one experiment started without torch using GPU. In the logs of the experiment shared below, we can see that...
5 years ago
0 Votes
16 Answers
2K Views
0 Votes 16 Answers 2K Views
Got some errors while running migration script from ES5 to ES7: 2020-08-11 15:21:50,130 Running on: Linux 2020-08-11 15:21:50,227 Docker allocated memory: 16...
5 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
Hi there, I would like to report a bug with the resizing of the columns in the projects view: it doesn’t work as expected. Please look at the behavior of the...
4 years ago
0 Votes
26 Answers
2K Views
0 Votes 26 Answers 2K Views
Hi, I would like to follow-up in this https://clearml.slack.com/archives/CTK20V944/p1646123127790389 happening on clearml server 1.2.0 (self hosted on a sing...
3 years ago
0 Votes
4 Answers
2K Views
0 Votes 4 Answers 2K Views
The “Manage queue” option in the right tab on a queued experiment is broken in v1.0 (it does nothing)
4 years ago
Show more results questions
0 Got Some Errors While Running Migration Script From Es5 To Es7:

Thanks! Unfortunately still not working, here is the log file:

5 years ago
0 Hello, I Have Some Problems With Allegro. I Run A Programm And Then I Saw It On The Trains Server. But Now I Change Something With The Code And I Pushed It Again. Now I Cloned It. But The Old Code Was Executed. How Can I Run The New Code I Pushed?

Make sure the cloned task is in Draft mode, if not, reset it
Then in the Execution tab of th task, in the Source Code section (first one), you can edit the values

4 years ago
0 Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

AgitatedDove14 Yes with the command you shared I can now ssh again manually to the agent, but I still clearml-agent will raise the same error

3 years ago
0 Hi There, I Have A Problem With Pyjwt: I Am Using

hooo now I understand, thanks for clarifying AgitatedDove14 !

4 years ago
0 Hello, ~3 Months Ago I Created A Trains-Server In A Machine With 30Gb Of Disk Space. Today I Wasn'T Able To Connect To Trains-Server, So I Checked The Server And Found That The Disk Full. I Ran:

Stopping the server Editing the docker-compose.yml file, adding the logging section to all services Restarting the serverDocker-compose freed 10Go of logs

4 years ago
0 Does Trains 0.16 Supports Pip >=20.2?

Yes, but a minor one. I would need to do more experiments to understand what is going on with pip skipping some packages but reinstalling others.

5 years ago
0 Trains-Elastic | {"Type": "Server", "Timestamp": "2020-12-07T15:19:11,101Z", "Level": "Error", "Component": "O.E.B.Elasticsearchuncaughtexceptionhandler", "Cluster.Name": "Trains", "Node.Name": "Trains", "Message": "Uncaught Exception In Thread [Main]",

it worked for the other folder, so I assume yes --> I archived the /opt/trains/data/mongo, sent the archive via scp, unarchived, updated the rights and now it works

4 years ago
0 Hi There, I Used

AgitatedDove14 So I copied pasted locally the https://github.com/pytorch-ignite/examples/blob/main/tutorials/intermediate/cifar10-distributed.py from the examples of pytorch-ignite. Then I added a requirements.txt and called clearml-task to run it on one of my agents. I adapted a bit the script (removed python-fire since it’s not yet supported by clearml).

3 years ago
0 Hi There, I Used

and this works. However, without the trick from UnevenDolphin73 , the following won’t work (return None):
if __name__ == "__main__": task = Task.current_task() task.connect(config) run() from clearml import Task Task.init()

3 years ago
0 Hey There, I Would Like To Increase The

So actually I don’t need to play with this limit, I am OK with the default for now

4 years ago
0 Hey There, I Would Like To Increase The

it actually looks like I don’t need such a high number of files opened at the same time

4 years ago
0 Hey There, I Would Like To Increase The

yes please, I think indeed that’s the problen

4 years ago
0 Hey There, I Would Like To Increase The

I will try adding
sudo sh -c "echo '\n* soft nofile 65535\n* hard nofile 65535' >> /etc/security/limits.conf"to the extra_vm_bash_script , maybe that’s enough actually

4 years ago
0 Hi There, I Used

AgitatedDove14 , my ā€œuncommitted changesā€ ends with
... if __name__ == "__main__": task = clearml.Task.get_task(clearml.config.get_remote_task_id()) task.connect(config) run() from clearml import Task Task.init()

3 years ago
0 Hey There, I Would Like To Increase The

mmmh it fails, but if I connect to the instance and execute ulimit -n , I do see
65535while the tasks I send to this agent fail with:
OSError: [Errno 24] Too many open files: '/root/.commons/images/aserfgh.png'and from the task itself, I run:
import subprocess print(subprocess.check_output("ulimit -n", shell=True))Which gives me in the logs of the task:
b'1024'So nnofiles is still 1024, the default value, but not when I ssh, damn. Maybe rebooting would work

4 years ago
0 Hi There

AgitatedDove14 I cannot confirm at 100%, the context is different (see previous messages) but it could be the same bug behind the scene...

5 years ago
0 Hi There

What is weird is:
Executing the task from an agent: task.get_parameters() returns an empty dict Calling task.get_parameters() from a local standalone script returns the correct properties, as shown in web UI, even if I updated them in UI.So I guess the problem comes from trains-agent?

5 years ago
0 Hi There

More context:
trains, trains-agent and trains-server all 0.16 Session.api_version -> 2.9 (both when executed in trains-agent and in local script)

5 years ago
0 Hi There

So in my minimal reproducable example, it does work 🤣 very frustrating, I will continue searching for that nasty bug

5 years ago
0 Hi There

I just read, I do have the trains version 0.16 and the experiment is created with that version

5 years ago
0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

ExcitedFish86 I have several machines with different cuda driver/runtime versions, that I why you might be confused as I am referring to one or another šŸ™‚

4 years ago
0 Hi There

As to why: This is part of the piping that I described in a previous message: Task B requires an artifact from task A, so I pass the name of the artifact as a parameter of task B, so that B knows what artifact from A it should retrieve

5 years ago
0 Hey There, I Would Like To Increase The

that works from within the ssh session

4 years ago
0 Hi There, I Used

AgitatedDove14 No, should I?

3 years ago
0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

(btw, yes I adapted to use Task.init(...output_uri=)

5 years ago
Show more results compactanswers