Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
215 Questions, 1023 Answers
  Active since 10 January 2023
  Last activity one month ago

Reputation

0

Badges 1

981 × Eureka!
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others
3 years ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
Hi, I am getting an error while running task.mark_stopped() , any idea why? (clearml 1.0.2, clearml-agent 1.0.0, python 3.6) File "/home/machine/.clearml/ven...
4 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
2 years ago
0 Votes
7 Answers
2K Views
0 Votes 7 Answers 2K Views
3 years ago
0 Votes
6 Answers
2K Views
0 Votes 6 Answers 2K Views
4 years ago
0 Votes
10 Answers
2K Views
0 Votes 10 Answers 2K Views
Hey, what is the exact difference between agent.package_manager.system_site_packages and trains-agent --install-globally ?
5 years ago
0 Votes
8 Answers
2K Views
0 Votes 8 Answers 2K Views
Hi guys, is a Task updating its status to 'Complete' before finishing to upload its artifacts/metrics in the background?
4 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
Hi, is it possible to disable some of the system metrics monitored? and also downsample the rate of logging?
4 years ago
0 Votes
30 Answers
2K Views
0 Votes 30 Answers 2K Views
Hello, I am getting ValueError: Could not get access credentials for ' s3://my-bucket ' , check configuration file ~/trains.conf but I did specify them in my...
4 years ago
0 Votes
30 Answers
2K Views
0 Votes 30 Answers 2K Views
4 years ago
0 Votes
30 Answers
2K Views
0 Votes 30 Answers 2K Views
Hi, in one of my agents with CUDA Version: 11.1 (from nvidia-smi), clearml agent 0.17.1 detects version 100 (I can see from experiments logs: agent.cuda_vers...
4 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
Hi there, is it possible to configure the clearml-agent to run some commands before running each experiment it launches? Eg. echo "test" > "test.txt" && <-- ...
3 years ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
Hi, is clearml-server compatible with latest versions of ES ( > 7.6.2)?
4 years ago
0 Votes
6 Answers
2K Views
0 Votes 6 Answers 2K Views
Hi, is it possible to specify the required version of python for a Task that is different from the python running the clearml-agent? Example: my clearml-agen...
2 years ago
0 Votes
12 Answers
2K Views
0 Votes 12 Answers 2K Views
2 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
aws
3 years ago
0 Votes
6 Answers
2K Views
0 Votes 6 Answers 2K Views
Hi there, maybe this was already asked but I don't remember: Would it be possible to have the clearml-agent switch between docker mode and virtualenv mode at...
2 years ago
0 Votes
18 Answers
2K Views
0 Votes 18 Answers 2K Views
Hi, kudos for the 0.15 guys! I am having an issue related to git auth: I have an issue with trains-agent (0.15): it does not use git creds while trying to cl...
5 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
2 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
Hey, I would like my experiment to call at some point a CLI program installed as a dependency of the experiment. Here is what I do: myTask = Task.init(...) i...
4 years ago
0 Votes
14 Answers
2K Views
0 Votes 14 Answers 2K Views
3 years ago
0 Votes
3 Answers
195 Views
0 Votes 3 Answers 195 Views
one month ago
0 Votes
14 Answers
2K Views
0 Votes 14 Answers 2K Views
4 years ago
0 Votes
18 Answers
2K Views
0 Votes 18 Answers 2K Views
Hey there, I would like to increase the ulimit for the number of files opened at the same time in a ec2 instance. According to this https://stackoverflow.com...
4 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
3 years ago
0 Votes
16 Answers
1K Views
0 Votes 16 Answers 1K Views
Hey, I have a problem with the following task: def main(args): config = yaml.load(open(args.config)) if __name__ == '__main__': parser = argparse.ArgumentPar...
4 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
aws
3 years ago
0 Votes
0 Answers
2K Views
0 Votes 0 Answers 2K Views
Hi all, Would it be possible to make the aws autoscaler log each scale in/out operation in the console to help debugging/understanding the course of events?
4 years ago
0 Votes
30 Answers
2K Views
0 Votes 30 Answers 2K Views
Hi again, my clearml api-server is having a memory leak. Each time I restart it, its ram consumption grows until getting OOM, is not killed and make the ec2 ...
4 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
Hi, I have an error with clearml-agent 1.5.1 when importing tensorflow 2.10 from tensorflow.python.client._pywrap_tf_session import * File "/root/.clearml/ve...
2 years ago
Show more results questions
0 Hi, One More Question: When Creating A Task With Task.Init(), We Can Specify The

I still don't see why you would change the type of the cloned Task, I'm assuming the original Task had the correct type, no?

Because it is easier for me that I create a training task out of the controller task by cloning it (so that parameters are prefilled and I can set the parent task id)

5 years ago
0 Hi, In The Aws Autoscaler, Is It Possible To Specify Multiple Regions (Availability_Zone)? I Currently Use Eu-West-1A, And Would Like To Start Using Eu-West-1B And Eu-West-1C. I Tried Specifying A List In Availability_Zone Parameter, But Without Success:

yea I just realized that you would also need to specify different subnets, etc… not sure how easy it is 😞 But it would be very valuable, on-demand GPU instances are so hard to spin up nowadays in aws 😄

3 years ago
0 Hi There,

Update: I successfully isolated one of the reason, mem leak in matplotib itself, I opened an issue on their repo here

2 years ago
5 years ago
0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

I'll try to pass these values using the env vars

4 years ago
0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

From my experience, I only installed cuda drivers on my machines. I didn't used conda to install torch nor cudatoolkit, I just let clearml-agent download the torch wheel file and install it

4 years ago
0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

but I also make sure to write the trains.conf to the root directory in this bash script:
echo " sdk.aws.s3.key = *** sdk.aws.s3.secret = *** " > ~/trains.conf ... python3 -m trains_agent --config-file "~/trains.conf" ...

4 years ago
0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

After some investigation, I think it could come from the way you catch error when checking the creds in trains.conf: When I passed the aws creds using env vars, another error poped up: https://github.com/boto/botocore/issues/2187 , linked to boto3

4 years ago
0 Is It Possible To Run An Agent, Listen To The Services Queue Without Using Docker?

Alright, so the steps would be:

trains-agent build --docker nvidia/cuda --id myTaskId --target base_env_services

That would create me a base docker image base_env_services . Then how should I ensure that trains-agent uses that base image for the services queue? My guess is:

trains-agent daemon --services-mode --detached --queue services --create-queue --docker base_env_services --cpu-only

Would that work?

5 years ago
0 Hi, In The Context Of Multi-Gpu Training, Is

if I want to resume a training on multi gpu, I will need to call this function on each process to send the weights to each gpu

3 years ago
0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

So the problem comes when I do
my_task.output_uri = " s3://my-bucket , trains in the background checks if it has access to this bucket and it is not able to find/ read the creds

4 years ago
0 Hi, Is It Possible To Disable Some Of The System Metrics Monitored? And Also Downsample The Rate Of Logging?

AgitatedDove14 I see that the default sample_frequency_per_sec=2. , but in the UI, I see that there isn’t such resolution (ie. it logs every ~120 iterations, corresponding to ~30 secs.) What is the difference with report_frequency_sec=30. ?

4 years ago
0 Hi, I Attached An Iam Role To An Ec2 Instance To Grant Access To An S3 Bucket. The Ec2 Instance Is Running A Clearml-Agent (V1.1.0). I Didn’T Specify Any Key/Secret For Clearml. The Tasks Fail With The Following Error:

I am confused now because I see in the master branch, the clearml.conf file has the following section:
# Or enable credentials chain to let Boto3 pick the right credentials. # This includes picking credentials from environment variables, # credential file and IAM role using metadata service. # Refer to the latest Boto3 docs use_credentials_chain: falseSo it states that IAM role using metadata service should be supported, right?

3 years ago
0 Hi, I Have Another Problem

I specified a torch @ https://download.pytorch.org/whl/cu100/torch-1.3.1%2Bcu100-cp36-cp36m-linux_x86_64.whl and it didn't detect the link, it tried to install latest version: 1.6.0

4 years ago
0 Got Some Errors While Running Migration Script From Es5 To Es7:

I should also rename /opt/trains/data/elastic_migrated_2020-08-11_15-27-05 folder to /opt/trains/data/elastic before running the migration tool right?

5 years ago
0 Hi, I Have A Local Package That I Use To Train My Models. To Start Training, I Have A Script That Calls

Sure! Here are the relevant parts:
` ...
Current configuration (clearml_agent v1.2.3, location: /tmp/.clearml_agent.3m6hdm1_.cfg):

...
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = ==20.2.3
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 ...

3 years ago
0 Hey There, I Moved The Clearml S3 Bucket Where I Stored All My Clearml Data From One S3 Bucket To Another And Now I Realized That All The Models/Experiments Logged In The Clearml-Server Still Refer To The Old S3 Bucket. Is There A Way To Update All The Re

Yes, I would like to update all references to the old bucket unfortunately… I think I’ll simply delete the old s3 bucket, wait or his name to be available again and recreate it where on the other aws account and move the data there. This way I don’t have to mess with clearml data - I am afraid to do something wrong and loose data

4 years ago
0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

even if I explicitely use previous_task.output_uri = " s3://my_bucket " , it is ignored and still saves the json file locally

5 years ago
Show more results compactanswers