
Reputation
Badges 1
981 × Eureka!ha nice, where can I find the mapping template of the original clearml so that I can copy and adapt?
the api-server shows when starting:clearml-apiserver | [2021-07-13 11:09:34,552] [9] [INFO] [clearml.es_factory] Using override elastic host
`
clearml-apiserver | [2021-07-13 11:09:34,552] [9] [INFO] [clearml.es_factory] Using override elastic port 9200
...
clearml-apiserver | [2021-07-13 11:09:38,407] [9] [WARNING] [clearml.initialize] Could not connect to ElasticSearch Service. Retry 1 of 4. Waiting for 30sec
clearml-apiserver | [2021-07-13 11:10:08,414] [9] [WARNING] [clearml.initia...
SuccessfulKoala55 I was able to recreate the indices in the new ES cluster. I specified number_of_shards: 4
for the events-log-d1bd92a3b039400cbafc60a7a5b1e52b
index. I then copied the documents from the old ES using the _reindex
API. The index is 7.5Gb on one shard.
Now I see that this index on the new ES cluster is ~19.4Gb š¤ The index is divided into the 4 shards, but each shard is between 4.7Gb and 5Gb!
I was expecting to have the same index size as in the previous e...
I am not sure I can do both operations at the same time (migration + splitting), do you think itās better to do splitting first or migration first?
ha wait, I removed the http://
in the host and it worked š
Ok thanks! And for this?
Would it be possible to support such use case? (have the clearml-agent setting-up a different python version when a task needs it?)
Yes, super thanks AgitatedDove14 !
Yea I really need that feature, I need to move away from key/secrets to iam roles
There is no need to add creds on the machine, since the EC2 instance has an attached IAM profile that grants access to s3. Boto3 is able retrieve the files from the s3 bucket
I will go for lunch actually š back in ~1h
ok, and if not the case, it will fall back to 3.8, right? Would it be possible to support such use case? (have the clearml-agent setting-up a different python version when a task needs it?)
SuccessfulKoala55 I was able to make it work with use_credentials_chain: true
in the clearml.conf and the following patch: https://github.com/allegroai/clearml/pull/478
I am confused now because I see in the master branch, the clearml.conf file has the following section:# Or enable credentials chain to let Boto3 pick the right credentials. # This includes picking credentials from environment variables, # credential file and IAM role using metadata service. # Refer to the latest Boto3 docs use_credentials_chain: false
So it states that IAM role using metadata service should be supported, right?
SuccessfulKoala55
In the docker-compose file, you have an environment setting for theĀ apiserverĀ service host and port (CLEARML_ELASTIC_SERVICE_HOSTĀ andĀ CLEARML_ELASTIC_SERVICE_PORT) - changing those will allow you to point the server to another ES service
The ES cluster is running in another machine, how can I set its IP in CLEARML_ELASTIC_SERVICE_HOST
? I would need to add host
to the networks of the apiserver service somehow? How can I do that?
You mean it will resolve by itself in the following days or should I do something? Or there is nothing to do and it will stay this way?
PS: in the new env, Iāv set num_replicas: 0, so Iām only talking about primary shardsā¦
Thanks AgitatedDove14 ! I created a project with a default output destination to a s3 bucket but I don't have local access to this bucket (only agents have access to it for security reasons). Because of that, I cannot create a task in this project programmatically locally because it tries to access the bucket and fails. And there is no easy way to change the default output location (not in the web UI, not in the sdk)
Indeed, I actually had the old configuration that was not JSON - I converted to json, now works š
then print(Task.get_project_object().default_output_destination)
is still the old value
Task.get_project_object().default_output_destination = None
Yes, perfect!!
Hi SuccessfulKoala55 , thanks for the idea! the function isnāt called with atexit.register() though, maybe the way the agent kills the task is not supported by atexit
How exactly is the clearml-agent killing the task?
SuccessfulKoala55 Could you please point me to where I could quickly patch that in the code?