Actually I think I am approaching the problem from the wrong angle
This https://discuss.elastic.co/t/index-size-explodes-after-split/150692 seems to say for the _split API such situation happens and solves itself after a couple fo days, maybe the same case for me?
It seems that around here, a Task that is created using init remotely in the main process gets its output_uri
parameter ignored
Thanks AgitatedDove14 !
Could we add this task.refresh()
on the docs? Might be helpful for other users as well 🙂 OK! Maybe there is a middle ground: For artifacts already registered, returns simply the entry and for artifacts not existing, contact server to retrieve them
So it is there already, but commented out, any reason why?
I am confused now because I see in the master branch, the clearml.conf file has the following section:# Or enable credentials chain to let Boto3 pick the right credentials. # This includes picking credentials from environment variables, # credential file and IAM role using metadata service. # Refer to the latest Boto3 docs use_credentials_chain: false
So it states that IAM role using metadata service should be supported, right?
There is no need to add creds on the machine, since the EC2 instance has an attached IAM profile that grants access to s3. Boto3 is able retrieve the files from the s3 bucket
Yea I really need that feature, I need to move away from key/secrets to iam roles
AgitatedDove14 WOW, thanks a lot! I will dig into that 🚀
mmmh there is no closing of the task happening at that point. Note that just before the task.upload_artifact, I call task.logger.report_table("Metric summary", "Metric summary", 0, df_scores)
, if that matters
I ended up dropping omegaconf altogether
SuccessfulKoala55 I tried to setup in a different machine the clearml-agent and now I get a different error message in the logs:Warning: could not locate requested Python version 3.6, reverting to version 3.6 clearml_agent: ERROR: Python executable with version '3.6' defined in configuration file, key 'agent.default_python', not found in path, tried: ('python3.6', 'python3', 'python')
Ho and also use the colors of the series. That would be a killer feature. Then I simply need to match the color of the series to the name to check the tags
Configuration:
` {
"resource_configurations": {
"v100": {
"instance_type": "g4dn.2xlarge",
"availability_zone": "us-east-1a",
"ami_id": "ami-05e329519be512f1b",
"ebs_device_name": "/dev/sda1",
"ebs_volume_size": 100,
"ebs_volume_type": "gp3",
"key_name": "key.name",
"security_group_ids": [
"sg-asd"
],
"is_spot": false,
"extra_configura...
Make sure the cloned task is in Draft mode, if not, reset it
Then in the Execution tab of th task, in the Source Code section (first one), you can edit the values
The jump in the loss when resuming at iteration 31 is probably another issue -> for now I can conclude that:
I need to set sdk.development.report_use_subprocess = false
I need to call task.set_initial_iteration(0)
super, thanks SuccessfulKoala55 !
Thanks for sharing the issue UnevenDolphin73 , I’ll comment on it!
Yes, perfect!!
because I cannot locate libcudart or because cudnn_version = 0?
Awesome, thanks WackyRabbit7 , AgitatedDove14 !
But clearml does read from env vars as well right? It’s not just delegating resolution to the aws cli, so it should be possible to specify the region to use for the logger, right?