Reputation
Badges 1
611 × Eureka!Here it is
This is the error I get from setting the logger upload destination.botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.
Is there a clearml.conf for this agent somewhere?
Okay, great! I just want to run the cleanup services, however I am running into ssh issues so I wanted to restart it to try to debug.
I have set default_output_uri to s3://my_minio_instance:9000/clearml
If I set files_server to s3://my_minio_instance:9000 /bucket_that_does_not_exist it fails at uploading metrics, but model upload still works:
WARNING - Failed uploading to s3://my_minio_instance:9000/ bucket_that_does_not_exist ('NoneType' object has no attribute 'upload')
clearml.Task - INFO - Completed model upload to s3://my_minio_instance:9000/clearml
What is ` default_out...
Or maybe a different question: What is not
Artifacts and Models. debug samples (or anything else the Logger class creates)
?
Also it is not possible to use multiple files server? E.g. log tasks on different S3 buckets without changing clearml.conf
Hi SuccessfulKoala55
I meant that in the WebUI deletion should only be allowed for artifacts for which deletion actually works.
For example I now have a lot of lingering artifacts that exist on the fileservers, but not on the clearml-api-server (I think).
Another example: I delete a task via WebUI. ClearML-server tries to delete the task and the artifacts belonging to the task. However, it will show that the task has been successfully deleted but some artifacts have not. Now there is no way...
@<1523701994743664640:profile|AppetizingMouse58> Thank you very much. I forgot the volume mapping.
So can I just add the config to the async_delete container and mirror the directory structure from github?
volumes:
- /opt/clearml/config:/opt/clearml/config
- /opt/clearml/logs:/var/log/clearml
Perfect, thank you 🙂
Alright, that s unfortunate. But thank you very much!
If you think the explanation takes too much time, no worries! I do not want to waste your time on my confusion 😄
SweetBadger76 I am using the Cleanup Service
Thank you very much. I am going to try that.
Is sdk.development.default_output_uri used with s3://ip:9000/clearml or ip:9000/clearml ?
==> 2021-03-11 13:54:59 <==
# cmd: /home/tim/miniconda3/condabin/conda create --yes --mkdir --prefix /home/tim/.clearml/venvs-builds/3.8 python=3.8
# conda version: 4.9.2
+defaults/linux-64::_libgcc_mutex-0.1-main
+defaults/linux-64::ca-certificates-2021.1.19-h06a4308_1
+defaults/linux-64::certifi-2020.12.5-py38h06a4308_0
+defaults/linux-64::ld_impl_linux-64-2.33.1-h53a641e_7
+defaults/linux-64::libedit-3.1.20191231-h14c3975_1
+defaults/linux-64::libffi-3.3-he6710b0_2
+defaults/linux-64...
I got the error again. Seems to happen only when I try to delete "large" experiments.
Okay, I see. Unfortunetly, I don't get how clearml tasks are intended to be used. Could you help me with that? (see code)
` def start_carla_factory():
task = # How do I create this task?
long_blocking_call_to_start_carla()
return task
pipe = PipelineController(
name="carla-autostart",
project="rlad/carla-servers",
version="0.0.1",
add_pipeline_tags=False,
)
pipe.add_step(name="start-carla", base_task_factory=start_carla_factory)
pipe.start() `
SuccessfulKoala55 I just had the issue again. The logs show nothing of interest. It looks like OOM to me, but I will test this again with way larger SWAP, so the server only slows down, but does not kill something. Unfortunately, kernel logs also do not show much (maybe I have my server logs misconfigured, I am no expert).
What is interesting though is that docker only showed my nginx, minio and docker-registry to have exited, while all the clearml containers were still running. I restarted ...
Quick question: Where again does clearml place the venv? I wanna take a look into it after the task has failed
Sounds good. I think it is obvious that immutability has to be managed by the user then, but this is not different from not using clearml-data, so not a disadvantage in my opinion.
And clearml-agent should pull these datasets from network storage...
AgitatedDove14 SuccessfulKoala55 Could you briefly explain whether clearml supports no-copy add for datasets?
Yea, the real problem is that I have very large datasets in network storage. I am looking for a way to add the datasets on the networks storage as clearml-dataset.
Yea, the clearml-data is immutable, but not the underlying data if I just store a pointer to some location.
Maybe a related question: Anyone every worked with datasets larger than the clearml-agent cache? Some colleague of mine has a dataset of ~ 1 tera byte...
I ll add creating an issue to my todo list