Reputation
Badges 1
611 × Eureka!Thank you very much! 😃
There is no way to create an artifact/model/dataset without a task, right? Just always inherit from the parent task. And if cloned change the user to the user who did the clone.
(just for my own interest: how much does the enterprise version divert from the open source version? It it just extended or are there core changes to the enterprise version)
Thank you for clearing that up 🙂
So just tried again and still it does not work.
This is what is in .ssh on my clearml-agent-rw------- 1 tim tim 1,5K Apr 8 14:28 authorized_keys -rw-rw-r-- 1 tim tim 208 Apr 29 11:15 config -rw------- 1 tim tim 432 Apr 8 14:53 id_ed25519 -rw-r--r-- 1 tim tim 119 Apr 8 14:53 id_ed25519.pub -rw------- 1 tim tim 432 Apr 29 11:16 id_gitlab -rw-r--r-- 1 tim tim 119 Apr 29 11:25 id_gitlab.pub -rw-rw-r-- 1 tim tim 3,1K Apr 29 11:33 known_hosts
I have a related question: I read here that 4GB is a http limitation and ClearML will not chunk single files. I take from that, that ClearML did not want/there was no need to implement an own solution so far. But what about models that are larger than 4GB?
Well, I guess no hurdles vs. safety is inherently no solvable. I am all for hurdles, if it is clear how to overcome it. And in my opinion referring to clearml-init is something which makes sense from a developer and a user perspective.
Okay, I found something out: When I use docker image ubuntu:22.04 it does not spin up a service agent and aborts the task. When I used python:latest everything works fine!
One question: Does clearml resolve the CUDA Version from driver or conda?
Nvm. I think I understood. When the file has never been added to repository it is not tracked.
Hard to answer now. I just wiped everything and reinstalled. If I encounter this problem again, I will investigate further.
It is weird though. The task is submitted by the original user and then run on the agent. The task however is still registered by the original user, since it is created by the original user.
Makes more sense to just inherit the user from the task than from the agent?
To answer my own question: In the WebUI where one inputs the credentials, use https for the host instead of the auto-added http
I am going to try it again and send you the relevant part of the logs in a minute. Maybe I am interpreting something wrong.
Another example on what I would expect:
` ### start_carla.py
def get_task():
task = Task.init(project_name="examples", task_name="start-carla", task_type="application")
# experiment is not run here. The experiment is only run when this is executed as standalone or on a clearml-agent.
return task
def run_experiment(task):
...
This task can also be run as standalone or run by a clearml-agent
if name == "main":
task = get_task()
run_experiment(task)
run_pi...
Exactly. I don't want people to circumvent the queue 🙂
Thank you. Seems like someone implemented a type check Error: Dataset id=8d7355655830427f9243671c8cf0a6b0 is not of type Dataset :)
Artifact Size: 74.62 MB
I will debug this myself a little more.
Maybe related question: Will there be some documentation about clearml internals with the new documentation? ClearML seems to store stuff that's relevant to script execution outside of clearml.Task if I am not mistaken. I would like to learn a little bit about what the code structure / internal mechanism is.
CostlyOstrich36 Actually no container exits, so I guess if it s because of OOM like SuccessfulKoala55 implies, than maybe a process inside the container gets killed and the container will hang? Is this possible?
SuccessfulKoala55 I did not observe elastic to use much RAM (at least right after starting). Doesn't this line in the docker-compose control the RAM usage?ES_JAVA_OPTS: -Xms2g -Xmx2g -Dlog4j2.formatMsgNoLookups=true
These are the errors I get if I use file_servers without a bucket ( s3://my_minio_instance:9000 )
2022-11-16 17:13:28,852 - clearml.storage - ERROR - Failed creating storage object Reason: Missing key and secret for S3 storage access ( ) 2022-11-16 17:13:28,853 - clearml.metrics - WARNING - Failed uploading to ('NoneType' object has no attribute 'upload_from_stream') 2022-11-16 17:13:28,854 - clearml.storage - ERROR - Failed creating storage object ` Reason: Missing key...
I use fixed users!
It seems like the services-docker is always started with Ubuntu 18.04, even when I usetask.set_base_docker( "continuumio/miniconda:latest -v /opt/clearml/data/fileserver/:{}".format( file_server_mount ) )
Is this working in the latest version? clearml-agent falls back to /usr/bin/python3.8 no matter how I configure clearml.conf Just want to make sure, so I can investigate what's wrong with my machine if it is working for you.