I had again the same problem but within a remote pipeline setup.
Are you saying the ussue is not fixed? can you verify the pipeline & pipeline components are using the at least 1.104rc0 version?
https://hub.docker.com/layers/nvidia/cuda/10.1-cudnn7-runtime-ubuntu18.04/images/sha256-963696628c9a0d27e9e5c11c5a588698ea22eeaf138cc9bff5368c189ff79968?context=explore
the docker image is missing the cudnn which is a must for TF to work 🙂
Good news a dedicated class for exactly that will be out in a few days 🙂
Basically task scheduler and task trigger scheduler, running as a service cloning/launching tasks either based on time (cron alike) or based on a trigger).
wdyt?
BTW: Basically just call Task.init(...) the rest is magic 🙂
Hi CheerfulGorilla72 ,
Sure there are:
https://github.com/allegroai/clearml/tree/master/examples/frameworks/pytorch-lightning
Hi DrabCockroach54
Do we know if gpu_0_mem_usage and gpu_0_mem_used_gb, both shows current GPU usage?
the first is percentage used (memory % used at any specific moment) and the second is memory used GiB , both for the video memory
How to know from this how much GPU is reserved for the task if this task is in progress?
What do you mean by how much is reserved ? Are you running with an agent?
YummyWhale40 you mean like continue training?
https://github.com/allegroai/trains/issues/160
creating a dataset with parents worked very well and produced great visuals on the UI!
woot woot!
I tried the squash solution, however this somehow caused a download of all the datasets into my
so this actually works, kind or like git squash, bottom line it will repackage the data from all the different versions into one new version. This means downloading the data from all squashed versions, then repackaging it into a single new version. Make sense ?
Yeah the hack would work but i’m trying to use it form the command line to put in airflow. I’ll post on GH
Oh, then set TMP/TMPDIR environment variable, it should have the same effect
Hi @<1651395720067944448:profile|GiddyHedgehong81>
However I need for a yolov8 (Object detection with arround 20k jpgs and .txt files) the data.yaml file:
Just add the entire folder with your files to a dataset, then get it in your code
Add files (you can do that from CLI for example): None
clearml-data add --files my_folder_with_files
Then from code: [Non...
JitteryCoyote63 to filter out 'archived tasks' (i.e. exclude archived tasks)Task.get_tasks(project_name="my-project", task_name="my-task", task_filter=dict(system_tags=["-archived"])))
Correct (with the port mapping service in it)
Hi DisgustedDove53
Is redis used as permanent data storage or just cache?
Mostly cache (Ithink)
Would there be any problems if it is restarted and comes up clean?
Pretty sure it should be fine, why do you ask ?
EnormousWorm79 you mean to get the DAG graph of the Dataset (like you see in the plots section)?
If we setup a ingress with MetalLB or Nginx, and added LoadBalancer into the template yaml, do you think this will work?
I would configure the k8s glue pod template to have "Service" port forward to the pod's 10022 port (default SSH port for the clearml-session), basically allowing the k8s ingest to allocate a port to the pod.
To test if it worked, spin the clearml session, and try to SSH to the external IP:port.
Once that works you can basically tell the clearml-session client which ...
SubstantialElk6 it seems the auto resolve of pytorch cuda failed,
What do you have in the "installed packages" section?
BTW: @<1673501397007470592:profile|RelievedDuck3> we just released 1.3.1 with better debugging, it prints full exception stack on failure to the clearml Serving Session Task.
I suggest you pull the latest image re run the docker compose and check what you have on the serving session Task in the UI
Hmm, I think the issue is here (the docker command mount)'-v', '/tmp/.clearml_agent.de0n48pm.cfg:/root/clearml.conf'
You will have to build your own docker image based on that docker file, and then update the docker compose
Hi GrotesqueOctopus42
In theory it can be built, the main hurdle is getting elk/mongo/redis containers for arm64 ...
Hi GrotesqueOctopus42
Dispite having reuse_last_task_id=True on Task.init, it always creates a new task id. Anyone ever had this issue?
So the way "reuse_last_task_id=True" works is that if there are no artifacts on the Task it will reuse it, but when running inside jupyter it always has artifacts (the notebook itself), so it starts a new Task.
You can however pass a specific Task ID and it will reuse it "reuse_last_task_id=aabb11", would that help?
Hi @<1545216070686609408:profile|EnthusiasticCow4>
The auto detection of clearml is based on the actual imported packages, not the requirements.txt of your entire python environment. This is why some of them are missing.
That said you can always manually add them
Task.add_requirements("hydra-colorlog") # optional add version="1.2.0"
task = Task.init(...)
(notice to call before Task.init)
Hi VirtuousFish83
Apologies for the documentation in the docs 🙂 It sounds complicated but actually should be relatively simple. Based on what I understand, you already have the server setup and you code integrated. The question is "can you see an experiment in the UI"? If you do, then you can right click it, clone the experiment , edit parameters and send for execution (enqueue). If the experiment is not in the UI you can either (1) run the code with the Task.init call, it ill automatica...
You can change it the CWD folder, if you put . in working dir it will be the root git repo, but you can do any subfolder, obviously you need to change the script path to match the folder, e.g. ./folder/script.py etc.
How does the folder structure look like, and where is the "package" and the entry script ?