Reputation
Badges 1
137 × Eureka!Yes, I still see those errors, but queues are working :)
Hi Jake thanks for your answer!
So I just have a very simple file "project.py" with this content:
` from clearml import Task
task = Task.init(project_name='project-no-git', task_name='experiment-1')
import pandas as pd
print("OK") If I run
python project.py ` from a folder that is not in a git repository, I can clone the task and enqueue it from the UI, and ti runs in the agent with no problems.
If I copy the same file, in a folder that is in a git repository, when I enqueue the ex...
Thanks CostlyOstrich36 I was thinking more to a setting of the environment, for example the documentation mentions the "--cpu-only" flag (which I am not sure I can use as I am using the helm charts from AllegroAI, I don't think I can override the command), or to set the env var NVIDIA_VISIBLE_DEVICES to an empty string (which I did, but I can still see the message)
Hi folks, I think I found the issue, the documentation mention to set NVIDIA_VISIBLE_DEVICES to "", when in reality it should be "none" according to the code:
if Session.get_nvidia_visible_env() == 'none': # NVIDIA_VISIBLE_DEVICES set to none, marks cpu_only flag # active_gpus == False means no GPU reporting self._active_gpus = False
As much as possible, I'd like removing the burden off the shoulders of people writing their models
My understanding is that in Task.init, you have a reuse_last_task_id (or similar name) that defaults to True.. In that case if your experiment wasn't "published" it will be overwritten, (based on project and experiment name). However, if you do publish it, a new experiment would be created
Thanks SuccessfulKoala55 . Any idea why going to the address https://allegroai.github.io/clearml-helm-charts
returns a 404 error?
Other repositories that are used in Argo CD examples (e.g. https://bitnami-labs.github.io/sealed-secrets , which is also hosted on Github) instead of returning a 404, the index.yaml page is loaded instead.
I suspect this might be the reason why I can't make it work with ClearML.
just to understand well the problems you helped me fix:
for elastic search it looked like I wasn't running the cluster with enough memory
but what happened to the FileServer? and how can I prevent it happening in a potential "production" deployment?
many thanks 🙂 I am going to play with ClearML a little bit and re-read carefully the thread to learn something from what you made me do today!
using the --set
you adviced above right?
thanks a lot 🙂 that was quick 🙂
and one more question, in the values, I also see the values for the default tokens:
` credentials:
apiserver:
# -- Set for apiserver_key field
accessKey: "5442F3443MJMORWZA3ZH"
# -- Set for apiserver_secret field
secretKey: "BxapIRo9ZINi8x25CRxz8Wdmr2pQjzuWVB4PNASZqCtTyWgWVQ"
tests:
# -- Set for tests_user_key field
accessKey: "ENP39EQM4SLACGD5FXB7"
# -- Set for tests_user_secret field
secretKey: "lPcm0imbcBZ8mwgO7tpadutiS3gnJD05x9j7a...
Hi AgitatedDove14 I have spent some time going through the helm charts but I admit I still haven't clear how things should work.
I see that with the default values (mostly what I am using), the K8s Glue agent is deployed (which is what you suggested to use).
Thanks, I'll try to understand how the default agent coming with the helm chart is configured and try to copy how to setup a different one from there then
great! thanks a lot!
Right now I see the default agent that comes with the helm chart...
is there a way I can check whether the apiserver are reachable?
(like: https://clearml-apiserver.ds.bumble.dev/health http://ds.bumble.dev/health )
Hi Jake unfortunately I realized we put a loadbalancer, so any address like addess.domain, would ping
I can ping it without issues, but I am not sure if the communications are set correctly
Hi Jake, I mean that when I create a token, I would like the users to see the right
hosts, so that they can just copy and paste when they do clearml-init
OK I could connect with the SDK, so everything is working, I'd just like to get the right hosts shown in the UI when a new token is created