![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/SarcasticSquirrel56.png)
Reputation
Badges 1
137 × Eureka!In the ClearML ui it stays in a Pending state
Thanks Martin.. I'll add this and check whether it fixes the issue, but I don't get quite well this though.. The local code doesn't need to import pandas, because the get method returns a DataFrame object that has a .loc
method.
I was expecting the remote experiment to behave similarly, why do I need to import pandas there?
Thanks Martin! If I end up having sometime I'll dig into the code and check if I can bake something!
And if instead I want to force "get()" to return me the path (e.g. I want to read the csv with a library that is not pandas) do we have an option for that?
actually there are some network issues right now, I'll share the output as soon as I manage to run it
Before any experiment enqueueing, theare are the queue I have available
when I run it on my laptop...
what I am trying to achieve is not having to worry about this setting, and have all the artifacts and models uploaded to the file server automatically
The behaviour I'd like to achieve is that any artefact is automatically saved to an S3 bucket, possibly without having the Data Scientist having to configure much on their side.
Right now, we are storing artefacts in the fileserver, and we have to make sure that we use output_uri=True in the Task.init call to have artefacts uploaded to ClearML fileserver.
What's the ideal setup to keep the boilerplate for DS code minimal?
So I set this to sdk.development.default_output_uri: <url to fileserver>
in the K8s Agent Glue pod, but DS in order for models to be uploaded,
you still have to set: output_uri=True
in the Task.init()
command...
otherwise artifacts are only stored on my laptop, and in the "models" I see a uri like : file:///Users/luca/etc.etc./
So I am not really sure what this does
Thanks CostlyOstrich36 I was thinking more to a setting of the environment, for example the documentation mentions the "--cpu-only" flag (which I am not sure I can use as I am using the helm charts from AllegroAI, I don't think I can override the command), or to set the env var NVIDIA_VISIBLE_DEVICES to an empty string (which I did, but I can still see the message)
Effectively kubectl commands don't work from within the agent pod, I'll try to figure out why
OK, thanks a lot, I'll try to get the networking thing sorted (and then I am sure I'll have lots more many doubts 😂 )
What I still don't get, is how you would create different queues, targeting different nodes with different GPUs, and having them using the appropriate Cuda image.
Looking at the template, I don't understand how that's possible.
ah I see, I'll give it a try then
sure, give me a couple of minutes to make the changes
Yes, I add it to "default" queue (which is the one used in the config file for the k8 glue agent)
The workaround that works for me is:
clone the experiment that I run on my laptop in the newly cloned experiment, modify the hyperparameters and configurations to my need in user properties set "k8s-queue" to "cpu" (or the name of queue I want to use) enqueue the experiment to the same queue I just set...
When I do like that in the K8sGlue pod for the cpu queue I can see that it has been correctly picked up:
` No tasks in queue 54d3edb05a89462faaf51e1c878cf2c7
No tasks in Queues, sleeping fo...
now, I go to experiment, clone an experiment that I previously executed on my laptop. In the newly created experiment, I modify some parameter, and enqueue the experiment in the CPU queue.
Hi SuccessfulKoala55 I can confirm that the "id-like" queue created by ClearML
actually correspond to the id of queue "k8s_scheduler" (so it looks like that instead of submitting the experiment to the scheduler to be enqueued to the right queue), a new queue whose name corresponds to the id of the k8s_scheduler is created instead.
Hope this helps 🙂
Next week I can take some screenshots if you need them, ai just closed the laptop and will be off for a couple of days :))
and one more question, in the values, I also see the values for the default tokens:
` credentials:
apiserver:
# -- Set for apiserver_key field
accessKey: "5442F3443MJMORWZA3ZH"
# -- Set for apiserver_secret field
secretKey: "BxapIRo9ZINi8x25CRxz8Wdmr2pQjzuWVB4PNASZqCtTyWgWVQ"
tests:
# -- Set for tests_user_key field
accessKey: "ENP39EQM4SLACGD5FXB7"
# -- Set for tests_user_secret field
secretKey: "lPcm0imbcBZ8mwgO7tpadutiS3gnJD05x9j7a...
perfect, let me try 🙂 (thanks a lot for all the help!)
Hi folks, I think I found the issue, the documentation mention to set NVIDIA_VISIBLE_DEVICES to "", when in reality it should be "none" according to the code:
if Session.get_nvidia_visible_env() == 'none': # NVIDIA_VISIBLE_DEVICES set to none, marks cpu_only flag # active_gpus == False means no GPU reporting self._active_gpus = False