
Reputation
Badges 1
137 × Eureka!Hi Jake thanks for your answer!
So I just have a very simple file "project.py" with this content:
` from clearml import Task
task = Task.init(project_name='project-no-git', task_name='experiment-1')
import pandas as pd
print("OK") If I run
python project.py ` from a folder that is not in a git repository, I can clone the task and enqueue it from the UI, and ti runs in the agent with no problems.
If I copy the same file, in a folder that is in a git repository, when I enqueue the ex...
Maybe I did something wrong...
the clearml.conf in the agentglue pod looks like:
sdk { development { default_output_uri: "fileserver_url" } } agent { package_manager: { extra_index_url: [ "extra_index_url" ] } }
but when I removed output_uri from Task.init, the pickled model has path file:///Users/luca/path/to/pickle.file
I actually found out it was an indentation error 😅 and the credentials weren't picked
So I set this to sdk.development.default_output_uri: <url to fileserver>
in the K8s Agent Glue pod, but DS in order for models to be uploaded,
you still have to set: output_uri=True
in the Task.init()
command...
otherwise artifacts are only stored on my laptop, and in the "models" I see a uri like : file:///Users/luca/etc.etc./
So I am not really sure what this does
The behaviour I'd like to achieve is that any artefact is automatically saved to an S3 bucket, possibly without having the Data Scientist having to configure much on their side.
Right now, we are storing artefacts in the fileserver, and we have to make sure that we use output_uri=True in the Task.init call to have artefacts uploaded to ClearML fileserver.
What's the ideal setup to keep the boilerplate for DS code minimal?
CostlyOstrich36 so I don't have to write the clearml.conf?
I would like to setup things so that a data scientist working on a project doesn't have to know about buckets and this sort of things... Ideally the server and the agents are configured with a default bucket...
no, there's no task with a name of cpu or gpu... Where can I find the id of the queue to check?2. what do you mean by initial log dumps, the very early row when it's being deployed?
Anyway, sure I can send it to you, but I just turned off my laptop :) and won't be able for a few days.
(though so far I am not quite managing to make it work even using the right hosts and ports)
now, I go to experiment, clone an experiment that I previously executed on my laptop. In the newly created experiment, I modify some parameter, and enqueue the experiment in the CPU queue.
Before any experiment enqueueing, theare are the queue I have available
do I need something else in the clearml.conf?
Exactly that :) if I go in the queue tab, I see a new queue name (that I didn't create),
with a name like "4gh637aqetc"
OK, thanks a lot, I'll try to get the networking thing sorted (and then I am sure I'll have lots more many doubts 😂 )
Thanks, in DM I sent you the conf we use to deploy the agents.
the experiment is supposed tu run in this queue, but then it hangs in a pending scheduler
Hi Jake, I mean that when I create a token, I would like the users to see the right
hosts, so that they can just copy and paste when they do clearml-init
Hi Alon, thanks, I actually watched those videos. But they don't help with settings things up 🙂
From your explanation, I understand that Agents are indeed needed for ClearML to work.
also, if I clone an experiment on wich I had to set the k8s-queue user property manually to run experiments on a queue, say cpu, and enqueue it to a different queue, say gpu, the property is not updated, and the experiment is enqueued in a queue with a random hash like name. I either have to delete the attribute, or set it to the right queue name, before enqueuing it, to have it run in the right queue
I have tried this several time and the behaviour is always the same. It looks like when I modify some hyperparameter, when I enqueue the experiment to one queue, things don't work if I didn't make sure to have previously set the value of k8s-queue to the name of the queue that I want to use. If I don't modify the configuration (e.g. I abort, or reset the job and enqueue it again, or clone and enqueue it without modifying the hyperparameters) then everything works as expected.
Next week I can take some screenshots if you need them, ai just closed the laptop and will be off for a couple of days :))
Not really 🙂
They files are clearly different, but if I understand correctly is it enough to add
` storage {
cache {
# Defaults to system temp folder / cache
default_base_dir: "~/.clearml/cache"
# default_cache_manager_size: 100
}
direct_access: [
# Objects matching are considered to be available for direct access, i.e. they will not be downloaded
# or cached, and any download request will return a di...
but I set up only the apiserver fileserver and webserver hosts, and the access keys... the rest is what is produced by clearml-init
OK. In the pod spawned by the K8s Glue Agent, clearml.conf is the same as the K8S Glue Agent
Absolute sense! Thanks a lot Martin, I thought it was being done by the backend!